fglrx causing APIC errors on AMD 780G

Bug #357457 reported by floid
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
fglrx
New
Undecided
Unassigned
linux (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

Binary package hint: xorg-driver-fglrx

Running an up-to-date Jaunty on AMD64, using an Asus M3A78-EM board (<A HREF="http://www.asus.com/product.aspx?P_ID=KjpYqzmAd9vsTM2D&templete=2">specs here</A>) and Athlon X2 "4850E", I have observed fglrx 2:8.600-0ubuntu2 reliably inducing two APIC errors early in the boot process on every boot:

[ 20.945703] APIC error on CPU0: 00(08)
[ 20.945711] APIC error on CPU1: 00(08)

This was noticed when I caught the system misbehaving more severely and throwing a handful of 00(08) and 08(08) across both cores. In fact, I went looking and found those due to the gnome-panel suddenly crashing/looping a few hours into a session, possibly due to a crashing indicator-applet(?!) -- I'm not sure what was really cause and effect there, though it was very surprising and reproducible across reboots this morning. Perhaps fglrx corrupted memory or state enough to make a random GNOME process lose its mind, or perhaps the (repeatably, but resolved by the time I reverted drivers?) confused GNOME process put enough load on fglrx to turn up bugs.

The only peripheral other than PATA HD, mouse, and keyboard is a Ralink-based Belkin F5D7050 USB wireless adapter -- left scanning via Network Manager, as I'm using a wired connection.

No APIC errors are observed at boot or otherwise, and the system is perfectly stable when using the open-source 'radeon' driver.

I'm filing this as a courtesy to upstream, if they ever look here, and as documentation for any other affected users.
(If you're looking for the surprisingly-hard-to-find decoder ring for APIC error codes or numbers, it is buried in the comment to smp_error_interrupt() in your favorite Linux' source tree's arch/x86/kernel/apic.c). Unfortunately I don't think I'm going to have the time to address this particular issue more thoroughly beyond the dmesg sample and lspci attached here.

Revision history for this message
floid (jkanowitz) wrote :
Revision history for this message
floid (jkanowitz) wrote :
Revision history for this message
floid (jkanowitz) wrote :

Hmm. This seems to have wound up pointed at "linux-restricted-modules-2.6.15 (Ubuntu)" after attempting to point it to xorg-driver-fglrx. I can't seem to find a more applicable package that Launchpad believes is "in Ubuntu," so pardon if I'm in the wrong place.

Revision history for this message
mkaz (mubashir-kazia) wrote :

I'm having same problem on a Gigabyte GA-MA78GM-S2H with the same chipset and fglrx 8.60.40. I'm not aware if the messages used to appear before upgrading to jaunty but I never had the freeze problem.
I have downgraded the kernel to 2.6.27-14. Though the messages 'APIC error on CPU0: 00(08)' and 'APIC error on CPU1: 00(08)' still appear while fglrx is initially loading, I no longer have the 'APIC error on CPU0: 08(08)' and 'APIC error on CPU1: 08(08)' messages and have not noticed the system freeze.

mkaz (mubashir-kazia)
affects: linux-restricted-modules-2.6.15 (Ubuntu) → linux (Ubuntu)
rfried (rfried)
tags: removed: amd64
summary: - [jaunty] fglrx causing APIC errors on SMP AMD64, 780G
+ fglrx causing APIC errors on SMP 780G
summary: - fglrx causing APIC errors on SMP 780G
+ fglrx causing APIC errors on AMD 780G
Revision history for this message
rfried (rfried) wrote :

I can confirm this bug with Ubuntu 8.04.2/Hardy (kernel 2.6.24-24-generic #1 SMP Wed Apr 15 15:54:25 UTC 2009 i686 GNU/Linux)
using Ati Catalyst 9.4 (fglrx 8.60.3) and Catalyst 9.5 (fglrx 8.61.2)
running on Gigabyte GA-MA78GM-S2H (AMD 780G Chipset; integrated Radeon HD 3200 GPU; AMD Athlon X2 4850e CPU).

Kernel log messages show up the APIC error:
  [ 42.198172] [fglrx] Maximum main memory to use for locked dma buffers: 2395 MBytes.
  [ 42.198231] [fglrx] vendor: 1002 device: 9610 count: 1
  [ 42.198707] [fglrx] ioport: bar 1, base 0xee00, size: 0x100
  [ 42.199138] [fglrx] Driver built-in PAT support is enabled successfully
  [ 42.199200] [fglrx] module loaded - fglrx 8.61.2 [Apr 28 2009] with 1 minors
  [ 57.714534] [fglrx] GART Table is not in FRAME_BUFFER range
  [ 57.715407] [fglrx] Firegl kernel thread PID: 7326
  [ 57.718888] APIC error on CPU1: 00(08)
  [ 57.718936] APIC error on CPU0: 00(08)
  [ 57.734356] [fglrx] Gart USWC size:1202 M.
  [ 57.734361] [fglrx] Gart cacheable size:60 M.
  [ 57.734366] [fglrx] Reserved FB block: Shared offset:0, size:1000000
  [ 57.734368] [fglrx] Reserved FB block: Unshared offset:1fffc000, size:4000

Packages installed:
  fglrx-amdcccle 2:8.612-0ubuntu1
  fglrx-kernel-source 2:8.612-0ubuntu1
  fglrx-modaliases 2:8.612-0ubuntu1
  xorg-driver-fglrx 2:8.612-0ubuntu1

Loading hardy provided Catalyst 8.3 (fglrx 8.47.3) drivers in linux-restricted-modules-generic package
_no_ APIC error is reported.

Except of this kernel log error message, no failure occures. System operates normal.

Kind Regards,
Roland

Revision history for this message
gzahl (gzahl) wrote :

Hello,

This bug affects me too. I also have problems with video playback. If i play a video (the source does not matter) the video hangs from time to time for a changing time period (usually some minutes). If the video hangs, also other processes are hanging. If a video hangs it only seems to take gnome applications down like gnome-do or terminal. e.g. xterm or firefox is never compromised.
it is a really a annoying bug, but i'm not really sure if it is related to this problem. Some idea how i could work that out?
Kind regards
Manuel

Revision history for this message
Kim Botherway (dj-dvant) wrote :

I finally tracked why my computer would pause for 2 minutes every so often. Typically I would hit reset, but today I left the machine running and within 2 mins the screen came back to life. While the screen was frozen other users network connections running though my machine worked perfectly, it was just the screen that was locked up. I have tried the latest AMD drivers as well.

Description: Ubuntu 9.04
Release: 9.04
Codename: jaunty

vendor_id : AuthenticAMD
cpu family : 15
model : 107
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 5600+

Video Chipset : AMD 780G
Memory : 4Gb

xorg-driver-fglrx:
  Installed: 2:8.612-0ubuntu1
  Candidate: 2:8.612-0ubuntu1
  Version table:
 *** 2:8.612-0ubuntu1 0
        100 /var/lib/dpkg/status

fglrx-amdcccle:
  Installed: 2:8.612-0ubuntu1
  Candidate: 2:8.612-0ubuntu1
  Version table:
 *** 2:8.612-0ubuntu1 0
        100 /var/lib/dpkg/status

syslog:Jun 15 16:20:49 server1 kernel: [41903.192807] APIC error on CPU1: 08(08)
syslog:Jun 15 16:20:49 server1 kernel: [41903.192891] APIC error on CPU0: 08(08)
syslog:Jun 15 16:20:49 server1 kernel: [41903.204428] APIC error on CPU0: 08(08)
syslog:Jun 15 16:20:49 server1 kernel: [41903.204433] APIC error on CPU1: 08(08)
syslog:Jun 15 18:58:06 server1 kernel: [51339.956857] APIC error on CPU1: 08(08)
syslog:Jun 15 18:58:06 server1 kernel: [51339.956864] APIC error on CPU0: 08(08)
syslog.0:Jun 14 19:24:17 server1 kernel: [ 84.469384] APIC error on CPU1: 00(08)
syslog.0:Jun 14 19:24:17 server1 kernel: [ 84.469388] APIC error on CPU0: 00(08)
syslog.0:Jun 15 04:42:53 server1 kernel: [ 27.708206] APIC error on CPU1: 00(08)
syslog.0:Jun 15 04:42:53 server1 kernel: [ 27.708211] APIC error on CPU0: 00(08)
syslog.0:Jun 15 04:44:55 server1 kernel: [ 149.246565] APIC error on CPU1: 08(08)
syslog.0:Jun 15 04:44:55 server1 kernel: [ 149.246570] APIC error on CPU0: 08(08)
syslog.0:Jun 15 04:47:52 server1 kernel: [ 325.831562] APIC error on CPU0: 08(08)
syslog.0:Jun 15 04:47:52 server1 kernel: [ 325.831566] APIC error on CPU1: 08(08)

Revision history for this message
Kim Botherway (dj-dvant) wrote :

I have some evidence that the new fglrx driver fixes the problem, its only been 12 hours since I built and installed the new packages from amd but so far no pauses and no APIC errors.

xorg-driver-fglrx:
  Installed: 2:8.620-0ubuntu1
  Candidate: 2:8.620-0ubuntu1
  Version table:
 *** 2:8.620-0ubuntu1 0
        100 /var/lib/dpkg/status

Revision history for this message
Kim Botherway (dj-dvant) wrote :

36 Hours later and I haven't had any pauses using the new fglrx driver.

Revision history for this message
gzahl (gzahl) wrote :

Hi,

i seem to have a similar problem to yours, but the driver didn't fix it. But the open source radeon driver 6.12.1 works just fine.

Greetings
Manuel

Revision history for this message
floid (jkanowitz) wrote :

Still trying to sit this one out due to time constraints/the proprietary nature, but:

@djdvant, @gzahl: rfried's already showing this on plain i686, but it's not clear to me if you're running i686 or 64-bit kernels.

@rfried: I see you took SMP out of the subject, but you're still running a dual-core chip like everyone else. Since the APICs are involved, it might be interesting to confirm if forcing UP brings stability.

Also, I gather the Ubuntu-based Neuros LINK PVR is using this same hardware, and it may be worth a look to see how they're dealing with the issue. But it could be that they're still basing their distribution on 8.10 and haven't run into it yet... or not using fglrx at all, I'm not sure.

Revision history for this message
Kim Botherway (dj-dvant) wrote :

I am running x86_64 kernel.

fglrx 8.62 seems to have fixed the problems, at least X has stopped pausing for several minutes before resuming.

Revision history for this message
-None- (mike-nycmoma) wrote :

i have the same problem at Gigabyte GA-MA78GM-S2H & AMD Athlon(tm) 64 X2 Dual Core Processor 4000+
my computer would pause for 1-2 minutes randomly 2-3 times in a day. Nothing changes in logs after it.
dmesg shows next:
[ 15.159106] sysctl net.netfilter.nf_conntrack_acct=1 to enable it.
[ 15.165222] VBoxNetFlt: dbg - g_abExecMemory=ffffffffa06e87e0
[ 15.174578] VBoxNetAdp: dbg - g_abExecMemory=ffffffffa070af80
[ 16.064849] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[ 16.064852] Bluetooth: BNEP filters: protocol multicast
[ 16.914755] ppdev: user-space parallel port driver
[ 17.836410] [fglrx] GART Table is not in FRAME_BUFFER range
[ 17.838679] fglrx_pci 0000:01:05.0: irq 2300 for MSI/MSI-X
[ 17.852712] [fglrx] Firegl kernel thread PID: 3486
[ 17.863453] APIC error on CPU1: 00(08)
[ 17.863459] APIC error on CPU0: 00(08)
[ 17.865974] [fglrx] Gart USWC size:803 M.
[ 17.865978] [fglrx] Gart cacheable size:60 M.
[ 17.865983] [fglrx] Reserved FB block: Shared offset:0, size:1000000
[ 17.865986] [fglrx] Reserved FB block: Unshared offset:fffc000, size:4000
[ 19.782632] JBD: barrier-based sync failed on dm-4:8 - disabling barriers
[ 22.880019] eth0: no IPv6 routers present
[ 25.860021] virbr0: no IPv6 routers present
[ 79.000034] Clocksource tsc unstable (delta = -261862324 ns)
[ 110.920595] JBD: barrier-based sync failed on dm-3:8 - disabling barriers

Revision history for this message
Felipe Sánchez (felipiwi) wrote :

how can i build the latest version of fglrx driver?

Revision history for this message
Balr0g (y-launchpad-net-oliverkonz-de) wrote :

I get the same APIC-Errors on all 4 CPU-Cores, though I haven't noticed any hangups, yet.

OS: Ubuntu 9.04 (x86, 32bit)
Kernel: 2.6.28-15-generic
Chipset: AMD 790GX
CPU: AMD Phenom(tm) 9350e Quad-Core Processor
Graphics: Onboard Radeon HD 3300 with fglrx (Using the driver from the Ubuntu repositories.)

Revision history for this message
rfried (rfried) wrote :

As noted by floid I tested

 - Ubuntu 8.04 kernel 2.6.24-24.59 (latest as for now 2009-08-29)
    SMP disabled and UP_APIC enabled kernels

 - ATI Catalyst 9.8 (8.640)

 - Architecture i686 (32bit)

The kernel APIC error message still comes up every time fglrx initializes graphics hardware.
(e.g. on X startup and on every console switch to X servers vt (ctrl-f1-> ctrl->f7))

  [ 384.946962] Linux agpgart interface v0.102
  [ 384.990098] [fglrx] Maximum main memory to use for locked dma buffers: 2646 MBytes.
  [ 384.990307] [fglrx] vendor: 1002 device: 9610 count: 1
  [ 384.990471] [fglrx] ioport: bar 1, base 0xee00, size: 0x100
  [ 384.990485] ACPI: PCI Interrupt 0000:01:05.0[A] -> GSI 18 (level, low) -> IRQ 16
  [ 384.990490] PCI: Setting latency timer of device 0000:01:05.0 to 64
  [ 384.991008] [fglrx] Driver built-in PAT support is enabled successfully
  [ 384.991221] [fglrx] module loaded - fglrx 8.64.3 [Jul 14 2009] with 1 minors
  [ 853.195360] [fglrx] GART Table is not in FRAME_BUFFER range
  [ 853.197088] [fglrx] Firegl kernel thread PID: 8071
  [ 853.200338] APIC error on CPU0: 08(08)
  [ 341.554523] [fglrx] Gart USWC size:869 M.
  [ 341.554530] [fglrx] Gart cacheable size:344 M.
  [ 341.554736] [fglrx] Reserved FB block: Shared offset:0, size:1000000
  [ 341.554739] [fglrx] Reserved FB block: Unshared offset:fffc000, size:4000

These are the dep packages created:
  http://mimas.selfip.org/s/apicerr/linux-image-2.6.24-24-rfried_2.6.24-24.59_i386.deb
  http://mimas.selfip.org/s/apicerr/linux-headers-2.6.24-24-rfried_2.6.24-24.59_i386.deb
  http://mimas.selfip.org/s/apicerr/linux-ubuntu-modules-2.6.24-24-rfried_2.6.24-24.39_i386.deb
  http://mimas.selfip.org/s/apicerr/linux-restricted-modules-2.6.24-24-rfried_2.6.24.18-24.1_i386.deb
  http://mimas.selfip.org/s/apicerr/fglrx-kernel-source_8.640-0ubuntu1_i386.deb
  http://mimas.selfip.org/s/apicerr/xorg-driver-fglrx_8.640-0ubuntu1_i386.deb
  http://mimas.selfip.org/s/apicerr/fglrx-amdcccle_8.640-0ubuntu1_i386.deb
  http://mimas.selfip.org/s/apicerr/fglrx-modaliases_8.640-0ubuntu1_i386.deb
  (flavour rfried so there is no conflict to ubuntu updates repo version files)
  I did not have stability problems here.
  But maybe someone who has, may try the uniprocessor kernels.

Revision history for this message
floid (jkanowitz) wrote :

No longer an issue for me with fglrx 2:8.660-0ubuntu4, linux 2.6.31-17.54 (=="2.6.31.17.30"? I am not so familiar with interpreting dpkg-query output yet.).

I'd mark it resolved, but how are things for the 8.04LTS crowd?

Revision history for this message
rfried (rfried) wrote :

I installed fglrx 2:8.681-0ubuntu1 on hardy running kernel 2.6.24-26-generic #1 SMP today.

The messages

[ 661.170093] APIC error on CPU1: 08(08)
[ 661.170242] APIC error on CPU0: 08(08)

still apear in console logs each time x-server initializes.

New are additional several console messages
[ 599.642650] Assertion failed in ../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/hal_rs780.c at line: 53
when switching from x-server to console(chvt 1).

But as always, there is no stability problem here.

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi floid,

Please be sure to confirm this issue exists with the latest development release of Ubuntu. ISO CD images are available from http://cdimage.ubuntu.com/releases/lucid . If the issue remains, please run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 357457

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Brad Figg (brad-figg) wrote : Unsupported series, setting status to "Won't Fix".

This bug was filed against a series that is no longer supported and so is being marked as Won't Fix. If this issue still exists in a supported series, please file a new bug.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.