Resuming from sleep leaves Xorg using 100% CPU and unable to turn on the screen

Bug #968265 reported by Simon Strandman
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
fglrx-installer (Ubuntu)
Invalid
High
Unassigned

Bug Description

With the open source radeon driver the system correcty resumes from sleep, but after installing fglrx or fglrx-updates the screen is just black when resuming. The computer hasn't actually hung so I can SSH to it. At that point Xorg is using 100% CPU and I'm unable to kill or restart the process (even when using kill -9). I'm not sure what to do about this but I guess it should be possible to check what Xorg is stuck doing.

WORKAROUND: Use fglrx 12.8 and later.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: fglrx-updates 2:8.960-0ubuntu1
ProcVersionSignature: Ubuntu 3.2.0-20.33-generic 3.2.12
Uname: Linux 3.2.0-20-generic x86_64
NonfreeKernelModules: fglrx
ApportVersion: 1.95-0ubuntu1
Architecture: amd64
Date: Thu Mar 29 16:06:27 2012
InstallationMedia: Ubuntu 12.04 LTS "Precise Pangolin" - Alpha amd64 (20120306)
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=sv_SE.UTF-8
 SHELL=/bin/bash
SourcePackage: fglrx-installer-updates
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Simon Strandman (nejsimon) wrote :
Revision history for this message
Simon Strandman (nejsimon) wrote :

The Xorg log when this happens, doesn't seem to contain anything interesting.

Revision history for this message
Bryce Harrington (bryce) wrote :

Often with these 100% X cpu bugs, the problem is caused by a client application, which is stuck in a loop making X requests. In such cases, the debugging procedure is to examine your process table (e.g. `ps aux`) and start killing processes one by one until the system unfreezes.

However, the fact that this occurs on resume makes this bug sound a bit different. It can't hurt to try the above, and it might turn up something (I'd probably test killing gnome-settings-daemon, compiz/unity and some of the other gnome infrastructural bits first).

If all that fails, you can connect to the running X process using gdb and gather a series of backtraces to see what series of routines it is hitting. strace can be used here too, although with the X server it produces so much output it's often unusable for diagnostics.

Changed in fglrx-installer (Ubuntu):
status: New → Incomplete
Revision history for this message
Simon Strandman (nejsimon) wrote :
Download full text (8.0 KiB)

Hello!

I tried killing some processes but it didn't help. I also attatched gdb to the Xorg process but gdb just hanged and wouldn't produce a backtrace, even if I ran "killall Xorg" or "killall -9 Xorg" from another ssh session. It seems impossible to stop the Xorg process once it gets into this state.

Anyway, I just discovered an error in kern.log. It looks similar to bug #881526. That bug isn't about suspend/resume issues but I guess it could be the same problem. I also guess there isn't much to do about this then except waiting for AMD to fix it. :(

Apr 11 18:40:38 simon-305U1A kernel: [ 416.884015] [fglrx] ASIC hang happened
Apr 11 18:40:38 simon-305U1A kernel: [ 416.884029] Pid: 1049, comm: Xorg Tainted: P O 3.2.0-23-generic #36-Ubuntu
Apr 11 18:40:38 simon-305U1A kernel: [ 416.884033] Call Trace:
Apr 11 18:40:38 simon-305U1A kernel: [ 416.884147] [<ffffffffa00d4ffe>] KCL_DEBUG_OsDump+0xe/0x10 [fglrx]
Apr 11 18:40:38 simon-305U1A kernel: [ 416.884196] [<ffffffffa00e258c>] firegl_hardwareHangRecovery+0x1c/0x50 [fglrx]
Apr 11 18:40:38 simon-305U1A kernel: [ 416.884281] [<ffffffffa017e119>] ? _ZN4Asic9WaitUntil15ResetASICIfHungEv+0x9/0x10 [fglrx]
Apr 11 18:40:38 simon-305U1A kernel: [ 416.884360] [<ffffffffa017e0bc>] ? _ZN4Asic9WaitUntil15WaitForCompleteEv+0x9c/0xf0 [fglrx]
Apr 11 18:40:38 simon-305U1A kernel: [ 416.884440] [<ffffffffa01813cc>] ? _ZN8AsicR60012IO_QuietdownEv+0x2c/0x40 [fglrx]
Apr 11 18:40:38 simon-305U1A kernel: [ 416.884520] [<ffffffffa0178bd4>] ? _ZN15ExecutableUnits10CPRingIdleE15idle_WaitMethod12_QS_CP_RING_+0x134/0x1e0 [fglrx]
Apr 11 18:40:38 simon-305U1A kernel: [ 416.884600] [<ffffffffa0178a4c>] ? _ZN15ExecutableUnits7PM4idleE15idle_WaitMethod+0x4c/0x90 [fglrx]
Apr 11 18:40:38 simon-305U1A kernel: [ 416.884678] [<ffffffffa01785b6>] ? _ZN15ExecutableUnits9assertPM4Eb+0x56/0x70 [fglrx]
Apr 11 18:40:38 simon-305U1A kernel: [ 416.884756] [<ffffffffa0182959>] ? _ZN8AsicR6009assertPM4Eb+0x39/0x80 [fglrx]
Apr 11 18:40:38 simon-305U1A kernel: [ 416.884809] [<ffffffffa0101550>] ? firegl_cmmqs_disabledriver+0xf0/0xf0 [fglrx]
Apr 11 18:40:38 simon-305U1A kernel: [ 416.884876] [<ffffffffa0150d65>] ? CMMQS_ReinitializeHardware+0x75/0xd0 [fglrx]
Apr 11 18:40:38 simon-305U1A kernel: [ 416.884929] [<ffffffffa010266b>] ? firegl_cmmqs_Enable_QS+0xbb/0x160 [fglrx]
Apr 11 18:40:38 simon-305U1A kernel: [ 416.884938] [<ffffffff81073537>] ? capable+0x17/0x20
Apr 11 18:40:38 simon-305U1A kernel: [ 416.884990] [<ffffffffa0101562>] ? firegl_cmmqs_enableqs+0x12/0x70 [fglrx]
Apr 11 18:40:38 simon-305U1A kernel: [ 416.885041] [<ffffffffa0101550>] ? firegl_cmmqs_disabledriver+0xf0/0xf0 [fglrx]
Apr 11 18:40:38 simon-305U1A kernel: [ 416.885088] [<ffffffffa00de12d>] ? firegl_ioctl+0x1ed/0x250 [fglrx]
Apr 11 18:40:38 simon-305U1A kernel: [ 416.885130] [<ffffffffa00ce9be>] ? ip_firegl_unlocked_ioctl+0xe/0x20 [fglrx]
Apr 11 18:40:38 simon-305U1A kernel: [ 416.885137] [<ffffffff81189cfa>] ? do_vfs_ioctl+0x8a/0x340
Apr 11 18:40:38 simon-305U1A kernel: [ 416.885143] [<ffffffff81659f0c>] ? __schedule+0x3cc/0x6f0
Apr 11 18:40:38 simon-305U1A kernel: [ 416.885149] [<ffffffff8118a041>] ? sys_ioctl+0x91/...

Read more...

Revision history for this message
Bryce Harrington (bryce) wrote :

Good find, although the stack traces don't 100% match up. Also sometimes with freeze bugs, crash dumps can be misleading. Still, worth mentioning on the other bug report (which I've done.)

More importantly, the trace proves this is a gpu lockup, not an application issue as originally suspected.

@aberto, please forward to ATI.

Changed in fglrx-installer (Ubuntu):
assignee: nobody → Alberto Milone (albertomilone)
importance: Undecided → High
status: Incomplete → Confirmed
Revision history for this message
Simon Strandman (nejsimon) wrote :

This might have been fixed in fglrx 12.6 beta! My laptop hasn't crashed yet after a few sleep-resume cycles.

Revision history for this message
Simon Strandman (nejsimon) wrote :

I can confirm that the problem is fixed in fglrx 12.6 or later. My computer hasn't crashed even once on suspend/resume since updating. Though support for my hardwhare was briefly dropped in 12.6 so 12.8 is the first version that's stable without having the annoying "Unsupported hardware" watermark.

It would be nice if this version could be added to 12.10 before the feature freeze!

Revision history for this message
penalvch (penalvch) wrote :

Simon Strandman, thank you for reporting this and helping make Ubuntu better. Could you please execute the following via a terminal:
apport-collect -p linux 968265

As well, using the fglrx-installer provided by Ubuntu, could you please provide the information following https://wiki.ubuntu.com/DebuggingKernelSuspend ?

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: high-spu
tags: added: high-cpu
removed: high-spu
Revision history for this message
Simon Strandman (nejsimon) wrote :

@Christopher

Hello. This problem is fixed in fglrx 12.8 and later so I don't think there is a need to debug it now. Howeer, it would be great if a newer version of fglrx could be backported to 12.04.

I'm using Ubuntu 12.10 now btw so I don't have the problem any more.

Simon

Revision history for this message
penalvch (penalvch) wrote :

Simon Strandman, thank you for your comments. Regarding them :
>"I'm using Ubuntu 12.10 now btw so I don't have the problem any more."

In 12.10, are you using the version of fglrx-installer that comes in the Ubuntu repositories or 12.8 downloaded from amd.com ?

description: updated
Revision history for this message
Simon Strandman (nejsimon) wrote :

I'm using the one from the repos (9.000-0ubuntu3) now. I also tried 12.11 beta from amd.com and it works too! I don't think 12.8 works on quantal due to the newer xserver but when I used precise I got 12.8 from amd.com.

Revision history for this message
penalvch (penalvch) wrote :

Simon Strandman, thank you for providing the requested information. Since you noted fglrx-installer from the Ubuntu repositories works for you in Quantal, did you need a backport of the fix to a release prior to Quantal, or may we close this as Status Invalid?

Revision history for this message
Simon Strandman (nejsimon) wrote :

For me it's fine if this bug is closed since my issue is solved. But I guess other might have the same issue? There is a very similar bug (#881526) btw and they might also be helped by a fglrx backport. But feel free to close this one!

Revision history for this message
penalvch (penalvch) wrote :

Simon Strandman, this bug report is being closed due to your last comment https://bugs.launchpad.net/ubuntu/+source/fglrx-installer/+bug/968265/comments/13 regarding this being fixed for you in Quantal. For future reference you can manage the status of your own bugs by clicking on the current status in the yellow line and then choosing a new status in the revealed drop down box. You can learn more about bug statuses at https://wiki.ubuntu.com/Bugs/Status. Thank you again for taking the time to report this bug and helping to make Ubuntu better. Please submit any future bugs you may find.

no longer affects: linux (Ubuntu)
Changed in fglrx-installer (Ubuntu):
assignee: Alberto Milone (albertomilone) → nobody
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.