fglrx + compiz fusion won't resume[AMD Feature#7647] [EPR#260329]

Bug #197209 reported by Andrew Hutchings on 2008-03-01
58
This bug affects 3 people
Affects Status Importance Assigned to Milestone
fglrx
Fix Released
Wishlist
Unassigned
fglrx-installer (Ubuntu)
Medium
Unassigned
linux-restricted-modules-2.6.24 (Ubuntu)
Undecided
Unassigned

Bug Description

Binary package hint: linux-restricted-modules-2.6.24-10-generic

Distribution: Ubuntu Hardy 64bit (up to date 1st March 2008)
Hardware: Macbook Pro Core2Duo with an ATi X1600

With fglrx as a driver, composite enabled and compiz enabled this laptop won't resume properly after suspend. The screen flashes but then stays blank (sometimes with some garbled red characters at the top of the screen) and it appears as if everything else starts up fine again in the background.

Running pm-trace gives:

[ 25.527032] Magic number: 0:32:773
[ 25.527085] hash matches device ptyp9
[ 25.527112] /build/buildd/linux-2.6.24/drivers/rtc/hctosys.c: unable to open rtc device (rtc0)

Trond Thorbjørnsen (tthorb) wrote :

This happens to me to - with ATI X1400

With opensource-driver suspend works perfect.

I'll attach dmesg.

Bryce Harrington (bryce) wrote :

Could you please attach your /var/log/Xorg.0.log and /etc/X11/xorg.conf?

Changed in linux-restricted-modules-2.6.24:
status: New → Incomplete

Sorry, due to this and other problems I switched to the i386 edition a couple of weeks ago which works fine with fglrx and suspend. I don't have the 64bit edition installed anyway to test this anymore.

Bryce Harrington (bryce) wrote :

Can anyone else reproduce this issue? If not within a couple months I guess we can close it as invalid.

DaveAbrahams (boostpro) wrote :

Yes, I can confirm it! Actually I thought suspend was hopeless until I ssh'd into the resumed machine with the black screen and I noticed this line at the end of Xorg.0.log:

(II) AIGLX: Suspending AIGLX clients for VT switch

Then I did /etc/init.d/gdm stop and killed Xorg and related processes, rmmod fglrx and reloaded it, and restarted gdm to get X back. Then I turned off visual effects and suspend/resume worked perfectly.

I'm attaching relevant files. My hardware is an IBM T60p with ATI FireGL v5200.

A proper fix ought to be possible, but if nothing else, please issue a workaround update that disables/re-enables compiz around suspend/resume. Thanks!

DaveAbrahams (boostpro) wrote :
DaveAbrahams (boostpro) wrote :
Changed in linux-restricted-modules-2.6.24:
status: Incomplete → Confirmed
DaveAbrahams (boostpro) wrote :

I have various partially-working scripts for suspend.d/resume.d to address this, and will attach my working versions in the morning when I can think straight.

I have the mobility radeon X1400, and suspend and resume "work" always. The only problem I have is sometimes there will be 100% (and I mean 100%) CPU usage on resume, but only when compiz is enabled. Another interesting point is that this only seems to happen when I suspend for long periods of time (i.e. overnight). The last two nights I have tried suspending and in the morning when it resumes there is 100% cpu usage and I must hard boot the computer. However, during the day I suspend and resume multiple times without problems as I am out and about with classes and such.

Attaching /var/log/Xorg.0.log

Bryce Harrington (bryce) wrote :

@Alex, from your description you seem to be having a different problem than the original reporter; please file it as a separate bug report.

@DaveAbrahams, are you also running the 64 bit version? (Check uname -a). Also, your files didn't show a backtrace or error message - please install dbg versions of the xserver and please get a full backtrace. For directions on this, please see http://wiki.ubuntu.com/X/Backtracing .

I have a T60p with Mobility FireGL V5250, running fglrx 8.4 (some problem with distro 8.3) on a 32bit Hardy, distupgraded form gutsy. I can confirm this bug.

Without compiz (running 'metacity --replace' before suspending), supend (and resume) works (tried 3 times in a row).
With compiz, the first try may succeed, but the second fails with Xorg taking 100% of one core.
I ran a gdb backtrace full on it (like described in http://wiki.ubuntu.com/X/Backtracing), but got only "no symbol" warnings. Yes, I installed the -dbg packages...

Attached it anyway.

Anything more I could help to make this work?

DaveAbrahams (boostpro) wrote :

I am on x86_32:

$ uname -a
Linux mcbain 2.6.24-16-generic #1 SMP Thu Apr 10 13:23:42 UTC 2008 i686 GNU/Linux

I'll give the backtracing approach a try, but my machine is so close to Niklas' that I'm not optimistic we'll get new information.

DaveAbrahams (boostpro) wrote :

Actually it's worse than that. The Backtracing instructions start out by giving me ambiguous answers ("pgrep Xorg" yields two PIDs, neither of which is Xorg, but instead X) and end by giving me no useful information. gdb says that the process is not being run even though "top" says that it is taking 100% of the CPU. I've attached two logs in case they're of any interest.

DaveAbrahams (boostpro) wrote :
Josh (jharris0221) wrote :

I am also running a T60p with a 5250 and have very similar symptoms as Niklas Hofer.

Running 64bit kernel. With compiz enabled, suspend works fine, but upon resume i get a blank screen. Can recover by resetting X with CTRL+PrtSc .

After I disabled compiz without a restart, I could resume fine but I got the 100% CPU error. Interestingly, it was pegging one core at a time with 100% but the other one was idling properly. It would randomly bounce between both cores as to which was 100% used. No processes sohwed using more than 2-5% of the CPU, which was odd.

After a restart without compiz, resume works fine after multiple trials.

Has someone filed a bug for the 100% CPU problem? It's awful! It's the only thing keeping me from using compiz fusion on hardy!

DaveAbrahams (boostpro) wrote :

After resume with compiz, when I ssh into my T60p running the 32-bit kernel with a FireGL v5200, I also get the 100% cpu problem.

When I wrote "I have various partially-working scripts for suspend.d/resume.d to address this" I meant to write that I have *found* such scripts on the web. See http://www.google.com/search?q=25-compiz-stop.sh&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-US:unofficial&client=firefox-a for a few of them. I tried to use that general approach to fixing the problem. Actually I have my own version of the suspend hook script that should be much more robust against things like X on displays other than :0 (attached), but have been completely unable to it to run automatically at suspend time. I'm utterly in the dark about why it isn't working, so if anyone can offer a solution, I'd appreciate it.
My script does seem to work around the problem if I run it from a plain root shell before suspending.

DaveAbrahams (boostpro) wrote :

I should add that this is almost certainly a hardware-related issue. My Dell Inspiron 9300 with ATI Mobility Radeon x300 suspends and resumes just fine with fglrx and compiz.

DaveAbrahams (boostpro) wrote :

My inability to work around this bug appears to be due to https://bugs.launchpad.net/ubuntu/+source/gnome-power-manager/+bug/220383

Sheesh.

DaveAbrahams (boostpro) wrote :

I could in theory get some traction by placing a script in /etc/pm/sleep.d, but I've tried that -- it appears to get called too late... I'm unable to effectively replace compiz with metacity :(

DaveAbrahams (boostpro) wrote :

The actual bug is an interaction with chvt, which is the first thing pm-utils does. One has to get in before chvt and invoke metacity. One way to do that is something like this file, which would go in /etc/pm/config.d/00-compiz-stop

# override the real chvt to stop compiz before doing anything
chvt() {
    # invoke metacity as user "dave"
    sudo -b -H -u dave XAUTHORITY=/home/dave/.Xauthority DISPLAY=:0 metacity --replace gconf

    # wait until compiz.real actually dies
    while pgrep compiz.real; do sleep 0.1; done

    # forget this function definition
    unset -f chvt

    # call the real chvt
    chvt "$1"
}

The above is a proof-of-concept hack. Tomorrow I should have a "real" version that also properly restores compiz on all the screens where it has been killed.

Curious to see if this works, as I'm stuck the same way on my T60p (ATI 5250) too. Is this a bit of a hack using metacity? I'd rather fix the issue than do an end around.

Mike

DaveAbrahams: Any progress on that "real" version?

DaveAbrahams (boostpro) wrote :

@Alex Strabala: Thanks for asking. Sorry, I've just been busy. I have all the necessary elements but it will be at least a few more days before I can put them together.

DaveAbrahams (boostpro) wrote :

@Mike Phillips: It's all hacks, since we're working around a bug. The alternative hack is to kill compiz directly and then re-start it in whichever way it was launched. I wasn't sure how well the windows would be restored in that case, so I felt more comfortable replacing it with another window manager. However, the metacity --replace technique will end up squishing allt he windows onto one desktop/workspace, so the technique of killing compiz is still worth trying. I've attached some code in progress that identifies the process that launched compiz and could thus re-invoke its command line in case you want to play with it or try to make it work for real. I was planning to combine that with some of the code above for my "real" solution. It's still invoking metacity in most cases but there's a case for killing compiz in there too.

Dave, thanks so much for the script. As of this AM, I tried a suspend (worked fine) but resume did not...

Mike

DaveAbrahams (boostpro) wrote :

It turns out that you can do this even a bit more simply. My current script looks like this:

CHVT="`which chvt`"
chvt() {
    killall -s 15 compiz.real
    while pgrep compiz.real; do sleep 0.1; done
    "$CHVT" "$1"
    unset -f chvt
}

I happen to be running fusion-icon, so when the machine resumes I just select "reload window manager" and I'm all set. As mentioned earlier, it's possible to do something cleaner that restores compiz, but I haven't had the time yet.

Rob (robhester) wrote :

Dave,
I registered just to message you. I've also got a T61 with a FireGL5200, and the suspend/restore bug in compiz is what forced me to abandon my last bout with linux. Does your patch fix our problem? On behalf of all other lurkers on the 'net, thank you so much for putting your time into this!

Dave:

The script you attached on 2008-05-01 (25-compiz-stop.sh) seems to have fixed the problem. I have suspended and resumed greater than 15 times without a hiccup. Interestingly enough, fusion-icon seemed to cause problems, so I simply removed it and now things seem to be working perfectly. I don't have to reload compiz or anything. Thanks so much for your contribution; I really really appreciate it.

For anyone else, here are the commands (assuming you downloaded the script into your home directory):

chmod +x ~/25-compiz-stop.sh
sudo cp ~/25-compiz-stop.sh /etc/acpi/suspend.d/25-compiz-stop.sh

You may or may not have to restart after running these. I do not know.

I'd really love to get something working but am struggling greatly.

My suspend procedure goes into text mode before powering down.

Upon resume, I see a flashing cursor in the upper right. Sometimes that stays there, sometimes it goes black. Sometimes I can ctrl+alt+F2 back to a text login (which only sometimes I can use - or is responsive). Usually pressing ctrl+alt+F7 lands me on a black screen where, I cannot get any input to respond. No mouse, no login, no display, no keyboard. Usually here I cannot get back to ctrl+alt+F2 text.

TIA,

Mike

Mike Phillips:

I suggest you make some changes in your /etc/default/acpi-support file.

gksudo gedit /etc/default/acpi-support

Change "SAVE_VBE_STATE=true" to "SAVE_VBE_STATE=false"
Change "POST_VIDEO=true" to "POST_VIDEO=false"

Reboot and you may have better luck.

DaveAbrahams (boostpro) wrote :

Alex: I'm surprised that 25-compiz-stop.sh is actually working for you. It doesn't work for me in its current form. If it's working for you, it would be a good thing to log everything it is doing so we can see what's going on.

I think the problem with fusion-icon may be that it tries to restart compiz after it is killed; you may have to disable the crash handler plugin.

I am currently recommending my script of 2008-05-04

Mik Phillips: yes, you'll need those changes. And if you ever use the USB ports in a docking station you probably want to set "DOUBLE_CONSOLE_SWITCH=true" as well.

I do usually doc my T60p into a port replicator but not always. I had some success 2 times in a row! But then I saw a bug that CPU0 is stuck for 11s. I also noticed the load was unusually high but processor use was ok. I tried a manual powernod stop before suspend and no good either.

I'm still stuck. :(

Hope I'm helping someone with all my pain. ugh...

thanks for all the support in this bug thread!

Mike

DaveAbrahams (boostpro) wrote :

Mike,

* The docking station issue is real (https://bugs.launchpad.net/ubuntu/+bug/218760).

* I see the same "stuck for 11s" message. If you google for it you'll see that many others do, too. For me it seems to be benign when all my suspend/resume workarounds are in place. Some people think this is a kernel 2.6.24 bug, although I tend to suspect fglrx, which introduces about 11s into any suspend cycle.

* My non-hack workaround for this problem is almost ready. You can see its current state at http://news.gmane.org/find-root.php?message_id=%3c87y76n7w19.fsf%40mcbain.luannocracy.com%3e

Rob (robhester) wrote :

I am quite the noob, but I was able to use DaveAbrahams' hack listed in the url above to successfully resume!

First, I did the chmod command given by Alex above, and then used that second command to copy the two files (after I renamed them once downloaded) to the two different folders as specified by Dave's message @ that URL.

The one issue is that the window boarders are not enabled upon resume. But a quick alt-F2 and typing "compiz" fixed that. It would be nice to not have to do this step, but I am tempted to not fiddle anymore now that I finally have resume working in linux!

My machine is a T60p with FireGL 5200 gfx. I am not running any extra beryl or compiz effects, just the enhanced option that comes stock w/Hardy.

Also, my machine is a Core Duo version 1. I don't know the command to check my CPU usage, so I can't give you guys that info.

Thanks a MILLION!

I too had 1 success of 1 so far with the scripts on the gmane.org page! I wasn't about to call it a success yet, but feel much more confident. I used Dave's "Compiz Fusion Icon" trick to reload the windows manager. It might be easier than dropping into VT and typing it by hand...

Mike

PS - Thanks much for all the help on this one! PHEW!

DaveAbrahams (boostpro) wrote :

Rob: the missing window borders are expected unless you are using the partial solution at http://news.gmane.org/find-root.php?message_id=%3c87y76n7w19.fsf%40mcbain.luannocracy.com%3e. In particular, /etc/pm/sleep.d/00compiz is responsible for re-launching the compiz process. If, even with that script in place (and chmod+x'd), you are still not getting your borders back, something is wrong. I suggest you add the line:

  set -x

Just before

  case "$1" in

and the line:

  set +x

just after

  esac

do a suspend/resume cycle, and post the contents of /var/log/pm-suspend.log

Dave:

Still using 25-compiz-stop.sh. This is extremely strange, but when I suspend for periods of less than 1.5 hours or so, resume works perfectly every time. If it's any longer than that, I get the 100% CPU bug, every time. The past three nights I have suspended and in the morning upon resume I get the bug, despite having gone through many successful suspend/resume cycles throughout each day. I'm not sure what to attach; please let me know. Right now I'm going to try your other script and see how that works.

Dave:

I've used your scripts from the gmane.org (and have chmodded both +x). I have also added the set -x and set +x as instructed, and can successfully resume for short periods (without window borders upon resume). Running "Reload Window Manager" from compiz-icon reloads compiz correctly.

I have to go to a meeting today, so I will suspend before it and resume after it and let you know if it resumes correctly after a several hour period. (see my post immediately above this one).

DaveAbrahams (boostpro) wrote :

Alex,

That's very odd; it appears that by the time 00-compiz-stop.sh gets to run, there are no processes named 'compiz.real' that were spawned by other processes (e.g. compiz-icon). When you get back to a normal state it would be instructive to see the result of 'ps faux' on your system. Before you post it look it over and make sure there's no information you consider sensitive or private, because it will reveal the commands used to launch all your running processes.

26 comments hidden view all 106 comments
Crackh34d (bartmacco) wrote :

I forgot to chmod the file 00compiz, so I did that. But it doesn't make any difference, after five attempts I only get a blank screen upon resume.

Installing fusion-icon didnt change anything either, even if compiz is launched through it

Crackh34d (bartmacco) wrote :

This is a fresh installation. I tweaked nothing at first, I tried suspending as soon as I could.
Do you have a T60 with X1400?? I have a T60p with a FireGL V5250 ( Which is basically a x1600/x1650 I believe)

A friend of mine has exactly the same notebook as me and can't suspend either with a fresh Intrepid install.

schlehmil (mail-schlehmil) wrote :

It is a T60 with a X1400. The only thing I did was to modify the kernel parameters. GRUB passes vga=794 acpi_sleep=s3_bios pci=bios to the kernel. I do not know if this (acpi_sleep=...) is still needed but it has fixed some of my suspend problems years ago.

Crackh34d (bartmacco) wrote :

I tried adding those commands to GRUB, but to no avail...

Hi Bart,

on Sun Oct 26 2008, Crackh34d <bartmacco-AT-hotmail.com> wrote:

> Dear Dave, I also have a T60p and have used an early version of your
> script on Hardy, and it worked fine with fusion-icon.
>
> Today I installed the Intrepid RC, and suspend was working with the open
> source Radeon driver, but I get ´bars´ of screen corruption when
> resuming.

Did you report that bug to the ubuntu people?

> I installed the fglrx driver, so I had to use your scripts again.

Did you report the fact that it still needs hacks/workarounds to the
ubuntu people also?

> I did the following:
>
> - I downloaded the latest 00compiz and 00-compiz-stop.sh to my home directory
> - sudo chmod +x 00-compiz-stop.sh
> - sudo cp 00-compiz-stop.sh /etc/acpi/suspend.d/00-compiz-stop.sh
> - sudo cp 00compiz /etc/pm/sleep/00compiz
>
> Is this correct?

Not on Hardy.
My 00-compiz-stop.sh is in /etc/pm/config.d/
My 00compiz is in /etc/pm/sleep.d/

I can't speak for intrepid.

> With compiz active, I tried suspending, which worked fine. Upon resume I
> was prompted with the login-screen, which is more than the blank screen
> before! But the computer was not responding at all, I could only move my
> mouse. No keyboard input at all, and the cursor in the text field wasn´t
> blinking. CTRL-ALT-Fx didnt work either.
>
> After a hard reboot I tried the command metacity --replace before
> suspending, after which suspending/resuming worked like a charm (Except
> for the Atheros wifi, but that is another bug I have to file I guess :P)

Well, that's promising. I would try putting the scripts where I put
them.

> After another reboot I tried to suspend again with compiz active, but
> this time I didn't even get the login screen.
>
> I have not installed fusion-icon yet, that is the first thing I'm gonna
> try now.
>
> I would appreciate some more help with this, and maybe see if others
> have it too with Intrepid RC?

I probably can't go much further with this than I have already.
Although I will probably make a feeble attempt to get Intrepid to work
on my T60p, I am not going to sink a lot more time into getting Ubuntu
to work properly on a laptop, and the Ubuntu devs are not picking up my
workarounds and folding them into the distro. My Hardy suspend/resume
stopped working reliably again a few weeks ago, for unknown reasons.
I'm planning to switch to Macbook Pro as soon as the new 17" models come
out; I just can't waste any more of my life getting the thing to an
acceptable baseline working state.

Regards,

--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

DaveAbrahams (boostpro) wrote :

on Sun Oct 26 2008, schlehmil <mail-AT-schlehmil.org> wrote:

> It is a T60 with a X1400. The only thing I did was to modify the kernel
> parameters. GRUB passes vga=794 acpi_sleep=s3_bios pci=bios to the
> kernel. I do not know if this (acpi_sleep=...) is still needed but it
> has fixed some of my suspend problems years ago.

Yeah, as far as I know the bug addressed by these scripts is a
FireGL-52xx-only issue.

--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

Hello Dave,

no it is not. Suspend also fails with X1400 on my T60 or X300 on my old T43 on Ubuntu Hardy without the scripts you have posted above. With Intrepid everthing works fine for me. Do not know if this bug has been fixed in ATI binary driver or in compiz.

Regards,
Thilo

Crackh34d (bartmacco) wrote :

Dear Dave,

I put those files where you told me, and had one perfect suspend/resume! But when I tried again for a few times, the screen remained black. If I manually start metacity before suspending, all is fine, as expected.

I will do some more testing

It is a shame the Ubuntu-devs don't pick up on your scripts, but I do really appreciate your work! :)

I think my problem with suspending using the open source driver is related to Bug #272969

But is it correct that your file 25-compiz-stop.sh is no longer needed? I only need 00compiz and 00-compiz-stop.sh?

Thanks for your great work so far,
Bart

on Mon Oct 27 2008, Crackh34d <bartmacco-AT-hotmail.com> wrote:

> Dear Dave,
>
> I put those files where you told me, and had one perfect suspend/resume!
> But when I tried again for a few times, the screen remained black. If I
> manually start metacity before suspending, all is fine, as expected.
>
> I will do some more testing
>
> It is a shame the Ubuntu-devs don't pick up on your scripts, but I do
> really appreciate your work! :)
>
> I think my problem with suspending using the open source driver is
> related to Bug #272969
>
> But is it correct that your file 25-compiz-stop.sh is no longer
> needed?

Yes, that's correct.

> I only need 00compiz and 00-compiz-stop.sh?
>
> Thanks for your great work so far,

You're welcome!

--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

[This is an automated message]

As of Intrepid (8.10), we have a dedicated package 'fglrx-installer' for fglrx bugs, which now includes a process for upstreaming bugs to AMD.

  http://bugs.launchpad.net/ubuntu/+source/fglrx-installer

To transition your bug into the new fglrx-package, we need your help. Please do the following:

 a. Verify the bug occurs in Intrepid.
     (Intrepid ISOs: http://cdimage.ubuntu.com)
 b. If you haven't already, please include in the bug:
     * Your /var/log/Xorg.0.log
     * The output of `lspci -vvnn`
     * Steps to reproduce the issue
 c. Click 'Also affects distribution'
 d. Set 'Source Package Name' to 'fglrx-installer'
 e. Click Continue

Thank you. This will assist us in reviewing and upstreaming your fglrx bug, as appropriate.

[We'll expire the fglrx bugs in l-r-m-* in a month or so.]

Crackh34d (bartmacco) wrote :

The bug is also in Intrepid

Ok, here is my Xorg.0.log and output of lspci -vvnn.

To reproduce, you need to have Compiz running when suspending. Upon resume, the screen will stay blank. When Compiz was running but you turn it off just before resuming (eg fusion-icon) suspend and resume work fine.

Crackh34d (bartmacco) wrote :
Bryce Harrington (bryce) on 2008-11-03
Changed in fglrx-installer:
importance: Undecided → Medium
status: New → Confirmed
Mario Limonciello (superm1) wrote :

This bug is sufficient to forward upstream now.

Changed in fglrx-installer:
status: Confirmed → Triaged
Changed in linux-restricted-modules-2.6.24:
status: Confirmed → Won't Fix
Bryce Harrington (bryce) on 2008-11-24
Changed in fglrx:
importance: Undecided → Medium
status: New → Confirmed
Jarkko Lietolahti (jarkko-jab) wrote :

Resume just hang with Jaunty.
1) Suspend ok
2) Resume starts to work, there's the unlocking (username,password) widget on the screen but no text inside it. Mourse cursor moves for awhile (~1min) but then it also freezes. Hard-drive light shows some activity. Magic keys (Alt-SysRd-REISUB) do not work. Only option is to hard reboot by keeping power button pressed on for 10 seconds.

 I think I've "some visual desktop effects" turned on, even thoug there's no check in the settings (does he tick go away if I change the setting through ccsm?).

 Attaching output of lspci -vvnn and Xorg.log.

Jarkko Lietolahti (jarkko-jab) wrote :
Jarkko Lietolahti (jarkko-jab) wrote :

I think the importance should be bumbed up as hanging when resuming may cause data loss. Especially when resume is half way up.

Jarkko Lietolahti (jarkko-jab) wrote :

Also attaching Xorg.0.log.old (does ubuntu copy Xorg.0.log to old automatically now upon reboot? That's nice) because there's some interesting messages there (see the end);

[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] mieqEnequeue: out-of-order valuator event; dropping.

repeat alot. Any idea if this is relevant?

Jarkko Lietolahti (jarkko-jab) wrote :
Download full text (5.0 KiB)

/var/log/messages also contains ASIC hang message:
Nov 24 23:29:00 gandalf kernel: [34547.824138] [fglrx] ASIC hang happened
Nov 24 23:29:00 gandalf kernel: [34547.824159] Pid: 6643, comm: Xorg Tainted: P 2.6.27-8-generic #1
Nov 24 23:29:00 gandalf kernel: [34547.824164]
Nov 24 23:29:00 gandalf kernel: [34547.824166] Call Trace:
Nov 24 23:29:00 gandalf kernel: [34547.824305] [<ffffffffa03474ce>] KCL_DEBUG_OsDump+0xe/0x10 [fglrx]
Nov 24 23:29:00 gandalf kernel: [34547.824404] [<ffffffffa035508c>] firegl_hardwareHangRecovery+0x1c/0x50 [fglrx]
Nov 24 23:29:00 gandalf kernel: [34547.824549] [<ffffffffa03cb499>] ? _ZN4Asic9WaitUntil15ResetASICIfHungEv+0x9/0x10 [fglrx]
Nov 24 23:29:00 gandalf kernel: [34547.824690] [<ffffffffa03cb44c>] ? _ZN4Asic9WaitUntil15WaitForCompleteEv+0x6c/0xb0 [fglrx]
Nov 24 23:29:00 gandalf kernel: [34547.824832] [<ffffffffa03ca53d>] ? _ZN4Asic19PM4ElapsedTimeStampERK23PM4_TS_INTERRUPT_PARAMSj14_LARGE_INTEGER+0x18d/0x1c0 [fglrx]
Nov 24 23:29:00 gandalf kernel: [34547.824846] [<ffffffff80245b80>] ? load_balance_newidle+0x90/0x280
Nov 24 23:29:00 gandalf kernel: [34547.824855] [<ffffffff80210219>] ? test_ti_thread_flag+0x9/0x20
Nov 24 23:29:00 gandalf kernel: [34547.824861] [<ffffffff802107f6>] ? __switch_to+0x3e6/0x490
Nov 24 23:29:00 gandalf kernel: [34547.824879] [<ffffffff8050066a>] ? thread_return+0x6d/0x3c3
Nov 24 23:29:00 gandalf kernel: [34547.825016] [<ffffffffa03bd492>] ? _Z19uQSTimeStampRetiredmjj14_LARGE_INTEGER+0xd2/0xe0 [fglrx]
Nov 24 23:29:00 gandalf kernel: [34547.825160] [<ffffffffa03b9479>] ? _Z8uCWDDEQCmjjPvjS_+0x379/0x1040 [fglrx]
Nov 24 23:29:00 gandalf kernel: [34547.825269] [<ffffffffa036ffd8>] ? firegl_cmmqs_CWDDE_32+0x368/0x410 [fglrx]
Nov 24 23:29:00 gandalf kernel: [34547.825373] [<ffffffffa036eba3>] ? firegl_cmmqs_CWDDE32+0x73/0x110 [fglrx]
Nov 24 23:29:00 gandalf kernel: [34547.825386] [<ffffffff803862e1>] ? apparmor_capable+0x21/0x70
Nov 24 23:29:00 gandalf kernel: [34547.825489] [<ffffffffa036eb30>] ? firegl_cmmqs_CWDDE32+0x0/0x110 [fglrx]
Nov 24 23:29:00 gandalf kernel: [34547.825582] [<ffffffffa0350acd>] ? firegl_ioctl+0x1ed/0x260 [fglrx]
Nov 24 23:29:00 gandalf kernel: [34547.825590] [<ffffffff8021b1a0>] ? init_fpu+0x60/0x120
Nov 24 23:29:00 gandalf kernel: [34547.825680] [<ffffffffa03456a6>] ? ip_firegl_ioctl+0x16/0x20 [fglrx]
Nov 24 23:29:00 gandalf kernel: [34547.825688] [<ffffffff802f85c5>] ? vfs_ioctl+0x85/0xb0
Nov 24 23:29:00 gandalf kernel: [34547.825693] [<ffffffff802f8873>] ? do_vfs_ioctl+0x283/0x2f0
Nov 24 23:29:00 gandalf kernel: [34547.825700] [<ffffffff802f8981>] ? sys_ioctl+0xa1/0xb0
Nov 24 23:29:00 gandalf kernel: [34547.825707] [<ffffffff8021285a>] ? system_call_fastpath+0x16/0x1b
Nov 24 23:29:00 gandalf kernel: [34547.825712]
Nov 24 23:29:00 gandalf kernel: [34547.825718] pubdev:0xffffffffa05459c0, num of device:1 , name:fglrx, major 8, minor 55.
Nov 24 23:29:00 gandalf kernel: [34547.825725] device 0 : 0xffff88013d080000 .
Nov 24 23:29:00 gandalf kernel: [34547.825730] Asic ID:0x9591, revision:0x51, MMIOReg:0xffffc200110e0000.
Nov 24 23:29:00 gandalf kernel: [34547.825736] FB phys addr: 0xe0000000, MC :0xc0000000, Total FB size :0x40000000....

Read more...

Bryce Harrington (bryce) on 2008-12-09
Changed in fglrx:
status: Confirmed → Triaged
Bryce Harrington (bryce) on 2008-12-17
Changed in fglrx:
importance: Medium → Wishlist
DaveAbrahams (boostpro) wrote :
Download full text (4.0 KiB)

The current status of this bug for me in Intrepid is that without my scripts, resume works the first time and then fails the second time. The failure manifests itself by showing me a blank screen. If I wait long enough, X is restarted automatically; it appears to be on vt8 instead of vt7.

Unfortunately, I've been unable to get my scripts to fix the problem.

Suspend and resume under metacity work just fine.

I'm attaching Xorg.1.log and Xorg.0.log, which cover the suspend cycle in question (seems to be right on the boundary between them?). Also, I'm seeing this in my syslog, for what that's worth:

Dec 21 14:34:24 mcbain kernel: [ 2457.812276] PM: suspend devices took 31.904 seconds
Dec 21 14:34:24 mcbain kernel: [ 2457.812289] ------------[ cut here ]------------
Dec 21 14:34:24 mcbain kernel: [ 2457.812293] WARNING: at /build/buildd/linux-2.6.27/kernel/power/main.c:176 suspend_test_finish+0x74/0x80()
Dec 21 14:34:24 mcbain kernel: [ 2457.812298] Modules linked in: aes_i586 aes_generic af_packet binfmt_misc rfcomm bnep sco l2cap ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp bridge stp kvm_intel kvm ppdev ipv6 acpi_cpufreq cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand freq_table cpufreq_conservative wmi sbs sbshc pci_slot container rpcsec_gss_krb5 auth_rpcgss nfs lockd nfs_acl sunrpc iptable_filter ip_tables x_tables ext3 jbd mbcache parport_pc lp parport loop joydev pcmcia snd_hda_intel snd_pcm_oss snd_mixer_oss arc4 snd_pcm ecb crypto_blkcipher thinkpad_acpi nvram snd_seq_dummy btusb iwl3945 evdev rfkill mac80211 bluetooth battery snd_seq_oss snd_seq_midi led_class ac snd_rawmidi psmouse snd_seq_midi_event cfg80211 serio_raw snd_seq yenta_socket fglrx(P) video rsrc_nonstatic bay output pcmcia_core snd_timer snd_seq_device intel_agp button pcspkr snd soundcore agpgart iTCO_wdt iTCO_vendor_support shpchp pci_hotplug snd_page_alloc jfs sr_mo
Dec 21 14:34:24 mcbain kernel: cdrom sd_mod crc_t10dif sg pata_acpi ata_piix ata_generic libata uhci_hcd scsi_mod ehci_hcd dock usbcore e1000e dm_mirror dm_log dm_snapshot dm_mod thermal processor fan fuse vesafb fbcon tileblit font bitblit softcursor
Dec 21 14:34:24 mcbain kernel: [ 2457.812446] Pid: 10335, comm: pm-suspend Tainted: P W 2.6.27-10-generic #1
Dec 21 14:34:24 mcbain kernel: [ 2457.812451] [<c037c9f6>] ? printk+0x1d/0x1f
Dec 21 14:34:24 mcbain kernel: [ 2457.812461] [<c0131dd9>] warn_on_slowpath+0x59/0x90
Dec 21 14:34:24 mcbain kernel: [ 2457.812480] [<c014d285>] ? sched_clock_cpu+0xd5/0x170
Dec 21 14:34:24 mcbain kernel: [ 2457.812485] [<c014c09f>] ? down_trylock+0x2f/0x40
Dec 21 14:34:24 mcbain kernel: [ 2457.812488] [<c01324b2>] ? try_acquire_console_sem+0x12/0x40
Dec 21 14:34:24 mcbain kernel: [ 2457.812492] [<c024e920>] ? kobject_put+0x20/0x50
Dec 21 14:34:24 mcbain kernel: [ 2457.812496] [<c02c7df6>] ? suspend_device+0x96/0x140
Dec 21 14:34:24 mcbain kernel: [ 2457.812501] [<c015d234>] suspend_test_finish+0x74/0x80
Dec 21 14:34:24 mcbain kernel: [ 2457.812504] [<c015d2bd>] suspend_devices_and_enter+0x7d/0x190
Dec 21 14:34:24 mcbain kernel: [ 2457.812507] [<c015d5a1>] enter_state+0xd1/0x100
Dec 21...

Read more...

DaveAbrahams (boostpro) wrote :
DaveAbrahams (boostpro) wrote :

Correction: it wasn't working at all

@schlehmil: I have no idea what made things work OOTB for you unless you unless you were unintentionally using the open source driver rather than fgrlx. Everything still failed in exactly the same way for me.

However, the good news is that the fix is the same: kill off compiz before any VT switch takes place. The bad news is that the old scripts don't work. I think this is why Hardy suspend stopped working for me after a while also: pm-utils uses a new protocol for which the "redefine chvt" hack no longer works. I think I may have picked that new version of pm-utils up by enabling the "backports" repository under Hardy.

I am attaching two new files that work under intrepid; you should use these instead of the earlier ones posted for Hardy. I'm also going to delete some of the older versions of these files if the interface will let me, just to reduce clutter.

DaveAbrahams (boostpro) wrote :

The previous file is /etc/pm/config.d/50compiz-fglrx-noclear

This one is /etc/pm/sleep.d/00compiz-fglrx, and should be marked executable with "chmod +x /etc/pm/sleep.d/00compiz-fglrx"

Just tested DaveAbrahams' new scripts and they're working for me. At first they hadn't -- it suspended OK, but resumed to the blinking cursor. I then noticed that when I'd cp'ed /etc/pm/sleep.d/00compiz-fglrx into place, I'd done so with ownership by my ordinary login uid and not root. I chown'ed it and also defined a LOG_FILE_NAME on line 41 to suss out any further issues (because /var/log/pm-suspend.log wasn't showing anything useful). After this, I was able to suspend/resume three times in a row without a problem. Not sure why this fixed it -- perhaps it was previously trying to save the pickle file to /var/run as me instead of as root; but the file isn't SUID so I'm not sure why that would happen. Some ACPI hook magic, maybe...?

In any case, thanks once again to DaveAbrahams for sharing his workarounds. It sure is nice to be able to use compiz again.

Dave Hay (david-hay) wrote :

I can confirm that Dave's circumvention works for me as well - I'm using Intrepid on a 2007-AE7 Thinkpad T60p with the binary fglrx driver and compiz.

I've written this up on my blog here: -

http://www.davehay.f2s.com/2009/01/happy-new-year-and-its-looking-good-so.html

The post includes the scripts that I've used in three different Intrepid builds, as well as my xorg.conf and lspci listing.

As per other comments here, much kudos to Dave to coming up with this. In my opinion, this is still a circumvention to a bug, rather than a permanent solution, especially given that suspend/resume was broken ( for me ) during a recent update after the Intrepid installation.

Dejan (dejan-rodiger) wrote :

Hi All,

I have IBM/Lenovo R60 with x1400 (9461-DXG). I am using Interpid with fglrx driver and compiz. My problem is similar to yours. When I start laptop only on battery power it comes to starting the gdm login, but it only shows blank screen. I can't Ctrl-Alt-F1 to F5 or anything else. Well probably it can go to VT1 but it doesn't show on screen. I know laptop is working, because CtrlAlt-Del (reboots) is working and I can ssh to my laptop. Everything else is starting properly.
GDM login doesn't start when I start laptop on battery power, but it works when it is started on normal power and then switched to battery. I think compiz is not part of the problem, since it happens during the gdm login startup, before I am logged in.

Should I report this problem somewhere else?

Thanks

I see this in /var/log/messages:
 [fglrx] ASIC hang happened
 Pid: 6557, comm: Xorg Tainted: P 2.6.27-11-generic #1
and then a Call trace

 Call Trace:
[<ffffffffa028f4de>] KCL_DEBUG_OsDump+0xe/0x10 [fglrx]
[<ffffffffa029d08c>] firegl_hardwareHangRecovery+0x1c/0x50 [fglrx]
[<ffffffffa0313499>] ? _ZN4Asic9WaitUntil15ResetASICIfHungEv+0x9/0x10 [fglrx]
[<ffffffffa031344c>] ? _ZN4Asic9WaitUntil15WaitForCompleteEv+0x6c/0xb0 [fglrx]
[<ffffffffa031253d>] ? _ZN4Asic19PM4ElapsedTimeStampERK23PM4_TS_INTERRUPT_PARAMSj14_LARGE_INTEGER+0x18d/0x1c0 [fglrx]
[<ffffffffa0305492>] ? _Z19uQSTimeStampRetiredmjj14_LARGE_INTEGER+0xd2/0xe0 [fglrx]
[<ffffffffa0301479>] ? _Z8uCWDDEQCmjjPvjS_+0x379/0x1040 [fglrx]
[<ffffffffa02b7fd8>] ? firegl_cmmqs_CWDDE_32+0x368/0x410 [fglrx]
[<ffffffffa02b6ba3>] ? firegl_cmmqs_CWDDE32+0x73/0x110 [fglrx]
[<ffffffff80386a31>] ? apparmor_capable+0x21/0x70
[<ffffffffa02b6b30>] ? firegl_cmmqs_CWDDE32+0x0/0x110 [fglrx]
[<ffffffffa0298acd>] ? firegl_ioctl+0x1ed/0x260 [fglrx]
[<ffffffffa028d6b6>] ? ip_firegl_ioctl+0x16/0x20 [fglrx]
[<ffffffff802f8975>] ? vfs_ioctl+0x85/0xb0
[<ffffffff802f8c13>] ? do_vfs_ioctl+0x273/0x2f0
[<ffffffff802f8d31>] ? sys_ioctl+0xa1/0xb0
[<ffffffff8021285a>] ? system_call_fastpath+0x16/0x1b
pubdev:0xffffffffa048d9c0, num of device:1 , name:fglrx, major 8, minor 55.
device 0 : 0xffff88007c874000 .
Asic ID:0x7145, revision:0xd, MMIOReg:0xffffc20005000000.
FB phys addr: 0xd8000000, MC :0xc0000000, Total FB size :0x8000000.
gart table MC:0xc7fb7000, Physical:0xdffb7000, size:0x44000.
mc_node :MC_NODE__FB, total 1 zones
MC start:0xc0000000, Physical:0xd8000000, size:0x8000000.
Mapped heap -- Offset:0x0, size:0x7fb7000, reference count:5, mapping count:0,
Mapped heap -- Offset:0x0, size:0x1000000, reference count:1, mapping count:0,
Mapped heap -- Offset:0x7fb7000, size:0x44000, reference count:1, mapping count:0,
 mc_node :MC_NODE__GART_USWC, total 2 zones
MC start:0xb40c0000, Physical:0x0, size:0xbf40000.
Mapped heap -- Offset:0x0, size:0x2000000, reference count:3, mapping count:0,
mc_node :MC_NODE__GART_CACHEABLE, total 3 zones
MC start:0xaf400000, Physical:0x0, size:0x4cc0000.
Mapped heap -- Offset:0x0, size:0x200000, reference count:2, mapping count:0,
Dump the trace queue.
End of dump

Dejan (dejan-rodiger) wrote :

I forgot to tell that I am using x86_64:
Linux drodiger-tp1 2.6.27-11-generic #1 SMP Fri Dec 19 16:29:35 UTC 2008 x86_64 GNU/Linux

01:00.0 VGA compatible controller [0300]: ATI Technologies Inc Radeon Mobility X1400 [1002:7145]
 Subsystem: Lenovo Device [17aa:2006]
 Flags: bus master, fast devsel, latency 0, IRQ 16
 Memory at d8000000 (32-bit, prefetchable) [size=128M]
 I/O ports at 2000 [size=256]
 Memory at ee000000 (32-bit, non-prefetchable) [size=64K]
 [virtual] Expansion ROM at ee020000 [disabled] [size=128K]
 Capabilities: <access denied>
 Kernel driver in use: fglrx_pci
 Kernel modules: fglrx

on Sun Jan 04 2009, Dejan <dejan.rodiger-AT-ck.t-com.hr> wrote:

> Hi All,
>
> I have IBM/Lenovo R60 with x1400 (9461-DXG). I am using Interpid with fglrx driver and
> compiz. My problem is similar to yours. When I start laptop only on battery power it
> comes to starting the gdm login, but it only shows blank screen.

Not similar enough, I'm afraid.

> Should I report this problem somewhere else?

Yes, this is almost certainly a different bug, since it has nothing to
do with suspend/resume. Please open a new issue.

Cheers,

--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

DaveAbrahams (boostpro) wrote :

To anyone watching this bug for my scripts, I am now hosting them at:

  http://github.com/techarcana/fglrx-support

You can use the "watch" button to subscribe to updates.

Bryce Harrington (bryce) wrote :

AMD reports that this issue is fixed with the 2.6.27.9 kernel.

Suspend/Resume problems often are kernel issues rather than xserver particularly, so if anyone is still having suspend/resume related problems with -fglrx on Intrepid or newer, please file a new bug (one bug report per machine per issue please!) against linux.

Changed in fglrx-installer:
status: Triaged → Fix Released
Bryce Harrington (bryce) wrote :

[AMD is closing EPR 260329]

Changed in fglrx:
status: Triaged → Fix Released
Dave Hay (david-hay) wrote :

Following Bryce Harrington's comments above, I do NOT find the problem to be fixed in the kernel - I am running 2.6.27-11-generic on a 32bit Intel dual core ( Thinkpad T60p 2007-AE7 ) and still find that I need to use Dave Abraham's scripts.

Bryce comments that AMD report the problem being fixed in 2.6.27.9 - given that 2.6.27.11 is now available, it's my assumption that, if it was fixed, it's been broken again - with apparently identical symptoms.

Anyone else able to test / comment ?

Slobodan Simić (slsimic) wrote :

Well only thing I know is that suspend/resume does NOT working. Even with newest update of kernel, 2.6.27-12 and newest Catalyst 9.1 fglrx, I still can't make my laptop resume from suspend. I use KDE4 but with effects disabled.

Thanks for verifying what I just found out in the past 2 days. KDE4.1 on Kubuntu Intrepid (8.10) on the T60p does NOT resume even with Dave's great scripts.

I'm testing adding GDM (ubuntu-desktop), Gnome to the Kubuntu install and see if that helps. I did finde KDE great looking, but somewhat restrictive and buggy. Maybe I'll try KDE again after the fix:

Bluetooth
crashing plasma
crashing dolphin on CTRL+X cut and paste

I'll post an update about gnome suspend resume in a day or so.
Mike

I don't think using gnome desktop and Dave's scripts helped. I was able to resume again, but compiz didn't restart and when I used fusion-icon to start and or refresh, the window'ed contents are frozen but interactable via emerald. I can move the windows, but can't type in the console or click on FF links etc..

I'm going to rip out fglrx and try with the all open source for sanity!

Mike

Is there anything else needed besides the scripts and setting them read/write/execute? I've got a clean Intrepid install and cannot get resume to work more than once without rebooting. I tried the ubuntu restricted drivers as well as envy-ng.

This persistent bug has me thinking about opensuse

Seems to have started working after a reboot. I did attempt a opensuse live cd via unetbootin which didn't work. Perhaps the Intrepid Gods were threated by that and decided to shape up! I did a supend resume about a half dozen times this am and all worked great!

Thanks to all!

DaveAbrahams (boostpro) wrote :

Bryce, are you claiming that with my stock ubuntu 2.6.27-11 kernel (which postdates 2.6.27-9), I can remove all my hacks and suspend/resume will work? If that doesn't turn out to be the case, then closing this bug would be *wrong* because after all, the OP reported the problem that's addressed by my hacks and not whatever AMD thinks is the same as this. Unless the bug has gone away for me, making me open a new bug with the same info seems backwards.

Urban Engberg (uengberg) wrote :

I can confirm that resume is *not* working on my Lenovo W500 with an ATI Mobility FireGL V5700,fglrx, compiz, Intrepid, 2.6.27-11-server. It gives me the black screen when trying to resume.

One try with Dave's scripts (http://github.com/techarcana/fglrx-support) made it work perfectly -- thanks, Dave.

OpenSUSE is not for me, I'm back on Ubuntu, but rather than redo'ing Intrepid, I went alpha Jaunty. I'm impressed and now my T60p (5250) works without fglrx! Yes, that is it works with full 3d compositing, full compiz, very smooth. Suspend and resume in seconds with NO issues other than those associated with running alpha/beta Ubuntu.

I'm happy!

Mike

Displaying first 40 and last 40 comments. View all 106 comments or add a comment.