[i945] [KMS] [GM945]: Xorg hang after resume from suspend

Bug #382884 reported by Carey Underwood
32
This bug affects 4 people
Affects Status Importance Assigned to Milestone
X.Org X server
Fix Released
Critical
linux (Ubuntu)
Fix Released
High
Unassigned

Bug Description

Binary package hint: xorg

With xorg-edgers and 2.6.30-rc7 from the mainline kernel, enabling KMS (via i915.modeset=1) causes the system to hang after resuming from S3.

I have the output of intel_gpu_dump attached.

As far as I can tell, this is the same as bug #381659, but I'm attaching this to a separate report as preferred by upstream.

ProblemType: Bug
Architecture: i386
Date: Tue Jun 2 13:33:13 2009
DistroRelease: Ubuntu 9.10
MachineType: Acer Aspire 3690
Package: xorg 1:7.4~5ubuntu20
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.30-999-generic root=UUID=681fabcf-a526-481f-9858-80d1901cdcb4 ro single
ProcEnviron:
 SHELL=/bin/bash
 PATH=(custom, user)
 LANG=en_CA.UTF-8
RelatedPackageVersions:
 xserver-xorg 1:7.4~5ubuntu20
 libgl1-mesa-glx 7.6.0~git20090601.9f6ec50f-0ubuntu0sarvatt
 libdrm2 2.4.11+git20090519.f355ad89-0ubuntu0sarvatt
 xserver-xorg-video-intel 2:2.7.99.1+git20090602.ec2fde7c-0ubuntu0sarvatt
 xserver-xorg-video-ati 1:6.12.99+git20090529.7599dc40-0ubuntu0sarvatt
SourcePackage: xorg
Uname: Linux 2.6.30-999-generic i686
dmi.bios.date: 02/13/2007
dmi.bios.vendor: Acer
dmi.bios.version: V3.50
dmi.board.name: Grapevine
dmi.board.vendor: Acer
dmi.board.version: N/A
dmi.chassis.type: 10
dmi.chassis.vendor: Acer
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnAcer:bvrV3.50:bd02/13/2007:svnAcer:pnAspire3690:pvrV3.50:rvnAcer:rnGrapevine:rvrN/A:cvnAcer:ct10:cvrN/A:
dmi.product.name: Aspire 3690
dmi.product.version: V3.50
dmi.sys.vendor: Acer
fglrx: Not loaded
fglrx-loaded: Error: command ['grep', 'fglrx', '/var/log/kern.log', '/proc/modules'] failed with exit code 1:
system:
 distro: Ubuntu
 architecture: i686kernel: 2.6.30-999-generic

Revision history for this message
In , Martin Pitt (pitti) wrote :

Created an attachment (id=23611)
dmesg

dmesg output (nothing interesting after the freeze). This is a clean boot, hibernate, and resume.

Revision history for this message
In , Martin Pitt (pitti) wrote :

Created an attachment (id=23612)
registers after clean boot

Revision history for this message
In , Martin Pitt (pitti) wrote :

Created an attachment (id=23613)
registers after hibernate

Probably not too interesting, since right after hibernate, everything works fine, but for completeness:

$ diff -U0 regs.cleanboot.txt regs.afterhibernate.txt
--- regs.cleanboot.txt 2009-03-06 18:49:36.000000000 +0100
+++ regs.afterhibernate.txt 2009-03-07 08:24:27.000000000 +0100
@@ -34 +34 @@
-(II): LVDS: 0xc0308300 (enabled, pipe B, 18 bit, 1 channel)
+(II): LVDS: 0x40300300 (disabled, pipe B, 18 bit, 1 channel)
@@ -46 +46 @@
-(II): PFIT_CONTROL: 0x00000000
+(II): PFIT_CONTROL: 0x00002668
@@ -166 +166 @@
-(II): pipe B dot 77142 n 2 m1 14 m2 8 p1 2 p2 14
+(II): pipe B dot 108000 n 2 m1 14 m2 8 p1 2 p2 10

Revision history for this message
In , Martin Pitt (pitti) wrote :

Created an attachment (id=23614)
registers after screen freeze

Revision history for this message
In , Martin Pitt (pitti) wrote :

Created an attachment (id=23615)
Xorg.0.log

Revision history for this message
In , Martin Pitt (pitti) wrote :

Created an attachment (id=23617)
stracing X after freeze

This is from ssh'ing into the frozen box and attaching strace to X. I see

ioctl(11, 0x6458, 0) =

Then I walked over, wiggled the mouse a bit, and pressed two keys. The strace shows that apparently those events were still received, and it didn't get stuck in a tight infinite loop or something like this. Thus I think that by and large the server still worked.

However, it should be noted that I tried to press "q" to quit the mutt I was working on when the freeze started. Going back to the ssh session mutt was still running, so I don't think that the "q" keypress actually made it all the way through to mutt. So maybe it's not just a screen freeze, but a little harder than that.

Revision history for this message
In , Martin Pitt (pitti) wrote :

Trying to attach gdb wasn't very successful unfortunately. I do have the debug symbols of X.org, libx11, libc6, etc. installed, but still the stack trace is totally useless. Perhaps the "Cannot access memory at address 0xffe85fec" has something to do with it, but I don't know why it's doing that.

$ ps aux|grep X
root 3470 0.0 5.2 115892 53076 tty7 Ss+ Mar06 0:45 /usr/bin/X :0 -br -audit 0 -auth /var/lib/gdm/:0.Xauth -nolisten tcp vt7
martin 6497 0.0 0.0 3348 816 pts/0 S+ 08:39 0:00 grep X
0 martin@tick:~/xdebug
$ sudo gdb /usr/bin/X
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
(no debugging symbols found)
(gdb) attach 3470
Attaching to program: /usr/bin/X, process 3470
Cannot access memory at address 0xffe85fec
(gdb) bt
#0 0xb7f2b430 in ?? ()
#1 0xb783fee2 in ?? ()
#2 0xb77d60ff in ?? ()
#3 0x0817c0eb in ?? ()
#4 0x08145088 in ?? ()
#5 0x080910c8 in ?? ()
#6 0x081319a4 in ?? ()
#7 0x0808d1ce in ?? ()
#8 0x080721fd in ?? ()
#9 0xb7af5775 in ?? ()
#10 0x080716b1 in ?? ()
(gdb) quit
The program is running. Quit anyway (and detach it)? (y or n) y
Detaching from program: /usr/bin/X, process 3470

XrandR information: (LVDS off, external TFT on, laptop is docked and closed):

$ xrandr
Screen 0: minimum 320 x 200, current 1280 x 1024, maximum 1280 x 1280
VGA disconnected (normal left inverted right x axis y axis)
LVDS connected (normal left inverted right x axis y axis)
   1280x800 59.8 +
   1024x768 85.0 75.0 70.1 60.0
   832x624 74.6
   800x600 85.1 72.2 75.0 60.3 56.2
   640x480 85.0 72.8 75.0 59.9
   720x400 85.0
   640x400 85.1
   640x350 85.1
TMDS-1 connected 1280x1024+0+0 (normal left inverted right x axis y axis) 340mm x 270mm
   1280x1024 75.0*+ 60.0
   1280x960 60.0
   1152x864 75.0
   1024x768 85.0 75.0 70.1 60.0
   832x624 74.6
   800x600 85.1 72.2 75.0 60.3 56.2
   640x480 85.0 75.0 72.8 66.7 59.9
   720x400 70.1
TV disconnected (normal left inverted right x axis y axis)

Revision history for this message
In , Martin Pitt (pitti) wrote :

Created an attachment (id=23618)
lspci -vvnn

Revision history for this message
In , Eric Anholt (eric-anholt) wrote :

that bit just says that the ring is busy -- it's probably just a side effect of the chip being hung.

Revision history for this message
In , Martin Pitt (pitti) wrote :

Finally, my xorg.conf:

$ cat /etc/X11/xorg.conf
Section "Device"
        Identifier "Configured Video Device"
        Option "FramebufferCompression" "off"
EndSection

I need to set this option because of bug 19304.

Revision history for this message
In , Martin Pitt (pitti) wrote :

I confirm that this also happens if I use the laptop undocked, with just the internal LVDS:

$ xrandr
Screen 0: minimum 320 x 200, current 1280 x 800, maximum 1280 x 1280
VGA disconnected (normal left inverted right x axis y axis)
LVDS connected 1280x800+0+0 (normal left inverted right x axis y axis) 261mm x 163mm
   1280x800 59.8*+
   1024x768 85.0 75.0 70.1 60.0
   832x624 74.6
   800x600 85.1 72.2 75.0 60.3 56.2
   640x480 85.0 72.8 75.0 59.9
   720x400 85.0
   640x400 85.1
   640x350 85.1
TMDS-1 disconnected (normal left inverted right x axis y axis)
TV disconnected (normal left inverted right x axis y axis)

Revision history for this message
In , Lubos Kolouch (lubos-kolouch) wrote :

I can confirm this issue, it happens in Gentoo and Arch for me... it is very annoying

Revision history for this message
In , Martin Pitt (pitti) wrote :

I have now upgraded to Linux 2.6.28.8 and -intel 2.6.3, and suspend/hibernate now works fine again, no hangs any more. Thus I tentatively close this now.

Lubos, if it still happens for you with the latest version, please reopen.

Revision history for this message
In , Martin Pitt (pitti) wrote :

Sorry, just got it again. It seems to happen a lot less often now, but still there.

Revision history for this message
In , Lubos Kolouch (lubos-kolouch) wrote :

It *just* happened to me as well.
I did several times suspend & resume during the weekend, all OK,
but now X stopped responding.

Gentoo kernel 2.6.28-r4, xf86-video-intel-2.6.3-r1

Revision history for this message
In , Lubos Kolouch (lubos-kolouch) wrote :

After latest upgrade it happens again 100% of the time...
work->hibernate->resume->wait->freeze->reboot

gentoo-sources-2.6.29
xf86-video-intel-2.6.3-r1
mesa-7.3-r1

Revision history for this message
In , Martin Pitt (pitti) wrote :

Confirmed that this still happens with the latest (v 5) patch in bug 18651, so this is apparently not related to pipe underruns.

Revision history for this message
In , Lubos Kolouch (lubos-kolouch) wrote :

I wonder if it is not related to

http://bugzilla.kernel.org/show_bug.cgi?id=12778

Revision history for this message
In , Martin Pitt (pitti) wrote :

Indeed, I also get this message when it happens:

Mar 29 23:32:54 tick kernel: [14858.069290] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count f
or disabled pipe 1
Mar 29 23:32:54 tick kernel: [14858.074255] mtrr: no MTRR for d0000000,10000000 found

Revision history for this message
In , Martin Pitt (pitti) wrote :

I confirm that running X with Option "DRI" "off", and rmmod'ing i915 and drm, suspend works fine. This might indicate that http://bugzilla.kernel.org/show_bug.cgi?id=12778 is indeed the cause of this.

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Can you confirm that you're not running 'vbetool post' or with any of the ACPI S3 reposting stuff? That's caused problems for us in the past...

Revision history for this message
In , Martin Pitt (pitti) wrote :

I just gave a thorough testing to the pm-utils scripts and quirks, and confirm that /usr/lib/pm-utils/sleep.d/98smart-kernel-video still does the right thing. I. e. it filters out all quirks for intel on >= 2.6.26 and thus does not run any quirks (and thus no VBE post/S3 stuff).

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

So far it looks ok on my 945 with the latest Jaunty bits (so 2.6.28-11-generic and xf86-video-intel 2.6.3), but I've only been waiting a few minutes (while moving windows around and browsing the web). Can you reproduce it with the 2.6.3 driver? It has quite a few fixes that might be relevant.

Revision history for this message
In , Martin Pitt (pitti) wrote :

When I tested the suspend quirks, I was running with DRI enabled again (on current Jaunty, i. e. with 2.6.3). It indeed survived for about 10 minutes, then it froze. This also happened to a colleague of mine here at the CELF/LF summit, who also has a 945.

As I said, it is totally erratic. I had it survive for as much as 2 hours, then only for 1 minute, in most of the cases it's like 5 minutes. I couldn't see a pattern when it happens wrt. to the actions performed. In many cases I was just reading something and didn't even move the mouse.

Revision history for this message
In , Martin Pitt (pitti) wrote :

We finally found the reason for this. Our kernel had the patch from http://bugzilla.kernel.org/show_bug.cgi?id=12950 applied, to improve performance for netbooks. This patch was now identified as causing this regression, and we reverted it.

Thus I close this bug report now. Lubos, if you want to "take over" this bug, please reopen; perhaps you could check if above patch is in Gentoo as well?

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Thanks for the update Martin... It's strange that the MCHBAR patch would cause problems with suspend/resume though. I'll look through the patch again but if you get a chance could you try running with the patch but with tiling disabled in your xorg.conf (option "tiling" "false")?

Revision history for this message
In , Lubos Kolouch (lubos-kolouch) wrote :

Martin, it happens to me also with vanilla kernel.

Revision history for this message
In , Martin Pitt (pitti) wrote :

I booted the previous kernel with the MCHBAR patch, disabled tiling, suspended, and it hanged again after about an hour.

I have run with the updated kernel (with the MCHBAR patch reverted) all day, and on the conf I'm using suspend/resume a lot. No hang here. However, I haven't looked what that MCHBAR patch was about. I cannot assert whether reverting it really fixed the suspend hang to 100%, or whether it was just sheer luck that it survived a day. Before that, I got the hang pretty reliably within an hour, though.

Revision history for this message
In , Martin Pitt (pitti) wrote :

It seems I just was lucky yesterday, it survived the entire day without freezing. But sure enough, when I kept my laptop suspended over night and resumed this morning, it froze after a couple of minutes.

So it was unrelated to the MCHBAR patch after all. Darn! :-/

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Thanks for testing Martin, I'll see if I can reproduce locally (again, I guess I'm in for lots of waiting). If you could capture a backtrace via gdb of the hung server that might help a lot.

Revision history for this message
In , Martin Pitt (pitti) wrote :

I think I did already, and it delivered nothing but ??. Also, I don't think it's actually hung, since I can still strace it and see mouse/keyboard activity. But I'll try harder to gdb it once I'm back home next week (with just a single laptop at the conference I don't have a place to ssh into the box).

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Oh yeah you did, forgot about that. I'm not sure why gdb wasn't able to attach properly but hopefully you can figure that out and get a useful trace. I usually just su and do it as root rather than using sudo (not sure how that affects uid and effective uid etc).

Revision history for this message
In , Martin Pitt (pitti) wrote :

Created an attachment (id=24962)
GPU dump with 2.6.30rc2

I tried to reproduce this with linux 2.6.30RC2 and libdrm 2.4.9, so that I could use intel_gpu_dump (standard Jaunty, where I encountered the hang before, has 2.6.28.8 and libdrm 2.4.5). However, the symptomps are now slightly different, so I'm not sure whether this is useful at all:

 - I get hangs without any special VT switches/suspend/etc after a few hours.
 - After suspend, the first hang again occurs after a few minutes
 - Unlike with standard jaunty, I can recover from the hang with a VT switch, but then it again happends after a few minutes. GPU dump attached (compressed, sorry, raw file was too big for bugzilla)
 - This also happens without compositing (where as disabling compiz was a good workaround for the original bug here).

For each hang that happens, I get

  [ 204.095061] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 1

in dmesg.

Revision history for this message
In , Lubos Kolouch (lubos-kolouch) wrote :

I attached similar dump of frozen GPU to #20560 ... seems like we are tracing the same issue in two bugs...

Revision history for this message
In , Martin Pitt (pitti) wrote :

Lubos, thanks. However, please note that the GPU dump is for the hangs which happen on 2.6.30RC2, which behave very different to the hangs I get on 2.6.28.8. I just can't use intel_gpu_dump on the latter, so this was my (vain) attempt to provide info for the original hang.

Revision history for this message
In , Lubos Kolouch (lubos-kolouch) wrote :

Martin, my dump is also from 2.6.30RC2 and it behaves exactly the same for me as in 2.6.28 and 2.6.29 ! I can't get away from it just by changing the VT!

Revision history for this message
In , Martin Pitt (pitti) wrote :

For the record, I now updated to linux 2.6.30rc3, -intel 2.7.0, libdrm 2.4.9, and turned on UXA. Things are running smoothly now, and I suspended about 5 times during the afternoon/evening without any problem.

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Ok, marking fixed. Thanks Martin.

Revision history for this message
Carey Underwood (cwillu) wrote : [KMS] [GM945]: Xorg hang after resume from suspend

Binary package hint: xorg

With xorg-edgers and 2.6.30-rc7 from the mainline kernel, enabling KMS (via i915.modeset=1) causes the system to hang after resuming from S3.

I have the output of intel_gpu_dump attached.

As far as I can tell, this is the same as bug #381659, but I'm attaching this to a separate report as preferred by upstream.

ProblemType: Bug
Architecture: i386
Date: Tue Jun 2 13:33:13 2009
DistroRelease: Ubuntu 9.10
MachineType: Acer Aspire 3690
Package: xorg 1:7.4~5ubuntu20
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.30-999-generic root=UUID=681fabcf-a526-481f-9858-80d1901cdcb4 ro single
ProcEnviron:
 SHELL=/bin/bash
 PATH=(custom, user)
 LANG=en_CA.UTF-8
RelatedPackageVersions:
 xserver-xorg 1:7.4~5ubuntu20
 libgl1-mesa-glx 7.6.0~git20090601.9f6ec50f-0ubuntu0sarvatt
 libdrm2 2.4.11+git20090519.f355ad89-0ubuntu0sarvatt
 xserver-xorg-video-intel 2:2.7.99.1+git20090602.ec2fde7c-0ubuntu0sarvatt
 xserver-xorg-video-ati 1:6.12.99+git20090529.7599dc40-0ubuntu0sarvatt
SourcePackage: xorg
Uname: Linux 2.6.30-999-generic i686
dmi.bios.date: 02/13/2007
dmi.bios.vendor: Acer
dmi.bios.version: V3.50
dmi.board.name: Grapevine
dmi.board.vendor: Acer
dmi.board.version: N/A
dmi.chassis.type: 10
dmi.chassis.vendor: Acer
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnAcer:bvrV3.50:bd02/13/2007:svnAcer:pnAspire3690:pvrV3.50:rvnAcer:rnGrapevine:rvrN/A:cvnAcer:ct10:cvrN/A:
dmi.product.name: Aspire 3690
dmi.product.version: V3.50
dmi.sys.vendor: Acer
fglrx: Not loaded
fglrx-loaded: Error: command ['grep', 'fglrx', '/var/log/kern.log', '/proc/modules'] failed with exit code 1:
system:
 distro: Ubuntu
 architecture: i686kernel: 2.6.30-999-generic

Revision history for this message
Carey Underwood (cwillu) wrote :
Revision history for this message
Carey Underwood (cwillu) wrote :

intel_gpu_dump executed from vt1 with X running

Revision history for this message
Carey Underwood (cwillu) wrote :

intel_gpu_dump executed from vt1 with X running, immediately before running pm-suspend

Revision history for this message
Carey Underwood (cwillu) wrote :

After running intel_gpu_dump after resuming, I switch vt's back to X, and experienced the same symptoms (locked up, touchpad could move mouse cursor, external mouse could not, system not responsive to vt changing, could still ssh into it, sysrq's were responded to).

tags: added: kms suspend
Changed in xorg (Ubuntu):
status: New → Confirmed
Changed in xorg-server:
status: Unknown → Confirmed
Revision history for this message
Geir Ove Myhr (gomyhr) wrote :

Carey, is there any particular reason you filed this for xorg and not xserver-xorg-video-intel? I guess, technically the bug is in the Linux kernel when it is in KMS, but at least freedesktop.org is handling the bug reports, so it makes some sense to keep it within an xorg-package. Did you just guess a package, or do you know something I don't?

Revision history for this message
Carey Underwood (cwillu) wrote : Re: [Bug 382884] Re: [KMS] [GM945]: Xorg hang after resume from suspend

No, I should have reported it against -intel. Couldn't remember the package
name when I ran ubuntu-bug, and completely forgot to change it afterwards.

Geir Ove Myhr (gomyhr)
affects: xorg (Ubuntu) → xserver-xorg-video-intel (Ubuntu)
Changed in xserver-xorg-video-intel (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Triaged
Revision history for this message
Bryce Harrington (bryce) wrote : Re: [KMS] [GM945]: Xorg hang after resume from suspend

Carey, thanks for forwarding this upstream.

Changed in xserver-xorg-video-intel (Ubuntu):
importance: Medium → High
Revision history for this message
Carey Underwood (cwillu) wrote : Re: [Bug 382884] Re: [KMS] [GM945]: Xorg hang after resume from suspend

intel_gpu_dump's from a failed hibernate (which I thought was working for a
moment)

Bryce Harrington (bryce)
summary: - [KMS] [GM945]: Xorg hang after resume from suspend
+ [i945] [KMS] [GM945]: Xorg hang after resume from suspend
Revision history for this message
In , Lubos Kolouch (lubos-kolouch) wrote :

As mentioned in #20560 , this is far from fixed...

Revision history for this message
In , Lubos Kolouch (lubos-kolouch) wrote :

Created an attachment (id=26643)
Dump with 2.6.30-rc8-git6

Revision history for this message
In , Martin Pitt (pitti) wrote :

Created an attachment (id=26783)
KMS/composite freeze logs from Martin Pitt

It had worked fine for some weeks (KMS+compiz) on my i945, but now it's back. I'm following Ubuntu's "xorg-edgers" archive which has very current snapshots of upstream. Unlike most regressions that I see, this one isn't just a temporary glitch, it's been broken for over a week now. It now freezes about two seconds after resuming, not several minutes, but otherwise the symptoms are very similar. Should I open a new bug about this, or is it the same? Logs attached (dmesg, gpu, registers, Xorg.log). My current versions:

  Linux 2.6.30 final, with git pull from anholt/drm-intel.git (commit 03d606991)
  libdrm from 2009-06-06 (3d4bfe8c)
  mesa from 2009-06-13 (18af7c38)
  intel from 2009-06-11 (6d062e9e)

I tried the following combinations:

 - KMS, X.org session with compiz: usually freezes; seldomly it survives first suspend, freezes on second
 - no KMS, X.org session with compiz: ok
 - KMS, VT only: ok
 - KMS, gdm only (no composite): ok
 - KMS, X.org session with metacity (no composite): ok
 - KMS, X.org with compiz, switch to VT1 before suspend: ok on resume, often freezes as soon as switching back to X.org

Revision history for this message
In , Yifei-chen (yifei-chen) wrote :

We tested this bug on 945GM with master branch, display will freeze right after system wake from S4 if we are running gnome with or without compiz. If we run raw X, most of time the system could wake from S4 correctly, but one time, it crashed the whole system. S3 works fine.

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

*** Bug 22039 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Ug, ok sounds like there are real issues with KMS resume. Let's keep S3 and S4
separate though; can someone seeing an issue with hibernate file a separate
bug?

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

*** Bug 22010 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Peng-li (peng-li) wrote :

Created an attachment (id=26881)
script to do s3 automatically

This is a script to do S3 resume automatically, should be help to reproduce this issue

Revision history for this message
In , Peng-li (peng-li) wrote :

I met the same problem in moblin, after 3 times S3 resume, screen become blank. I got the regdump diff of good and bad s3 resume, same as above

-(II): MI_MODE: 0x00000200
+(II): MI_MODE: 0x00000000

Changed in xorg-server:
status: Confirmed → Invalid
Revision history for this message
Mario Limonciello (superm1) wrote :

updating to new upstream bug (the original was marked as a duplicate)

Changed in xorg-server:
status: Invalid → Unknown
Changed in xorg-server:
status: Unknown → Confirmed
Revision history for this message
In , Gordon Jin (gordon-jin) wrote :

(In reply to comment #44)
> Ug, ok sounds like there are real issues with KMS resume. Let's keep S3 and S4
> separate though; can someone seeing an issue with hibernate file a separate
> bug?
>

Is bug#22263 the hibernation bug?

Revision history for this message
In , Peng-li (peng-li) wrote :

(In reply to comment #46)
> Created an attachment (id=26881) [details]
> script to do s3 automatically
>
> This is a script to do S3 resume automatically, should be help to reproduce
> this issue
>

Maybe 10 sec is not enough. I change the sleep and wake up time to 15sec, and test 20 times suspend/resume, it works well.

Revision history for this message
In , Milan Bouchet-Valat (nalimilan) wrote :

Gordon: bug#22263 is not the hibernation problem I'm seeing, and doesn't seem to be Martin's either (comment #41). I don't get any screen corruption. See bug 22366.

Revision history for this message
In , Clotho67 (clotho67) wrote :

(In reply to comment #45)
> *** Bug 22010 has been marked as a duplicate of this bug. ***
>

I'm not sure whether this is a duplicate of this bug. I have done some tests. I'm sure kernel 2.6.29.4 and 2.6.30-rc5 is good. The screen corruption and X hang only occur on kernel after 2.6.30-rc6. I'll try do some bisect to see which commit is suspicious.

Revision history for this message
In , Clotho67 (clotho67) wrote :

Well, git bisect shows that revert

commit: 79f11c19a396e8cea7dad322dcfb46c0a8517fe6
drm/i915: save/restore fence registers across suspend/resume

make kernel 2.6.30 resume works again. kernel 2.6.30-rc5 + the above commit doesn't cause this hang, so it could be some conflict between this commit and other commits for kernel 2.6.30-rc6.

Here is some addition info.

i915_gem_fence_regs before suspend:

Reserved fences = 3
Total fences = 16
Fenced object[ 0] = unused
Fenced object[ 1] = unused
Fenced object[ 2] = unused
Fenced object[ 3] = f676c360: P 00c00000 00400000 00001000 X 00000002 00000002 0 (name: 1)
Fenced object[ 4] = f6901f00: 02000000 00400000 00001000 X 00000002 00000002 0 (name: 2)
Fenced object[ 5] = f6901f60: 02400000 00400000 00001000 X 00000002 00000002 0 (name: 3)
Fenced object[ 6] = unused
Fenced object[ 7] = unused
Fenced object[ 8] = unused
Fenced object[ 9] = unused
Fenced object[10] = unused
Fenced object[11] = unused
Fenced object[12] = unused
Fenced object[13] = unused
Fenced object[14] = unused
Fenced object[15] = unused

i915_gem_fence_regs after resume:

Reserved fences = 3
Total fences = 16
Fenced object[ 0] = unused
Fenced object[ 1] = unused
Fenced object[ 2] = unused
Fenced object[ 3] = f6042780: P 00c00000 00400000 00001000 X 00000002 00000000 0 (name: 1)
Fenced object[ 4] = unused
Fenced object[ 5] = unused
Fenced object[ 6] = unused
Fenced object[ 7] = unused
Fenced object[ 8] = unused
Fenced object[ 9] = unused
Fenced object[10] = unused
Fenced object[11] = unused
Fenced object[12] = unused
Fenced object[13] = unused
Fenced object[14] = unused
Fenced object[15] = unused

Revision history for this message
In , Clotho67 (clotho67) wrote :

(In reply to comment #52)

Sorry, this is the one after resume.

i915_gem_fence_regs after resume:

Reserved fences = 3
Total fences = 16
Fenced object[ 0] = unused
Fenced object[ 1] = unused
Fenced object[ 2] = unused
Fenced object[ 3] = f676c360: P 00c00000 00400000 00001000 X 00000002 00000000 0 (name: 1)
Fenced object[ 4] = unused
Fenced object[ 5] = unused
Fenced object[ 6] = unused
Fenced object[ 7] = unused
Fenced object[ 8] = unused
Fenced object[ 9] = unused
Fenced object[10] = unused
Fenced object[11] = unused
Fenced object[12] = unused
Fenced object[13] = unused
Fenced object[14] = unused
Fenced object[15] = unused

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

If fence register save/restore really is the issue, this patch should help.

Current code saves the fence registers before rendering has completed, which can affect fence register allocation. If we save before rendering completes, and restore again at resume time, we may end up causing trouble with whatever objects land in the fenced space after resume.

Saving register state (including fences) *after* we've idled the memory manager should help with that.

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 98560e1..e3cb402 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -67,8 +67,6 @@ static int i915_suspend(struct drm_device *dev, pm_message_t s

        pci_save_state(dev->pdev);

- i915_save_state(dev);
-
        /* If KMS is active, we do the leavevt stuff here */
        if (drm_core_check_feature(dev, DRIVER_MODESET)) {
                if (i915_gem_idle(dev))
@@ -77,6 +75,8 @@ static int i915_suspend(struct drm_device *dev, pm_message_t s
                drm_irq_uninstall(dev);
        }

+ i915_save_state(dev);
+
        intel_opregion_free(dev, 1);

        if (state.event == PM_EVENT_SUSPEND) {

Revision history for this message
In , Clotho67 (clotho67) wrote :

(In reply to comment #54)
> If fence register save/restore really is the issue, this patch should help.
>

Yes, it does help my problem. The system can resume correctly again. I didn't see a hang so far.

Revision history for this message
In , Martin Pitt (pitti) wrote :

I tested the patch in comment 54 and also confirm that it fixes suspend/resume with the internal laptop monitor. Thanks!

It still fails with the external one, but that's a different problem, and I'm going to report it separately.

Revision history for this message
In , Tomas M. (el-dragon) wrote :

(In reply to comment #54)
> If fence register save/restore really is the issue, this patch should help.

applied the patch here and it appears to have fixed it for me..

intel gma950 laptop.

Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Great, thanks for testing. Fix has been pushed into the kernel:

commit 9e06dd39f2b6d7e35981e0d7aded618686b32ccb
drm/i915: correct suspend/resume ordering

Revision history for this message
In , Gordon Jin (gordon-jin) wrote :

(In reply to comment #58)
> Great, thanks for testing. Fix has been pushed into the kernel:
>
> commit 9e06dd39f2b6d7e35981e0d7aded618686b32ccb
> drm/i915: correct suspend/resume ordering

The fix is in drm-intel-next branch.

Eric, please cherry-pick it into qa-branch so it'll be in Q2 package.

Revision history for this message
In , Clotho67 (clotho67) wrote :

(In reply to comment #58)
> Great, thanks for testing. Fix has been pushed into the kernel:
>
> commit 9e06dd39f2b6d7e35981e0d7aded618686b32ccb
> drm/i915: correct suspend/resume ordering
>

Maybe this fix should also be send to 2.6.30.x stable branch, since it's a regression during the 2.6.30 rc process. And it will make user of the stable kernel happy. Thanks.

Changed in xorg-server:
status: Confirmed → Fix Released
Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

On Tue, 23 Jun 2009 20:16:32 -0700 (PDT)
> --- Comment #60 from Jie Luo <email address hidden> 2009-06-23
> 20:16:32 PST --- (In reply to comment #58)
> > Great, thanks for testing. Fix has been pushed into the kernel:
> >
> > commit 9e06dd39f2b6d7e35981e0d7aded618686b32ccb
> > drm/i915: correct suspend/resume ordering
> >
>
> Maybe this fix should also be send to 2.6.30.x stable branch, since
> it's a regression during the 2.6.30 rc process. And it will make user
> of the stable kernel happy. Thanks.

Good point, want to send a note to <email address hidden> with the commit
info, proposing the patch for inclusion?

Thanks,

Revision history for this message
Bryce Harrington (bryce) wrote :

According to the upstream bug report, this was resolved with a kernel fix. Sounds like it might be included for .31 but I'm uncertain from the bug report so leaving open against the kernel.

"Great, thanks for testing. Fix has been pushed into the kernel:

commit 9e06dd39f2b6d7e35981e0d7aded618686b32ccb
drm/i915: correct suspend/resume ordering
"

affects: xserver-xorg-video-intel (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
status: Triaged → New
tags: added: xorg-needs-kernel-fix
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Thanks Bryce. Seems this patch is already available in the Karmic kernel git tree. Marking this Fix Released against the kernel.

ogasawara@emiko:~/ubuntu-karmic$ git log -p 9e06dd39f2b6d7e35981e0d7aded618686b32ccb
commit 9e06dd39f2b6d7e35981e0d7aded618686b32ccb
Author: Jesse Barnes <email address hidden>
Date: Mon Jun 22 18:05:12 2009 -0700

    drm/i915: correct suspend/resume ordering

    We need to save register state *after* idling GEM, clearing the ring,
    and uninstalling the IRQ handler, or we might end up saving bogus
    fence regs, for one. Our restore ordering should already be correct,
    since we do GEM, ring and IRQ init after restoring the last register
    state, which prevents us from clobbering things.

    I put this together to potentially address a bug, but I haven't heard
    back if it fixes it yet. However I think it stands on its own, so I'm
    sending it in.

    Signed-off-by: Jesse Barnes <email address hidden>
    Signed-off-by: Eric Anholt <email address hidden>

Changed in linux (Ubuntu):
status: New → Fix Released
Revision history for this message
Carey Underwood (cwillu) wrote :

Yep, fix looks good here.

Changed in xorg-server:
importance: Unknown → Critical
Changed in xorg-server:
importance: Critical → Unknown
Changed in xorg-server:
importance: Unknown → Critical
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.