[i945] (Needs UXA) X freezes a few minutes after resuming

Bug #339091 reported by Martin Pitt on 2009-03-07
82
This bug affects 5 people
Affects Status Importance Assigned to Milestone
xf86-video-intel
Fix Released
Critical
xserver-xorg-video-intel (Ubuntu)
Medium
Unassigned
Jaunty
Medium
Unassigned
Karmic
Medium
Unassigned

Bug Description

Binary package hint: xserver-xorg-video-intel

A few minutes after resuming from suspend or hibernate, the display suddenly
freezes. This is not triggered by anything obvious (such as starting a
particular program), just randomly after some key presses or mouse movements.

After display freezing, I can still ssh into the box. The entire user session
continues to run, I can start programs, etc.

I didn't see anything interesting in dmesg and Xorg.0.log, gdb stack trace is
totally useless, stracing the X server shows that it's usually waiting in an
ioctl(), and receives keyboard and mouse events. I'll attach detailled logs in
a minute.

The single change that I spotted was in the registers:

--- regs.afterhibernate.txt 2009-03-07 08:24:27.000000000 +0100
+++ regs.freeze.txt 2009-03-07 08:35:18.000000000 +0100
@@ -140 +140 @@
-(II): MI_MODE: 0x00000200
+(II): MI_MODE: 0x00000000

This consistently changes like this after such a freeze happens. Whenever I
look at MI_MODE in a working system, it is always 0x00000200. No other
registers change after the freeze.

This started happening two or three weeks ago. I am on Ubuntu jaunty
(development release), which closely tracks X.org upstream releases. It never
happened until then.

It definitively did not happen with -intel 2.4.1/X.org 1.5.2/Linux 2.6.27. Now
I have -intel 2.6.1/X.org 1.6.0/Linux 2.6.28.7.

My hardware:
 Intel mobile 945GM
 Intel Core 2 Duo 1.2
 1 GB RAM
 Internal 1280x800 LVDS (switched off)
 External 1280x1024 TFT

I attached logs and more details in the upstream bug report at https://bugs.freedesktop.org/show_bug.cgi?id=20520

Created an attachment (id=23611)
dmesg

dmesg output (nothing interesting after the freeze). This is a clean boot, hibernate, and resume.

Created an attachment (id=23612)
registers after clean boot

Created an attachment (id=23613)
registers after hibernate

Probably not too interesting, since right after hibernate, everything works fine, but for completeness:

$ diff -U0 regs.cleanboot.txt regs.afterhibernate.txt
--- regs.cleanboot.txt 2009-03-06 18:49:36.000000000 +0100
+++ regs.afterhibernate.txt 2009-03-07 08:24:27.000000000 +0100
@@ -34 +34 @@
-(II): LVDS: 0xc0308300 (enabled, pipe B, 18 bit, 1 channel)
+(II): LVDS: 0x40300300 (disabled, pipe B, 18 bit, 1 channel)
@@ -46 +46 @@
-(II): PFIT_CONTROL: 0x00000000
+(II): PFIT_CONTROL: 0x00002668
@@ -166 +166 @@
-(II): pipe B dot 77142 n 2 m1 14 m2 8 p1 2 p2 14
+(II): pipe B dot 108000 n 2 m1 14 m2 8 p1 2 p2 10

Created an attachment (id=23614)
registers after screen freeze

Created an attachment (id=23615)
Xorg.0.log

Created an attachment (id=23617)
stracing X after freeze

This is from ssh'ing into the frozen box and attaching strace to X. I see

ioctl(11, 0x6458, 0) =

Then I walked over, wiggled the mouse a bit, and pressed two keys. The strace shows that apparently those events were still received, and it didn't get stuck in a tight infinite loop or something like this. Thus I think that by and large the server still worked.

However, it should be noted that I tried to press "q" to quit the mutt I was working on when the freeze started. Going back to the ssh session mutt was still running, so I don't think that the "q" keypress actually made it all the way through to mutt. So maybe it's not just a screen freeze, but a little harder than that.

Trying to attach gdb wasn't very successful unfortunately. I do have the debug symbols of X.org, libx11, libc6, etc. installed, but still the stack trace is totally useless. Perhaps the "Cannot access memory at address 0xffe85fec" has something to do with it, but I don't know why it's doing that.

$ ps aux|grep X
root 3470 0.0 5.2 115892 53076 tty7 Ss+ Mar06 0:45 /usr/bin/X :0 -br -audit 0 -auth /var/lib/gdm/:0.Xauth -nolisten tcp vt7
martin 6497 0.0 0.0 3348 816 pts/0 S+ 08:39 0:00 grep X
0 martin@tick:~/xdebug
$ sudo gdb /usr/bin/X
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
(no debugging symbols found)
(gdb) attach 3470
Attaching to program: /usr/bin/X, process 3470
Cannot access memory at address 0xffe85fec
(gdb) bt
#0 0xb7f2b430 in ?? ()
#1 0xb783fee2 in ?? ()
#2 0xb77d60ff in ?? ()
#3 0x0817c0eb in ?? ()
#4 0x08145088 in ?? ()
#5 0x080910c8 in ?? ()
#6 0x081319a4 in ?? ()
#7 0x0808d1ce in ?? ()
#8 0x080721fd in ?? ()
#9 0xb7af5775 in ?? ()
#10 0x080716b1 in ?? ()
(gdb) quit
The program is running. Quit anyway (and detach it)? (y or n) y
Detaching from program: /usr/bin/X, process 3470

XrandR information: (LVDS off, external TFT on, laptop is docked and closed):

$ xrandr
Screen 0: minimum 320 x 200, current 1280 x 1024, maximum 1280 x 1280
VGA disconnected (normal left inverted right x axis y axis)
LVDS connected (normal left inverted right x axis y axis)
   1280x800 59.8 +
   1024x768 85.0 75.0 70.1 60.0
   832x624 74.6
   800x600 85.1 72.2 75.0 60.3 56.2
   640x480 85.0 72.8 75.0 59.9
   720x400 85.0
   640x400 85.1
   640x350 85.1
TMDS-1 connected 1280x1024+0+0 (normal left inverted right x axis y axis) 340mm x 270mm
   1280x1024 75.0*+ 60.0
   1280x960 60.0
   1152x864 75.0
   1024x768 85.0 75.0 70.1 60.0
   832x624 74.6
   800x600 85.1 72.2 75.0 60.3 56.2
   640x480 85.0 75.0 72.8 66.7 59.9
   720x400 70.1
TV disconnected (normal left inverted right x axis y axis)

Created an attachment (id=23618)
lspci -vvnn

that bit just says that the ring is busy -- it's probably just a side effect of the chip being hung.

Finally, my xorg.conf:

$ cat /etc/X11/xorg.conf
Section "Device"
        Identifier "Configured Video Device"
        Option "FramebufferCompression" "off"
EndSection

I need to set this option because of bug 19304.

I confirm that this also happens if I use the laptop undocked, with just the internal LVDS:

$ xrandr
Screen 0: minimum 320 x 200, current 1280 x 800, maximum 1280 x 1280
VGA disconnected (normal left inverted right x axis y axis)
LVDS connected 1280x800+0+0 (normal left inverted right x axis y axis) 261mm x 163mm
   1280x800 59.8*+
   1024x768 85.0 75.0 70.1 60.0
   832x624 74.6
   800x600 85.1 72.2 75.0 60.3 56.2
   640x480 85.0 72.8 75.0 59.9
   720x400 85.0
   640x400 85.1
   640x350 85.1
TMDS-1 disconnected (normal left inverted right x axis y axis)
TV disconnected (normal left inverted right x axis y axis)

Binary package hint: xserver-xorg-video-intel

A few minutes after resuming from suspend or hibernate, the display suddenly
freezes. This is not triggered by anything obvious (such as starting a
particular program), just randomly after some key presses or mouse movements.

After display freezing, I can still ssh into the box. The entire user session
continues to run, I can start programs, etc.

I didn't see anything interesting in dmesg and Xorg.0.log, gdb stack trace is
totally useless, stracing the X server shows that it's usually waiting in an
ioctl(), and receives keyboard and mouse events. I'll attach detailled logs in
a minute.

The single change that I spotted was in the registers:

--- regs.afterhibernate.txt 2009-03-07 08:24:27.000000000 +0100
+++ regs.freeze.txt 2009-03-07 08:35:18.000000000 +0100
@@ -140 +140 @@
-(II): MI_MODE: 0x00000200
+(II): MI_MODE: 0x00000000

This consistently changes like this after such a freeze happens. Whenever I
look at MI_MODE in a working system, it is always 0x00000200. No other
registers change after the freeze.

This started happening two or three weeks ago. I am on Ubuntu jaunty
(development release), which closely tracks X.org upstream releases. It never
happened until then.

It definitively did not happen with -intel 2.4.1/X.org 1.5.2/Linux 2.6.27. Now
I have -intel 2.6.1/X.org 1.6.0/Linux 2.6.28.7.

My hardware:
 Intel mobile 945GM
 Intel Core 2 Duo 1.2
 1 GB RAM
 Internal 1280x800 LVDS (switched off)
 External 1280x1024 TFT

I attached logs and more details in the upstream bug report at https://bugs.freedesktop.org/show_bug.cgi?id=20520

Changed in xserver-xorg-video-intel:
status: Unknown → Confirmed

I can confirm this issue, it happens in Gentoo and Arch for me... it is very annoying

I have now upgraded to Linux 2.6.28.8 and -intel 2.6.3, and suspend/hibernate now works fine again, no hangs any more. Thus I tentatively close this now.

Lubos, if it still happens for you with the latest version, please reopen.

This seems to be fixed in current jaunty now, either with linux 2.6.28.8 or the new -intel 2.6.3 driver.

Changed in xserver-xorg-video-intel (Ubuntu):
status: New → Fix Released
Changed in xserver-xorg-video-intel:
status: Confirmed → Fix Released

Sorry, just got it again. It seems to happen a lot less often now, but still there.

It *just* happened to me as well.
I did several times suspend & resume during the weekend, all OK,
but now X stopped responding.

Gentoo kernel 2.6.28-r4, xf86-video-intel-2.6.3-r1

Sorry, just got it again. It seems to happen a lot less often now, but still there.

Changed in xserver-xorg-video-intel (Ubuntu):
status: Fix Released → Confirmed
Changed in xserver-xorg-video-intel:
status: Fix Released → Confirmed

My 2 cents: it happened to me a lot of times (almost every time) in intrepid with -proposed repository activated (and system full up-to-date). I had to re-install but now I did without activating -proposed, and while this is still happening, it does a lot less often now.

I just want to make sure, though, that we have the same problem: my system seems to freeze with a black screen, I don't even have the option to type in my passwd. It doesn't seem to recieve any keyboard input as I blindly switched to a console, typed in my username and password (correctly, I did it very slowly) and then issuing a sudo reboot now command, nothing happened, I had to hard reboot the laptop.

I am using a dell inspiron 1545 with intel X4500HD video accelerator with a pentium dual core.

Hope it helps.

After latest upgrade it happens again 100% of the time...
work->hibernate->resume->wait->freeze->reboot

gentoo-sources-2.6.29
xf86-video-intel-2.6.3-r1
mesa-7.3-r1

Confirmed that this still happens with the latest (v 5) patch in bug 18651, so this is apparently not related to pipe underruns.

Indeed, I also get this message when it happens:

Mar 29 23:32:54 tick kernel: [14858.069290] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count f
or disabled pipe 1
Mar 29 23:32:54 tick kernel: [14858.074255] mtrr: no MTRR for d0000000,10000000 found

I confirm that running X with Option "DRI" "off", and rmmod'ing i915 and drm, suspend works fine. This might indicate that http://bugzilla.kernel.org/show_bug.cgi?id=12778 is indeed the cause of this.

There is a remote chance that this is related to http://bugzilla.kernel.org/show_bug.cgi?id=12778 . This talks about a DRI regression in 2.6.29rc6, and I know Jaunty's kernel is based on 2.6.28, but I think we pulled in some newer DRM bits from upstream. Also, it never happened for me before jaunty alpha-3 (around that time), in any ubuntu release.

Martin Pitt (pitti) wrote :

I also get this message when it happens:

Mar 29 23:32:54 tick kernel: [14858.069290] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count f
or disabled pipe 1
Mar 29 23:32:54 tick kernel: [14858.074255] mtrr: no MTRR for d0000000,10000000 found

Martin Pitt (pitti) wrote :

I confirm that running X with Option "DRI" "off", and rmmod'ing i915 and drm, suspend works fine. This might indicate that http://bugzilla.kernel.org/show_bug.cgi?id=12778 is indeed the cause of this?

Martin Pitt (pitti) wrote :

So while this looses me hardware acceleration and composite, this at least is a very nice workaround for taking your laptop to conferences and planes, where suspend matters much more than bling. :-)

Changed in xserver-xorg-video-intel (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Triaged

Can you confirm that you're not running 'vbetool post' or with any of the ACPI S3 reposting stuff? That's caused problems for us in the past...

I just gave a thorough testing to the pm-utils scripts and quirks, and confirm that /usr/lib/pm-utils/sleep.d/98smart-kernel-video still does the right thing. I. e. it filters out all quirks for intel on >= 2.6.26 and thus does not run any quirks (and thus no VBE post/S3 stuff).

So far it looks ok on my 945 with the latest Jaunty bits (so 2.6.28-11-generic and xf86-video-intel 2.6.3), but I've only been waiting a few minutes (while moving windows around and browsing the web). Can you reproduce it with the 2.6.3 driver? It has quite a few fixes that might be relevant.

When I tested the suspend quirks, I was running with DRI enabled again (on current Jaunty, i. e. with 2.6.3). It indeed survived for about 10 minutes, then it froze. This also happened to a colleague of mine here at the CELF/LF summit, who also has a 945.

As I said, it is totally erratic. I had it survive for as much as 2 hours, then only for 1 minute, in most of the cases it's like 5 minutes. I couldn't see a pattern when it happens wrt. to the actions performed. In many cases I was just reading something and didn't even move the mouse.

We finally found the reason for this. Our kernel had the patch from http://bugzilla.kernel.org/show_bug.cgi?id=12950 applied, to improve performance for netbooks. This patch was now identified as causing this regression, and we reverted it.

Thus I close this bug report now. Lubos, if you want to "take over" this bug, please reopen; perhaps you could check if above patch is in Gentoo as well?

Thanks for the update Martin... It's strange that the MCHBAR patch would cause problems with suspend/resume though. I'll look through the patch again but if you get a chance could you try running with the patch but with tiling disabled in your xorg.conf (option "tiling" "false")?

Martin, it happens to me also with vanilla kernel.

This was caused by the patch in bug 349314 which now got reverted. Suspend with DRI once again works flawlessly.

I also updated the upstream bug for this issue.

Changed in xserver-xorg-video-intel (Ubuntu Jaunty):
status: Triaged → Fix Released

I booted the previous kernel with the MCHBAR patch, disabled tiling, suspended, and it hanged again after about an hour.

I have run with the updated kernel (with the MCHBAR patch reverted) all day, and on the conf I'm using suspend/resume a lot. No hang here. However, I haven't looked what that MCHBAR patch was about. I cannot assert whether reverting it really fixed the suspend hang to 100%, or whether it was just sheer luck that it survived a day. Before that, I got the hang pretty reliably within an hour, though.

It seems I just was lucky yesterday, it survived the entire day without freezing. But sure enough, when I kept my laptop suspended over night and resumed this morning, it froze after a couple of minutes.

So it was unrelated to the MCHBAR patch after all. Darn! :-/

Thanks for testing Martin, I'll see if I can reproduce locally (again, I guess I'm in for lots of waiting). If you could capture a backtrace via gdb of the hung server that might help a lot.

I think I did already, and it delivered nothing but ??. Also, I don't think it's actually hung, since I can still strace it and see mouse/keyboard activity. But I'll try harder to gdb it once I'm back home next week (with just a single laptop at the conference I don't have a place to ssh into the box).

Martin Pitt (pitti) on 2009-04-09
Changed in xserver-xorg-video-intel (Ubuntu Jaunty):
status: Fix Released → Confirmed
Bryce Harrington (bryce) on 2009-04-09
summary: - [i945] screen freezes a few minutes after resuming
+ [i945] X freezes a few minutes after resuming
tags: added: freeze
tags: added: intel
Changed in xserver-xorg-video-intel:
status: Confirmed → Fix Released
31 comments hidden view all 111 comments

Does downgrading xserver-xorg-video-intel to version 2.4 work around this problem?

Zack Evans [2009-05-05 14:09 -0000]:
> 1. Where did you find the 2.7.0 version, I don't seem to have that
> available and I thought I had every Intel driver repo known to man
> configured. :-)

https://launchpad.net/~xorg-edgers/+archive/ppa

> 2. If we file an upstream bug specific to EXA... will it be marked
> WONTFIX, because EXA is on the way out? Seems a bit harsh on the massive
> installed base of laptops out there (not just Ubuntu!)

Unfortunately that's pretty much the case, I'm afraid. They won't
outright close it, but don't throw a lot of effort onto it. Well,
UXA can really need some robustification, so work spent there will be
good for the future.

Well, if someone can confirm that downgrading intel driver (and explaining how to do it :-) ) will fix the issue on Jaunty, I can paint myself happy...

On the other side: ¿is there anywhere a tutorial on how to use UXA and the new drivers (and how to recover in case of disaster)? I suppose I should compile a 30rcX kernel with kernel modesetting and things like that...

Thanks

Bryce Harrington (bryce) wrote :
summary: - [i945] X freezes a few minutes after resuming
+ [i945] (Needs UXA) X freezes a few minutes after resuming
Changed in xserver-xorg-video-intel (Ubuntu Karmic):
status: Confirmed → Triaged

OK, thanks. Maybe I am thick here, but just a last question: is there confirmation (Guido?) that reverting to 2.4 will fix the suspend/resume locks? If yes, I can upgrade and maybe help with the UXA testing.

Thanks!

Bordi (borderlinedancer) wrote :

with i865G X freezs too

I think I've might be experiencing the same bug: soon after resume (<30 seconds) the display freezes with the symptoms as described above.

Two additional observations:
1.) It is still possible to switch virtual terminals. (This is a real difference to the other sporadic freezes I experience rarely under heavy load) Switching to a text console and back into the X screen restores everything and I can continue to work! Screen input definitely gets through while the screen is frozen. The screen will freeze again some seconds later. Switching VTs helps again. This continues several times (about 5-10 times). After that no freezes any more.
2.) I almost never get freezes when I turn of LVDS and use an external VGA monitor. If I'm not fast enough there will be at most one freeze, which can resolved by switching VTs but then no freezes anymore.

If you are affected as well by these freezes, could you please test if my observations hold with you as well and report! I guess this could help to distinguish between different types of freezes and perhaps even help identifying the problem.

I'm using UXA on Jaunty with a kernel from a PPA:
Linux 2.6.30-020630rc2-generic #020630rc2 SMP Wed Apr 15 13:20:18 UTC 2009 x86_64 GNU/Linux
Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03)

Addendum: I was talking about resume from suspend only. Not tested with hibernation yet.

Well, I am quite confused now.

The bug name was changed "needs UXA". But AFAIK that's not true. In my notebook, booting with a Jaunty live cd, the X freezes after a resume form S2R with *standard* configuration, no UXA needed at all.

I suspect there are two bugs mixed here:

1) the first one is the fact that, updating from Intrepid to Jaunty, a perfectly fine configuration (default xorg.conf, 3d-enabled desktop) stops working. I think S2R is fundamental for laptop use, it's not just a commodity.

2) then there are the problems with using UXA with a 2.6.30-rc kernel, but this is just, well, living on the bleeding edge.

Related to problem 1), can anyone please tell me if downgrading the intel driver fixes it? If yes, I could then upgrade to Jaunty and maybe help testing 2) :-)

Zack Evans (zevans23) wrote :

There absoutely are freeze bugs in UXA and EXA both in the default Jaunty configuration, so we need an SRU of some type, even if it's a bug that affects performance.

The good news is that a whole bunch of them are fixed, either in kernel updates or in updates in -proposed - so with a little time devoted to packaging the smallest possible SRU (I'd suggest 2.7.1, the MTRR fixes, and some sort of fix for the tiling stuff, maybe a cherry-pick), we can at least make Jaunty work out of the box, probably using EXA and Greedy migration, since that seems to work for most people I've seen posting on these bugs, at a reasonable level of performance.

Anyhoo, the even better news is that EXA and UXA are BOTH rock solid for me under .30rc7, or a patched kernel (from one of these bugs), and desktop performance is acceptable. Game performance isn't what it should be but I now realise that UXA and DRI2 is the way to go, and bleeding-edge gamers will have to wait for Karma and we can't fix the Intel issues from where we are standing. That's life!

So in summary - I think 2.7.1 really does fix this bug - it did for me.

UXA has all sorts of cosmetic problems and the font-corruption bug comes and goes, but I guess I'll need to upgrade to Karmic before there's any point in reporting bugs against UXA. UXA is about half the speed of EXA for 3d apps (but MIGHT be slightly quicker in 2d.)

Bryce Harrington (bryce) wrote :

While certainly there are freeze bugs in UXA, this particular one is EXA-only apparently, judging from comment #22 from the original reporter, thus we will consider this bug fixed by enabling UXA.

People with other freezes that can be reproduced with UXA+KMS on karmic, please file a new bug and refer to comment #20 for the dump info needed for investigating the issue.

Regarding SRUs for specific fixes, the kernel team is reviewing a list of patches for consideration. I have the MTRR fixes on my todo list. For 2.7.1, we can't SRU entire driver version revs, but it is available in the x-updates PPA.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xserver-xorg-video-intel - 2:2.7.99.1+git20090602.ec2fde7c-0ubuntu1

---------------
xserver-xorg-video-intel (2:2.7.99.1+git20090602.ec2fde7c-0ubuntu1) karmic; urgency=low

  * Update to git 20090602 (master branch) up to commit ec2fde7c
    - xvmc is disabled since DRI1 no longer supported
    - LP: #96991 - 3D stuff breaks with Compiz: Redirected Direct Rendering
      is needed in DRI
    - LP: #120834 - X freezes with I830WaitLpRing error when running OpenGL apps
    - LP: #337608 - X crashes in fbBlt() when using Sun Java Plugin 6 + firefox3.0
    - LP: #339555 - compiz slowmotion after Jaunty upgrade
    - LP: #363900 - X.org freezes with intel driver, no apparent trigger
    - LP: #331719 - VT switching doesn't work on Intel 915GM
    - LP: #339091 - X freezes a few minutes after resuming
    - LP: #348436 - Kubuntu: X server crash when screensaver is started (4500MHD)
    - LP: #279727 - Kubuntu: Display Corruption w/ Intel 4700MHD
    - LP: #357851 - Kubuntu: Distorted display after switching virtual desktops w/ exa
    - LP: #158415 - Front buffer dynamic resize not supported
    - LP: #324998 - x server restarts itself w/ compiz on Intel 945GM
    - LP: #355593 - after upgrade to 9.04, rotating desktop cube ran slow
    - LP: #357290 - 1 fps in 3d apps like neverball with EXA
    - LP: #360774 - Graphical Corruption with EXA on X4500
    - LP: #364126 - screensaver prefs dialog in 9.04 RC livecd leaves dirt
    - LP: #375712 - Native resolution for dell "2005fpw" monitor not listed
    - LP: #375264 - Choppy flash video and poor performance with compiz
    - LP: #349568 - Jaunty / Compiz slow and tearing on GMA 4500MHD
    - LP: #356056 - window tearing during movement on 965 (no compiz)
    - LP: #330460 - xorg shows black image/hangs with jpg in firefox
    - LP: #347587 - X asserts on pI830->batch_ptr != 0 on resume from suspend
  * Merge with Debian experimental. Remaining Ubuntu changes:
    - Add lpia architecture
    - Re-enable the patch system, add quilt to build-deps.
    - 110_quirk_hp_mini.patch: quirk (sent upstream)
    - 117_quirk_thinkpad_x30.patch: quirk (sent upstream)
  * Drop 116_8xx_disable_dri.patch. There have been fixes for 3d on 8xx
    chipsets upstream, so drop the DRI disablement so the fixes can be
    re-tested.
  * Drop 103_quirk_intel_mb890.patch. Better quirk available upstream.
    (LP: #305269)

 -- Bryce Harrington <email address hidden> Tue, 02 Jun 2009 10:47:32 -0700

Changed in xserver-xorg-video-intel (Ubuntu Karmic):
status: Triaged → Fix Released

OK. great news. What about a recipe to fix it in Jaunty? Thanks!

Zack Evans (zevans23) wrote :

@Bryce: Fair comment, but for Jaunty, if you enable UXA to remove the EXA-freeze bug, you get the UXA-fonts bug instead - which does stop you working since everything becomes unreadable.

What we need is an SRU which fixes or works around BOTH bugs...

As mentioned in #20560 , this is far from fixed...

Created an attachment (id=26643)
Dump with 2.6.30-rc8-git6

Changed in xserver-xorg-video-intel:
status: Fix Released → Confirmed

Created an attachment (id=26783)
KMS/composite freeze logs from Martin Pitt

It had worked fine for some weeks (KMS+compiz) on my i945, but now it's back. I'm following Ubuntu's "xorg-edgers" archive which has very current snapshots of upstream. Unlike most regressions that I see, this one isn't just a temporary glitch, it's been broken for over a week now. It now freezes about two seconds after resuming, not several minutes, but otherwise the symptoms are very similar. Should I open a new bug about this, or is it the same? Logs attached (dmesg, gpu, registers, Xorg.log). My current versions:

  Linux 2.6.30 final, with git pull from anholt/drm-intel.git (commit 03d606991)
  libdrm from 2009-06-06 (3d4bfe8c)
  mesa from 2009-06-13 (18af7c38)
  intel from 2009-06-11 (6d062e9e)

I tried the following combinations:

 - KMS, X.org session with compiz: usually freezes; seldomly it survives first suspend, freezes on second
 - no KMS, X.org session with compiz: ok
 - KMS, VT only: ok
 - KMS, gdm only (no composite): ok
 - KMS, X.org session with metacity (no composite): ok
 - KMS, X.org with compiz, switch to VT1 before suspend: ok on resume, often freezes as soon as switching back to X.org

We tested this bug on 945GM with master branch, display will freeze right after system wake from S4 if we are running gnome with or without compiz. If we run raw X, most of time the system could wake from S4 correctly, but one time, it crashed the whole system. S3 works fine.

*** Bug 22039 has been marked as a duplicate of this bug. ***

Ug, ok sounds like there are real issues with KMS resume. Let's keep S3 and S4
separate though; can someone seeing an issue with hibernate file a separate
bug?

*** Bug 22010 has been marked as a duplicate of this bug. ***

Created an attachment (id=26881)
script to do s3 automatically

This is a script to do S3 resume automatically, should be help to reproduce this issue

I met the same problem in moblin, after 3 times S3 resume, screen become blank. I got the regdump diff of good and bad s3 resume, same as above

-(II): MI_MODE: 0x00000200
+(II): MI_MODE: 0x00000000

(In reply to comment #44)
> Ug, ok sounds like there are real issues with KMS resume. Let's keep S3 and S4
> separate though; can someone seeing an issue with hibernate file a separate
> bug?
>

Is bug#22263 the hibernation bug?

(In reply to comment #46)
> Created an attachment (id=26881) [details]
> script to do s3 automatically
>
> This is a script to do S3 resume automatically, should be help to reproduce
> this issue
>

Maybe 10 sec is not enough. I change the sleep and wake up time to 15sec, and test 20 times suspend/resume, it works well.

Gordon: bug#22263 is not the hibernation problem I'm seeing, and doesn't seem to be Martin's either (comment #41). I don't get any screen corruption. See bug 22366.

(In reply to comment #45)
> *** Bug 22010 has been marked as a duplicate of this bug. ***
>

I'm not sure whether this is a duplicate of this bug. I have done some tests. I'm sure kernel 2.6.29.4 and 2.6.30-rc5 is good. The screen corruption and X hang only occur on kernel after 2.6.30-rc6. I'll try do some bisect to see which commit is suspicious.

Well, git bisect shows that revert

commit: 79f11c19a396e8cea7dad322dcfb46c0a8517fe6
drm/i915: save/restore fence registers across suspend/resume

make kernel 2.6.30 resume works again. kernel 2.6.30-rc5 + the above commit doesn't cause this hang, so it could be some conflict between this commit and other commits for kernel 2.6.30-rc6.

Here is some addition info.

i915_gem_fence_regs before suspend:

Reserved fences = 3
Total fences = 16
Fenced object[ 0] = unused
Fenced object[ 1] = unused
Fenced object[ 2] = unused
Fenced object[ 3] = f676c360: P 00c00000 00400000 00001000 X 00000002 00000002 0 (name: 1)
Fenced object[ 4] = f6901f00: 02000000 00400000 00001000 X 00000002 00000002 0 (name: 2)
Fenced object[ 5] = f6901f60: 02400000 00400000 00001000 X 00000002 00000002 0 (name: 3)
Fenced object[ 6] = unused
Fenced object[ 7] = unused
Fenced object[ 8] = unused
Fenced object[ 9] = unused
Fenced object[10] = unused
Fenced object[11] = unused
Fenced object[12] = unused
Fenced object[13] = unused
Fenced object[14] = unused
Fenced object[15] = unused

i915_gem_fence_regs after resume:

Reserved fences = 3
Total fences = 16
Fenced object[ 0] = unused
Fenced object[ 1] = unused
Fenced object[ 2] = unused
Fenced object[ 3] = f6042780: P 00c00000 00400000 00001000 X 00000002 00000000 0 (name: 1)
Fenced object[ 4] = unused
Fenced object[ 5] = unused
Fenced object[ 6] = unused
Fenced object[ 7] = unused
Fenced object[ 8] = unused
Fenced object[ 9] = unused
Fenced object[10] = unused
Fenced object[11] = unused
Fenced object[12] = unused
Fenced object[13] = unused
Fenced object[14] = unused
Fenced object[15] = unused

(In reply to comment #52)

Sorry, this is the one after resume.

i915_gem_fence_regs after resume:

Reserved fences = 3
Total fences = 16
Fenced object[ 0] = unused
Fenced object[ 1] = unused
Fenced object[ 2] = unused
Fenced object[ 3] = f676c360: P 00c00000 00400000 00001000 X 00000002 00000000 0 (name: 1)
Fenced object[ 4] = unused
Fenced object[ 5] = unused
Fenced object[ 6] = unused
Fenced object[ 7] = unused
Fenced object[ 8] = unused
Fenced object[ 9] = unused
Fenced object[10] = unused
Fenced object[11] = unused
Fenced object[12] = unused
Fenced object[13] = unused
Fenced object[14] = unused
Fenced object[15] = unused

If fence register save/restore really is the issue, this patch should help.

Current code saves the fence registers before rendering has completed, which can affect fence register allocation. If we save before rendering completes, and restore again at resume time, we may end up causing trouble with whatever objects land in the fenced space after resume.

Saving register state (including fences) *after* we've idled the memory manager should help with that.

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 98560e1..e3cb402 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -67,8 +67,6 @@ static int i915_suspend(struct drm_device *dev, pm_message_t s

        pci_save_state(dev->pdev);

- i915_save_state(dev);
-
        /* If KMS is active, we do the leavevt stuff here */
        if (drm_core_check_feature(dev, DRIVER_MODESET)) {
                if (i915_gem_idle(dev))
@@ -77,6 +75,8 @@ static int i915_suspend(struct drm_device *dev, pm_message_t s
                drm_irq_uninstall(dev);
        }

+ i915_save_state(dev);
+
        intel_opregion_free(dev, 1);

        if (state.event == PM_EVENT_SUSPEND) {

(In reply to comment #54)
> If fence register save/restore really is the issue, this patch should help.
>

Yes, it does help my problem. The system can resume correctly again. I didn't see a hang so far.

I tested the patch in comment 54 and also confirm that it fixes suspend/resume with the internal laptop monitor. Thanks!

It still fails with the external one, but that's a different problem, and I'm going to report it separately.

(In reply to comment #54)
> If fence register save/restore really is the issue, this patch should help.

applied the patch here and it appears to have fixed it for me..

intel gma950 laptop.

Great, thanks for testing. Fix has been pushed into the kernel:

commit 9e06dd39f2b6d7e35981e0d7aded618686b32ccb
drm/i915: correct suspend/resume ordering

(In reply to comment #58)
> Great, thanks for testing. Fix has been pushed into the kernel:
>
> commit 9e06dd39f2b6d7e35981e0d7aded618686b32ccb
> drm/i915: correct suspend/resume ordering

The fix is in drm-intel-next branch.

Eric, please cherry-pick it into qa-branch so it'll be in Q2 package.

(In reply to comment #58)
> Great, thanks for testing. Fix has been pushed into the kernel:
>
> commit 9e06dd39f2b6d7e35981e0d7aded618686b32ccb
> drm/i915: correct suspend/resume ordering
>

Maybe this fix should also be send to 2.6.30.x stable branch, since it's a regression during the 2.6.30 rc process. And it will make user of the stable kernel happy. Thanks.

Changed in xserver-xorg-video-intel:
status: Confirmed → Fix Released

On Tue, 23 Jun 2009 20:16:32 -0700 (PDT)
> --- Comment #60 from Jie Luo <email address hidden> 2009-06-23
> 20:16:32 PST --- (In reply to comment #58)
> > Great, thanks for testing. Fix has been pushed into the kernel:
> >
> > commit 9e06dd39f2b6d7e35981e0d7aded618686b32ccb
> > drm/i915: correct suspend/resume ordering
> >
>
> Maybe this fix should also be send to 2.6.30.x stable branch, since
> it's a regression during the 2.6.30 rc process. And it will make user
> of the stable kernel happy. Thanks.

Good point, want to send a note to <email address hidden> with the commit
info, proposing the patch for inclusion?

Thanks,

Changed in xserver-xorg-video-intel:
importance: Unknown → Critical
Changed in xserver-xorg-video-intel:
importance: Critical → Unknown
Changed in xserver-xorg-video-intel:
importance: Unknown → Critical

Jaunty reached end-of-life on 23 October 2010. The bug is marked as fixed in later versions of Ubuntu

Changed in xserver-xorg-video-intel (Ubuntu Jaunty):
status: Confirmed → Won't Fix
Displaying first 40 and last 40 comments. View all 111 comments or add a comment.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.