Frequent crashes on i915GM (Thinkpad X41) Error in I830WaitLpRing()

Bug #176377 reported by Dagfinn Ilmari Mannsåker
58
Affects Status Importance Assigned to Milestone
xf86-video-intel
Fix Released
Medium
xserver-xorg-video-intel (Ubuntu)
Fix Released
High
Unassigned

Bug Description

After upgrading to Hardy, X crashes several times a day on my IBM Thinkpad X41. Most of the time it leaves the hardware in a state it is unable to reinitialise it from, so I have to reboot in order to get X working again.

||/ Name Version Description
++-============================-============================-========================================================================
ii xserver-xorg-core 2:1.4.1~git20071119-1ubuntu1 Xorg X server - core server
ii xserver-xorg-video-intel 2:2.2.0-1ubuntu1 X.Org X server -- Intel i8xx, i9xx display driver

Revision history for this message
Dagfinn Ilmari Mannsåker (ilmari) wrote :
Revision history for this message
Dagfinn Ilmari Mannsåker (ilmari) wrote :
Revision history for this message
Dagfinn Ilmari Mannsåker (ilmari) wrote :
Revision history for this message
Dagfinn Ilmari Mannsåker (ilmari) wrote :
Revision history for this message
Dagfinn Ilmari Mannsåker (ilmari) wrote :
Revision history for this message
Dagfinn Ilmari Mannsåker (ilmari) wrote :

Here's the difference between the lspci output before and after X crashed:

 00:02.0 VGA compatible controller: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller (rev 03) (prog-if 00 [VGA])
  Subsystem: IBM Unknown device 0582
- Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
+ Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
  Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
+ Latency: 0
  Interrupt: pin A routed to IRQ 17
  Region 0: Memory at a0080000 (32-bit, non-prefetchable) [size=512K]
  Region 1: I/O ports at 1800 [size=8]
@@ -10,7 +11,7 @@
  Capabilities: [d0] Power Management version 2
   Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
   Status: D0 PME-Enable- DSel=0 DScale=0 PME-
-00: 86 80 92 25 03 00 90 00 03 00 00 03 00 00 80 00
+00: 86 80 92 25 07 00 90 00 03 00 00 03 00 00 80 00
 10: 00 00 08 a0 01 18 00 00 08 00 00 c0 00 00 00 a0
 20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 82 05
 30: 00 00 00 00 d0 00 00 00 00 00 00 00 0b 01 00 00

Revision history for this message
Bryce Harrington (bryce) wrote :

SetGrabKeysState - enabled
Error in I830WaitLpRing(), timeout for 2 seconds
pgetbl_ctl: 0x5ffc0001 pgetbl_err: 0x0
ipeir: 0 iphdr: 7d000006
LP ring tail: 14198 head: 1404c len: 1f001 start 0
eir: 0 esr: 0 emr: ffff
instdone: fa41 instpm: 0
memmode: 306 instps: 800f00c4
hwstam: fffe ier: 2 imr: 8 iir: a0
Ring at virtual 0xa7828000 head 0x1404c tail 0x14198 count 83
 00013fcc: 00000000
 00013fd0: 02000011
 00013fd4: 00000000
 00013fd8: 54f00006
 00013fdc: 03cc0b20
 00013fe0: 000402b0
 00013fe4: 000a02c2
 00013fe8: 00c5e0a0
 00013fec: 00000000
 00013ff0: 00000050
 00013ff4: 00c05f90
 00013ff8: 02000011
 00013ffc: 00000000
 00014000: 54f00006
 00014004: 03cc0b20
 00014008: 000a02b0
 0001400c: 001102c2
 00014010: 00c5e0a0
 00014014: 00000000
 00014018: 00000050
 0001401c: 00c05f90
 00014020: 02000011
 00014024: 00000000
 00014028: 54f00006
 0001402c: 03cc0050
 00014030: 00000000
 00014034: 00010012
 00014038: 00c05fe0
 0001403c: 00000000
 00014040: 00000050
 00014044: 00c05f90
 00014048: 7d000006
 0001404c: 00000003
Ring end
space: 130732 wanted 131064
(II) intel(0): [drm] removed 1 reserved context for kernel
(II) intel(0): [drm] unmapping 8192 bytes of SAREA 0xf8d41000 at 0xb7a37000
(II) intel(0): [drm] Closed DRM master.

Fatal server error:
lockup

(II) AIGLX: Suspending AIGLX clients for VT switch
(II) intel(0): fbc disabled on plane a
Error in I830WaitLpRing(), timeout for 2 seconds
pgetbl_ctl: 0x5ffc0001 pgetbl_err: 0x0
ipeir: 0 iphdr: 7d000006
LP ring tail: 141a0 head: 1404c len: 1f001 start 0
eir: 0 esr: 0 emr: ffff
instdone: fa41 instpm: 0
memmode: 306 instps: 800f00c4
hwstam: fffe ier: 2 imr: 8 iir: a0
Ring at virtual 0xa7828000 head 0x1404c tail 0x141a0 count 85
 00013fcc: 00000000
 00013fd0: 7d000011
 00013fd4: 00000000
 00013fd8: 00c00006
 00013fdc: 00000b20
 00013fe0: 006002b0
 00013fe4: 00c002c2
 00013fe8: 0100e0a0
 00013fec: 00e00000
 00013ff0: 7d010050
 00013ff4: 00005f90
 00013ff8: 00000011
 00013ffc: 00000000
 00014000: 00000006
 00014004: 00000b20
 00014008: 000002b0
 0001400c: 000002c2
 00014010: 7d8ee0a0
 00014014: 03800000
 00014018: 01050050
 0001401c: 7d855f90
 00014020: 00000011
 00014024: 7d040000
 00014028: ffff0006
 0001402c: 00900050
 00014030: 00000000
 00014034: 00000012
 00014038: 7d805fe0
 0001403c: 00000000
 00014040: 00000050
 00014044: 02635f90
 00014048: 00000006
 0001404c: 00000003
Ring end
space: 130724 wanted 131064

FatalError re-entered, aborting
lockup

description: updated
Changed in xserver-xorg-video-intel:
importance: Undecided → High
status: New → Triaged
Bryce Harrington (bryce)
description: updated
Changed in xserver-xorg-video-intel:
importance: Undecided → Unknown
status: New → Unknown
Changed in xserver-xorg-video-intel:
status: Unknown → Confirmed
Revision history for this message
Bryce Harrington (bryce) wrote :

Dagfinn, can you give more specific steps to reproduce the problem?

Changed in xserver-xorg-video-intel:
status: Triaged → Incomplete
Revision history for this message
Dagfinn Ilmari Mannsåker (ilmari) wrote :

Not really, I'm afraid.

I've seen it happen after anything between a few minutes and several hours of use, and I haven't noticed any particular activity that triggers it.

Currently I'm using the i810 driver as a workaround.

Revision history for this message
Bryce Harrington (bryce) wrote :

Hmm, okay, since it looks like upstream is unable to reproduce the behavior based on the limited amount we know so far, it would be helpful for you to do some additional forensics for us, if you don't mind.

The first thing needed is a backtrace of Xorg. General directions for doing this is at https://wiki.ubuntu.com/DebuggingXorg. Essentially, what you need to do is ssh into the system from another computer after a crash, attach gdb to it, and get a 'backtrace full'.

If that doesn't work, you can ssh in, start up X from within gdb, and run it until it locks up, and then do a backtrace full. Paste the full output into a text file, and attach it to this bug report.

Revision history for this message
Dagfinn Ilmari Mannsåker (ilmari) wrote :

Here's the full backtrace. I'll attach the core file as well.

Revision history for this message
Dagfinn Ilmari Mannsåker (ilmari) wrote :
Revision history for this message
Bryce Harrington (bryce) wrote :

Hmm, I see in your log it's loading the "fb" driver, which upstream reports is incompatible with the -intel driver. Can you attach your xorg.conf, perhaps you simply have a misconfiguration.

Revision history for this message
Dagfinn Ilmari Mannsåker (ilmari) wrote :

There's no mention of "fb" in my xorg.conf, but it might be pulled in by something else in the Module section.

I'll try paring it down to just the keyboard layout settings and see if that helps.

Revision history for this message
Dagfinn Ilmari Mannsåker (ilmari) wrote :

The X server crashed again just now, with everything except the InputDevice section for the keyboard removed from xorg.conf.

FWIW, I'm attaching the Xorg.0.log from a not-yet-crashed session with the pared-down xorg.conf.

Revision history for this message
Bryce Harrington (bryce) wrote :

Dagfinn,

Upstream is still curious if you have the kernel framebuffer loaded. Could you attach the output of dmesg and lsmod?

Revision history for this message
Dagfinn Ilmari Mannsåker (ilmari) wrote :

Nope, no framebuffer drivers loaded.

BTW, would it be easier if I subscribed to the upstream bugzilla so you wouldn't have to proxy everything?

Revision history for this message
Dagfinn Ilmari Mannsåker (ilmari) wrote :
Revision history for this message
Bryce Harrington (bryce) wrote :

Hi Dagfinn,

Thanks, I'll pass those along. Upstream says `cat /proc/fb` would also be interesting.

And yes, it would _definitely_ help if you could subscribe to the upstream bug and reply directly. Cutting me out as a middleman would probably make this process go a whole lot faster!

Revision history for this message
Johan Christiansen (johandc) wrote :

I have the same issue here. Ever since upgrading my X41 to Hardy i have frequent crashes (anywhere between 5 mins to several hours between). The screen just blanks and keeps blank. I can't even type CTRL+ALT+F1 to enter console. It seems lige the plug is pulled and i can only recover the PC by rebooting it. However, i can press CTRL+ALT+Del and the machine will reboot nicely and uspash will show up during shut down. I'm not sure that switching to i810 fixes the issue.

Revision history for this message
Simon Kagstrom (simon-kagstrom) wrote :

I'd just like to add that I also see this on my X41 on Hardy (kept uptodate). The screen flickers in the same way as when the X server is starting up.

Revision history for this message
oh (oystein-homelien) wrote :

I think I am experiencing the same bug -
* Thinkpad X60 (logs attached)
* Ubuntu Hardy updated as of today - crash happened last time yesterday
* Started happening some time in the hardy upgrade cycle
* Upon crashing, the screen flickers like on startup of X (backlight is turned completely off for a moment etc)
* It seems xorg is trying to restart, but won't. Have tried killing xorg remotely and
restarting. Does not work, must reboot - very annoying.
* GNOME, No desktop effects enabled
* No apparent reason for crashing, but seems to happen often when I run "heavier" OpenGL applications (tuxracer, wine/directx)
* "Lightweight" OpenGL applications work (my own homemade app has been running an entire night without crashing)
* It also happens when I (to my knowledge) do not run OpenGL at all. I can be just moving the cursor reading a web page in firefox, and suddenly it crashes.
* The caret/cursor disappears, the x server is really trying to restart.
* No artifacts on screen. Turns black (and I can see the backlight switching on/off/on)
* Can not switch to text ttys with ctrl-alt-f1 after crash.
* I CAN reboot with ctrl-alt-del, but there is nothing on screen
* Ubuntu Shutdown splash screen shows some times after ctrl-alt-del, but not always.

Last lines of attached Xorg.0.log.old (I believe this is the log from when xorg
tries to _restart_, not the session crashing initially, but I am not sure):

(II) intel(0): 0x007bf000: end of stolen memory
(II) intel(0): 0x00800000-0x00bfffff: front buffer (4096 kB) X tiled
(II) intel(0): 0x00c00000-0x017fffff: exa offscreen (12288 kB)
(II) intel(0): 0x01800000-0x01bfffff: back buffer (4096 kB) X tiled
(II) intel(0): 0x01c00000-0x01ffffff: depth buffer (4096 kB) X tiled
(II) intel(0): 0x02000000-0x03ffffff: classic textures (32768 kB)
(II) intel(0): 0x10000000: end of aperture
(WW) intel(0): PRB0_CTL (0x0001f001) indicates ring buffer enabled
(WW) intel(0): PRB0_HEAD (0xac4167fc) and PRB0_TAIL (0x00016808) indicate ring buffer not flushed
(WW) intel(0): Existing errors found in hardware state.

Revision history for this message
oh (oystein-homelien) wrote :

Attached is lspci -vvvn, related to my bug report in the previous comment.

Also, I think Bug #197722 might be a duplicate of this. His xorg log shows the same error as mine. Although I believe this should be the master bug.

unggnu (unggnu)
Changed in xserver-xorg-video-intel:
status: Incomplete → Confirmed
Revision history for this message
Bryce Harrington (bryce) wrote :

On the upstream bug report, it sounds that the issue is not found in the 2.1 intel driver, so reverting to that is a temporary workaround.

To debug this issue, what's needed is someone who can reproduce the issue to git bisect from 2.1 to 2.2.1, and identify which change resulted in this problem. Since it takes a while for the problem to manifest, this may take some time to do, but is the most viable way of narrowing down the problem.

Revision history for this message
Liken Otsoa (liken) wrote :

bug #197722, which I reported is a duplicate of this. IBM Thinkpad X41 Tablet.

Well.. for some weeks I am not having more crashes with xserver-xorg-video-intel (version 2:2.2.1-1ubuntu5 from hardy uptodate Now). But I think there is no crashes since I put this options in xorg.conf, section "Device":

Option "AccelMethod" "exa"
Option "MigrationHeuristic" "greedy"
Option "ExaNoComposite" "true"

Revision history for this message
Danilo Piazzalunga (danilopiazza) wrote :

I am experiencing a similar problem with a system recently upgraded to Hardy.

ii xserver-xorg-core 2:1.4.1~git20080131-1ubuntu5 Xorg X server - core server
ii xserver-xorg-video-intel 2:2.2.1-1ubuntu5 X.Org X server -- Intel i8xx, i9xx display driver

Revision history for this message
Danilo Piazzalunga (danilopiazza) wrote :
Revision history for this message
Danilo Piazzalunga (danilopiazza) wrote :
Revision history for this message
Danilo Piazzalunga (danilopiazza) wrote :
Revision history for this message
Bryce Harrington (bryce) wrote :

Liken, that's interesting - I recently put in a patch to switch the "greedy" heuristic on by default for everyone. I'm curious if that change alone is necessary, or if Option "ExaNoComposite" "true" is also required? (am I correct in that this shuts off Compiz as well? That would be unfortunate...)

Revision history for this message
Liken Otsoa (liken) wrote :

Hi Bryce. After reading your message, I disabled the options:

#Option "AccelMethod" "exa"
#Option "MigrationHeuristic" "greedy"
#Option "ExaNoComposite" "true"

And I had a crash shortly after. If exa accel and greedy are default now with xserver-xorg-video-intel 2:2.2.1-1ubuntu5, then the key could be exanocomposite true option.

I think I have no problems with compiz and exanocomposite true. I see the cube and others. Although I not use compiz by default in my system.

I am using too: INTEL_BATCH=1 in etc/environment because this increase 3D render speed notably. (glxgears from 4000 to 5000 FPS)

Revision history for this message
Bryce Harrington (bryce) wrote :

Liken, interesting, thanks for confirming that.

Regarding INTEL_BATCH=1, I got word from the upstream developers that they do not recommend shipping Ubuntu with that option turned on by default. While it may help performance, it also exposes code they feel is not very stable and could cause crashes in certain corner case situations. So unfortunately we won't be supporting Hardy configurations with INTEL_BATCH=1 set. (Of course, you're welcome to use it if you like the performance boost and don't mind the instabilities.)

Could you repeat the test with those three options disabled, AND with INTEL_BATCH disabled? Comment the INTEL_BATCH out rather than setting it to 0 (sometimes the internal code checks if the env var is present but doesn't check the value).

Revision history for this message
Liken Otsoa (liken) wrote :

Bryce. I am testing this, no intel_batch, no options, and I have no crash at the moment (1 Day).
I am using xserver-xorg-video-intel 2:2.2.1-1ubuntu6

But I am seeing a NEW BUG, I do not know if it is new or I realize now. it is failing 3D rendering.
Try fullscreen screensavers: atunnel, gears, flipflop, .. Where it is most appreciated.

-------------
Thinkpad X41 Tablet
Hardy Uptodate

Revision history for this message
Bryce Harrington (bryce) wrote :

Liken, please report different issues on separate bugs. However, if you're using Compiz and OpenGL, there is a known issue where this does not always work properly; if you can reproduce the issues with Compiz turned off, they may be worth reporting though. I expect the issue will be solved either in Intrepid or Intrepid+1 by inclusion of TTM (which is too intrusive and large of a change for Hardy).

Revision history for this message
oh (oystein-homelien) wrote :

Just pitching in to report that

    Option "ExaNoComposite" "true"

definitely solved this problem for me (as a workaround). I usually had 2-3 daily spontaneous X crashes as described previously in this bug, they stopped immediately when I added this option and restarted X.

What exactly does this option turn off? OpenGL still works nicely for me (I develop opengl apps), but I don't run 3d effects in the window manager.

Revision history for this message
oh (oystein-homelien) wrote :

I forgot to say in the previous comment that I have been running ExaNoComposite=true for over a week now, without any X crash. Before that 2-3 per day.

Revision history for this message
Bryce Harrington (bryce) wrote :

I didn't spot ExaNoComposite on the 'intel' man page, but found a description of it on the via man page:

   Option "EXANoComposite" "boolean"
              If Exa is enabled using the above option, Don't accelerate composite. Since EXA, and in particular, it's composite accelera-
              tion is still experimental, This is a way to disable exa composite acceleration.

So, it sounds like it could make compiz run slower, but perhaps with more stability.

Revision history for this message
Bryce Harrington (bryce) wrote :

Is anyone subscribed on this bug still having the crashes on i915?

Revision history for this message
Dagfinn Ilmari Mannsåker (ilmari) wrote :

I have not been seeing any crashes for the last month or so.

Revision history for this message
Danilo Piazzalunga (danilopiazza) wrote :

Same thing, I have not been seeing this one for the last month or so.

Revision history for this message
Bryce Harrington (bryce) wrote :

Dagfinn and Danilo, thanks for confirming the crash has gone. I suspect the work we did earlier fixed it. Since Dagfinn is the original reporter and no longer sees the crash, I'm closing this bug as fixed. It sounds like upstream will be doing the same.

If others are also seeing crashes on i915GM, please report them as new bug reports.

Changed in xserver-xorg-video-intel:
status: Confirmed → Fix Released
Changed in xserver-xorg-video-intel:
status: Confirmed → Fix Released
Changed in xserver-xorg-video-intel:
importance: Unknown → Medium
Changed in xserver-xorg-video-intel:
importance: Medium → Unknown
Changed in xserver-xorg-video-intel:
importance: Unknown → Medium
To post a comment you must log in.