opengl total freeze using DRI with intel 945 graphics chip

Bug #177518 reported by aaronetz
30
This bug affects 1 person
Affects Status Importance Assigned to Milestone
xf86-video-intel
Fix Released
High
linux (Ubuntu)
Fix Released
High
Unassigned
Hardy
Fix Released
High
Unassigned
Intrepid
Fix Released
High
Unassigned
xserver-xorg-video-intel (Ubuntu)
Invalid
Medium
Unassigned
Hardy
Invalid
Undecided
Unassigned
Intrepid
Invalid
Medium
Unassigned

Bug Description

Crash while running 3D graphics applications (eg. Blender, Scorched3D, Google Earth) on Intel 945/gma950 systems:
Complete system lockup: even Alt+SysRq+b reboot doesn't work
Screen contents retained

Please do not confuse this with other X crash bugs such as bug 120834; if you cannot find an exact match for your symptoms then please file a new bug.

WORKAROUND: After adding DRI=false to xorg.conf the freezes stop.

Reported from Gutsy and Hardy, with and without Compiz. Logs nothing in xorg.0.log.old or syslog.

Revision history for this message
aaronetz (aaronetz) wrote :

Some new insights:
Freeze occurs in Google Earth also. So it is probably related to openGL.
After adding DRI=false to xorg.conf the freezes stop.

Revision history for this message
Daniel Hahler (blueyed) wrote :

Do you have compiz installed/running?
What graphic card/drivers are you using?

Revision history for this message
aaronetz (aaronetz) wrote :

Compiz is not installed at all (default Kubuntu installation).
My graphic card is an onboard intel gma950 (i945 chipset).
I'm using the intel driver (the default that was autodetected after Kubuntu installation).
I tried switching to i810, but then X refused to run.

Revision history for this message
Daniel Hahler (blueyed) wrote :

This sounds like a problem with Xorg or the driver you are using.
I'm assigning the bug to the xorg package for now.
Please refer to https://wiki.ubuntu.com/X/Debugging for what information would be useful to include.

description: updated
description: updated
Revision history for this message
aaronetz (aaronetz) wrote :

I'm attaching some files, according to The Ubuntu X Debugger's Handbook.
xorg.conf has the DRI=false option. In order to reproduce the bug this option has to be removed.

Revision history for this message
aaronetz (aaronetz) wrote :
Revision history for this message
aaronetz (aaronetz) wrote :
Revision history for this message
aaronetz (aaronetz) wrote :
Revision history for this message
aaronetz (aaronetz) wrote :
Revision history for this message
aaronetz (aaronetz) wrote :
Revision history for this message
aaronetz (aaronetz) wrote :
Daniel Hahler (blueyed)
Changed in xserver-xorg-video-intel:
assignee: blueyed → nobody
status: Incomplete → New
Revision history for this message
Rebecca Palmer (rebecca-palmer) wrote :

Are those logs from right after the crash? (Xorg only keeps them for one reboot) They don't appear to record one.

Does your screen go blank when the system freezes? If it does, then flashes between off and black with a "busy" mouse pointer several times, that's probably bug 120834.

Revision history for this message
aaronetz (aaronetz) wrote :

The logs are not from after the crash. I should have guessed their irrelevance. I will reproduce the crash and reattach them.

My screen does not go blank. It stays as it was, just completely frozen.

Revision history for this message
aaronetz (aaronetz) wrote :
Revision history for this message
aaronetz (aaronetz) wrote :
Revision history for this message
aaronetz (aaronetz) wrote :
Revision history for this message
aaronetz (aaronetz) wrote :
Revision history for this message
aaronetz (aaronetz) wrote :
Revision history for this message
aaronetz (aaronetz) wrote :
Revision history for this message
aaronetz (aaronetz) wrote :
Revision history for this message
Rebecca Palmer (rebecca-palmer) wrote :

Still no error message in the Xorg log, which means this *isn't* bug 120834 (which always logs Error in I830WaitLpRing()), and could be something that freezes the system so completely that it can't log anything. I suggest the following:

-Reproduce the crash
-Try Ctrl+Alt+F1 (should switch to a text mode terminal) and Ctrl+Alt+Backspace (should restart X)
-If nothing happens, hold down Alt and SysRq(PrntScrn) and press 1, t, e, s, u, i, b in that order, allowing any disk activity to finish between each (the first two log a call trace, the rest shut down and reboot the system as gracefully as possible)
-Post /var/log/syslog, and which of the above worked

Revision history for this message
aaronetz (aaronetz) wrote :

None of the above worked - no response at all.
I'm attaching the file, gzipped.

Revision history for this message
Nic (ntetreau) wrote :

Having the same crash here using scorched3D, screen content still visible, no distortion, hard lock, keyboard doesn't even work. I'm using Hardy with latest intel driver, DRI is enabled, compiz enabled, using
Section "device" #
 Identifier "device1"
 Boardname "VESA driver (generic)"
 Busid "PCI:0:2:0"
 Driver "intel"
    Option "FramebufferCompression" "on"
    Option "AccelMethod" "XAA"
    Option "Tiling" "on"
 Screen 0
EndSection

in xorg.conf

Revision history for this message
Rodrigo Varella Rahmi (rovarella) wrote :

Here frequently happens in doom 3, some wine apps like n64 emulators crash too.

gma950 (i945)
i915 driver

Revision history for this message
Bryce Harrington (bryce) wrote : Re: opengl total freeze using DRI with intel graphics chip

Hmm, the syslog output doesn't seem to have an error message either.

I think to solve this one it'll be necessary to step through the code in gdb to isolate the line / state before the crash.

It would also be worthwhile to research for similar bug reports in the blender or xorg bug trackers.

Changed in xserver-xorg-video-intel:
status: New → Incomplete
Revision history for this message
aaronetz (aaronetz) wrote :

I did a lot of search on this, but couldn't find a solution (except disabling DRI).
How can I debug the code you mentioned?
I'm not an ubuntu developer, but I'm a programmer, so if you lead me I might be able to help.

Revision history for this message
Gero (gero-putzar) wrote :

I've got a similar experience with intel 965 with or without explicitly loading the dri-module:

I tried to use the commercial CFD-mesh-generator IcemCFD with Ubuntu Gutsy (Kubuntu alternate-installation on AMD64) on a Dell Optiplex 745, one graphics controller Intel 965 with two monitors connected. First all seems to do well but after a couple of IcemCFD-operations mouse and keyboard input don't work, the screen is messed up and you need a hard reboot.

In order to reproduce the error more easily (and without the need for an IcemCFD-Licence) I tested gmsh which didn't seem to have problems. I guess from the slow display that it doesn't use OpenGL/Mesa.

Next I compiled the contents of the mesademos-package. Those demos can lead to the same experience: When more than one of them is running and the windows are moved fast on the screen one over the other, sooner or later the system crashes (most of the time). The demos "terrain", "tunnel", "fire" and "bounce" were my favorite candidates, two or three of them, some window-moving preferably also from one screen to the other and one over the other, and it took most certainly less than a minute to stop my computer.

I tested three different setups to check wether the dualhead configuration is the problem: First the dual head configuration with correct resolution on both screens. For setting the correct resolution on both screens I call the xrandr-utility from within /etc/kde3/kdm/Xsetup: (/usr/bin/xrandr --output TMDS-1 --auto). Second I skip the xrandr call which leads to a wrong aspect ratio on the second screen connected to TMDS-1 since it is a wide screen 16:9 and the other is 4:3. It makes no difference. Last I use only one screen. This does not help either.

I also tried to install the newer driver package xserver-xorg-video-intel from Hardy (Version 2.2.0+git20080107-1ubuntu2 instead of 2.1.1-0ubuntu9), it did not help.

I tried with and without explicitly loading the dri-module, but that does not seem to have any effect. In the logfile it says that the dri-module is loaded though in the xorg.conf there is no section "Module" at all. How can I prevent xorg from loading the dri-module?

An excerpt from lspci:
00:02.0 VGA compatible controller: Intel Corporation 82Q963/Q965 Integrated Graphics Controller (rev 02)
00:02.1 Display controller: Intel Corporation 82Q963/Q965 Integrated Graphics Controller (rev 02)

Attached are the xorg.conf for the one-screen-configuration and the corresponding logfile.

Revision history for this message
Gero (gero-putzar) wrote :

... the logfile

description: updated
Revision history for this message
Rebecca Palmer (rebecca-palmer) wrote :

Gero, i965 graphics card and screen messed up (rather than just frozen) suggests bug 120834, not this bug. I have edited the description to make the difference clearer.

DRI is on by default; to turn it off (which avoids both these bugs, but may drastically reduce performance) add
         Option "DRI" "false"
to the graphics controller Device section of /etc/X11/xorg.conf.

Note also that X starts a new log file on every reboot, so for crash bugs the log file you need is Xorg.0.log.old, not Xorg.0.log. (This bug doesn't log anything, but 120834 logs Error in I830WaitLpRing(), timeout for 2 seconds)

Revision history for this message
Bryce Harrington (bryce) wrote :

https://wiki.ubuntu.com/DebuggingXorg has some tips on using gdb to debug issues like this, although it doesn't give much details on stepping through things. I'm afraid this is something you have to just do, to learn.

Revision history for this message
In , mcisely (isely) wrote :
Download full text (3.5 KiB)

Created an attachment (id=15005)
standard signed-off kernel patch to fix intel driver lockup on buffer swap

Running an opengl application which attempts to synchronize buffer swapping to the vertical sync event eventually results in a hard lockup of the Linux kernel. When this happens even trying the magic sysrq key won't show any signs of life. This problem takes from 5 to 90 minutes of running before the lockup happens, and you need to be running dual head cloned mode for it to happen at all. The root cause is not related to dual head, but the less predictable interrupt rate when running this way I am guessing really exacerbates the underlying race condition, raising the probability of failure.

A standard signed-off kernel patch which fixes the problem is attached to this bug. I am not a git expert so I apologize that I don't have a git changeset built against the drm git repository. (I did however check that this problem also still appears to be present in the git repository.) It should be trivial to apply this patch in any case. This patch was built against the 2.6.24.3 vanilla kernel source tree.

Following are the comments from the attached patch:

<blockquote>
The i915_vblank_swap() function schedules an automatic buffer swap
upon receipt of the vertical sync interrupt. Such an operation is
lengthy so it can't happen normal interrupt context, so the DRM
implements this by scheduling the work in a kernel softirq-scheduled
tasklet. In order for the buffer swap to work safely, the DRM's
central lock must be taken, via a call to drm_lock_take() in drm_irq.c
within the function drm_locked_tasklet_func(). The lock-taking logic
uses a non-interrupt-blocking spinlock to implement the manipulations
needed to take the lock. Note that a non-interrupt-blocking spinlock
blocks kernel pre-emption and atomically sets a flag, but interrupts
are still enabled. This semantic is safe if ALL attempts to use the
spinlock only happen from process context. However this buffer swap
happens from softirq context which is really a form of interrupt
context that WILL pre-empt execution even when normal thread
pre-emption is otherwise disabled. Thus we have an unsafe situation,
in that drm_locked_tasklet_func() can block on a spinlock already
taken by a thread in process context which will never get scheduled
again because of the blocked softirq tasklet. This wedges the kernel
hard.

It's a very small race condition, but a race nonetheless with a very
undesirable potential outcome. To trigger this bug, run a dual-head
cloned mode configuration which uses the i915 drm, then execute an
opengl application which synchronizes buffer swaps against the
vertical sync interrupt. In my testing, a lockup always results after
running anywhere from 5 minutes to an hour and a half. I believe
dual-head is needed to really trigger the problem because then the
vertical sync interrupt handling is no longer predictable (due to
being interrupt-sourced from two different heads running at different
speeds). This raises the probability of a the tasklet trying to run
while the userspace DRI is doing things to the GPU (and manipulating
the DRM lock).

The fix is to ...

Read more...

Revision history for this message
mcisely (isely) wrote :

Hi all,

I'm not an ubuntu user (Debian actually), however a google search for this issue has led me here. I am pretty sure I'm seeing the same problem while running an opengl app with this driver (hard random lockup, frozen screen, magic sysrq key is useless). I believe I've found the cause and have a viable fix. I've generated a bug report with the patch at freedesktop.org. Go to here if you want to examine / test:

http://bugs.freedesktop.org/show_bug.cgi?id=14937

I am curious to know if this fixes the problem for people here (thus adding further confirmation that I found it). My e-mail address is included there with the bug report.

  -Mike Isely

Revision history for this message
aaronetz (aaronetz) wrote :

This patch solves my problem.
I did a long test with blender (with DRI turned on, of course) and the freezes do not occur.
Thank you Mike!

Revision history for this message
unggnu (unggnu) wrote :

I guess it is then a kernel issue only.

Changed in xserver-xorg-video-intel:
status: Incomplete → Invalid
Changed in linux:
status: New → Confirmed
Changed in xserver-xorg-video-intel:
status: Unknown → Confirmed
Revision history for this message
In , Michel-tungstengraphics (michel-tungstengraphics) wrote :

Dave pushed the fix to drm Git and submitted it for 2.6.25.

Changed in xserver-xorg-video-intel:
status: Confirmed → Fix Released
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Thanks for the pointer to the upstream fix. I'm adding the upstream git commit id and title for the kernel team to reference. I'll have them take a look. Thanks.

commit 9df5808cca52f33e1deb52b5010c68c6ed1656fe
Author: Mike Isely <email address hidden>
Date: Thu Mar 13 15:30:35 2008 -0500

    drm: Fix race that can lockup the kernel

Changed in linux:
assignee: nobody → ubuntu-kernel-team
importance: Undecided → High
status: Confirmed → Triaged
Revision history for this message
mcisely (isely) wrote :

You're welcome.

Thank you (aaronetz) for testing and verifying the fix for me.

  -Mike

Stefan Bader (smb)
Changed in linux:
assignee: ubuntu-kernel-team → stefan-bader-canonical
status: Triaged → In Progress
Revision history for this message
Stefan Bader (smb) wrote :

Cherry picked the mentioned patch.

Changed in linux:
status: In Progress → Fix Committed
Revision history for this message
In , Thomas-tungstengraphics (thomas-tungstengraphics) wrote :

I think this fix is partially incorrect.

The idea of tasklets is that you should be able to run a "bottom half" of the IRQ handler with hard IRQs enabled.

Data that is shared between a hard interrupt handler and a tasklet or a normal process should use spin_lock_irqsave() whereas data shared between a tasklet and a normal process should use spin_lock_bh()

So in this case, I think we should be using spin_lock_bh() to avoid unnecessary disabling of hard IRQs.

/Thomas

Revision history for this message
In , mcisely (isely) wrote :

Interesting. I was unaware of spin_lock_bh(), but having looked at it now I agree. If I had seen it before, I would have used it.

But for what it's worth, I didn't use spin_lock_irqsave() without some care first. This spin lock is only ever taken for very short reasonably deterministic, non-blocking intervals in the DRM code (as part of taking a much heavier-weight lock). So I felt that any latency impact from interrupt blockage in these cases should be negligible and thus I didn't really look for a different (potentially more complex solution).

  -Mike

Revision history for this message
In , Michel-tungstengraphics (michel-tungstengraphics) wrote :

(In reply to comment #2)
> Data that is shared between a hard interrupt handler and a tasklet or a normal
> process should use spin_lock_irqsave() whereas data shared between a tasklet
> and a normal process should use spin_lock_bh()

Note that the tasklet callback function may also be called from normal process context if the tasklet couldn't acquire the DRM lock. Would spin_lock_bh() still work in that case?

Revision history for this message
In , Michel-tungstengraphics (michel-tungstengraphics) wrote :

(In reply to comment #4)
> Note that the tasklet callback function may also be called from normal process
> context if the tasklet couldn't acquire the DRM lock. Would spin_lock_bh()
> still work in that case?

Never mind, of course this spinlock is only used by the actual tasklet...

Changed in xserver-xorg-video-intel:
status: Fix Released → Confirmed
Revision history for this message
In , Airlied-freedesktop (airlied-freedesktop) wrote :

so did someone want to submit a patch to use _bh?

Revision history for this message
In , Thomas-tungstengraphics (thomas-tungstengraphics) wrote :

I've pushed a fix for this now. Let me know if there are any problems with it.

/Thomas

Changed in xserver-xorg-video-intel:
status: Confirmed → Fix Released
Revision history for this message
aaronetz (aaronetz) wrote :

I've just upgraded to Hardy, and the bug occurs there, although it seems that the fix for this bug has been released.
Is this fix already included in Hardy?

Revision history for this message
Roderick B. Greening (roderick-greening) wrote :

I have an i945 and it works with Hardy. I am using the now default EXA rather than XAA, perhaps that may help.

Start with a new xorg.conf and see if the problem still exists.

Revision history for this message
aaronetz (aaronetz) wrote :

I did a fresh install.
How can I tell if the fix for this bug is included in the current release (besides looking at the kernel's source)?

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hrm, Stefan mentioned he committed this patch but I can't seem to find the commit in the hardy git tree. I'll nudge the kernel team to take a look again. Thanks.

Changed in linux:
assignee: stefan-bader-canonical → ubuntu-kernel-team
status: Fix Committed → Triaged
milestone: none → ubuntu-8.04.1
Changed in linux:
milestone: ubuntu-8.04.1 → none
status: Triaged → Fix Released
assignee: nobody → ubuntu-kernel-team
importance: Undecided → High
milestone: none → ubuntu-8.04.1
status: New → Triaged
Changed in xserver-xorg-video-intel:
status: New → Invalid
Changed in linux:
status: Fix Released → Fix Committed
Revision history for this message
Tim Gardner (timg-tpi) wrote :

commit 104d415651e1d8f5a0f0720bdc2e1f527544a24b was released with 2.6.24-13-23

Changed in linux:
milestone: ubuntu-8.04.1 → none
status: Triaged → Fix Released
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Marking "Fix Released" against Intrepid since the Intrepid kernel is now available in the archives.

Changed in linux:
status: Fix Committed → Fix Released
Revision history for this message
João Pinto (joaopinto) wrote :

I am experiencing this bug on Hardy and I don't understand the bug status, the top status list shows Fix Released for Hardy however the bug is not fixed on Hardy. Would someone care to explain how to get the fix ?

Thanks

Revision history for this message
theblackkat (theblackkat) wrote :

I'm still having total freezes when running opengl apps, like blender, armagetron, etc, with a fresh install of Intrepid. Is there a way to apply that fix Mike proposed, or is it supposed to be included already? =/

Revision history for this message
Launchpad Janitor (janitor) wrote : Kernel team bugs

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

Changed in xserver-xorg-video-intel:
importance: Unknown → High
Revision history for this message
Andrew Crome (acrome) wrote :

I have a toshiba protege r500 running ubuntu10.10 despite following every instruction that i can find I cannot get programs such as googleearth or doom 3 to either load or install what the hell is going on ?

Changed in xserver-xorg-video-intel:
importance: High → Unknown
Changed in xserver-xorg-video-intel:
importance: Unknown → High
bdien (vl624)
Changed in xserver-xorg-video-intel (Ubuntu):
status: Invalid → Incomplete
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.