Xorg crashes w/ failed assert in bm_fake_NotifyContendedLockTake, with high CPU usage after OpenGL screensaver [965GM]

Bug #234768 reported by Martin Olsson
6
Affects Status Importance Assigned to Milestone
X.Org X server
Fix Released
Medium
mesa (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

I'm using Ubuntu Hardy Heron, and I just got back to my computer and for the third time I found it completely hung with no mouse cursor, no response and a single frozen image from the screen saver is visible (I use the screensaver called "busy spheres" and I think it uses openGL? not sure). The screen saver image also has some defects in it along the borders. I have attached a screenshot (photo) of what it looks like.

This is what I know:
* the machine is completely unresponsive through the usual keyboard/mouse (I don't see any mouse curosor even).
* I can however SSH to the machine so the kernel is still operating correctly.
* If I start the "top" program I can see X spinning permanently at 100% (and this does not go away; I've tried waiting for more than 15minutes). This is what it looks like in top:

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 3487 root 20 0 264m 3292 2496 R 99.2 0.2 26:19.56 Xorg
    1 root 20 0 2844 1688 544 S 0.0 0.1 0:01.16 init
    2 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kthreadd
    3 root 15 -5 0 0 0 S 0.0 0.0 0:00.94 ksoftirqd/0
    4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
    5 root 15 -5 0 0 0 S 0.0 0.0 0:00.08 events/0

* Further, I also know that all commands that require super user privileges doesn't work. The first thing I tired was to launch a "sudo gdb" in order to attach to X.org and see what it's doing but this "sudo gdb" command just hangs too, blocking indefinatelt and I can't even press CTRL-C on it. Also, other commands such as "sudo su", "su strace" and even "sudo apt-get install something" will also hang and I can't CTRL-B out of them. Basically, whenever I run anything command as a super user that ssh connection becomes unusable so I have to connect a new ssh client to continue trying other stuff.
* I also know that this doesn't happen all the time, most of the time when I get back to my computer I just move the mouse a little bit and the screensaver (busy spheres) goes away I can continue use my machine normally. I've ran into this freeze/hang bug three times since I installed hardy (so the bug happens about once a week). Note that I always leave this machine on over night though so it's getting a pretty nice stress test each night I suppose?
* I did not have any problems like this with ubuntu gutsy gibbon. I most have left this machine on over night every single day for at least 5 months with gutsy gibbon installed and using the exact same screen saver. I never ever saw this bug using gutsy gibbon.

I have an intel 965GM gfx card and I use compiz. I will attach x.org conf and log later today. If there is any additional information I can provide which will help you track down this bug, let me know.

[lspci]
00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub (rev 0c)
00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c)
00:00.0 0600: 8086:2a00 (rev 0c)
00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub (rev 0c)
     Subsystem: Hewlett-Packard Company Unknown device 30cc
00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c) (prog-if 00 [VGA controller])
     Subsystem: Hewlett-Packard Company Unknown device 30cc

Revision history for this message
Martin Olsson (mnemo) wrote : x.org hangs during screensaver (100% CPU, no mouse cursor, no response)

I'm using Ubuntu Hardy Heron, and I just got back to my computer and for the third time I found it completely hung with no mouse cursor, no response and a single frozen image from the screen saver is visible (I use the screensaver called "busy spheres" and I think it uses openGL? not sure). The screen saver image also has some defects in it along the borders. I have attached a screenshot (photo) of what it looks like.

This is what I know:
* the machine is completely unresponsive through the usual keyboard/mouse (I don't see any mouse curosor even).
* I can however SSH to the machine so the kernel is still operating correctly.
* If I start the "top" program I can see X spinning permanently at 100% (and this does not go away; I've tried waiting for more than 15minutes). This is what it looks like in top:

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 3487 root 20 0 264m 3292 2496 R 99.2 0.2 26:19.56 Xorg
    1 root 20 0 2844 1688 544 S 0.0 0.1 0:01.16 init
    2 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kthreadd
    3 root 15 -5 0 0 0 S 0.0 0.0 0:00.94 ksoftirqd/0
    4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
    5 root 15 -5 0 0 0 S 0.0 0.0 0:00.08 events/0

* Further, I also know that all commands that require super user privileges doesn't work. The first thing I tired was to launch a "sudo gdb" in order to attach to X.org and see what it's doing but this "sudo gdb" command just hangs too, blocking indefinatelt and I can't even press CTRL-C on it. Also, other commands such as "sudo su", "su strace" and even "sudo apt-get install something" will also hang and I can't CTRL-B out of them. Basically, whenever I run anything command as a super user that ssh connection becomes unusable so I have to connect a new ssh client to continue trying other stuff.
* I also know that this doesn't happen all the time, most of the time when I get back to my computer I just move the mouse a little bit and the screensaver (busy spheres) goes away I can continue use my machine normally. I've ran into this freeze/hang bug three times since I installed hardy (so the bug happens about once a week). Note that I always leave this machine on over night though so it's getting a pretty nice stress test each night I suppose?
* I did not have any problems like this with ubuntu gutsy gibbon. I most have left this machine on over night every single day for at least 5 months with gutsy gibbon installed and using the exact same screen saver. I never ever saw this bug using gutsy gibbon.

I have an intel 965GM gfx card and I use compiz. I will attach x.org conf and log later today. If there is any additional information I can provide which will help you track down this bug, let me know.

Revision history for this message
Martin Olsson (mnemo) wrote :
Revision history for this message
Martin Olsson (mnemo) wrote :
Revision history for this message
Martin Olsson (mnemo) wrote :
Changed in xorg-server:
status: Unknown → Confirmed
Revision history for this message
unggnu (unggnu) wrote :

I guess it is a driver bug but you can recheck it with the Vesa. If the problem still appears with the -vesa driver it is an Xorg bug.

Please try to get a proper backtrace described in this howto https://wiki.ubuntu.com/X/Backtracing .

Changed in xorg:
status: New → Incomplete
Revision history for this message
unggnu (unggnu) wrote : Re: periodic Xorg crashes w/ high cpu usage after OpenGL screensaver [965GM]

Btw. many thanks for reporting it upstream. For future reports there is howto which data should be uploaded for Intel drivers http://intellinuxgraphics.org/how_to_report_bug.html since there seems to be some information missing. Thanks again.

Revision history for this message
Martin Olsson (mnemo) wrote :

First of all, as I explained above this bug does not allow me to launch any "sudo" operation and in particular I cannot start "sudo gdb" so I will not be able to obtain a backtrace. This is in itself a problem because it means some X.org bugs cannot be efficiently debugged at all (it's probably going to be a looong time before airlied's non-root x.org patches, see http://airlied.livejournal.com/59521.html, make it into ubuntu so maybe someone should start to figure out what on earth could possibly block ALL SUDO operations on the system?).

Moving on, here is some of the additional information (this information was extracted after a reboot, when the system was operating connectly again):
"uname -a" reports Linux 2.6.24-16.386 #1 Thu Apr 10 12:50:06 UTC 2008 i86 GNU/Linux

Revision history for this message
Martin Olsson (mnemo) wrote :
Revision history for this message
Martin Olsson (mnemo) wrote :
Revision history for this message
unggnu (unggnu) wrote :

The easiest way to test a nearly current Intel driver is to use the Debian sid driver in Hardy I guess. You have to remove the old i810 driver because the new one doesn't work parallel.
sudo apt-get remove xserver-xorg-video-i810 xserver-xorg-video-all
Download the current driver from http://mirrors.kernel.org/debian/pool/main/x/xserver-xorg-video-intel/xserver-xorg-video-intel_2.3.1-1_i386.deb and install it.
After that restart X and check if your problem is gone. If you have made changes to your xorg.conf generate a new one please (sudo dpkg-reconfigure xserver-xorg) or just remove it to be sure.

If you want to reset everything do the follow commands.
sudo apt-get remove xserver-xorg-video-intel
sudo apt-get install xserver-xorg-video-all

Afterwards you will have the standard hardy driver again.
Btw. I am pretty sure this doesn't work everytime but atm Sid and Ubuntu stable are similar from the libs and apps.

Revision history for this message
unggnu (unggnu) wrote :

The data mentioned in the Intel Howto was dedicated to upstream bug report. The data needed for an Xorg Ubuntu bug report is described here https://wiki.ubuntu.com/X/Reporting and except of the backtrace you have posted any needed data afaik.

Revision history for this message
Martin Olsson (mnemo) wrote :

I tried upgrading to the new driver. One happy moment was that I now get textured video (so they I move totem the video surface now moves with it), that's just awesome. One not so good thing though is that this new driver consumes HUGE amounts of CPU. When I play a 320x240 pixel .OGG file (which was downloaded from youtube), my computer grinds away at 100% CPU just playing that single video (check out the attached screenshot). This is a dual core 2GHz, 2GB machine and with the previous driver playing that same .OGG file was barely noticable on the CPU meter.

I will see how long I can last using this new driver. I'll keep it for a few days though for sure, to see if this freeze bug happens again.

Thanks for responding so quickly unggnu!

I also have another, non-related idea; instead of asking users to complete these long lists of tasks, as seen here:
https://wiki.ubuntu.com/X/Reporting#head-8c2902c284112c960b29386ea3d2f97fd3c57109
and here:
http://intellinuxgraphics.org/how_to_report_bug.html

Maybe someone could create a script that gathers and submits all the necessary information? I know that the ALSA project did this for sound cards and it has been VERY successful for them in getting more issues resolved with less effort. Check out their script here:
http://hg.alsa-project.org/alsa/raw-file/2ea9a8a108ea/alsa-info.sh

If this bug repros with the new driver I will post about it here right away. Again, thanks for the help so far!

Revision history for this message
unggnu (unggnu) wrote :

I was very unhappy that upstream made textured video standard while it isn't really usable. That's why still overlay is used in Ubuntu release afaik.
I guess the best thing to test it is to let the pc just run with the screensaver enabled.
Btw. do you have added INTEL_BATCH=1 to environment file or somewhere else? It is known to break things.

Revision history for this message
Martin Olsson (mnemo) wrote :

I don't have INTEL_BATCH=1 set (I basically don't have any non-standard configuration options available, just pure ubuntu default. The only change I did was to activate evdev for my X.org input (makes my mouse smooth and low-latency).

I really hope they can get textured video optimized a little bit, it's a very nice thing to have. I've also noticed some other glitches though, like for instance the screensaver preview dialog is rendering the preview surface at a fixed location even though I move around the gtk window, and also there is some random pixel dirt shown temporarily when I launch the firefox window.

Anyway, I just figured out that I can attach gdb _BEFORE_ the hang and see if that yields a backtrace (not sure why I didn't think of that in the beginning). So I got "sudo gdb" started from a ssh client now and I got the screensaver chewing away. Of course, this bug only happens once a week or so it's probably going to take a long time for me before I get a shoot at capturing a backtrace (and then maybe the symbols will be screwed up or so, even though I have installed a bunch of dbg packages specified on the Ubuntu X debugging webpage). I might even revert back to the original driver and then try the "have gdb attach" trick on it. I'll post again when I have more info.

Revision history for this message
In , Gordon Jin (gordon-jin) wrote :

Is "sudo not working" a side effect of the weekly hang, or always an issue on your system?

It's really hard to debug such a hard-to-reproduce issue. Anyway you can try the latest upstream driver with the guide at http://www.intellinuxgraphics.org/install.html.

Revision history for this message
unggnu (unggnu) wrote : Re: periodic Xorg crashes w/ high cpu usage after OpenGL screensaver [965GM]

Some glitches maybe come frome the fact the MigrationHeuristic isn't greedy per default with the Debian driver.
If you want to get a backtrace please reinstall the standard driver like describe above and the dbg packages mentioned in the backtrace howto. You can only get a backtrace if you use ssh from a remote system or something similar.

Revision history for this message
In , Martin Olsson (mnemo) wrote :

"sudo not working" is a side effect of this bug. I usually don't have any problems running sudo and I can normally also attach to X.org with symbols etc no problems.

I know this is a hard bug to fix, anyway to make a long story short; yesterday I came up with two ideas that when combined allowed me to make progress on this bug:

A) attach "sudo gdb" before the crash happens and have it attached all along
B) turn off the "blank screen in 20 minutes" which makes the screensaver run forever.

The first trick made it possible for me to get a backstrace, the second trick made this bug A LOT easier to repro (when I woke up this morning it was hung again).

I'm going to do some additional testing during this week. I really recommend you to activate the busy spheres screensaver and turn of "blank screen" in power options. This is a great way to stress the driver.

Revision history for this message
In , Martin Olsson (mnemo) wrote :

(read the ubuntu bug report for details on the backtrace etc)

Revision history for this message
Martin Olsson (mnemo) wrote : Re: periodic Xorg crashes w/ high cpu usage after OpenGL screensaver [965GM]

After about an hour of CPU hogging pain with the new driver I reverted to the old Hardy driver. I then change my power options to say that "Never blank out screen" and I also set my screensaver timeout to 1 minute. Then I went to sleep. I just woke up and this bug had reproduced. I believe I can actually get this bug to repro whenever I want, the only thing I have to do is to leave the busy spheres screensaver on without setting the screen to blank out.

When I got back to the machine this morning I found it hung but the graphics was not yet corrupted. When I checked the already attached gdb from ssh the program had received a SIGABRT with the following backtrace:

(gdb) bt full
#0 0xb7ef5410 in __kernel_vsyscall ()
No symbol table info available.
#1 0xb7c98085 in raise () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2 0xb7c99a01 in abort () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#3 0xb7c9110e in __assert_fail () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#4 0xa75f0d68 in bm_fake_NotifyContendedLockTake () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#5 0xa75f6871 in LOCK_HARDWARE () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#6 0xa760fff5 in ?? () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#7 0x0840e880 in ?? ()
No symbol table info available.
#8 0xb7bd7a0c in __pthread_mutex_unlock_usercnt () from /lib/tls/i686/cmov/libpthread.so.0
No symbol table info available.
#9 0xa7610545 in brw_draw_prims () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#10 0xa76ac96c in ?? () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#11 0x0840e880 in ?? ()
No symbol table info available.
#12 0x08862030 in ?? ()
No symbol table info available.
#13 0xbfe30d30 in ?? ()
No symbol table info available.
#14 0x00000001 in ?? ()
No symbol table info available.
#15 0x00000000 in ?? ()
No symbol table info available.
(gdb)

At this point I used the "c" command to resume execution and the computer then went into the corrupted graphics defect mode and then it printed this in gdb:

Program terminated with signal SIGABRT, Aborted.
The program no longer exists.

After this the computer hung in exactly the same way that I was originally. Even could I could not execute anymore commands in gdb, I could clearly see the X.org process still alive and taking 100% CPU.

Revision history for this message
In , Martin Olsson (mnemo) wrote :

I just repro'ed this bug using the xserver-xorg-video-intel_2.3.1-1_i386.deb driver as well.

Revision history for this message
Martin Olsson (mnemo) wrote : Re: periodic Xorg crashes w/ high cpu usage after OpenGL screensaver [965GM]

This morning after I found the hang associated with the backtrace above, I restarted X and I upgraded to the new driver again. I attached GDB and I left for work. Now, I just got home and I found my machine hung once again. The backtrace for the crash in the new driver shows the exact same abort() call.

This confirms that this bug is not only present in the hardy driver version but it's also still not fixed in the upstream version "xserver-xorg-video-intel_2.3.1-1_i386.deb".

Revision history for this message
In , unggnu (unggnu) wrote :

Just for the records:

Uploaded files:
http://launchpadlibrarian.net/14699416/DSC_0053.jpg (Screenshot of the frozen screen)
http://launchpadlibrarian.net/14699465/Xorg.0.log
http://launchpadlibrarian.net/14701101/dmesg.log
http://launchpadlibrarian.net/14701110/lspci.log

Backtrace against the Ubuntu 8.04 Intel Driver 2:2.2.1-1ubuntu13

(gdb) bt full
#0 0xb7ef5410 in __kernel_vsyscall ()
No symbol table info available.
#1 0xb7c98085 in raise () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2 0xb7c99a01 in abort () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#3 0xb7c9110e in __assert_fail () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#4 0xa75f0d68 in bm_fake_NotifyContendedLockTake () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#5 0xa75f6871 in LOCK_HARDWARE () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#6 0xa760fff5 in ?? () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#7 0x0840e880 in ?? ()
No symbol table info available.
#8 0xb7bd7a0c in __pthread_mutex_unlock_usercnt () from /lib/tls/i686/cmov/libpthread.so.0
No symbol table info available.
#9 0xa7610545 in brw_draw_prims () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#10 0xa76ac96c in ?? () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#11 0x0840e880 in ?? ()
No symbol table info available.
#12 0x08862030 in ?? ()
No symbol table info available.
#13 0xbfe30d30 in ?? ()
No symbol table info available.
#14 0x00000001 in ?? ()
No symbol table info available.
#15 0x00000000 in ?? ()
No symbol table info available.
(gdb)

Revision history for this message
Martin Olsson (mnemo) wrote : Re: periodic Xorg crashes w/ high cpu usage after OpenGL screensaver [965GM]

I've tried a couple of more things now:
* liveCD with busySpheres screensaver has bug
* liceCD with bioF screensaver has bug

This means that we can rule out any special configuration I have on my ubuntu HHD installation. It also means we can rule out a problem with the particular screensaver.

Also, I now have a solid set of repro steps which can most likely be used on any machine with intel 965GM hardware:

1. Boot hardy live CD
2. System::Open Preferences::Power Management and switch to the "When on AC power" tab, then set the "Blank out screen after" to "Never".
3. System::Open Preferences::Screen Saver and set it to bioF.
4. Wait for a couple of hours while screensaver is working.

Actual results:
X.org hangs with high CPU hogging.

Expected results:
Screensaver should continue to run until user gets back and then go away gracefully when the mouse is moved.

------------------------------------------------------------

Now, I've looked carefully at "top" while this is happening and I believe that what happens is that the first X.org instance exits through the abort() function. Then for some reason another X.org instance starts and it's that second instance which causes the CPU hogging hang. For instance, during the 100% CPU hang, if I run "kill -9 `pidof X`" then X.org just continues to hogg CPU at 100% but it changes PID.

When I saw this I wanted to debug the startup of X.org and stop the process a couple of times in gdb in order to found out the exact loop where it's spinning, but this failed. I was able to launch "gdb X" from a su prompt and I saw the usual startup lines like "(II) blah blah" and so on. However, once X enters the CPU hogging loop it's not possible to use CTRL-C in gdb, this keystroke is just being ignored.

I have two questions:

* Where is the mechanisms that restarts X.org all the time? Is it a parent process or a shell script or what?
* How can I disable the automatic restarting of X.org temporarily?

Revision history for this message
unggnu (unggnu) wrote :

It seems that you haven't installed the dbg packages. Please try it with the standard Hardy driver and install the packages xserver-xorg-core-dbg and xserver-xorg-video-intel-dbg. After that you should get a proper backtrace.

unggnu (unggnu)
Changed in xserver-xorg-video-intel:
status: Incomplete → Confirmed
Revision history for this message
unggnu (unggnu) wrote :

Please also attach the file /var/log/Xorg.0.log.old directly after reboot when the computer hangs before. Maybe there are some important messages while X is crashing.
Btw. the upstream devs are the fastest one to fix such an problem so I would also upload important data to the upstream report instead of just pointing to the Ubuntu one.
But thanks for gathering all the important information so fast.

Revision history for this message
In , Martin Olsson (mnemo) wrote :

When I boot up the first time after this crash I see the following in the end of /var/log/Xorg.0.log.old:

(WW) intel(0): ESR is 0x00000001
(WW) intel(0): PRB0_CTL (0x0001f001) indicates ring buffer enabled
(WW) intel(0): PRB0_HEAD (0x95e1c1a0) and PRB0_TAIL (0x0001fd30) indicate ring buffer not flushed
(WW) intel(0): Existing errors found in hardware state.

A complete copy of the Xorg.0.log.old file taking on first boot after crash is available here:

http://launchpadlibrarian.net/14752765/Xorg.0.log.old

Revision history for this message
Martin Olsson (mnemo) wrote : Re: periodic Xorg crashes w/ high cpu usage after OpenGL screensaver [965GM]

Ah, there was in fact some information related to the crash in the /var/log/Xorg.0.log.old file! I'm attaching it now. It's not much but the end of it says:

(WW) intel(0): ESR is 0x00000001
(WW) intel(0): PRB0_CTL (0x0001f001) indicates ring buffer enabled
(WW) intel(0): PRB0_HEAD (0x95e1c1a0) and PRB0_TAIL (0x0001fd30) indicate ring buffer not flushed
(WW) intel(0): Existing errors found in hardware state.

Revision history for this message
Martin Olsson (mnemo) wrote :

unggnu, are you able to answer the two questions I posted above about why/how X.org automatically restarts?

Revision history for this message
unggnu (unggnu) wrote :

Sorry, I have over read it. The restarting process is Gdm, your login manager. You can stop Gdm after console login with the command "sudo /etc/init.d/gdm stop" but you have to start X manually afterwards. I guess the command "startx gnome" should do it but I haven't tested it.

Revision history for this message
In , Martin Olsson (mnemo) wrote :

I downloaded and burned Ubuntu Gutsy Gibbon yesterday and I ran an overnight stress test during the busy spheres screensaver. Often the hardy intel driver won't even last 60 minutes using busy spheres and the gutsy driver ran for the entire night and it's still running this morning with no sign of the bug.

Therefore I believe that, with a pretty high probability, this bug is a regression. The regression range for it, although very wide, is:

It worked fine with the "2:2.1.1-0ubuntu9" package and the bug had already been introduced in ""2:2.2.1-1ubuntu12".

However, there is still a chance that the bug was still present in both drivers, but that some other non-driver change in ubuntu between gutsy and hardy made the driver run through a new code path etc. One such change that comes to mind was the fact that compiz was enabled for my graphics card between gutsy and hardy. It was previously blacklisted because video didn't work well when compiz was active.

Further, the fact that it runs on gutsy also makes it pretty unlikely that the graphics card in this machine is defect somehow.

Revision history for this message
unggnu (unggnu) wrote : Re: periodic Xorg crashes w/ high cpu usage after OpenGL screensaver [965GM]

Just for the records you can use Overlay with the Debnian SID driver with mplayer -vo xv:port=<n> where <n> is the port number you get from xvinfo for the video overlay adaptor.

Revision history for this message
In , Martin Olsson (mnemo) wrote :

When I turn off compiz I can run the same screensavers without problems while using the 2:2.2.1-1ubuntu12 driver.

When I tried 2:2.1.1-0ubuntu9 before, I did it with the gutsy gibbon live CD and that has compiz turned off by default (so it was kind of a bad test, it's entirely possible that the 2:2.1.1-0ubuntu9 version is also affected).

Revision history for this message
Martin Olsson (mnemo) wrote : Re: periodic Xorg crashes w/ high cpu usage after OpenGL screensaver [965GM]

Ah, thanks for the tip unggnu, that's a nice workaround for textured video.

By the way, I was able to figure out why my backtrace didn't have full symbols. The two dbg packages you mentioned does not cover everything, it's also necessary to install libgl1-mesa-dri-dbg in order to get symbols for i965_dri.so functions. So next time you see an intel x.org driver bug, tell them to install these dbg packages:

    xserver-xorg-core-dbg xserver-xorg-video-intel-dbg libgl1-mesa-dri-dbg

Thirdly, I also discovered one more clue about this bug. I tried to turn off Compiz and then the bug doesn't repro at all. I ran the screensaver for three days and three nights, no sign of the bug. When I turned compiz back on the bug happened again after just 2 hours.

Revision history for this message
unggnu (unggnu) wrote :

Could you upload the whole backtrace with symbols then please?
I guess the Intel devs are mostly interested in the 2.3.1 one. The dbg package for this driver is available at http://mirrors.kernel.org/debian/pool/main/x/xserver-xorg-video-intel/xserver-xorg-video-intel-dbg_2.3.1-1_i386.deb .

Revision history for this message
In , Martin Olsson (mnemo) wrote :

FWIW, the game gunroar also seems to trigger this bug, I can rarely play more than say 15 minutes before my entire X freezes with the exact same graphics defects etc.

Use "sudo apt-get install gunroar" if you want to try it.

Revision history for this message
rysiek (mikiwoz) wrote : Re: periodic Xorg crashes w/ high cpu usage after OpenGL screensaver [965GM]

I can confirm this bug, been having it in gutsy too. for some time it seemed to be less frequent, then - about a week ago - an update for kernel came that made this bug readily reproducible again.

my system:
Kubuntu Hardy
Intel GMA965/X3100, default drivers
electricsheep screensaver.

I seem to be having those hangups not only after/while the screensaver is running, but also sometimes just after loading KDE or plugging-in/disconnecting the power chord.

more details (dmesg, lspci, xorg.conf) coming up tomorrow.

Revision history for this message
Martin Olsson (mnemo) wrote :

I was wrong about the improved stacktrace, when I looked carefully at it I noticed that for some reason I'm getting symbols for some addresses in the DRI module but not all addresses. I find this a little bit strange. Anyway, I can also repro this bug using the 2.3.1 driver that bryce provided a test .deb for here:
http://people.ubuntu.com/~bryce/Testing/intel/hardy-i386/xserver-xorg-video-intel_2.3.1-1ubuntu1_i386.deb
Below is the backtrace (slightly different) that I'm getting with the new 2.3.1 version of the driver:

Program received signal SIGABRT, Aborted.
0xb7f2d410 in __kernel_vsyscall ()
(gdb) bt full
#0 0xb7f2d410 in __kernel_vsyscall ()
No symbol table info available.
#1 0xb7cd0085 in raise () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2 0xb7cd1a01 in abort () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#3 0xb7cc910e in __assert_fail () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#4 0xa7620d68 in bm_fake_NotifyContendedLockTake () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#5 0xa7626871 in LOCK_HARDWARE () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#6 0xa763fff5 in ?? () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#7 0xa7640545 in brw_draw_prims () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#8 0xa76dc96c in ?? () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#9 0xb7b55686 in __glXDisp_DrawArrays () from /usr/lib/xorg/modules/extensions//libglx.so
No symbol table info available.
#10 0xb7b3332d in DoRender () from /usr/lib/xorg/modules/extensions//libglx.so
No symbol table info available.
#11 0xb7b3344c in __glXDisp_Render () from /usr/lib/xorg/modules/extensions//libglx.so
No symbol table info available.
#12 0xb7b37996 in __glXDispatch () from /usr/lib/xorg/modules/extensions//libglx.so
No symbol table info available.
#13 0x081506de in ?? ()
No symbol table info available.
#14 0x0808d8df in Dispatch ()
No symbol table info available.
#15 0x0807471b in main ()
No symbol table info available.
(gdb)

Bryce Harrington (bryce)
Changed in xserver-xorg-video-intel:
status: Confirmed → Triaged
Revision history for this message
In , Martin Olsson (mnemo) wrote :

Today I got a new version (2:2.2.1-1ubuntu13.6) of the intel driver and also the intel-dbg package and so I decided to see if this bug still exists. The bug is still easily reproducible unfortunately. However, the new dbg package seems to provide a much better stacktrace (now I also get some, but not all, DRI function names which I think could be useful):

Program received signal SIGABRT, Aborted.
[Switching to Thread 0xb7c2ea30 (LWP 6312)]
0xb7f56410 in __kernel_vsyscall ()
(gdb) bt full
#0 0xb7f56410 in __kernel_vsyscall ()
No symbol table info available.
#1 0xb7cf8085 in raise () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2 0xb7cf9a01 in abort () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#3 0xb7cf110e in __assert_fail () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#4 0xa7650d68 in bm_fake_NotifyContendedLockTake ()
   from /usr/lib/dri/i965_dri.so
No symbol table info available.
#5 0xa7656871 in LOCK_HARDWARE () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#6 0xa7656391 in ?? () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#7 0xa77ae76f in _mesa_resizebuffers () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#8 0xa7693e3c in _mesa_make_current () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#9 0xa7656add in intelMakeCurrent () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#10 0xa764cd3a in ?? () from /usr/lib/dri/i965_dri.so
No symbol table info available.
---Type <return> to continue, or q <return> to quit---
#11 0xb7b94d2a in __glXDRIcontextForceCurrent ()
   from /usr/lib/xorg/modules/extensions//libglx.so
No symbol table info available.
#12 0xb7b5f506 in __glXForceCurrent ()
   from /usr/lib/xorg/modules/extensions//libglx.so
No symbol table info available.
#13 0xb7b5b2b7 in DoRender () from /usr/lib/xorg/modules/extensions//libglx.so
No symbol table info available.
#14 0xb7b5b44c in __glXDisp_Render ()
   from /usr/lib/xorg/modules/extensions//libglx.so
No symbol table info available.
#15 0xb7b5f996 in __glXDispatch ()
   from /usr/lib/xorg/modules/extensions//libglx.so
No symbol table info available.
#16 0x081506ee in ?? ()
No symbol table info available.
#17 0x0808d8df in Dispatch ()
No symbol table info available.
#18 0x0807471b in main ()
No symbol table info available.

Revision history for this message
Martin Olsson (mnemo) wrote :

Today I got a new version (2:2.2.1-1ubuntu13.6) of the intel driver and also the intel-dbg package and so I decided to see if this bug still exists. The bug is still easily reproducible unfortunately. However, to my delight the new dbg package seems to provide a much better stacktrace (now I also get some, but not all, DRI function names which I think could be useful):

Program received signal SIGABRT, Aborted.
[Switching to Thread 0xb7c2ea30 (LWP 6312)]
0xb7f56410 in __kernel_vsyscall ()
(gdb) bt full
#0 0xb7f56410 in __kernel_vsyscall ()
No symbol table info available.
#1 0xb7cf8085 in raise () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2 0xb7cf9a01 in abort () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#3 0xb7cf110e in __assert_fail () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#4 0xa7650d68 in bm_fake_NotifyContendedLockTake ()
   from /usr/lib/dri/i965_dri.so
No symbol table info available.
#5 0xa7656871 in LOCK_HARDWARE () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#6 0xa7656391 in ?? () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#7 0xa77ae76f in _mesa_resizebuffers () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#8 0xa7693e3c in _mesa_make_current () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#9 0xa7656add in intelMakeCurrent () from /usr/lib/dri/i965_dri.so
No symbol table info available.
#10 0xa764cd3a in ?? () from /usr/lib/dri/i965_dri.so
No symbol table info available.
---Type <return> to continue, or q <return> to quit---
#11 0xb7b94d2a in __glXDRIcontextForceCurrent ()
   from /usr/lib/xorg/modules/extensions//libglx.so
No symbol table info available.
#12 0xb7b5f506 in __glXForceCurrent ()
   from /usr/lib/xorg/modules/extensions//libglx.so
No symbol table info available.
#13 0xb7b5b2b7 in DoRender () from /usr/lib/xorg/modules/extensions//libglx.so
No symbol table info available.
#14 0xb7b5b44c in __glXDisp_Render ()
   from /usr/lib/xorg/modules/extensions//libglx.so
No symbol table info available.
#15 0xb7b5f996 in __glXDispatch ()
   from /usr/lib/xorg/modules/extensions//libglx.so
No symbol table info available.
#16 0x081506ee in ?? ()
No symbol table info available.
#17 0x0808d8df in Dispatch ()
No symbol table info available.
#18 0x0807471b in main ()
No symbol table info available.

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

so it's a bug in mesa, which provides the dri drivers.

Revision history for this message
In , Michael Fu (michael-fu-intel) wrote :

haihao, would you please try to see if you can still reproduce this on your 965GM machine? otherwise, we may close this bug... thanks.

Revision history for this message
In , Haihao-xiang (haihao-xiang) wrote :

I can't reproduce it with the latest drivers, so I mark this bug as fixed. If you still experience this issue with the latest drivers, feel free to reopen it.

Bryce Harrington (bryce)
description: updated
Changed in xorg-server:
status: Confirmed → Fix Released
Revision history for this message
Bryce Harrington (bryce) wrote :

Upstream indicates this can no longer be reproduced for quite some time, so marking it fixed now.

Changed in mesa (Ubuntu):
status: Triaged → Fix Released
Changed in xorg-server:
importance: Unknown → Medium
Changed in xorg-server:
importance: Medium → Unknown
Changed in xorg-server:
importance: Unknown → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.