X.org XServer - ATI gfx chipset driver

compiz crashes gnome desktop using default ati driver (radeon X600)

Reported by phord on 2009-04-27
52
This bug affects 7 people
Affects Status Importance Assigned to Milestone
xserver-xorg-driver-ati
Fix Released
High
mesa (Ubuntu)
High
Unassigned
Jaunty
Undecided
Unassigned

Bug Description

[Impact]

Fixes frequent xorg segv's for radeon r300/r400 users introduced as a "severe regression since jaunty".
The patch prevents memory from being used after it's free'd and it has been confirmed
to solve two upstream bugs (the linked one plus another bug that Michel Dänzer mentions in the upstream bug report).

Examples of hardware that is known to be affected:
01:00.0 VGA compatible controller [0300]: ATI Technologies Inc RV380 [Radeon
X600 (PCIE)] [1002:5b62]
01:00.0 VGA compatible controller [0300]: ATI Technologies Inc RV350 AP [Radeon
9600] [1002:4150]

[Karmic - Development]
This same patch has been uploaded to Karmic. Also, we will soon be updating to a newer version of mesa which already includes this change.

[Jaunty - Stable]
This patch, a cherrypick from the upstream release, is proposed for jaunty:
http://launchpadlibrarian.net/26309613/lp368049_jaunty_proposed.debdiff

[Testcase]
1. boot jaunty on the affected hardware
2. install "compizconfig-settings-manager"
3. activate the "ring switcher" plugin (launch settings manager from system::preferences) and make sure it's Initiate keybinding is SUPER+TAB
4. hold down SUPER-TAB for 2-3 seconds so that the windows swirl around full speed.
5. SEGV in xorg, back to gdm.

(there are very likely several other compiz operations/transformations that
trigger the crash but the ring switcher seems to be the most reliable repro.
For example in the upstream bug report one uses reported he had a crash that
happened when he was "constantly resizing a gnome-terminal" and that this crash
stopped happening when he installed the testing DEB that held the patch in this SRU).

[Regression Potential]
The scope of the patch is limited by the fact that the code changes are against the r300* files.

The patch changes the handler of a texstate pointer; like any change involving pointers this carries some regression risk, but probably no worse than the existing bug. In any case, this patch should receive extra testing by r300/r400 users before deployment.

[Original Bug Report]

I have Compiz configured for Ring Switcher. When I hold Alt-Tab and get windows swirling around a bunch, gnome eventually crashes and restarts. I find this in syslog:

Apr 26 19:38:22 ipsn-hordp2 kernel: [ 1180.965338] [drm] Num pipes: 1
Apr 26 19:38:22 ipsn-hordp2 x-session-manager[4016]: WARNING: Detected that screensaver has left the bus
Apr 26 19:38:22 ipsn-hordp2 gdm[3506]: WARNING: gdm_slave_xioerror_handler: Fatal X error - Restarting :0
Apr 26 19:38:23 ipsn-hordp2 acpid: client connected from 5037[0:0]
Apr 26 19:38:24 ipsn-hordp2 kernel: [ 1182.932084] [drm] Setting GART location based on new memory map
Apr 26 19:38:24 ipsn-hordp2 kernel: [ 1182.933712] [drm] Loading R300 Microcode
Apr 26 19:38:24 ipsn-hordp2 kernel: [ 1182.933752] [drm] Num pipes: 1
Apr 26 19:38:24 ipsn-hordp2 kernel: [ 1182.933760] [drm] writeback test succeeded in 1 usecs
Apr 26 19:38:32 ipsn-hordp2 pulseaudio[5209]: pid.c: Stale PID file, overwriting.
Apr 26 19:38:34 ipsn-hordp2 pulseaudio[5209]: module-x11-xsmp.c: X11 session manager not running.
Apr 26 19:38:34 ipsn-hordp2 pulseaudio[5209]: module.c: Failed to load module "module-x11-xsmp" (argument: ""): initialization failed.
Apr 26 19:38:35 ipsn-hordp2 NetworkManager: <info> Unmanaged Device found; state CONNECTED forced. (see http://bugs.launchpad.net/bugs/191889)
Apr 26 19:40:01 ipsn-hordp2 /USR/SBIN/CRON[5468]: (root) CMD ([ -x /usr/sbin/update-motd ] && /usr/sbin/update-motd 2>/dev/null)
Apr 26 19:42:23 ipsn-hordp2 kernel: [ 1421.504431] [drm] Num pipes: 1

$ lsb_release -rd
Description: Ubuntu 9.04
Release: 9.04

$ apt-cache policy xorg gdm
xorg:
  Installed: 1:7.4~5ubuntu18
  Candidate: 1:7.4~5ubuntu18
  Version table:
 *** 1:7.4~5ubuntu18 0
        500 cdrom://Ubuntu 9.04 _Jaunty Jackalope_ - Release i386 (20090420.1) jaunty/main Packages
        500 http://us.archive.ubuntu.com jaunty/main Packages
        100 /var/lib/dpkg/status
gdm:
  Installed: 2.20.10-0ubuntu2
  Candidate: 2.20.10-0ubuntu2
  Version table:
 *** 2.20.10-0ubuntu2 0
        500 cdrom://Ubuntu 9.04 _Jaunty Jackalope_ - Release i386 (20090420.1) jaunty/main Packages
        500 http://us.archive.ubuntu.com jaunty/main Packages
        100 /var/lib/dpkg/status

ProblemType: Bug
Architecture: i386
DistroRelease: Ubuntu 9.04
Package: xorg 1:7.4~5ubuntu18
ProcEnviron:
 SHELL=/bin/zsh
 PATH=(custom, user)
 LANG=en_US.UTF-8
ProcVersion: Linux version 2.6.28-11-generic (buildd@palmer) (gcc version 4.3.3 (Ubuntu 4.3.3-5ubuntu4) ) #42-Ubuntu SMP Fri Apr 17 01:57:59 UTC 2009
SourcePackage: xorg
Uname: Linux 2.6.28-11-generic i686

Created an attachment (id=19724)
Another log displaying the same crash

The crashes are occurring again. Log attached.

Can you get a backtrace with gdb?

(In reply to comment #2)
> Can you get a backtrace with gdb?
>

I tried attaching gdb to a running X session, and directing logging to a file. After a crash, I got this:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1209039088 (LWP 6417)]
0x081167ea in miSpriteSourceValidate (pDrawable=0xd0c5d58, x=1, y=134693389,
    width=136110148, height=187747912) at misprite.c:423
423 SCREEN_PROLOGUE (pScreen, SourceValidate);
Detaching from program: /home/alex/xserver/bin/Xorg, process 6417
Quitting: ptrace: No such process.

Created an attachment (id=19890)
Full gdb log until crash

(In reply to comment #3)
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread -1209039088 (LWP 6417)]
> 0x081167ea in miSpriteSourceValidate (pDrawable=0xd0c5d58, x=1, y=134693389,
> width=136110148, height=187747912) at misprite.c:423
> 423 SCREEN_PROLOGUE (pScreen, SourceValidate);
> Detaching from program: /home/alex/xserver/bin/Xorg, process 6417
> Quitting: ptrace: No such process.

Did you explicitly detach after the SIGSEGV, or did that happen automatically? If you get a prompt after the SIGSEGV, enter

bt full

to get a detailed backtrace.

Created an attachment (id=20818)
Crash with week-old git tree

This is a crash from a week-old git compile.

I must add that almost all the times I have seen the crash, emerald quits first, without leaving any backtrace, and has to be restarted.

Created an attachment (id=21406)
Crash with git tree from December 18, 2008

Created an attachment (id=21726)
GDB log with more detailed backtrace

This time I managed to attach to the X server via ssh before the crash and log a full backtrace. Backtrace attached.

Created an attachment (id=21727)
Xorg.0.log after X server crashes

Log file from the crash with GDB and detailed trace. Apart from this, it is identical to the previous log.

Created an attachment (id=22115)
Debug patch with checks for invalid pointers

I tried making this patch to check whether this fixes the bug, or at least shows any messages. Now I am getting a different crash.

Created an attachment (id=22116)
New log file with debug patch, after crash

This is the log file after the crash, with the debug patch applied.

Is there anything else I should be doing? The previous patch was made from an educated guess at where the first crash was. What do you think of this patch?

So, texUnit->CurrentRect is NULL. That should never happen (unless the context is being torn down/deleted). The "Current" texture object pointers should never be null. They should either point to the texture that the user bound with glBindTexture() or should point to the default texture objects in the ctx->Shared state.

I'm afraid the patch is just hiding the real issue elsewhere.

This may be a reference counting bug somewhere. I could add some assertions to try to narrow it down. I'll check them into git ASAP. I probably won't hold up the 7.3 release though unless we can make progress on this today.

phord (hordp) wrote :

Binary package hint: xorg

I have Compiz configured for Ring Switcher. When I hold Alt-Tab and get windows swirling around a bunch, gnome eventually crashes and restarts. I find this in syslog:

Apr 26 19:38:22 ipsn-hordp2 kernel: [ 1180.965338] [drm] Num pipes: 1
Apr 26 19:38:22 ipsn-hordp2 x-session-manager[4016]: WARNING: Detected that screensaver has left the bus
Apr 26 19:38:22 ipsn-hordp2 gdm[3506]: WARNING: gdm_slave_xioerror_handler: Fatal X error - Restarting :0
Apr 26 19:38:23 ipsn-hordp2 acpid: client connected from 5037[0:0]
Apr 26 19:38:24 ipsn-hordp2 kernel: [ 1182.932084] [drm] Setting GART location based on new memory map
Apr 26 19:38:24 ipsn-hordp2 kernel: [ 1182.933712] [drm] Loading R300 Microcode
Apr 26 19:38:24 ipsn-hordp2 kernel: [ 1182.933752] [drm] Num pipes: 1
Apr 26 19:38:24 ipsn-hordp2 kernel: [ 1182.933760] [drm] writeback test succeeded in 1 usecs
Apr 26 19:38:32 ipsn-hordp2 pulseaudio[5209]: pid.c: Stale PID file, overwriting.
Apr 26 19:38:34 ipsn-hordp2 pulseaudio[5209]: module-x11-xsmp.c: X11 session manager not running.
Apr 26 19:38:34 ipsn-hordp2 pulseaudio[5209]: module.c: Failed to load module "module-x11-xsmp" (argument: ""): initialization failed.
Apr 26 19:38:35 ipsn-hordp2 NetworkManager: <info> Unmanaged Device found; state CONNECTED forced. (see http://bugs.launchpad.net/bugs/191889)
Apr 26 19:40:01 ipsn-hordp2 /USR/SBIN/CRON[5468]: (root) CMD ([ -x /usr/sbin/update-motd ] && /usr/sbin/update-motd 2>/dev/null)
Apr 26 19:42:23 ipsn-hordp2 kernel: [ 1421.504431] [drm] Num pipes: 1

$ lsb_release -rd
Description: Ubuntu 9.04
Release: 9.04

$ apt-cache policy xorg gdm
xorg:
  Installed: 1:7.4~5ubuntu18
  Candidate: 1:7.4~5ubuntu18
  Version table:
 *** 1:7.4~5ubuntu18 0
        500 cdrom://Ubuntu 9.04 _Jaunty Jackalope_ - Release i386 (20090420.1) jaunty/main Packages
        500 http://us.archive.ubuntu.com jaunty/main Packages
        100 /var/lib/dpkg/status
gdm:
  Installed: 2.20.10-0ubuntu2
  Candidate: 2.20.10-0ubuntu2
  Version table:
 *** 2.20.10-0ubuntu2 0
        500 cdrom://Ubuntu 9.04 _Jaunty Jackalope_ - Release i386 (20090420.1) jaunty/main Packages
        500 http://us.archive.ubuntu.com jaunty/main Packages
        100 /var/lib/dpkg/status

ProblemType: Bug
Architecture: i386
DistroRelease: Ubuntu 9.04
Package: xorg 1:7.4~5ubuntu18
ProcEnviron:
 SHELL=/bin/zsh
 PATH=(custom, user)
 LANG=en_US.UTF-8
ProcVersion: Linux version 2.6.28-11-generic (buildd@palmer) (gcc version 4.3.3 (Ubuntu 4.3.3-5ubuntu4) ) #42-Ubuntu SMP Fri Apr 17 01:57:59 UTC 2009
SourcePackage: xorg
Uname: Linux 2.6.28-11-generic i686

phord (hordp) wrote :
Martin Olsson (mnemo) wrote :

XorgOld has backtrace in it:
Backtrace:
0: /usr/X11R6/bin/X(xorg_backtrace+0x3b) [0x813518b]
1: /usr/X11R6/bin/X(xf86SigHandler+0x55) [0x80c7be5]
2: [0xb80c0400]
3: /usr/lib/dri/r300_dri.so(_mesa_update_state_locked+0x832) [0xad542152]
4: /usr/lib/dri/r300_dri.so(_mesa_update_state+0x2a) [0xad54228a]
5: /usr/lib/dri/r300_dri.so(_mesa_GetIntegerv+0x278) [0xad6140c8]
6: /usr/lib/xorg/modules/extensions//libglx.so [0xb7a55132]
7: /usr/lib/xorg/modules/extensions//libglx.so [0xb7a472e8]
8: /usr/lib/xorg/modules/extensions//libglx.so [0xb7a461a7]
9: /usr/lib/xorg/modules/extensions//libglx.so [0xb7a4ad6a]
10: /usr/X11R6/bin/X(Dispatch+0x33f) [0x808d57f]
11: /usr/X11R6/bin/X(main+0x3bd) [0x80722ed]
12: /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe5) [0xb7c92775]
13: /usr/X11R6/bin/X [0x80717a1]
Saw signal 11. Server aborting.

affects: xorg (Ubuntu) → xserver-xorg-video-ati (Ubuntu)
Martin Olsson (mnemo) wrote :

I tried to follow the repro steps and I can trigger the bug as well using the same driver but a different ATI card. I will run apport-collect shortly to submit my xorg.log showing the SEGV etc.

Architecture: i386
DistroRelease: Ubuntu 9.04
Package: xserver-xorg-video-ati 1:6.12.1-0ubuntu2
PackageArchitecture: i386
ProcEnviron:
 SHELL=/bin/bash
 PATH=(custom, user)
 LANG=en_DK.UTF-8
ProcVersion: Linux version 2.6.28-11-generic (buildd@palmer) (gcc version 4.3.3 (Ubuntu 4.3.3-5ubuntu4) ) #42-Ubuntu SMP Fri Apr 17 01:57:59 UTC 2009
Uname: Linux 2.6.28-11-generic i686
UserGroups: adm admin audio cdrom dialout dip floppy fuse lpadmin plugdev video
Xrandr:

glxinfo:

setxkbmap:

xdpyinfo:

xkbcomp:

Martin Olsson (mnemo) wrote :
Martin Olsson (mnemo) wrote :
Martin Olsson (mnemo) wrote :
Martin Olsson (mnemo) wrote :
Martin Olsson (mnemo) wrote :
Martin Olsson (mnemo) wrote :
Martin Olsson (mnemo) wrote :
Martin Olsson (mnemo) wrote :
Martin Olsson (mnemo) wrote :

The stacktrace I saw using my ATI RV350 AP (Radeon 9600) [1002:4150] was:

0: /usr/X11R6/bin/X(xorg_backtrace+0x3b) [0x813518b]
1: /usr/X11R6/bin/X(xf86SigHandler+0x55) [0x80c7be5]
2: [0xb7f6f400]
3: /usr/lib/dri/r300_dri.so(_mesa_update_state_locked+0x832) [0xad3fa152]
4: /usr/lib/dri/r300_dri.so(_mesa_update_state+0x2a) [0xad3fa28a]
5: /usr/lib/dri/r300_dri.so(_mesa_GetIntegerv+0x278) [0xad4cc0c8]
6: /usr/lib/xorg/modules/extensions//libglx.so [0xb78f6132]
7: /usr/lib/xorg/modules/extensions//libglx.so [0xb78e82e8]
8: /usr/lib/xorg/modules/extensions//libglx.so [0xb78e71a7]
9: /usr/lib/xorg/modules/extensions//libglx.so [0xb78ebd6a]
10: /usr/X11R6/bin/X(Dispatch+0x33f) [0x808d57f]
11: /usr/X11R6/bin/X(main+0x3bd) [0x80722ed]
12: /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe5) [0xb7b39775]
13: /usr/X11R6/bin/X [0x80717a1]

Martin Olsson (mnemo) wrote :

Phord, great work with this bug. It's awesome that you found such an easy way to reproduce the crash!

Changed in xserver-xorg-driver-ati:
status: Unknown → Confirmed

*** Bug 17829 has been marked as a duplicate of this bug. ***

*** Bug 15809 has been marked as a duplicate of this bug. ***

*** Bug 20673 has been marked as a duplicate of this bug. ***

(In reply to comment #12)
> This may be a reference counting bug somewhere. I could add some assertions to
> try to narrow it down. I'll check them into git ASAP.

Are those assertions in now? I suppose distribution binaries will usually be built with assertions disabled though... Maybe somebody could try catching the problem with a gdb watchpoint or something like that.

Really easy repro steps for this bug is:

1. boot ubuntu jaunty final version on one of the affected systems
2. run "sudo apt-get install compizconfig-settings-manager"
3. launch the settings manager from system::preferences and activate the ring
switcher plugin
4. hold down SUPER+TAB so that the ring spins around full speed
5. xorg SEGV after like 2-3 seconds tops

I have triggered the bug using these steps on the following cards:

01:00.0 VGA compatible controller [0300]: ATI Technologies Inc RV380 [Radeon
X600 (PCIE)] [1002:5b62]
01:00.0 VGA compatible controller [0300]: ATI Technologies Inc RV350 AP [Radeon
9600] [1002:4150]

PS. I'd be willing to try patches with assertions or whatever and send back results.

Bryce Harrington (bryce) on 2009-04-28
tags: added: crash
Changed in xserver-xorg-video-ati (Ubuntu):
status: New → Confirmed
Martin Olsson (mnemo) on 2009-04-28
Changed in xserver-xorg-driver-ati:
status: Confirmed → Unknown
Changed in xserver-xorg-driver-ati:
status: Unknown → Confirmed

I'm also experiencing this bug.

My card is:

02:00.0 VGA compatible controller [0300]: ATI Technologies Inc R430 [Radeon X800 (PCIE)] [1002:554f]

Backtrace:
0: /usr/X11R6/bin/X(xorg_backtrace+0x3b) [0x813518b]
1: /usr/X11R6/bin/X(xf86SigHandler+0x55) [0x80c7be5]
2: [0xb7f22400]
3: /usr/lib/dri/r300_dri.so(_mesa_update_state_locked+0x832) [0xa53a6152]
4: /usr/lib/dri/r300_dri.so(_mesa_update_state+0x2a) [0xa53a628a]
5: /usr/lib/dri/r300_dri.so(_mesa_GetIntegerv+0x278) [0xa54780c8]
6: /usr/lib/xorg/modules/extensions//libglx.so [0xb78a1132]
7: /usr/lib/xorg/modules/extensions//libglx.so [0xb78932e8]
8: /usr/lib/xorg/modules/extensions//libglx.so [0xb78921a7]
9: /usr/lib/xorg/modules/extensions//libglx.so [0xb7896d6a]
10: /usr/X11R6/bin/X(Dispatch+0x33f) [0x808d57f]
11: /usr/X11R6/bin/X(main+0x3bd) [0x80722ed]
12: /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe5) [0xb7af5775]
13: /usr/X11R6/bin/X [0x80717a1]
Saw signal 11. Server aborting.

AlckO (alckox) wrote :

Same error for me on Radeon X700, X crashes randomly when I click on a button for example
Ubuntu 9.04

Backtrace:
0: /usr/X11R6/bin/X(xorg_backtrace+0x3b) [0x813518b]
1: /usr/X11R6/bin/X(xf86SigHandler+0x55) [0x80c7be5]
2: [0xb7f4b400]
3: /usr/lib/dri/r300_dri.so(_mesa_update_state_locked+0x832) [0xa53cb152]
4: /usr/lib/dri/r300_dri.so(_mesa_update_state+0x2a) [0xa53cb28a]
5: /usr/lib/dri/r300_dri.so(_mesa_GetIntegerv+0x278) [0xa549d0c8]
6: /usr/lib/xorg/modules/extensions//libglx.so [0xb78cc132]
7: /usr/lib/xorg/modules/extensions//libglx.so [0xb78be2e8]
8: /usr/lib/xorg/modules/extensions//libglx.so [0xb78bd1a7]
9: /usr/lib/xorg/modules/extensions//libglx.so [0xb78c1d6a]
10: /usr/X11R6/bin/X(Dispatch+0x33f) [0x808d57f]
11: /usr/X11R6/bin/X(main+0x3bd) [0x80722ed]
12: /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe5) [0xb7b20775]
13: /usr/X11R6/bin/X [0x80717a1]
Saw signal 11. Server aborting.
(II) USB Keyboard: Close
(II) UnloadModule: "evdev"
(II) USB Keyboard: Close
(II) UnloadModule: "evdev"
(II) Macintosh mouse button emulation: Close
(II) UnloadModule: "evdev"
(II) Trust GM-4200 Gamer Optical Mouse: Close
(II) UnloadModule: "evdev"
(II) AIGLX: Suspending AIGLX clients for VT switch
disable primary dac
disable FP1
(II) RADEON(0): RADEONRestoreMemMapRegisters() :
(II) RADEON(0): MC_FB_LOCATION : 0xcfffc000 0xcfffc000
(II) RADEON(0): MC_AGP_LOCATION : 0x003f0000
finished PLL2
finished PLL1
Entering Restore TV
Restore TV PLL
Restore TVHV
Restore TV Restarts
Restore Timing Tables
Restore TV standard
Leaving Restore TV
 ddxSigGiveUp: Closing log

Martin Olsson (mnemo) wrote :

AlckO, can you paste "lspci -nn | grep VGA" here as well so I can see exactly which card you have.

In general this bug is now well understood by the driver devs so we basically just need to wait for them to fix upstream bug 17895. After it's fixed we can see if we can cherry pick that patch for ubuntu.

No further "me too" comments are needed unless you can confirm the bug on a non-R300 radeon chipset. Thanks.

It's a long shot, but can someone try the Mesa r300 driver patch attached to bug 20539 to see if it helps for this? It should fix a case of using memory after free, which could theoretically cause all sorts of funny behaviour...

Just out of curiosity, do you have an idea when this bug was introduced ?

I've tried the patch suggested by Michel Dänzer in comment #19.

For me it sort of worked actually but I accidently ran "sudo dpkg -i *.deb" on mesa which borked by system a bit. Afterwards I think I managed to restore my system (at least I could repro the bug again and then I installed only the patched dri and glx packages). One strange thing remains on my system after this hickup though and that is that "glxinfo | grep direct" now says "no" (if anyway knows how to fix this please tell me).

Also, I uploaded my x86/x64 debs of ubuntu jaunty's mesa 7.4 with michel dänzers patch added, to this location (if someone else is using ubuntu jaunty maybe you can use this for testing as well?):
http://temp.minimum.se/mesa_with_fixed_ati_bug/

cnom (cnom) wrote :

Ok, then. I have this on an ATI Radeon X800 Pro. That's the R420 chipset, isn't it?

% lspci -nn | grep VGA
02:00.0 VGA compatible controller [0300]: ATI Technologies Inc R420 JI [Radeon X800PRO] [1002:4a49]

By the way, the same thing happens using the Shift Switcher.

Martin Olsson (mnemo) wrote :

Interesting... Conrad, can you please install debug symbols:
sudo apt-get install libc6-dbg xserver-xorg-core-dbg libgl1-mesa-dri-dbg xserver-xorg-video-ati-dbg

Then reboot your machine, repro the bug and then collect xorg.log and xorg.log.old right _after_ you have reproduced the bug? (ideally using ssh while the machine is still at gdm). I wonder if the backtrace is identical in your case.

AlckO (alckox) wrote :

01:00.0 VGA compatible controller [0300]: ATI Technologies Inc RV410 [Radeon X700 (PCIE)] [1002:5e4d]

cnom (cnom) wrote :

I've made copies of Xorg.0.log and Xorg.0.log.old from a tty right after gdm restarted.

Backtrace in Xorg.0.log.old:
0: /usr/bin/X(xorg_backtrace+0x3b) [0x813518b]
1: /usr/bin/X(xf86SigHandler+0x55) [0x80c7be5]
2: [0xb804a400]
3: /usr/lib/dri/r300_dri.so(_mesa_update_state_locked+0x832) [0xad4d7152]
4: /usr/lib/dri/r300_dri.so(_mesa_update_state+0x2a) [0xad4d728a]
5: /usr/lib/dri/r300_dri.so(_mesa_GetIntegerv+0x278) [0xad5a90c8]
6: /usr/lib/xorg/modules/extensions//libglx.so [0xb79c6132]
7: /usr/lib/xorg/modules/extensions//libglx.so [0xb79b82e8]
8: /usr/lib/xorg/modules/extensions//libglx.so [0xb79b71a7]
9: /usr/lib/xorg/modules/extensions//libglx.so [0xb79bbd6a]
10: /usr/bin/X(Dispatch+0x33f) [0x808d57f]
11: /usr/bin/X(main+0x3bd) [0x80722ed]
12: /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe5) [0xb7c1a775]
13: /usr/bin/X [0x80717a1]

(In reply to comment #21)
> For me it sort of worked actually [...]

What does this mean exactly? The steps from comment #17 no longer cause a crash?

> One strange thing remains on my system after this hickup though and that is
> that "glxinfo | grep direct" now says "no" (if anyway knows how to fix this
> please tell me).

LIBGL_DEBUG=verbose glxinfo

should give more information.

Yes, I can spin the ring switcher full speed for 30 seconds straight without crashes. I also tried lots of other things like spinning the cube fast and what not and with this fix I was unable to crash. The only thing that made me unsure was the "direct rendering: no" because I was afraid I had misconfigured my system in such a way that I was no longer hitting the same execution path in the code (and thus just wasn't seeing the bug anymore).

If I run glxinfo with verbose this is what it says:
direct rendering: No (LIBGL_ALWAYS_INDIRECT set)
Also if I do "env | grep LIBGL" I can see "LIBGL_ALWAYS_INDIRECT=1" but I have no idea where this was set and by what file/program etc? As far as I know I have not set this myself.

Actually after googling around I found this bug (marked INVALID):
https://bugs.launchpad.net/ubuntu/+source/desktop-effects/+bug/137388
(this bug describes by issue pretty accurately. If I launch a gnome-terminal and then do "glxinfo | grep direct" it says "direct rendering: No (LIBGL_ALWAYS_INDIRECT set)" however in the same session on the same computer if I press ALT-F2 and type "xterm" and then do "glxinfo | grep direct" then it says "direct rendering: yes".

Actually, if I put a gnome-terminal launcher onto the GNOME panel and then launch it then I do get "glxinfo | grep direct" displaying "yes". However, if I launch gnome-terminal using my custom keybinding "CTRL-ALT-A" then "glxinfo | grep direct" prints "No (LIBGL_ALWAYS_INDIRECT set)". I understand why this happens now, it's because that keybinding is something I configured in gconf under /apps/metacity/global_keybindings and when I use compiz basically I think compiz reuses the same keybindings read from the gconf of metacity so basically compiz is the process that is the parent of my gnome-terminal when I launch it with CTRL-ALT-A and of course compiz sets LIBGL_ALWAYS_INDIRECT for it's own process when starting up.

God that made really confused for a while.

@Dänzer, I've asked a bunch of other ubuntu users if they could try my DEBs as well so see if that fixes the bug for them. See LP bug 368049:
https://bugs.launchpad.net/xserver-xorg-driver-ati/+bug/368049

So far, at least one other ubuntu user (who has a "Radeon X700 (PCIE)" with RV410 chipset) has confirmed that this fixes the bug on this machine.

Fix pushed to Git master and mesa_7_4_branch, thanks for testing.

Martin Olsson (mnemo) wrote :

I've taken a non-commited patch suggested by upstream and applied it to jaunty's mesa 7.4 and using that combination I no longer get this crash. I've loaded DEBs with this patch (built for both x86 and x64) here:
http://temp.minimum.se/mesa_with_fixed_ati_bug/

It would be nice to hear if this fixes the issue for other people as well. Basically what you need to do is:
1. download the right DEBs for your arch
2. "sudo dpkg -i libgl1-mesa-glx_*.deb" followed by "sudo dpkg -i libgl1-mesa-dri_*.deb"
3. reboot the computer just to be sure everything reloads properly
4. see if the bug still exists and report back here
5. "sudo apt-get install --reinstall libgl1-mesa-glx/jaunty libgl1-mesa-dri/jaunty" to revert back to supported distro packages

cnom (cnom) wrote :

Works for me!

Does anything speak against leaving your patches installed? If so, how long do you think will it take until this finds its way into the official repositories?

Martin Olsson (mnemo) wrote :

The patch was written by upstream developer / coding ninja Michel Dänzer, he's the one you should thank. You can safely leave the DEB installed for a while if it help your stability. In general we never apply patches until they have been commited upstream just because we want to make sure that the patch is approved as a high quality fix by upstream. As soon as Dänzer commits a final patch I will talk to Bryce about getting this into karmic and also a potential SRU for jaunty.

The more people that test these DEBs, the higher the probability is that we can release it as an update to jaunty.

Martin Olsson (mnemo) wrote :

Upstream just commited the patch to both mesa master (commit c28707b50701b1cf8727be29d61e2d939c6ee58f) and mesa 7.4 stable branch (commit a1ce4efefbb7f796a0a24544a1e893a56848f0c1).

Martin Olsson (mnemo) on 2009-04-30
affects: xserver-xorg-video-ati (Ubuntu) → mesa (Ubuntu)

I tried your debs on my debian/sid system. They apparently fix a problem where constantly resizing a gnome-terminal could hang the server, but they don't fix server hangs when running OpenGL apps, like the Carousel screensaver (see bug #9252).

Changed in xserver-xorg-driver-ati:
status: Confirmed → Fix Released
Martin Olsson (mnemo) wrote :

I got some feedback from tormod on the previous debdiff. In this new version I've fixed the following things:
- added patch description tags to the .diff file
- make sure version number is correct (was using "dch -i" before and now it's blah.1)
- it now targets the "jaunty-proposed" pocket instead of "jaunty"

Martin Olsson (mnemo) on 2009-05-04
description: updated
Bryce Harrington (bryce) on 2009-05-04
description: updated
description: updated
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mesa - 7.4-0ubuntu4

---------------
mesa (7.4-0ubuntu4) karmic; urgency=low

  * debian/patches/106_compiz_ring_switcher_xorg_segv_on_radeon.diff:
    fix xserver segv triggered by compiz ring switcher plugin for users
    with r300/r400 radeon chipsets and -ati driver. Patch previously
    commited to mesa master as c28707b50701b1cf8727be29d61e2d939c6ee58f
    and also to mesa_7_4_branch as a1ce4efefbb7f796a0a24544a1e893a56848f0c1.
    Note: it was commited to the 7.4 branch after mesa 7.4.0 release.
    (LP: #368049)

 -- Martin Olsson <email address hidden> Mon, 04 May 2009 12:25:29 +0200

Changed in mesa (Ubuntu):
status: Confirmed → Fix Released
Bryce Harrington (bryce) wrote :

Thanks mnemo, looks good, I've sponsored the uploads to karmic and jaunty.

Changed in mesa (Ubuntu):
importance: Undecided → High
status: Fix Released → Fix Committed
Martin Pitt (pitti) wrote :

+--- mesa-7.4.orig/src/mesa/drivers/dri/r300/r300_context.h 2009-05-04 12:07:48.000000000 +0200
++++ mesa-7.4/src/mesa/drivers/dri/r300/r300_context.h 2009-05-04 12:08:56.000000000 +0200
+@@ -211,7 +211,7 @@
+ };
+
+ struct r300_texture_env_state {
+- r300TexObjPtr texobj;
++ struct gl_texture_object *texobj;

Do I smell an ABI change here, or is that header file (and thus the struct) not exported anywhere? Or are these two pointers of "by and large" the same type?

Patch is limited to the r300 driver, thus shouldn't change behaviour on other chipsets. Looks okay to me, if above struct change is opaque to mesa clients.

Martin Olsson (mnemo) wrote :

@Pitti, great question. I do indeed think it's safe, but I'm not familiar with the mesa codebase though. I asked on #dri-devel to be sure:

[00:50] <mnemo> could the first change in this patch cause ABI breakage if this patch was applied to an otherwise stable distro release? --> https://bugs.freedesktop.org/attachment.cgi?id=25219
[00:51] <airlied> mnemo: don't think so
[00:52] <mnemo> we're about to ship it as a stable release update for ubuntu
[00:52] <mnemo> im thinking that all the apps call mesa and then mesa calls the mesa driver for that card right?
[00:52] <airlied> pretty much
[00:53] <mnemo> thanks daniel

Bryce Harrington (bryce) wrote :

[timing error]

Changed in mesa (Ubuntu):
status: Fix Committed → Fix Released
Martin Pitt (pitti) wrote :

Thanks Martin.

Changed in mesa (Ubuntu Jaunty):
status: New → Fix Committed
tags: added: verification-needed
Martin Pitt (pitti) wrote :

Accepted mesa into jaunty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Martin Olsson (mnemo) wrote :

FWIW; Works for me. Compiz was updated in karmic today so I could finally start it there again. I've done some testing of this bugfix for karmic; spinning the windows around with the the ring switcher and the shift switcher plugins full speed for a solid 25 seconds and I'm not able to induce any crash. Also nothing was barfed to dmesg or xorg.log while doing so.

Martin Olsson (mnemo) wrote :

Good news. Turns out LP bug #336320 is a duplicate of this one and one of the posters on that bug has confirmed the SRU:
https://bugs.launchpad.net/ubuntu/+source/mesa/+bug/336320

Martin Pitt (pitti) on 2009-05-06
tags: added: verification-done
removed: verification-needed
Martin Olsson (mnemo) wrote :

I've been following the ATI/mesa bugs closely because of this SRU.

One user posted a question "did anyone else see a perf regression from the mesa 3.1 SRU?" here:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-ati/+bug/351990/comments/67

Note that 1) claim is not verified yet, and 2) this SRU fixes "frequent crashes" and the issue is perf related. I don't know that much yet, I've asked the user to roll back the SRU to confirm though.

pablomme (pablomme) wrote :

I haven't experienced any performance problems after the SRU, it's pretty much like it was. I use Gnome and Compiz, not KDE and Kwin, though.

One performance issue that I noticed with the -ati driver is when rotating the desktop cube with an animated skydome, for which the animation is visibly choppy. This was equally true before and after installing the proposed SRU (perhaps slightly better after, if anything).

My card is an ATI xpress 200, by the way.

Martin Olsson (mnemo) wrote :

@pablomme, thanks for posting. I've tested this very carefully before posting the SRU, I've been running it since it was uploaded to proposed and I've checked again now on a clean machine and I'm not seeing any regression (I'm not using KDE though but GNOME). The guy who reported it also has previously reported in various comments that he's trying various mainline kernels and tormod's PPA etc so it could potentially be a config issue on this machine only. I will keep looking for any other signs that this update had any issues with it.

haneya (a-haneya) wrote :

Hi , same issue with "ATI Technologies Inc M56P [Radeon Mobility X1600]" i did not find any solution to this + another issue with video freeze + when playing with compiz X restart .

Jaunty 2.6.28-11-generic #42-Ubuntu

Thanks.

haneya (a-haneya) wrote :

Does this patch apply on my case ? can some one answer plz ?!

Martin Olsson (mnemo) wrote :

@haneya, install the mesa 3.1 version from jaunty-proposed to find out.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mesa - 7.4-0ubuntu3.1

---------------
mesa (7.4-0ubuntu3.1) jaunty-proposed; urgency=low

  * debian/patches/106_compiz_ring_switcher_xorg_segv_on_radeon.diff:
    fix xserver segv triggered by compiz ring switcher plugin for users
    with r300/r400 radeon chipsets and -ati driver. Patch previously
    commited to mesa master as c28707b50701b1cf8727be29d61e2d939c6ee58f
    and also to mesa_7_4_branch as a1ce4efefbb7f796a0a24544a1e893a56848f0c1.
    Note: it was commited to the 7.4 branch after mesa 7.4.0 release.
    (LP: #368049)

 -- Martin Olsson <email address hidden> Mon, 04 May 2009 12:25:29 +0200

Changed in mesa (Ubuntu Jaunty):
status: Fix Committed → Fix Released
Stefan (stefan-helmer) wrote :

Hi,
I have the issue where ring switcher crashes. The bug seems to be posted in a few places, but this looked like the most current thread.

I noticed that the patch seemed to be in the last round of updates I got on the Update Manager today (may 14), but now, if anything, I'd say ring switcher crashes even faster than previously.

Initial info on my computer:
Dell Inspiron 4150 w/ Radeon Mobility 7500
Ubuntu 9.04 w/ all the updates from the Update Manager
Haven't tinkered with much else. Have the default ATI drivers that Ubuntu installs

I'm not sure what else is useful to know, but I'm happy to post info.

Thx
Stefan

Martin Olsson (mnemo) wrote :

@Stefan, please startup the crash reporter tool using this command:

      sudo force_start=1 /etc/init.d/apport start

...and after that reproduce your issue and let it file a new bug. Once we have seen a full backtrace we can determine whether your issue is related to this bug or not. If the crash reporter doesn't work for some reason, open a new bug manually and link to your comment here and we can take it from there.

Stefan (stefan-helmer) wrote :

https://bugs.launchpad.net/ubuntu/+bug/377122

hope I did that right....
In the syslog, May 15 17:17:01 is the last entry before I made it crash.

Hope that helps! Let me know if you need more.

-Stefan

Carlo Capocasa (carlotheman) wrote :

The alternate mesa packages in Comment 21 (https://bugs.launchpad.net/ubuntu/+source/mesa/+bug/368049/comments/21) worked for me.

If I don't follow up (search for my name to find out) the fix worked permantently.

Carlo

Carlo Capocasa (carlotheman) wrote :

Correction: The mentioned package didn't fix the issue for me after all, as there was another crash.

Stefan (stefan-helmer) wrote :

I can confirm that it also crashes for me with this package installed.

i'm experiencing a similar problem. however it's not neccessary to switch the emerald-theme, but i have to switch between the window managers or at least reload them (compiz has to be involved: either switch from compiz, switch to
compiz or reload compiz). i am not able to reprocuce it, it happens randomly

i have a ATI radeon mobility 9800 and i'm using radeon V6.12.2 and xorg 1.6.3
i'll attach my x-logfile

maybe the following is helpful, maybe completely irrelevant:
when i don't load the module dri2, compiz starts, but the screen is completely white, only the mouse pointer is visible. the system stays usable, i can click around and it reacts, but invisible. when i interrupt (ctrl+C) compiz again, the white disappears and my system is normally usable again.
when i set the server flag "AIGLX" to "off" compiz doesn't start

Created an attachment (id=28788)
logfile after the crash

that's the log about the crash with radeon 6.12.2 and xorg 1.6.3

Changed in xserver-xorg-driver-ati:
status: Fix Released → Confirmed

(In reply to comment #28)
> i'm experiencing a similar problem. however it's not neccessary to switch the
> emerald-theme,

Please always file a new bug unless it's 100% certain it's the same one (which is unlikely here given the above). It's easier to mark separate reports as duplicates than to untangle information about several issues in a single report.

> but i have to switch between the window managers or at least reload them
> (compiz has to be involved: either switch from compiz, switch to compiz or
> reload compiz).

That sounds like an X server issue which has been fixed in http://cgit.freedesktop.org/xorg/xserver/commit/?id=2075d4bf9e53b8baef0b919da6c44771220cd4a5 and http://cgit.freedesktop.org/xorg/xserver/commit/?id=3020b1d43e34fca08cd51f7c7c8ed51497d49ef3 .

Changed in xserver-xorg-driver-ati:
status: Confirmed → Fix Released
hkais (r-2) wrote :

I have a similar problem with nvidia. I tried 3 versions of the proprietary driver. With no success..
Here the Bug #412113

Changed in xserver-xorg-driver-ati:
importance: Unknown → High
Changed in xserver-xorg-driver-ati:
importance: High → Unknown
Changed in xserver-xorg-driver-ati:
importance: Unknown → High
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.