ATI Technologies Inc RV350 AR [Radeon 9600] hangs occasionally when dri enabled

Bug #32124 reported by Jeff Bailey
46
Affects Status Importance Assigned to Milestone
xserver-xorg-video-ati (Ubuntu)
Fix Released
High
Ubuntu-X

Bug Description

Ever since moving to the modular X tree, this driver has tried to enable dri. It now crashes seemingly randomly. There does not seem to be a trigger (It doesn't wait until I run something with GL although it doesn't happen when the system is idle) - sometimes it can be while I'm typing in a web form or on IRC. It could be while scrolling in a webpage, or during mouse activity like testing the latest gnome-games.

When it hangs, I can usually (but not always) ssh to the machine and killall -9 Xorg. The machine generally is happiest when I reboot after that.

I have solved this problem by preventing radeon.ko from loading, and forcing GL to be disabled. I had worked with Daniel and BenH on this near the end of last year, and we were not able to get to the bottom of it. I suspect that for this card, dri should be disabled by default.

Information:
0000:f0:10.0 VGA compatible controller: ATI Technologies Inc RV350 AR [Radeon 9600]

2.6.15-15-powerpc64-smp

Xorg.log file attached.

Revision history for this message
Jeff Bailey (jbailey) wrote : Xorg log file

Xorg log file

Revision history for this message
Jeff Bailey (jbailey) wrote :

Setting severity to Major, this causes machine crashes.

Revision history for this message
Jeff Bailey (jbailey) wrote :

(For those playing along, This and 32125 are unrelated. Solving one, doesn't solve the other)

Revision history for this message
Rocco Stanzione (trappist) wrote :

Is this the same bug as 31527?

Revision history for this message
Jeff Bailey (jbailey) wrote :

I'm concerned about the drop from Major severity to Normal. This causes me to power cycle my machine. Can you tell me why it was lowered, please?

Revision history for this message
Fabio Massimo Di Nitto (fabbione) wrote :

by mistake.. i was batch processing..

Fabio

Revision history for this message
Cory Maccarrone (darkstar6262) wrote :

I've been able to predictably reproduce the lockup on my machine. First, my hardware:

0000:01:05.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility U1 (prog-if 00 [VGA])
        Subsystem: Hewlett-Packard Company Pavilion ze4400 builtin Video
        Flags: bus master, stepping, fast Back2Back, 66MHz, medium devsel, latency 66, IRQ 10
        Memory at e8000000 (32-bit, prefetchable) [size=128M]
        I/O ports at 9000 [size=256]
        Memory at e0100000 (32-bit, non-prefetchable) [size=64K]
        Expansion ROM at e0120000 [disabled] [size=128K]
        Capabilities: <available only to root>

The method is as follows:

In Gnome, open the forcast window from the weather applet and switch to the radar map tab. Then, keep resizing the window until the system locks. On my computer, either the screen will slightly corrupt and freeze, go blank, or turn off completely. I'm unable to recover from it other than just rebooting.

This is on Linux 2.6.15.5-ubuntu1 (custom compiled) on a Compaq Presario 2172us (AMD Athlon XP-M 2500+).

Just my two cents.

- Cory

Revision history for this message
Cory Maccarrone (darkstar6262) wrote :

Came across a patch for a problem I was having with PCI devices not being initialized properly here:

http://lkml.org/lkml/2006/1/5/287

After applying it, the messages went away, and my lockup problems seem to have stopped as well. Does this work for anyone else?

I'll continue to test, but my usual test with the weather applet seems rock solid now.

- Cory

Revision history for this message
Cory Maccarrone (darkstar6262) wrote :

Nevermind. It's more solid, but still does eventually crash.

- Cory

Revision history for this message
Michał Sawicz (saviq) wrote :

Looks like same here on Radeon 7000... Crashed a lot in Badger, in Dapper hanged only after a few hours work... but still...
Can't locate any errors in any of the logs... Ssh doesn't work either.

Revision history for this message
Michał Sawicz (saviq) wrote : Xorg.log

Here's my x-org log

Revision history for this message
Erik Andrén (erik-andren) wrote :

Try to get a backtrace of the crash and post it here.

1. Log in via ssh from a remote computer
2. Get the PID of the X process
3. Attach gdb to the process
4. Catch the crash and output the backtrace by typing bt in gdb.
5. Post the results here.

Revision history for this message
Alexandre Otto Strube (surak) wrote :

waiting what Erik asked for.

Changed in xserver-xorg-driver-ati:
status: Unconfirmed → Needs Info
Changed in xserver-xorg-driver-ati:
assignee: nobody → ubuntu-x-swat
Revision history for this message
Andy Hird (andyhird) wrote :

This bug looks similar to 38181 which I've just added a comment to. I'll try and get the backtrace in the next few days when I have access to more than one machine. Heres the comment I added there:

I'm experiencing this problem after upgrading from breezy to dapper (release).

Using the same xorg.conf that I've been running with for the whole of breezy my laptop would lockup hard (have to hold down the reset button to restart) after anything between 2 and 5 minutes. It usually ocurred when running something like evolution or firefox and I can repro every time by dragging and xterm over the top of evolution. Some relevant versions:
ii xserver-xorg-c 1.0.2-0ubuntu1 X.Org X server -- core server
ii xserver-xorg-d 6.5.7.3-0ubunt X.Org X server -- ATI display driver
kernel was linux-image-2.6.15-23-686

I tried commenting out the dri, GLcore and glx load modules in my xorg.conf one by one but each time I'd still get a lockup. Finally I tried rebooting with a 2.6.12-10-686 image and that seems to have solved the problem. I've been running for more than 12 hours with no repeat of the lockups. I haven't tried adding back the load modules in my xorg.conf but I may give that a go.

I'm using a pretty old Dell Latitude C510/C610 with a Radeon Mobility (16mb). I have no onboard wireless, bluetooth or anything fancy which may have caused problems with newer kernels.

Revision history for this message
Andy Hird (andyhird) wrote :

I tried to catch the crash with gdb via ssh. Unfortunately when the crash happens the machine really does lock hard, its not pingable and the ssh session (with gdb in it) is well and truly dead.

Any other suggestions for getting a backtrace from kernel lockups?

Revision history for this message
Benjamin Herrenschmidt (benh-kernel) wrote :

I'm pretty much out of clues about those crashes... they are not kernel crashes, it's the gfx card that locks up afaik. There is one remaining case where I suspect X goes bunk without the gfx card actually locking up which _might_ be related to a signal problem. You can test running X with the -dumbSched option and let us know if that makes any difference. Also, don't run any DRI application, just the server, for a while, to avoid mixing problems.

Revision history for this message
Andy Hird (andyhird) wrote :

I'm not sure how I can tell what apps are using DRI? I've been testing using just a standard gnome installation and running evolution and a few xterms (none of which I'd have thought would use DRI).

Would disabling DRI in the xorg.conf have the same effect (just comment out the load dri line)?

I'll do that and try running with the extra option to see how it goes. Like I mentioned before everything works fine when running with a 2.6.12-10-686 kernel I have installed (I guess from breezy), including DRI and GL turned on.

Revision history for this message
Benjamin Herrenschmidt (benh-kernel) wrote :

Apps using DRI -> Apps using 3d mostly, Just avoid running 3d screensavers :) If you disable DRI in xorg.conf, that will have the effect of completely disabling CCE acceleration which is likely to fix your problem as the DRM will be completely disabled.

So leave xorg.conf alone, try -dumbSched and tell us. I'm surprised when you say that 2.6.12 works with DRI and GL though... Did it already contain the r300 DRM ? Maybe it does.... I though it was merged more recently though. I suspect recent X will disable DRI with such an old DRM though. Can you verify that in your logs ?

Revision history for this message
Andy Hird (andyhird) wrote :

As I mentioned above disabling DRI (or the GLcore and glx modules) by commenting out the lines from my xorg.conf didn't previously fix the problem when running with the 2.16.15 kernel.

Anyway I added the -dumbSched to the X servers startup line (in gdm.conf) and verified it was running with it (just via ps):
/usr/bin/X :0 -br -audit 0 -dumbSched -auth /var/lib/gdm/:0.Xauth -nolisten tcp vt7

Initially (I didn't read last your message until just now) I did comment out the load dri line from my xorg.conf and ran with -dumbSched with 2.6.15. The machine stayed up for much longer - close to 2 hours (woohoo!). That included running evolution, xterms, firefox and even xscreensaver kicked in a few times. Then it locked hard as previously described.

After just reading your message I rebooted, reenabled DRI, restarted X still running with -dumbSched and on 2.6.15. I ran evolution and an xterm. Moved the xterm around the screen and got a hard lockup. Lasted about 2 minutes. Rebooting I get the same results.

Onto the DRI / r300 DRM. I've no idea what r300 DRM is, but I'm going to attach my Xorg.log as andyh.xorg.log to this bug. Looking in the log I do see the lines:

(II) Loading sub module "drm"
(II) LoadModule: "drm"
(II) Loading /usr/lib/xorg/modules/linux/libdrm.so
(II) Module drm: vendor="X.Org Foundation"
        compiled for 7.0.0, module version = 1.0.0
        ABI class: X.Org Server Extension, version 0.2

and later on:
drmOpenDevice: node name is /dev/dri/card0
drmOpenDevice: open result is -1, (No such device or address)
drmOpenDevice: open result is -1, (No such device or address)
drmOpenDevice: Open failed
drmOpenByBusid: Searching for BusID pci:0000:01:00.0
drmOpenDevice: node name is /dev/dri/card0
drmOpenDevice: open result is 7, (OK)
drmOpenByBusid: drmOpenMinor returns 7
drmOpenByBusid: drmGetBusid reports pci:0000:01:00.0

Seems ok?
I did notice that at the top of the log file it reports:

X Window System Version 7.0.0
Release Date: 21 December 2005
X Protocol Version 11, Revision 0, Release 7.0
Build Operating System:Linux 2.6.12 i686
Current Operating System: Linux zebedee 2.6.12-10-686 #1 Sat Mar 11 16:22:51 UTC
 2006 i686

the build OS is 2.6.12 (which happens to be the same OS as I'm running) which seems a little odd, I'd expect it to be the same as the new released kernel 2.6.15. A quick further (paranoid) investigation reveals:

root@zebedee:~$ dpkg -S /usr/bin/X
x11-common: /usr/bin/X
root@zebedee:~$ dpkg -l x11-common
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Installed/Config-files/Unpacked/Failed-config/Half-installed
|/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad)
||/ Name Version Description
+++-==============-==============-============================================
ii x11-common 7.0.0-0ubuntu4 X Window System (X.Org) infrastructure

(its actually package version 7.0.0-0ubuntu45 which looking at packages.ubuntu.com appears to be the most up-to-date version).

Revision history for this message
Andy Hird (andyhird) wrote : Andyh xorg.0.log

xorg.0.log when running under 2.6.12, dapper and noncrashing X

Revision history for this message
Benjamin Herrenschmidt (benh-kernel) wrote :

Seems similar to other reports of lockups with M6. Several comments:
 - Does it happen if you use 16 bpp instead of 24 ?
 - Your video memory isn't big enough for DRI to kick in, at least with the old kernel (you can see that in the log), thus DRI is in fact disabled. It would be worth checking what's up with the new kernel. There should be no difference, if there is, we indeed have some bug somewhere. Basically, with your setup and running at 24bpp, DRI should be disabled in both cases and the chip configured pretty much the same way.
- Thus, can you also attach a log with the newer kernel so I can compare some values please ?
- This should probably be filed under a different bug report related to M6 and radeon 7000 lockups

Revision history for this message
Andy Hird (andyhird) wrote : Andyh xorg.0.log running under 2.6.15

I've added my xorg.log running under 2.6.15. If you want me to move this to another bug point me at one I'll and add them there.

I'm heading out, but will attempt to run under the different resolution when I get back

Revision history for this message
Andy Hird (andyhird) wrote : Andyh xorg.0.log at 16bpps and 2.6.15

I've been running under 16bpp and the new 2.6.15 kernel overnight (about 9 hours) and had no lockups. I've tried pretty hard to lock up the machine by running various apps and failed, so I'd say that its working. Nice one.

I've attached my Xorg.0.log just incase it holds anything valuable. I can move this to a new bug or attach to another older one if you want?

Does that actually give you any clues as to why things are crashing under 24bpps?

Revision history for this message
Rocco Stanzione (trappist) wrote :

No activity in several months. Is this still a problem?

Revision history for this message
Jeff Bailey (jbailey) wrote :

On current edgy, X hangs solid as soon as it starts up so I'm having a bit of trouble telling. =/

My machine had to go into the shop to replace the power supply, so I haven't been hacking on this recently. I will start again soon.

Revision history for this message
Jeff Bailey (jbailey) wrote :

I've confirmed that this bug still occurs even with latest upstream git kernel now.

Revision history for this message
Benjamin Herrenschmidt (benh-kernel) wrote :

You get a solid hang at startup ? With up to date kernel DRM, X DDX etc... ? (including that clock fix we put in the X DDX a while ago)

Have you tried Option "DynamicClocks" "off" (or "on", that is try both).

Revision history for this message
Jeff Bailey (jbailey) wrote :

As with Bug# 54352 - I don't appear to be having this problem anymore at all with 6.6.2-0ubuntu3. Marking fixed, thanks!

Changed in xserver-xorg-video-ati:
status: Needs Info → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.