[RV516] CPU spin OR ioctl() freeze in xorg when visiting http://mundoplus.tv/ using -ati driver

Bug #371279 reported by jlpino
56
This bug affects 5 people
Affects Status Importance Assigned to Milestone
xserver-xorg-driver-ati
Fix Released
Critical
xserver-xorg-video-ati (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

-ati driver freezes (CPU spinning in X or an ioctl GPU lockup depending on whether compiz is OFF/ON).

REPRO STEPS:
1. (using firefox) visit website http://mundoplus.tv/

Note: This bug goes away completely when you set "DRI" "off" or if you use "AccelMethod" "XAA" or if you set "EXANoUploadToScreen" "true". The bug also repros on Fedora10.

When compiz is ON, this results in a GPU freeze, xorg bt shows it's permanently blocking on ioctl() and it takes 0% CPU.

When compiz is OFF, this results in 100% CPU spin in xorg (Note: for dual machines this will show up as 50% CPU or 25% CPU for quad core), in this case repeated xorg bt sampling shows:
#1 0xb7d35ea9 in ioctl () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7b30a6d in drmDMA () from /usr/lib/libdrm.so.2
#3 0xb7aa1948 in .....SOME_FUNCTION_HERE...
(and basically if you put breakpoints on these three top most stackframes it hits 1 and 2 but not ....SOME_FUNCTION... even though it seems that ....SOME_FUNCTION.... is different from different systems.

Confirmed affected hardware:
01:00.0 VGA compat: ATI Technologies Inc RV516 [Mobility Radeon
X1350] (jlpino)
01:00.0 VGA compat [0300]: ATI Technologies Inc RV350 AP [Radeon
9600] [1002:4150] (mnemo, Tomasz Czapiewski)
01:00.0 VGA compat [0300]: ATI Technologies Inc Radeon R350 [Radeon 9800 Pro] [1002:4e48] (Luc Vigato)
01:00.0 VGA compat [0300]: ATI Technologies Inc Radeon Mobility X1400 [1002:7145] (Thomas Lauckner, juhuu)
01:00.0 VGA compat [0300]: ATI Technologies Inc M56P [Radeon Mobility X1600] [1002:71c5] (aya)
01:00.0 VGA compat [0300]: ATI Technologies Inc RV350 [Mobility Radeon 9600 M10] [1002:4e50] (Gobnuts on kubuntu)
01:05.0 VGA compat [0300]: ATI Technologies Inc Radeon
XPRESS 200M 5955 (PCIE) [1002:5955] (Wolfgang Jeltsch)
01:00.0 VGA compat [0300]: ATI Technologies Inc RV350 [Mobility Radeon 9600 M10] [1002:4e50] (doomsword)
also confirmed on radeon r500 chipset

[ORIGINAL BUG REPORT]

Binary package hint: xserver-xorg-video-ati

In Ubuntu 8.10 I haven't problems with the 3D aceleration, but since I upgrade tu Jaunty my PC usually lockup and it don't answer when I press the keys (for example I cant access ttyN with ctrl+alt+N), however I can continue using the mouse although pointer moves too slowly.

This problem happens when I play some games (Supertuxkarts and Planet Penguin Rancer) but not with all (OpenArena works fine), and when I visit some websites like mundoplus.tv (I think that happens in this site because it uses jquery, with flash desactivated it happens too).

My graphic card is X1350.
pino@pino-hp6820s:~$ lspci | grep Radeon
01:00.0 VGA compatible controller: ATI Technologies Inc RV516 [Mobility Radeon X1350]

I use radeon's module:
pino@pino-hp6820s:~$ lsmod | grep radeon
radeon 342816 3
drm 96296 4 radeon

Sorry for my bad English.

Tags: freeze
Revision history for this message
Bryce Harrington (bryce) wrote :

Hi jlpino,

Please attach the output of `lspci -vvnn`, and attach your /var/log/Xorg.0.log (and maybe Xorg.0.log.old) file from after reproducing this issue. If you've made any customizations to your /etc/X11/xorg.conf please attach that as well.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-xorglog
tags: added: needs-lspci-vvnn
Changed in xserver-xorg-video-ati (Ubuntu):
status: New → Incomplete
Revision history for this message
In , Martin Olsson (mnemo) wrote :

repro steps:
0. (using ubuntu 9.04 stable version)
1. open http://mundoplus.tv/ in firefox
2. xorg freezes and gdb shows it's stuck in drmIoctl()

hardware known to be affected:
01:00.0 VGA compatible controller: ATI Technologies Inc RV516 [Mobility Radeon X1350]
01:00.0 VGA compatible controller [0300]: ATI Technologies Inc RV350 AP [Radeon 9600] [1002:4150]

xorg.log recorded using ssh while machine was frozen/hung:
http://launchpadlibrarian.net/26396832/XorgLog.txt
(nothing is printed to dmesg when the freeze happens)

downstream bug report is here:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-ati/+bug/371279

---

please let me know if you want additional information or if you want to to try any patches. if this bug happens to be fixed upstream some cherry pick guesses would be super nice. thanks.

Revision history for this message
In , Martin Olsson (mnemo) wrote :

Created an attachment (id=25555)
gdb trace show xorg CPU spin caused by visiting website mundoplus.tv

If I turn off compiz and open that URL in firefox xorg still locks up but instead of being stuck permanently blocking in drmIoctl() it goes into a CPU spin.

At this time the backtrace is essentially:

#1 0xb7d35ea9 in ioctl () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7b30a6d in drmDMA () from /usr/lib/libdrm.so.2
#3 0xb7aa1948 in RADEONCPGetBuffer (pScrn=0x9e575c8) at ../../src/radeon_accel.c:651
#4 0xb7af60fb in RADEONPrepareSolidCP (pPix=0xa508620, alu=3, pm=4294967295, fg=0) at ../../src/radeon_exa_funcs.c:92
#5 0xb78ce96a in exaFillRegionSolid (pDrawable=0xa508620, pRegion=0xa4fef40, pixel=0, planemask=4294967295, alu=<value optimized out>)
    at ../../exa/exa_accel.c:939
#6 0xb78d0312 in exaPolyFillRect (pDrawable=0xa508620, pGC=0xa0ba0f8, nrect=1, prect=0xa4865cc) at ../../exa/exa_accel.c:751
#7 0x08180b94 in damagePolyFillRect (pDrawable=0xa508620, pGC=0xa0ba0f8, nRects=1, pRects=0xa4865cc) at ../../../miext/damage/damage.c:1404
#8 0x0808a4f0 in ProcPolyFillRectangle (client=0xa4e4008) at ../../dix/dispatch.c:1769
#9 0x0808d57f in Dispatch () at ../../dix/dispatch.c:437

If I put breakpoints on the three top most stack frames I see ioctl() and drmDMA() being hit constantly but the breakpoint on RADEONCPGetBuffer() is never hit so I don't think that function ever exits.

Breakpoint 1, 0xb7d35e90 in ioctl () from /lib/tls/i686/cmov/libc.so.6
Continuing.
Breakpoint 2, 0xb7b309f5 in drmDMA () from /usr/lib/libdrm.so.2
Continuing.
Breakpoint 1, 0xb7d35e90 in ioctl () from /lib/tls/i686/cmov/libc.so.6
Continuing.
Breakpoint 2, 0xb7b309f5 in drmDMA () from /usr/lib/libdrm.so.2
Continuing.

etc etc

I'm attaching a full gdb showing this trace.

Revision history for this message
Martin Olsson (mnemo) wrote : Re: Sistem lockup when I play games or visit some websites

I can confirm xorg lockup when visiting http://mundoplus.tv/ using jaunty with -ati and:
01:00.0 VGA compatible controller [0300]: ATI Technologies Inc RV350 AP [Radeon 9600] [1002:4150]

What I see in gdb is a GPU lockup:
(gdb) bt
#0 0xb7ff1430 in __kernel_vsyscall ()
#1 0xb7c5dea9 in ioctl () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7a56add in drmIoctl () from /usr/lib/libdrm.so.2
#3 0xb7a56e2b in drmCommandWrite () from /usr/lib/libdrm.so.2
#4 0xad53274e in ?? () from /usr/lib/dri/r300_dri.so
#5 0xad566311 in _mesa_Finish () from /usr/lib/dri/r300_dri.so
#6 0xb7aba78b in ?? () from /usr/lib/xorg/modules/extensions//libglx.so
#7 0xb7ab6d6a in ?? () from /usr/lib/xorg/modules/extensions//libglx.so
#8 0x0808d57f in Dispatch ()
#9 0x080722ed in main ()

Note to other triagers/devs: Alex Deucher recently suggested that we will have GPU lockup debugging facilities (similar to intel batch buffer dumps introduced in the 2.6.30-rc4 kernel) once the radeon-rewrite branch is merged into stable. See this e-mail: http://lists.freedesktop.org/archives/xorg/2009-May/045486.html
Jerome Glisse bloged about this work here: http://jglisse.livejournal.com/1822.html

Good news is that jlpino found a solid repro for this particular lockup so hopefully it will be actionable even without such a dump.

I will shortly attach apport-collect recorded from ssh while xorg in hung state.

Revision history for this message
Martin Olsson (mnemo) wrote : apport-collect data

Architecture: i386
DistroRelease: Ubuntu 9.10
Package: xserver-xorg-video-ati 1:6.12.2-0ubuntu3
PackageArchitecture: i386
ProcEnviron:
 SHELL=/bin/bash
 LANG=en_DK.UTF-8
ProcVersion: Linux version 2.6.28-11-generic (buildd@palmer) (gcc version 4.3.3 (Ubuntu 4.3.3-5ubuntu4) ) #42-Ubuntu SMP Fri Apr 17 01:57:59 UTC 2009
Uname: Linux 2.6.28-11-generic i686
UserGroups: adm admin audio cdrom dialout dip floppy fuse lpadmin plugdev video
Xrandr:

glxinfo:

setxkbmap:

xdpyinfo:

Revision history for this message
Martin Olsson (mnemo) wrote :
Revision history for this message
Martin Olsson (mnemo) wrote :
Revision history for this message
Martin Olsson (mnemo) wrote :
Revision history for this message
Martin Olsson (mnemo) wrote :
Revision history for this message
Martin Olsson (mnemo) wrote :
Revision history for this message
Martin Olsson (mnemo) wrote :
Revision history for this message
Martin Olsson (mnemo) wrote :
Revision history for this message
Martin Olsson (mnemo) wrote :
Changed in xserver-xorg-video-ati (Ubuntu):
status: Incomplete → New
Revision history for this message
Martin Olsson (mnemo) wrote : Re: Sistem lockup when I play games or visit some websites

Basically nothing was printed in xorg.log or dmesg leading up to the freeze, I know for sure that the line:
Changing OV0_BASE_ADDR from 0xe0000000 to 0xe5c00000
was appended to xorg.log before the lockup happened.

Revision history for this message
Martin Olsson (mnemo) wrote :

I've upstreamed this bug report here:
https://bugs.freedesktop.org/show_bug.cgi?id=21598

Changed in xserver-xorg-driver-ati:
status: Unknown → Confirmed
Revision history for this message
Martin Olsson (mnemo) wrote :

If I turn off compiz and open that URL in firefox xorg still locks up but
instead of being stuck permanently blocking in drmIoctl() it goes into a CPU
spin.

At this time the backtrace is essentially:

#1 0xb7d35ea9 in ioctl () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7b30a6d in drmDMA () from /usr/lib/libdrm.so.2
#3 0xb7aa1948 in RADEONCPGetBuffer (pScrn=0x9e575c8) at
../../src/radeon_accel.c:651
#4 0xb7af60fb in RADEONPrepareSolidCP (pPix=0xa508620, alu=3, pm=4294967295,
fg=0) at ../../src/radeon_exa_funcs.c:92
#5 0xb78ce96a in exaFillRegionSolid (pDrawable=0xa508620, pRegion=0xa4fef40,
pixel=0, planemask=4294967295, alu=<value optimized out>)
    at ../../exa/exa_accel.c:939
#6 0xb78d0312 in exaPolyFillRect (pDrawable=0xa508620, pGC=0xa0ba0f8, nrect=1,
prect=0xa4865cc) at ../../exa/exa_accel.c:751
#7 0x08180b94 in damagePolyFillRect (pDrawable=0xa508620, pGC=0xa0ba0f8,
nRects=1, pRects=0xa4865cc) at ../../../miext/damage/damage.c:1404
#8 0x0808a4f0 in ProcPolyFillRectangle (client=0xa4e4008) at
../../dix/dispatch.c:1769
#9 0x0808d57f in Dispatch () at ../../dix/dispatch.c:437

If I put breakpoints on the three top most stack frames I see ioctl() and
drmDMA() being hit constantly but the breakpoint on RADEONCPGetBuffer() is
never hit so I don't think that function ever exits.

Breakpoint 1, 0xb7d35e90 in ioctl () from /lib/tls/i686/cmov/libc.so.6
Continuing.
Breakpoint 2, 0xb7b309f5 in drmDMA () from /usr/lib/libdrm.so.2
Continuing.
Breakpoint 1, 0xb7d35e90 in ioctl () from /lib/tls/i686/cmov/libc.so.6
Continuing.
Breakpoint 2, 0xb7b309f5 in drmDMA () from /usr/lib/libdrm.so.2
Continuing.

etc etc

I'm attaching a full gdb showing this trace.

Bryce Harrington (bryce)
Changed in xserver-xorg-video-ati (Ubuntu):
status: New → Confirmed
Revision history for this message
Srik (maxpower-email) wrote :

I've the same problem on jaunty (ubuntu 9.04) with a ATI RADEON X300 SE (RV370 SE).

Revision history for this message
jlpino (jlpino) wrote :

Thanks to all for your help. I attach the requested files.

I removed the tags "needs-lspci-vvnn" and "needs-xorglog" because I think that there is enough info with the files attached by Martin Olsson and me.

If I can help with this bug, contact me.

tags: removed: needs-lspci-vvnn needs-xorglog
Revision history for this message
In , Martin Olsson (mnemo) wrote :

We're getting _a lot_ of duplicate bug reports (and confirms) for this bug in the downstream bug tracker. Around 10 bug reports each containing 1-4 users confirming the problem.

Hardware affected (all confirmed using live cd, i.e. no configuration problems on the machines):

01:00.0 VGA compat: ATI Technologies Inc RV516 [Mobility Radeon
X1350]
01:00.0 VGA compat [0300]: ATI Technologies Inc RV350 AP [Radeon
9600] [1002:4150]
01:00.0 VGA compat [0300]: ATI Technologies Inc Radeon R350 [Radeon 9800 Pro] [1002:4e48]
01:00.0 VGA compat [0300]: ATI Technologies Inc Radeon Mobility X1400 [1002:7145]
01:00.0 VGA compat [0300]: ATI Technologies Inc M56P [Radeon Mobility X1600] [1002:71c5]

Revision history for this message
In , Michel Dänzer (michel-daenzer) wrote :

Does

    Option "RenderAccel" "off"

or

    Option "DRI" "off"

or any other usual suspect work around the problem?

Revision history for this message
In , Martin Olsson (mnemo) wrote :

With DRI off the bug does not repro any more. If I remove DRI "off" and use RenderAccel "off" instead then bug comes back.

Revision history for this message
In , Martin Olsson (mnemo) wrote :

Option "AccelMethod" "XAA" also makes the bug go away (I guess it's just hitting a different execution path).

As I mentioned earlier, if you have compiz ON this turns into a ioctl() that blocks indefinitely with 0% CPU activity in xorg. If you have compiz OFF it instead turns info a CPU spin inside Xorg hitting the breakpoints in the order I explained in comment #1.

This latter fact means that on modern dual core machines people basically see "50% CPU in use by Xorg" while their system is still sluggish but it can be operated still. Whereas quad core users see "xorg hogs 25% CPU constantly". By looking more carefully into the downstream bug reports I think I have another 10 duplicates of this bug, it's just that quad core users tend to report "a performance problem" rather than a freeze/lockup so I didn't realized that these bugs where proabably duplicates at first.

There is still one thing that makes me not want to dup all those bugs against my bug though, and that is the fact that some of these "25% CPU in xorg" bugs show a spinning stacktrace rooted in exaGlyphs() where as I explained above that the CPU spin that I see for _this_ bug is rooted in RADEONPrepareSolidCP(). However, both of these stacks have drmIoctl() and ioctl() and their two topmost stack frames.

FWIW, an example of this (a CPU spinning xorg stack rooted in exaGlyphs function) in the downstream tracker is bug 347078:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-ati/+bug/347078

So, I even believe that upstream FDO bug 21683 is a potential duplicate of this bug:
https://bugs.freedesktop.org/show_bug.cgi?id=21683

Martin Olsson (mnemo)
description: updated
summary: - Sistem lockup when I play games or visit some websites
+ CPU spin lockup in xorg when visiting http://mundoplus.tv/ using -ati
+ driver
Martin Olsson (mnemo)
description: updated
Martin Olsson (mnemo)
description: updated
Martin Olsson (mnemo)
description: updated
Revision history for this message
In , Martin Olsson (mnemo) wrote :

all defaults except EXANoComposite==true.... still freezes
all defaults except EXANoDownloadFromScreen==true.... still freezes
all defaults except EXANoUploadToScreen==true.... DOES NOT FREEZE

Let me know if there is anything else that can help narrow it down.

Martin Olsson (mnemo)
description: updated
Revision history for this message
In , Martin Olsson (mnemo) wrote :

I've downloaded all the files on that website using "wget -m" and then I opened them one by one in Firefox. The offending item is a bitmap with 8K pixels height:
http://mundoplus.tv/tpl/v3/panBR.png

I put a copy here in case they change their website:
http://pages.minimum.se/crashers/ddx_stressers/panBR.png

An interesting detail is that the following 10K pixels wide bitmap opens just fine on the same machine:
http://pages.minimum.se/crashers/ddx_stressers/Singapore_port_panorama.jpg
(this latter 10K wide bitmap actually crashes the intel DDX driver when running in UXA mode but that's another story)

description: updated
Revision history for this message
In , Martin Olsson (mnemo) wrote :

otaylor said in #radeon that he was able to repro this bug on Fedora 10 + r500 card.

Martin Olsson (mnemo)
description: updated
Martin Olsson (mnemo)
description: updated
Martin Olsson (mnemo)
description: updated
Martin Olsson (mnemo)
summary: - CPU spin lockup in xorg when visiting http://mundoplus.tv/ using -ati
- driver
+ CPU spin OR ioctl() freeze in xorg when visiting http://mundoplus.tv/
+ using -ati driver
description: updated
description: updated
Revision history for this message
jlpino (jlpino) wrote : Re: CPU spin OR ioctl() freeze in xorg when visiting http://mundoplus.tv/ using -ati driver

Using AdBlockPlus, when I block |http://mundoplus.tv/tpl* and |http://mundoplus.tv/imagenes* the site mundoplus.tv don't freeze my computer.

Revision history for this message
Martin Olsson (mnemo) wrote :

I've mirrored their entire site and tried the files one by one, the offending item is:
http://mundoplus.tv/tpl/v3/panBR.png
Most likely because it's because this image has a pretty large height (8K pixels).

I've also kept a copy of that file (in case they change they site we can still run the repro:
http://pages.minimum.se/crashers/ddx_stressers/panBR.png

Further, I've also found that adding:
Option "EXANoUploadToScreen" "true"
in the device section of xorg.conf makes the bug go away. This means we can rule out the other EXA hooks (less lines of code in the driver to search in)

Finally, otaylor over at red hat confirmed this bug on a r500 chipset running Fedora 10 so the range of affected hardware is most likely all radeon r100 up to r500. F10 still dont have radeon-rewrite branch bits so they couldn't record the GPU command buffer contents though.

description: updated
Revision history for this message
In , Michel Dänzer (michel-daenzer) wrote :

Created an attachment (id=25789)
UploadToScreen coordinate paranoia

Does this patch help?

(In reply to comment #5)
> So, I even believe that upstream FDO bug 21683 is a potential duplicate of this
> bug:
> https://bugs.freedesktop.org/show_bug.cgi?id=21683

Hold your horses... that only talks about slowdowns, not a lockup.

Backtraces are generally not useful for diagnosing GPU lockups because they will just more or less randomly show one of the places where the drivers wait for the GPU to catch up (which never happens because it's locked up). Instead one has to focus on what triggers the lockup.

Revision history for this message
In , Øyvind Stegard (oyvindstegard) wrote :

(In reply to comment #9)
> Created an attachment (id=25789) [details]
> UploadToScreen coordinate paranoia
>
> Does this patch help?
<snip>

Yes, it does. With these extra coordinate checks the crash no longer occours here. Tested both the website and the individual PNG image in Firefox. I have an ATI X1400 Radeon Mobility [1002:7145] w/128 MB RAM. Patch applied to master radeon branch as of today.

For the hell of it, I also tested with a 10x6000 PNG image in Firefox (which should be *within* the limits introduced by this patch, no?), and that works as well, no crash .. Also, no crash for a 10x7000 PNG image.

This problem also affects ATI Radeon Mobility X1600/M56P [1002:71c5] with 256MB RAM.

Revision history for this message
In , Michel Dänzer (michel-daenzer) wrote :

(In reply to comment #10)
> With these extra coordinate checks the crash no longer occours here.

So far, so good. Can you set a gdb breakpoint on the RADEONUploadToScreenCP() line that returns FALSE after these new checks, and attach a backtrace from when it triggers?

Revision history for this message
In , Øyvind Stegard (oyvindstegard) wrote :

(In reply to comment #10)
> (In reply to comment #9)
> > Created an attachment (id=25789) [details] [details]
> > UploadToScreen coordinate paranoia
> >
> > Does this patch help?
> <snip>
>
> Yes, it does. With these extra coordinate checks the crash no longer occours
> here. Tested both the website and the individual PNG image in Firefox. I have
> an ATI X1400 Radeon Mobility [1002:7145] w/128 MB RAM. Patch applied to master
> radeon branch as of today.
>
> For the hell of it, I also tested with a 10x6000 PNG image in Firefox (which
> should be *within* the limits introduced by this patch, no?), and that works as
> well, no crash .. Also, no crash for a 10x7000 PNG image.
>
> This problem also affects ATI Radeon Mobility X1600/M56P [1002:71c5] with 256MB
> RAM.
>

Also tested without the "..(x + w) > 8191" criterion (that is only y+height check), and still no crash. Tested website, plus a 12000x10 png and a 10x12000 png (both PNGs have transparency). And also tested with a 9000x9000 PNG (also w/transparency). I'll see if can get you the debugging-info you have requested ..

Revision history for this message
In , Øyvind Stegard (oyvindstegard) wrote :

(In reply to comment #11)
> (In reply to comment #10)
> > With these extra coordinate checks the crash no longer occours here.
>
> So far, so good. Can you set a gdb breakpoint on the RADEONUploadToScreenCP()
> line that returns FALSE after these new checks, and attach a backtrace from
> when it triggers?
>

I succeeded in attaching gdb to the Xorg process and setting the required breakpoint. I did this from a VT. Then I told gdb to continue execution (with c command), but when switching back to Xorg all I get is a black screen (had to reboot).

Revision history for this message
In , Øyvind Stegard (oyvindstegard) wrote :

(In reply to comment #13)
> (In reply to comment #11)
> > (In reply to comment #10)
> > > With these extra coordinate checks the crash no longer occours here.
> >
> > So far, so good. Can you set a gdb breakpoint on the RADEONUploadToScreenCP()
> > line that returns FALSE after these new checks, and attach a backtrace from
> > when it triggers?
> >
>
> I succeeded in attaching gdb to the Xorg process and setting the required
> breakpoint. I did this from a VT. Then I told gdb to continue execution (with c
> command), but when switching back to Xorg all I get is a black screen (had to
> reboot).
>

I'll have a look at this:
http://www.x.org/wiki/Development/Documentation/ServerDebugging
...

Revision history for this message
In , Øyvind Stegard (oyvindstegard) wrote :

(In reply to comment #14)
> (In reply to comment #13)
> > (In reply to comment #11)
> > > (In reply to comment #10)
> > > > With these extra coordinate checks the crash no longer occours here.
> > >
> > > So far, so good. Can you set a gdb breakpoint on the RADEONUploadToScreenCP()
> > > line that returns FALSE after these new checks, and attach a backtrace from
> > > when it triggers?
> > >
> >
> > I succeeded in attaching gdb to the Xorg process and setting the required
> > breakpoint. I did this from a VT. Then I told gdb to continue execution (with c
> > command), but when switching back to Xorg all I get is a black screen (had to
> > reboot).
> >
>
> I'll have a look at this:
> http://www.x.org/wiki/Development/Documentation/ServerDebugging
> ...
>

Perhaps this will do it for me:
handle SIGUSR1 nostop

Apparently gdb halts things on VT switch. I don't have the possibility of logging in via ssh where I'm currently at.

Revision history for this message
In , Øyvind Stegard (oyvindstegard) wrote :

(In reply to comment #11)
> (In reply to comment #10)
> > With these extra coordinate checks the crash no longer occours here.
>
> So far, so good. Can you set a gdb breakpoint on the RADEONUploadToScreenCP()
> line that returns FALSE after these new checks, and attach a backtrace from
> when it triggers?
>

This is as far as I got today:

(gdb) break radeon_exa_funs.c:276
Breakpoint 1 at 0xb78d0b0e: file ../../src/radeon_exa_funcs.c, line 276.
(gdb) continue
Continuing.
[Switching to Thread 0xb79a36d0 (LWP 2978)]

Breakpoint 1, RADEONUploadToScreenCP (pDst=0xa0580008, x=0, y=8190, w=16, h=2,
    src=0xa3bef70 "<BINARY GARBAGE>",
    src_pitch=64) at ../../src/radeon_exa_funcs.c:276
276 return FALSE;
(gdb) backtrace full
#0 RADEONUploadToScreenCP (pDst=0xa0580008, x=0, y=8190, w=16, h=2,
    src=0xa3bef70 "<BINARY GARBAGE>",
    src_pitch=64) at ../../src/radeon_exa_funcs.c:276
 pScrn = (ScrnInfoPtr) 0x9776e18
 info = (RADEONInfoPtr) 0x9777320
 bpp = 32
 hpass = 3213989268
 buf_pitch = 3213989272
 dst_pitch_off = 2690121736
 __FUNCTION__ = "RADEONUploadToScreenCP"
#1 0xb765b255 in ?? () from /usr/lib/xorg/modules//libexa.so
No symbol table info available.
#2 0x0818258d in ?? ()
No symbol table info available.
#3 0x0808a20e in ProcPutImage ()
No symbol table info available.
#4 0x0808d57f in Dispatch ()
No symbol table info available.
#5 0x080722ed in main ()
No symbol table info available.

The breakpoint was triggered by opening this image:
http://mundoplus.tv/tpl/v3/panBR.png

Revision history for this message
In , Øyvind Stegard (oyvindstegard) wrote :

(In reply to comment #16)
> (In reply to comment #11)
> > (In reply to comment #10)
> > > With these extra coordinate checks the crash no longer occours here.
> >
> > So far, so good. Can you set a gdb breakpoint on the RADEONUploadToScreenCP()
> > line that returns FALSE after these new checks, and attach a backtrace from
> > when it triggers?
> >
>
> This is as far as I got today:
<snip>
> The breakpoint was triggered by opening this image:
> http://mundoplus.tv/tpl/v3/panBR.png
>

Oh, and I compiled the radeon-driver with no gcc optimization (-O0).

Revision history for this message
In , Øyvind Stegard (oyvindstegard) wrote :

I did some more testing with sizes because of the of number 8192 (=2^13), which is the height of the image that triggers this.

Here are the results:

16x8191 PNG image: No crash. [http://folk.uio.no/oyvinst/fdsbug21598/16x8191.png]
16x8192 PNG image: FREEZE [http://folk.uio.no/oyvinst/fdsbug21598/16x8192.png]
16x8193 PNG image: No crash. [http://folk.uio.no/oyvinst/fdsbug21598/16x8193.png]

It looks like this bug has something to do with height being exactly 8192.

I created various PNG test-images of different dimensions, they can be found here:
http://folk.uio.no/oyvinst/fdsbug21598/

Of all these images, *only* the 16x8192 image triggers the crash/freeze. All images created in Gimp with transparency (don't know if that's really necessary..).

Hope this might help somewhat in tracking this down.

Revision history for this message
In , Martin Olsson (mnemo) wrote :

I modified my -ati driver according to the patch in comment #9 and then I put a
breakpoint on the "return FALSE" just below it. This coordinate hack patch
indeed makes the bug go away! And when I surf to mundoplus.tv I do hit the
proposed breakpoint and here is the "bt full" from that breakpoint. I compiled
with DEB_BUILD_OPTIONS="noopt nostrip" which I think means essentially "-O0
-g3" or something like that.

Program received signal SIGINT, Interrupt.
0xb7f6a430 in __kernel_vsyscall ()
(gdb) break radeon_exa_funcs.c:277
Breakpoint 2 at 0xb796c956: file ../../src/radeon_exa_funcs.c, line 277.
(gdb) info breakpoints
Num Type Disp Enb Address What
2 breakpoint keep y 0xb796c956 in RADEONUploadToScreenCP at
../../src/radeon_exa_funcs.c:277
(gdb) c
Continuing.

Breakpoint 2, RADEONUploadToScreenCP (pDst=0xa2c94008, x=0, y=8190, w=16, h=2,
    src=0x8a97228
"��������������������������������������������������������������������������������������������������������������������������������7",
src_pitch=64) at ../../src/radeon_exa_funcs.c:277
277 return FALSE;
(gdb) bt full
#0 RADEONUploadToScreenCP (pDst=0xa2c94008, x=0, y=8190, w=16, h=2,
    src=0x8a97228
"��������������������������������������������������������������������������������������������������������������������������������7",
src_pitch=64) at ../../src/radeon_exa_funcs.c:277
        pScrn = (ScrnInfoPtr) 0x85fd5e8
        info = (RADEONInfoPtr) 0x85fbea8
        bpp = 32
        hpass = 2731098120
        buf_pitch = 3077628745
        dst_pitch_off = 3213388776
        __func__ = "RADEONUploadToScreenCP"
#1 0xb770f344 in exaPutImage (pDrawable=0xa2c94008, pGC=0x8a71738, depth=24,
x=0, y=8190, w=16, h=2, leftPad=0, format=2,
    bits=0x8a97228
"��������������������������������������������������������������������������������������������������������������������������������7")
at ../../exa/exa_accel.c:211
No locals.
#2 0x08182a10 in damagePutImage (pDrawable=0xa2c94008, pGC=0x8a71738,
depth=24, x=0, y=8190, w=16, h=2, leftPad=0, format=2,
    pImage=0x8a97228
"��������������������������������������������������������������������������������������������������������������������������������7")
at ../../../miext/damage/damage.c:905
        pGCPriv = (DamageGCPrivPtr) 0x8a054b0
        oldFuncs = (GCFuncs *) 0x8213a80
#3 0x0808a301 in ProcPutImage (client=0x8980830) at ../../dix/dispatch.c:1897
        pGC = (GC *) 0x8a71738
        pDraw = (DrawablePtr) 0xa2c94008
        length = <value optimized out>
#4 0x0808cff7 in Dispatch () at ../../dix/dispatch.c:437
        result = <value optimized out>
        client = (ClientPtr) 0x8980830
        nready = 0
        start_tick = 660
#5 0x080722fd in main (argc=10, argv=0xbf886f24, envp=0xbf886f50) at
../../dix/main.c:397
        i = <value optimized out>
        alwaysCheckForInput = {0, 1}
(gdb)

Revision history for this message
In , Martin Olsson (mnemo) wrote :

Actually when I run the repro with the patch from comment #9 it hits the subsequent "return FALSE" twice. The second time the "bt full" is:

(gdb) bt full
#0 RADEONUploadToScreenCP (pDst=0xa2896008, x=0, y=8190, w=16, h=2,
    src=0x8a97228 "��������������������������������������������������������������������������������������������������������������������������������7\t\005", src_pitch=64) at ../../src/radeon_exa_funcs.c:277
 pScrn = (ScrnInfoPtr) 0x85fd5e8
 info = (RADEONInfoPtr) 0x85fbea8
 bpp = 32
 hpass = 2726912008
 buf_pitch = 3077628745
 dst_pitch_off = 3213388776
 __func__ = "RADEONUploadToScreenCP"
#1 0xb770f344 in exaPutImage (pDrawable=0xa2896008, pGC=0x899cc58, depth=24, x=0, y=8190, w=16, h=2, leftPad=0, format=2,
    bits=0x8a97228 "��������������������������������������������������������������������������������������������������������������������������������7\t\005") at ../../exa/exa_accel.c:211
No locals.
#2 0x08182a10 in damagePutImage (pDrawable=0xa2896008, pGC=0x899cc58, depth=24, x=0, y=8190, w=16, h=2, leftPad=0, format=2,
    pImage=0x8a97228 "��������������������������������������������������������������������������������������������������������������������������������7\t\005") at ../../../miext/damage/damage.c:905
 pGCPriv = (DamageGCPrivPtr) 0x89aa428
 oldFuncs = (GCFuncs *) 0x8213a80
#3 0x0808a301 in ProcPutImage (client=0x8980830) at ../../dix/dispatch.c:1897
 pGC = (GC *) 0x899cc58
 pDraw = (DrawablePtr) 0xa2896008
 length = <value optimized out>
#4 0x0808cff7 in Dispatch () at ../../dix/dispatch.c:437
 result = <value optimized out>
 client = (ClientPtr) 0x8980830
 nready = 0
 start_tick = 680
#5 0x080722fd in main (argc=10, argv=0xbf886f24, envp=0xbf886f50) at ../../dix/main.c:397
 i = <value optimized out>
 alwaysCheckForInput = {0, 1}
(gdb) c

Revision history for this message
In , Martin Olsson (mnemo) wrote :

I can confirm that 16x8191.png and 16x8193.png load just fine on unpatched jaunty versions whereas 16x8192.png triggers the bug.

I also grepped in the -ati driver for the number 8192 and that turns up several very interesting source lines that uses this number as maximum texture sizes and vport_scissor for example.

Maybe the "viewport cropping" has a off by one error?

Revision history for this message
In , Michel Dänzer (michel-daenzer) wrote :

With <width>x8193, does the breakpoint get hit?

What about 8191x<height> vs. 8192x<height> vs. 8193x<height>?

P.S. Please try to keep comments tidy to avoid cluttering up reports.

Revision history for this message
In , Martin Olsson (mnemo) wrote :

Using the -ati driver modified with the coord paranoia patch as per comment #9:
For 16x8192 the breakpoint gets hit.
For 16x8191 and 16x8193 the breakpoint does not get hit.

Using the same patched -ati driver and breakpoint set I also tried (after rotating oyvind's PNGs in GIMP, i'm unsure about what this transformation did to transparency if that matters):
8191x16 (no crash, no breakpoint hit)
8192x16 (no crash, no breakpoint hit)
8193x16 (no crash, no breakpoint hit)

I also reverted to the unpatched jaunty driver and just as I suspected 8191x16, 8192x16 and 8193x16 loads just fine there as well. If needed, you can find my rotated PNGs here:
http://pages.minimum.se/crashers/ddx_stressers/ati_bug_21598/

Revision history for this message
In , Michel Dänzer (michel-daenzer) wrote :

Created an attachment (id=25819)
EXA coordinate limit fixups

This is probably the real fix. Not sure about the R600 changes, but they make everything consistent with the stricter checks in R600CheckComposite().

(In reply to comment #23)
> Using the same patched -ati driver and breakpoint set I also tried (after
> rotating oyvind's PNGs in GIMP, i'm unsure about what this transformation did
> to transparency if that matters):
> 8191x16 (no crash, no breakpoint hit)
> 8192x16 (no crash, no breakpoint hit)
> 8193x16 (no crash, no breakpoint hit)

I don't think the transparency matters, but I realized in the meantime that it hits the pitch limit at much lower width.

Revision history for this message
In , Martin Olsson (mnemo) wrote :

I can confirm that the patch from comment #24 makes it possible to open all PNGs mentioned in this bug report and no lockups or issues was seen (nothing barfed in xorg.log or dmesg).

Is this suitable for cherry picking onto the 6.12.2 we currently have in Ubuntu? The patch clearly applies cleanly and works nicely from what I can tell with my limited testing so far at least.

Thanks a ton MrCooper.

Bryce Harrington (bryce)
summary: - CPU spin OR ioctl() freeze in xorg when visiting http://mundoplus.tv/
- using -ati driver
+ [RV516] CPU spin OR ioctl() freeze in xorg when visiting
+ http://mundoplus.tv/ using -ati driver
Revision history for this message
In , Michel Dänzer (michel-daenzer) wrote :

Fix pushed to the Git master and 6.12 branches, thanks for all the testing.

I haven't pushed the R600 changes as they don't seem to be necessary. Maybe the stricter check in R600CheckComposite() should be relaxed instead.

Changed in xserver-xorg-driver-ati:
status: Confirmed → Fix Released
Bryce Harrington (bryce)
tags: added: freeze
Revision history for this message
jlpino (jlpino) wrote :

The status in freedesktop is "resolved". How I cam fix this problem in my computer? Where I can download the necesary source code?

Revision history for this message
Martin Olsson (mnemo) wrote :

@jlpino. Yes, the bug has been fixed upstream which means that in the next version of the driver this bug won't be there (for example, it will be fixed in karmic). If you want a decent workaround quickly I suggest you set "EXANoUploadToScreen" to "true" in the device section of your xorg.conf. Personally I'm pretty busy right now so I can't prepare a SRU but I think it would be nice if one was issued for sure.

Revision history for this message
jlpino (jlpino) wrote :

Thank you Martin Olsson, this workaround fixed the mundoplus.tv problem, but Supertuxkarts still lockup my computer.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xserver-xorg-video-ati - 1:6.12.2-2ubuntu1

---------------
xserver-xorg-video-ati (1:6.12.2-2ubuntu1) karmic; urgency=low

  * Merge from debian unstable
    - Remaining Ubuntu changes:
      + 104_use_exa.patch: use EXA by default
    - Fixes: "RV516: CPU spin / ioctl freeze visiting mundoplus.tv"
      (LP: #371279)
  * Drop patches now included upstream:
    - 105_pre_avivo_vblank_interrupt.patch
    - 106_fix_dvi_on_rs690.patch
  * Add 107_r420_pciid.patch: Cherrypick from upstream to add PCI ID
    for R420 [Radeon X800 VE] [1002:4a54]
    (LP: #376226)

 -- Bryce Harrington <email address hidden> Thu, 04 Jun 2009 14:56:12 -0700

Changed in xserver-xorg-video-ati (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Gompie (kamaraadski) wrote :

I wanted to post this message to a duplicate of this bug, but I suppose this is a better place. I'm using Ubuntu Studio 9.04 with the manually update kernel 2.6.28-15-generic. I managed to fix some serious audio errors by updating alsa drivers. From updating alsa-drivers, I encountered a bug with my Nvidia driver, which was solved by manually updating kernel headers to 2.6.28-15-generic (from 2.6.8.28-3-generic).

I've been playing games and installing programs from last Saturday up till yesterday evening, and suddenly I encountered the errors described above. I'm now working from within Studio 8.04 (on a different disk) because I don't have the time for writing too much before everything hangs.

Difference from all notifications above: I'm not using Intel, I'm using AMD VIA.

I'm going to try to attach the outcome of lspci -vvnn to this message now.

Revision history for this message
In , Michel-brabants-gmail (michel-brabants-gmail) wrote :

Hello,

I'm still having the firefox-lockup with the mentionned png-image and my xorg was apparently released in August. Maybe this is normal, my not, but I just wanted to share it.

My xorg-info:

X.Org X Server 1.6.3.901 (1.6.4 RC 1)
Release Date: 2009-8-25
X Protocol Version 11, Revision 0
Build Operating System: Linux 2.6.30-ARCH x86_64
Current Operating System: Linux natuur 2.6.30-ARCH #1 SMP PREEMPT Wed Sep 9 14:16:44 CEST 2009 x86_64
Build Date: 04 September 2009 05:45:43PM

(--) PCI:*(0:1:0:0) 1002:5b63:17ee:0373 ATI Technologies Inc RV370 [Sapphire X550 Silent] rev 0, Mem @ 0xd8000000/134217728, 0xfe9f0000/65536, I/O @ 0x0000c
000/256, BIOS @ 0x????????/131072
(--) PCI: (0:1:0:1) 1002:5b73:17ee:0372 ATI Technologies Inc RV370 secondary [Sapphire X550 Silent] rev 0, Mem @ 0xfe9e0000/65536

Kind regards,

Michel

Revision history for this message
In , Michel-brabants-gmail (michel-brabants-gmail) wrote :

My ati-module-version:

II) Module ati: vendor="X.Org Foundation"
        compiled for 1.6.1, module version = 6.12.2
        Module class: X.Org Video Driver
        ABI class: X.Org Video Driver, version 5.0

It seems to be 6.12 like mentionned above.

Kind regards,

Michel

Revision history for this message
In , Michel Dänzer (michel-daenzer) wrote :

(In reply to comment #28)
> My ati-module-version:
>
> II) Module ati: vendor="X.Org Foundation"
> compiled for 1.6.1, module version = 6.12.2

The fix is only in 6.12.3.

Revision history for this message
In , Michel-brabants-gmail (michel-brabants-gmail) wrote :

Hello,

thank you. I'll upgrade then :).

Kind regards,

Michel

Changed in xserver-xorg-driver-ati:
importance: Unknown → Critical
Changed in xserver-xorg-driver-ati:
importance: Critical → Unknown
Changed in xserver-xorg-driver-ati:
importance: Unknown → Critical
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.