Time out error with r128 and some games

Bug #214442 reported by jhansonxi
4
Affects Status Importance Assigned to Milestone
X.org xf86-video-r128
Won't Fix
Medium
xorg (Ubuntu)
Invalid
Low
Unassigned

Bug Description

Binary package hint: xserver-xorg-video-ati

Some 3D games exit abruptly on an ATI Rage 128 Pro. Occurs at both 16 and 24bpp depths. Does not affect all games. Terminal message:
Error: Rage 128 timed out... exiting

Workarounds:
Chromium - none
Armagetron Advanced - Turn off floor and object textures.

Similar report on Mandriva:
http://fixunix.com/mandriva/123547-3d-rage-driver-crashing-mandriva-one-2007-spring.html

Possible freedesktop.org bug reference in Mandriva report:
http://bugs.freedesktop.org/show_bug.cgi?id=9379

[Chromium]
ATTENTION!! Could not load alAttenuationScale
ERROR!! <> alGetError() = Invalid Value
alAttenuationScale == 0. Kludge it.
...startup complete.
high scores:
01/01/2000 nobody 250000
01/01/2000 nobody 200000
01/01/2000 nobody 150000
01/01/2000 nobody 100000
01/01/2000 nobody 50000
Error: Rage 128 timed out... exiting
[New Thread 0xb56cbb90 (LWP 5730)]

Program exited with code 0377.

[armagetronad
(gdb) run
Starting program: /usr/games/armagetronad.real --datadir /usr/share/games/armagetronad --configdir /etc/armagetronad --userconfigdir /home/administrator/.armagetronad --vardir /home/administrator/.armagetronad/var --userdatadir /home/administrator/.armagetronad/data
[Thread debugging using libthread_db enabled]
[New Thread 0xb741a720 (LWP 5868)]
[New Thread 0xb72d1b90 (LWP 5871)]
[Thread 0xb72d1b90 (LWP 5871) exited]
[New Thread 0xb72d1b90 (LWP 5872)]
Error: Rage 128 timed out... exiting
[Thread 0xb72d1b90 (LWP 5872) exited]

Program exited with code 0377.
(gdb) backtrace full
No stack.
[lspci]
00:00.0 Host bridge [0600]: VIA Technologies, Inc. VT82C598 [Apollo MVP3] [1106:0598] (rev 04)
     Subsystem: VIA Technologies, Inc. MVP3 ISA Bridge [1106:0000]
01:00.0 VGA compatible controller [0300]: ATI Technologies Inc Rage 128 PF/PRO AGP 4x TMDS [1002:5046] (prog-if 00 [VGA controller])
     Subsystem: ATI Technologies Inc Unknown device [1002:0404]

Revision history for this message
In , Sustmidown (sustmidown) wrote :

When I'm running some game (in this case Paintball2 (it is 3D
FirstPersonShooter based on Quake2 engine)) and go to some very "hard-
renderable" (there is very low fps) place (in that game), it quits with error
"Error: Rage 128 timed out... exiting".
I found out that this error is in r128_ioctl.c file in function:
void r128WaitForIdleLocked( r128ContextPtr rmesa )
So i did some changes to see what is wrong and edited code a little:

do {
  i = 0;
  do {
    ret = drmCommandNone( fd, DRM_R128_CCE_IDLE );
+ fprintf( stderr, "ret=%d, i=%d, to=%d\n" , ret, i, to);
+ if (ret) DO_USLEEP( 100 );
- } while ( ret && errno == EBUSY && i++ < R128_IDLE_RETRY );
+ } while ( ret && i++ < R128_IDLE_RETRY );
} while ( ( ret == -EBUSY ) && ( to++ < R128_TIMEOUT ) );

if ( ret < 0 ) {
+ fprintf( stderr, "(ret=%d, i=%d, to=%d, errno=%d)\n" , ret, i, to, errno);
  drmCommandNone( fd, DRM_R128_CCE_RESET);

And when I run the game and go around the level looking on the wall everything
if OK, it prints out:

    ret=0, i=0, to=0

(with every frame, I think)
So IDLE call is successful instantly (in the first iteration).

But when I look's on some "hard-to-render" place (many polygons or something),
game hangs and it prints out:

ret=-22, i=0, to=0
ret=-22, i=1, to=0
ret=-22, i=2, to=0
...
ret=-22, i=32, to=0
(ret=-22, i=33, to=0, errno=22)
Error: Rage 128 timed out... exiting

Is seems that something causes the IDLE call can't be done successful, anymore.
Also I tried to add line with DO_USLEEP(500000), but it didn't help.
Error number 22 is EINVAL (invalid argument) and is thrown when ioctl's
"request" or "argp" arguments are not valid.
It should throw error 16 (busy) at most...
I'm not so expert in Mesa's DRI and DRM code to find the problem.
(ps. this problem I had with some other games and with Blender(3d modeling).
and some games eg. Quake3Arena doesn't do this problem...
maybe it occurs only when rendered scene is very very full of polygons...
win32 version of Paintball2 doesn't hangs in this scenes but have big spf
(seconds per frame))

Thank your for helping.

Revision history for this message
In , Randrik (randrik) wrote :

I can confirm this bug, Paintball2_alpha016 exit during loading first map.

in dmseg:

[drm] Initialized drm 1.1.0 20060810
ACPI: PCI Interrupt 0000:01:00.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ
 11
[drm] Initialized r128 2.5.0 20030725 on minor 0
agpgart: Found an AGP 1.0 compliant device at 0000:00:00.0.
agpgart: Putting AGP V2 device at 0000:00:00.0 into 2x mode
agpgart: Putting AGP V2 device at 0000:01:00.0 into 2x mode
[drm:r128_cce_idle] *ERROR* r128_cce_idle called without lock held, held 0 owne
r c947c5a0 c947c5a0

And i'm not sure related this with kpovmodeler lockup? I can see program window,
but after all UI elements rendered - whole X freeze.

[drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
SysRq : Terminate All Tasks

kde 3.5.5, X.org 7.1.1 , Mesa/drm from git (22/12/2006). (also with Mesa
6.5.2/kernel 2.6.17.14 drm)

Revision history for this message
In , Diskman (diskman) wrote :

AlphaPC 164SX/UP2000
XOrg-7.1
Mesa-6.5 and Mesa-6.5.2
Kernel-2.6.13.1 [alpha linux]
Radeon 7500 PCI

Have the same error with GLCORE/GLX enabled... GL screensavers generate this same error 100% of the time, along with ANY other applications using DRI/OpenGL:

Jan 28 20:46:18 jericho kernel: [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock

Revision history for this message
In , Sustmidown (sustmidown) wrote :

Created attachment 8627
patch to disable span locking

I found similar problem in R300 drivers and I also found patch which should solve it (but I don´t know if its real solution).
So I rewrote this patch for r128 with little change - disabling span locking by default.
Set R128_ENABLE_SPAN_LOCKING if you want to temporarily disable this patch.

Revision history for this message
In , Sustmidown (sustmidown) wrote :

Edit:
> Set R128_SPAN_ENABLE_LOCKING if you want to temporarily disable this patch.

Revision history for this message
In , Randrik (randrik) wrote :

Patch from comment #3 works fine!

No more lockups with kpovmodeler, paintball2 nearly hangs, but system is responsible, i can switch to another VTs and kill game. More or less, it work!

(today Mesa git + patch, and 2-3 days old drm git, kernel 2.6.19.7 X.org 7.2 xf86-video-ati from 22 feb 2007 git )

HW: p2-420Mhz/416 Mb RAM/
video-HW:
(--) R128(0): Chipset: "ATI Rage 128 Pro GL PF (AGP)" (ChipID = 0x5046)
(--) R128(0): VideoRAM: 16384 kByte (64-bit SDR SGRAM 1:1)

Revision history for this message
In , Michel-tungstengraphics (michel-tungstengraphics) wrote :

I'm afraid the patch is not acceptable in this form. If anything, it should default to the span locking being enabled, with the possibility of disabling it via an environment variable, as is done in the r300 driver. However, AFAIK the latter hasn't actually needed it in a while (certainly not in my experience), so it would be even better to identify and fix the actual cause of the hangs. It sounds like other parts of the driver may not handle the locking correctly.

Revision history for this message
In , Peter Cordes (peter-cordes) wrote :

On my core2duo running Ubuntu Edgy amd64, mesa and dri from git (as of March 2), dual seat i965+r128 (X -sharevts ...), I can reproduce the "r128_cce_idle called without lock held", program exits, thing with armagetron on the r128 X server.

peter@tesla:~$ armagetron
 menu comes up. start a game and...
Error: Rage 128 timed out... exiting
peter@tesla:~$ echo $?
255
 dmesg:
[1465378.259327] [drm:r128_cce_idle] *ERROR* r128_cce_idle called without lock held, held 0 owner ffff81003b14e180 ffff81003b14e180

peter@tesla:~$ LIBGL_DEBUG=verbose MESA_DEBUG=1 armagetron
libGL: XF86DRIGetClientDriverName: 4.1.0 r128 (screen 0)
libGL: OpenDriver: trying /usr/local/lib/dri/r128_dri.so
drmOpenDevice: node name is /dev/dri/card0
drmOpenDevice: open result is 5, (OK)
drmOpenByBusid: Searching for BusID pci:0000:06:01.0
drmOpenDevice: node name is /dev/dri/card0
drmOpenDevice: open result is 5, (OK)
drmOpenByBusid: drmOpenMinor returns 5
drmOpenByBusid: drmGetBusid reports pci:0000:00:02.0
drmOpenDevice: node name is /dev/dri/card1
drmOpenDevice: open result is 5, (OK)
drmOpenByBusid: drmOpenMinor returns 5
drmOpenByBusid: drmGetBusid reports pci:0000:06:01.0
Mesa warning: couldn't open libtxc_dxtn.so, software DXTn compression/decompression unavailable
libGL error:
Can't open configuration file /etc/drirc: No such file or directory.
Error: Rage 128 timed out... exiting
Segmentation fault
peter@tesla:~$ echo $?
139
 dmesg:
[1465931.617255] [drm:r128_cce_idle] *ERROR* r128_cce_idle called without lock held, held 0 owner ffff81002c9ce0c0 ffff81002c9ce0c0
[1465931.733266] armagetron.real[3733]: segfault at 00002ad9f4b427f9 rip 00002ad9f4b427f9 rsp 00007fffb8ab5948 error 14

 It didn't segfault the previous two times, without the debug env vars...
My armagetron settings include full mirroring of everything, dual textured plane, and a bunch of stuff that my i965 can handle just fine. It's probably unplayably slow on r128. If anyone tries to reproduce this, but can't, I can send my armagetron config file...

 BTW, https://bugs.freedesktop.org/show_bug.cgi?id=7994 mentions seeing the same kernel log message.

Revision history for this message
In , agd5f (agd5f) wrote :

does mesa commit 25f21b5331d27225b1f6b7aaf2c9bf3f32764d91 (see bug 7994) help?

Revision history for this message
In , Peter Cordes (peter-cordes) wrote :

> does mesa commit 25f21b5331d27225b1f6b7aaf2c9bf3f32764d91 (see bug 7994) help?

 Unfortunately no, it doesn't.

armagetron still crashes. In System Settings->Display Settings->Detail Settings, if I set "object textures" to "off", it runs fine when I start a game. In game, if I hit escape and change the "object textures" setting, the program crashes right away with stderr of:
Error: Rage 128 timed out... exiting

kernel log:
[drm:r128_cce_idle] *ERROR* r128_cce_idle called without lock held, held 0 owner ffff810052cc0680 ffff810052cc0680

 Another problem that's still present (and was mentioned in this bug re: kpovmodeler) is the lockup of the whole server until the client program is killed. This is https://bugs.freedesktop.org/show_bug.cgi?id=3493. If you have access to any r128 hardware, progs/demos/engine is the best test case for it. It locks up, with 100% repeatability, before it can even draw its window the first time (the previous contents of that screen area are still visible inside the window frame). And killall engine restores everything to normal. The behaviour of the "engine" lockup is completely unchanged by that commit. It still says:
[drm:drm_lock_take] *ERROR* 3 holds heavyweight lock

Revision history for this message
In , András Péteri (apeteri) wrote :

Created attachment 11428
log showing two consecutive locking attempts

Revision history for this message
In , András Péteri (apeteri) wrote :

(Sorry for the double post, I'm a first-time bugzilla user.)

re: [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock

I have attached some heavily formatted log output, which shows drm debug messages at the time of the hang (the messages come from the kernel log, obtained using "modrobe drm debug=1". You can see that after the cce_idle ioctl call, context 3 tries to lock again without unlocking first. I recompiled the Mesa DRI driver with the DEBUG_LOCKING flag set in r128_lock.h, and found out that this double locking behavior occurs in the depth buffer calls in r128_span.c. Basically

 /* These functions require locking */

+/*
 #undef HW_LOCK
 #undef HW_UNLOCK
 #define HW_LOCK() LOCK_HARDWARE(R128_CONTEXT(ctx));
 #define HW_UNLOCK() UNLOCK_HARDWARE(R128_CONTEXT(ctx));
+*/

 /* 16-bit depth buffer functions
  */

fixed the problem for me. Apparently, the lock is taken before these functions get called (AFAICR, r128SpanRenderStart() does the job, but I'm not sure).

re: *ERROR* r128_cce_idle called without lock held

drmP.h has a LOCK_TEST_WITH_RETURN macro, which returns -EINVAL along this error message when the lock isn't held. Because of this, the infinite loop in r128WaitForIdleLocked exits with ret == -EINVAL, so the actual error message is quite misleading in this case. So, the real error is that r128WaitForIdleLocked got called without a lock held.

I modified DEBUG_LOCK in r128_lock.h to print all lock operations to the standard error output, and modified the code to call DEBUG_LOCK on unlocks too. r128WaitForIdleLocked was also replaced with a macro to show where it's called from. It turned out that sometimes r128SpanRenderFinish() is called without calling r128SpanRenderStart() first -- so no lock is held during the call to r128WaitForIdleLocked. I think this may be a problem with the software fallback. Armagetron exhibits this behavior, while gl-117 calls RenderStart and RenderFinish in pairs; lock operations also.

Using kernel 2.6.22 with drm and mesa from the freedesktop git.

Revision history for this message
In , András Péteri (apeteri) wrote :

Created attachment 13855
patch to fix various span locking issues

Here goes my proposed patch for fixing some locking issues with the r128.

I think that most of the problems are caused by some primitives which ended up in the software rasterizer without setting up the software part first. The defined HW_LOCK() and HW_UNLOCK() macros mentioned in comment #11 caused "double locking" ("*ERROR* 3 holds heavyweight lock"), if the correct fallback path was taken, and r128SpanRenderStart was called before reading/writing buffers. Removing these #defines as in the radeon source - leaving the job solely to ...SpanRenderStart and ...SpanRenderFinish - in turn left the software rasterizer without *any* locking for some primitives, which resulted in early application termination (because now eg. r128WaitForIdleLocked could be called without acquiring the locks -- hence the "*ERROR* r128_cce_idle called without lock held").

I added two new fallback flags based on error logs - after removing the #defines, blender crashed shortly after enabling GL_LINE_STIPPLE, and armagetron had some problems with GL_POLYGON_SMOOTH. (armagetron also has GL_LINE_SMOOTH, but it didn't crash when enabled, so I didn't add a separate case for it.)

Another issue is caused by projective textures (seen in armagetron). In r128RenderStart, when the routine determines from the vertex format (texture coordinate size) that software rendering should be enabled, it is already "too late" - some of the function pointer changes that _swsetup_Wakeup makes (particularly tnl->Driver.Render.Start, which is changed to _swsetup_RenderStart in _swsetup_Wakeup, which in turn would call r128SpanRenderStart to acquire proper locking) will only take effect (if at all) at the beginning of the next frame. So, once again, no locking would take place for that iteration. I added a check to see if R128_FALLBACK_PROJTEX was actually set in FALLBACK, and a call to run tnl->Driver.Render.Start if needed. This part is commented out, because I'm not sure if this is the right way to handle this; so, for now, projective textures are disabled by this patch.

Revision history for this message
In , András Péteri (apeteri) wrote :

Comment on attachment 13855
patch to fix various span locking issues

I tried to run more games after applying the patch, and some of them terminated with a segmentation fault and/or put multiple "r128_cce_* called without lock held" messages into the kernel log. So it's not working properly.

Revision history for this message
jhansonxi (jhansonxi) wrote :
Revision history for this message
Bryce Harrington (bryce) wrote :

Please attach your /var/log/Xorg.0.log

Changed in xserver-xorg-video-ati:
status: New → Incomplete
Revision history for this message
jhansonxi (jhansonxi) wrote :
Revision history for this message
jhansonxi (jhansonxi) wrote :
Bryce Harrington (bryce)
Changed in xserver-xorg-video-ati:
importance: Undecided → Medium
status: Incomplete → Triaged
Revision history for this message
Bryce Harrington (bryce) wrote :

Hi jhansonxi,

Could you obtain a full backtrace of a couple games that are exiting in this fashion? Perhaps an strace as well.

This error message "timed out...exiting" sounds too generic, but if a backtrace or strace can identify where in the code the application is exiting, perhaps that would give a better clue.

Changed in xserver-xorg-video-ati:
status: Triaged → Incomplete
Revision history for this message
jhansonxi (jhansonxi) wrote :
Revision history for this message
jhansonxi (jhansonxi) wrote :
Revision history for this message
jhansonxi (jhansonxi) wrote :
Revision history for this message
jhansonxi (jhansonxi) wrote :
Bryce Harrington (bryce)
Changed in xserver-xorg-video-ati:
status: Incomplete → Triaged
Revision history for this message
Brian Murray (brian-murray) wrote : Ubuntu needs you!

Thanks for taking the time to report this bug and helping to make Ubuntu better. In the development cycle for Intrepid there have been some vast improvements in the open source ati video driver and we could use your help testing them. Could you please download the latest Alpha CD image of Intrepid and test this particular bug just using the Live CD? You can find the latest image at http://www.ubuntu.com/testing . Your testing can help make Ubuntu and the open source ati driver even better! Thanks in advance.

Changed in xserver-xorg-video-ati:
status: Triaged → Incomplete
Bryce Harrington (bryce)
Changed in xserver-xorg-video-ati:
assignee: nobody → bryceharrington
status: Incomplete → Triaged
Revision history for this message
jhansonxi (jhansonxi) wrote :

No improvement in Xubuntu 8.10 (Intrepid). Same exact behavior and error with both games.

Changed in xserver-xorg-video-r128:
status: Triaged → New
Bryce Harrington (bryce)
Changed in xserver-xorg-video-r128:
status: New → Confirmed
Revision history for this message
In , jhansonxi (jhansonxi) wrote :

I believe I've encountered this same bug on Ubuntu with Armagetron Advanced and Chromium:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-r128/+bug/214442

Bryce Harrington (bryce)
description: updated
description: updated
Revision history for this message
Bryce Harrington (bryce) wrote :

It looks like this issue is in progress upstream, although the bug has not received an update in about a year.

Thanks for collecting the backtraces; unfortunately like you mentioned at the start, the games are exiting with an error code, not an actual crash, so it doesn't look like we can debug this easily. So there's not much we can do on this except wait for a fix from upstream.

I browsed through the recent changes to the -r128 we have on jaunty, but didn't see any changes that looked like they'd fix this issue.

Changed in xserver-xorg-video-r128:
assignee: bryceharrington → nobody
status: Confirmed → Triaged
Changed in xserver-xorg-video-r128:
status: Unknown → Confirmed
Bryce Harrington (bryce)
description: updated
Revision history for this message
In , András Péteri (apeteri) wrote :

Created attachment 24949
span locking patch

The previous patch left three calls to the software rasterizer without locking (these are called when both hardware accelerated and software rendered primitives are drawn), this one wraps them correctly.

Texture projection is still commented out, because not too many OpenGL applications have an option to disable it (if I remember correctly, armagetronad recently had this setting removed), and rendering these kind of primitives in software hits the frame rate quite hard. Since only the first and second texture coordinate components are left in place, leaving it this way (=unhandled) can cause rendering artifacts (some texcoord transformation magic could solve this, but it could only occur at the vertex level, and the texture would still be linearly interpolated pixel-wise).

The attached patch applies to git commit dc1153ce83041a397b1d1815db4133ce8c53eaa1 and mesa_7.4-0ubuntu3 source package from the ubuntu jaunty repository, but was tested only with the latter; to test, start a free flight in gl-117, leave the camera in first person mode and fly your plane into the Sun :)

Revision history for this message
In , András Péteri (apeteri) wrote :

Created attachment 27576
patch to add support for projective textures

Enables hardware-accelerated rendering of primitives that have vertices with 'q' texture coordinates used in projective texture mapping. It must be applied after attachment 24949.

The patch is generally based on the "ptex hack" found in the S3 Savage sources (which uses the standard vertex format, then shuffles values around to their correct position just before sending the vertices to the DMA buffer), modified to support two sets of texture coordinates; the vertex layout used by the r128, along with the additional vertex format flag needed, were obtained from earlier Mesa/DRI sources (eg. [1][2]). The performed calculations are essentially the same in both cases (savage & r128).

Both patches apply against the mesa_7.4-0ubuntu3.1 source package and git commit eb33c0ab8b3594f0b1d58534a13a26e3fb050cff, however only the former has been tested.

(ps. I'm unsure which vertex size the SAREA structure should store, but I also couldn't find any piece of code where that member gets referenced, so it may not matter either way.)

[1] http://www.koders.com/c/fid0D7829772F2499BEA37AC17768976CD64AD3459C.aspx?s=rhw2#L62

[2] http://www.koders.com/c/fidDF4ACA3257B4F85D79E4E734EA089EC50D1DBB7F.aspx#L79

Revision history for this message
In , András Péteri (apeteri) wrote :

Comment on attachment 27576
patch to add support for projective textures

Well, apparently it *does* matter, or something else is amiss, because after further testing I managed to hardlock the machine and got a few related segfaults in the process. I will try to look into this as time permits.

Revision history for this message
In , Ajax-a (ajax-a) wrote :

Mass version move, cvs -> git

Changed in xserver-xorg-video-r128:
importance: Unknown → Medium
Changed in xserver-xorg-video-r128:
importance: Medium → Unknown
Changed in xserver-xorg-video-r128:
importance: Unknown → Medium
Revision history for this message
In , Anton Patsev (patsev-anton) wrote :

Miroslav Šustek,
Do you still experience this issue with newer drivers ?
Please check the status of your issue.

Revision history for this message
In , Sustmidown (sustmidown) wrote :

(In reply to comment #19)
> Miroslav Šustek,
> Do you still experience this issue with newer drivers ?
> Please check the status of your issue.

I'm sorry, I no longer have the HW to test it.

Revision history for this message
In , Anton Patsev (patsev-anton) wrote :

Please check the status of your issue.
Or close this bug.

Revision history for this message
In , Sustmidown (sustmidown) wrote :

Guys, thank you for all your work here. I also spent good times hacking r128 drivers.
Unfortunately, I gave the video card away four years ago, so I can't participate on this bug anymore.
Closing. *** drying nostalgic tear ***

Changed in xserver-xorg-video-r128:
status: Confirmed → Won't Fix
Revision history for this message
penalvch (penalvch) wrote :

jhansonxi, thank you for reporting this bug to Ubuntu. Intrepid reached EOL on April 30, 2010.
See this document for currently supported Ubuntu releases: https://wiki.ubuntu.com/Releases

Is this an issue in a supported release? If so, could you please execute the following command, as it will automatically gather debugging information, in a terminal:
apport-collect 214442

Please ensure you have xdiagnose installed, and that you click the Yes button for attaching additional debugging information.

affects: xserver-xorg-video-r128 (Ubuntu) → xorg (Ubuntu)
Changed in xorg (Ubuntu):
importance: Medium → Low
status: Triaged → Incomplete
Changed in xorg (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.