Comment 0 for bug 68291

Revision history for this message
In , Lam (lam-lac) wrote :

I started to cry over it in bug #7271, but it's time to give it a bug of its own.

This is Fedora development (Rawhide)'s Xorg rebuilt for FC5, it tells me:
X Window System Version 7.1.1
Release Date: 12 May 2006
X Protocol Version 11, Revision 0, Release 7.1.1
(...)
Build ID: xorg-x11-server 1.1.1-26
I don't know if it's closer to 7.1 or CVS_head, sorry.

tdfx driver 1.2.1 makes my X crash. I don't know how's that possible, but
1.1.1.3 (meant for Xorg 7.0) works flawlessly.

With 1.2.1, gdm shows up, I got my Fedora bubbles and stuff, but X crashes
immediately after entering password. I'm guessing that it crashes on clearing
the screen of bubbles.

Trying to get a backtrace was impossible, trying to get to the crash with gdb's
"next" was impossible, too (strange, it looped somewhere ignoring my keystrokes).

So I wanted to be smarter and recompiled Xorg without -O and with -g3, the same
for the tdfx driver. Attached gdb to it and instead of crashing instantly after
entering the password, it worked for some time before crashing when I was
browsing with SeaMonkey. It still produces stupid backtraces, like:

(gdb) bt
#0 0x0900ccb2 in ?? ()
#1 0x10100001 in ?? ()
#2 0x03800424 in ?? ()
#3 0x03600000 in ?? ()
#4 0x01f60322 in ?? ()
#5 0x08ca9690 in ?? ()
#6 0x00009164 in ?? ()
#7 0x00000002 in ?? ()
#8 0x00000900 in ?? ()
#9 0xb3f82000 in ?? ()
#10 0x09038b94 in ?? ()
#11 0x00000000 in ?? ()

or

(gdb) bt
#0 0x095fa392 in ?? ()
#1 0x0000007b in ?? ()
#2 0x10100001 in ?? ()
#3 0x016019fc in ?? ()
#4 0x03600000 in ?? ()
#5 0x01f60322 in ?? ()
#6 0x09360690 in ?? ()
#7 0x0001b4bf in ?? ()
#8 0x00000002 in ?? ()
#9 0x00000900 in ?? ()
#10 0xb3f6c000 in ?? ()
#11 0x095e2a04 in ?? ()
#12 0x00000000 in ?? ()

and many shorter, but adequately meaningless.

The only one that seems to make any sense was that:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1208702768 (LWP 13488)]
0x0a2f0841 in ?? ()
(gdb) bt
#0 0x0a2f0841 in ?? ()
#1 0x00000291 in ?? ()
#2 0x0819ff38 in damagePutImage (pDrawable=0x940000, pGC=0x2a20471,
    depth=168035168, x=223396, y=168653104, w=0, h=0, leftPad=0, format=0,
    pImage=0x940000 "\001") at damage.c:750
Previous frame inner to this frame (corrupt stack?)

damage.c:750 is a "}", return from a void function which just called some
PutImage function from some struct (Xorg code is black magic for me). I can only
guess that in the end this call translates to some PutImage from my card's driver.

This even corelates with the only change in code (not the build system) between
1.1.1.2 and 1.2.1, which is
http://gitweb.freedesktop.org/?p=projects/xorg-driver-xf86-video-tdfx;a=blobdiff;h=5b7c7a349c27b72889bf060991297407a37ca9de;hp=60d7b0b97dbbd125cb6e550a5af3a72be1c050f2;hb=55954625ff815b932d365422864745b31c4eadb4;f=src/tdfx_video.c
but I don't know how come it crashes in semi-random moments (under load, doing
things fast, when not being traced by gdb), all that's changed is an additional
parameter, which isn't even used. I'm not an expert in function calling, but I
thought the calling function frees the argument space from the stack, not the
calee (in which case different number of arguments would make a difference and
be cause of the corruption). Maybe some Fedora CFLAGS make the difference (stack
protector or something)? :)

Or maybe the PutImage stuff was just a random garbage on the stack and some
change in the build system breaks something I can't find? :) This is very
probable after seeing other random pointers there, still, the TDFXPutImage*
change is in fact the only one between the versions :)

Either way, tdfx 1.1.1.3 compiles against new Xorg and works just fine in 2D. I
can't tell why other people do use tdfx 1.2.1 with newer Xorg and don't complain
about crashes.

I still have the unoptimized Xorg with unneeded debug info, so if there's
something I can check, tell me, I want to fix it, but I need help (and I need it
quick, this thing is really slow now).