Xorg crashed with SIGABRT in _mesa_GenTextures()

Bug #1221041 reported by Misha Bazanov
32
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Mesa
Won't Fix
Medium
mesa (Ubuntu)
Invalid
Medium
Unassigned

Bug Description

Opensource radeon driver crash every time i try to run FTL: Faster Then Light.

ProblemType: Crash
DistroRelease: Ubuntu 13.10
Package: xserver-xorg-core 2:1.14.2.901-2ubuntu4
ProcVersionSignature: Ubuntu 3.11.0-4.9-generic 3.11.0-rc7
Uname: Linux 3.11.0-4-generic x86_64
.tmp.unity.support.test.0:

ApportVersion: 2.12.1-0ubuntu3
Architecture: amd64
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
CompositorRunning: compiz
CompositorUnredirectDriverBlacklist: '(nouveau|Intel).*Mesa 8.0'
CompositorUnredirectFSW: true
Date: Thu Sep 5 12:18:18 2013
DistUpgraded: Fresh install
DistroCodename: saucy
DistroVariant: ubuntu
DkmsStatus: virtualbox, 4.2.16, 3.11.0-4-generic, x86_64: installed
ExecutablePath: /usr/bin/Xorg
ExtraDebuggingInterest: Yes
GraphicsCard:
 Advanced Micro Devices, Inc. [AMD/ATI] Caicos [Radeon HD 6450/7450/8450] [1002:6779] (prog-if 00 [VGA controller])
   Subsystem: ASUSTeK Computer Inc. Device [1043:03da]
InstallationDate: Installed on 2013-08-12 (24 days ago)
InstallationMedia: Ubuntu 13.10 "Saucy Salamander" - Alpha amd64 (20130630)
MachineType: Gigabyte Technology Co., Ltd. EG41MF-US2H
MarkForUpload: True
ProcCmdline: /usr/bin/X -core :0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch
ProcEnviron:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.11.0-4-generic root=UUID=74c6d537-9a52-45a2-a434-ecde1217e8d1 ro quiet splash vt.handoff=7
Signal: 6
SourcePackage: xorg-server
StacktraceTop:
 _mesa_GenTextures () from /usr/lib/x86_64-linux-gnu/libdricore9.2.0.so.1
 ?? () from /usr/lib/xorg/modules/extensions/libglx.so
 ?? () from /usr/lib/xorg/modules/extensions/libglx.so
 ?? ()
 ?? ()
Title: Xorg crashed with SIGABRT in _mesa_GenTextures()
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

dmi.bios.date: 04/22/2010
dmi.bios.vendor: Award Software International, Inc.
dmi.bios.version: F5
dmi.board.name: EG41MF-US2H
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.type: 3
dmi.chassis.vendor: Gigabyte Technology Co., Ltd.
dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvrF5:bd04/22/2010:svnGigabyteTechnologyCo.,Ltd.:pnEG41MF-US2H:pvr:rvnGigabyteTechnologyCo.,Ltd.:rnEG41MF-US2H:rvrx.x:cvnGigabyteTechnologyCo.,Ltd.:ct3:cvr:
dmi.product.name: EG41MF-US2H
dmi.sys.vendor: Gigabyte Technology Co., Ltd.
version.compiz: compiz 1:0.9.10+13.10.20130828.2-0ubuntu1
version.ia32-libs: ia32-libs N/A
version.libdrm2: libdrm2 2.4.46-1
version.libgl1-mesa-dri: libgl1-mesa-dri 9.2-1ubuntu1
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 9.2-1ubuntu1
version.xserver-xorg-core: xserver-xorg-core 2:1.14.2.901-2ubuntu4
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.7.3-0ubuntu3.1
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:7.2.0-0ubuntu3
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.21.14-4ubuntu3
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.9-2ubuntu1
xserver.bootTime: Thu Sep 5 12:18:26 2013
xserver.configfile: default
xserver.errors:

xserver.logfile: /var/log/Xorg.0.log
xserver.version: 2:1.14.2.901-2ubuntu4
xserver.video_driver: radeon

Revision history for this message
In , Philip Armstrong (phil-ubuntu) wrote :

System: Radeon HD 5770, AMD Phenom II. Debian Linux kernel 3.9.8, mesa 9.1.4 libdrm-radeon 2.4.45 xserver-xorg-video-radeon 6.14.4

Running the Linux version of the game FTL causes the Xserver to segfault.

The backtrace I get is:

Backtrace:
0: /usr/bin/Xorg (xorg_backtrace+0x36) [0x7fba2ad6dd06]
1: /usr/bin/Xorg (0x7fba2abef000+0x182859) [0x7fba2ad71859]
2: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7fba29f14000+0xf210) [0x7fba29f23210]
3: /usr/lib/x86_64-linux-gnu/dri/r600_dri.so (0x7fba256e5000+0x10c7a7) [0x7fba257f17a7]
4: /usr/lib/xorg/modules/extensions/libglx.so (0x7fba273a2000+0xddb1) [0x7fba273afdb1]
5: /usr/lib/xorg/modules/extensions/libglx.so (0x7fba273a2000+0x3c223) [0x7fba273de223]
6: /usr/bin/Xorg (0x7fba2abef000+0x52e61) [0x7fba2ac41e61]
7: /usr/bin/Xorg (0x7fba2abef000+0x41ec5) [0x7fba2ac30ec5]
8: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xf5) [0x7fba28b9f995]
9: /usr/bin/Xorg (0x7fba2abef000+0x4219d) [0x7fba2ac3119d]

Segmentation fault at address 0x2d6a83b0

I'll see if I can get a better backtrace by installing the dbg packages.

No errors appear in the dmesg output - this appears to be a userspace crash.

Revision history for this message
In , Philip Armstrong (phil-ubuntu) wrote :

Turns out the real problem is that FTL bundles a version of libstdc++ that the DRI drivers won't link against.

It looks like the net result is that *no* DRI drivers (not even swrast) can be loaded, and the Xserver dies when trying to invoke the first GLX call.

Here's the output of the program with LIBGL_DEBUG=verbose :

$ cat libgl_debug-output2.txt
libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/tls/r600_dri.so
libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/r600_dri.so
libGL error: dlopen /usr/lib/x86_64-linux-gnu/dri/r600_dri.so failed (/home/phil/games/FTL/data/amd64/lib/libstdc++.so.6: version `GLIBCXX_3.4.15' not found (required by /usr/lib/x86_64-linux-gnu/libLLVM-3.2.so.1))
libGL: OpenDriver: trying ${ORIGIN}/dri/tls/r600_dri.so
libGL: OpenDriver: trying ${ORIGIN}/dri/r600_dri.so
libGL error: dlopen ${ORIGIN}/dri/r600_dri.so failed (/home/phil/games/FTL/data/amd64/lib/libstdc++.so.6: version `GLIBCXX_3.4.15' not found (required by /usr/lib/x86_64-linux-gnu/libLLVM-3.2.so.1))
libGL: OpenDriver: trying /usr/lib/dri/tls/r600_dri.so
libGL: OpenDriver: trying /usr/lib/dri/r600_dri.so
libGL error: dlopen /usr/lib/dri/r600_dri.so failed (/usr/lib/dri/r600_dri.so: cannot open shared object file: No such file or directory)
libGL error: unable to load driver: r600_dri.so
libGL error: driver pointer missing
libGL error: failed to load driver: r600
libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/tls/swrast_dri.so
libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so
libGL error: dlopen /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so failed (/home/phil/games/FTL/data/amd64/lib/libstdc++.so.6: version `GLIBCXX_3.4.15' not found (required by /usr/lib/x86_64-linux-gnu/libLLVM-3.2.so.1))
libGL: OpenDriver: trying ${ORIGIN}/dri/tls/swrast_dri.so
libGL: OpenDriver: trying ${ORIGIN}/dri/swrast_dri.so
libGL error: dlopen ${ORIGIN}/dri/swrast_dri.so failed (/home/phil/games/FTL/data/amd64/lib/libstdc++.so.6: version `GLIBCXX_3.4.15' not found (required by /usr/lib/x86_64-linux-gnu/libLLVM-3.2.so.1))
libGL: OpenDriver: trying /usr/lib/dri/tls/swrast_dri.so
libGL: OpenDriver: trying /usr/lib/dri/swrast_dri.so
libGL error: dlopen /usr/lib/dri/swrast_dri.so failed (/usr/lib/dri/swrast_dri.so: cannot open shared object file: No such file or directory)
libGL error: unable to load driver: swrast_dri.so
libGL error: failed to load driver: swrast
XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":3"
      after 10305 requests (10305 known processed) with 0 events remaining.

If I remove the bundled libstdc++.so & use the system one then everything works as expected.

Obviously this is still an Xorg crash bug though: the server ought not to crash if a userspace program fails to load a glx driver!

Revision history for this message
In , Philip Armstrong (phil-ubuntu) wrote :

Created attachment 82474
gdb backtrace of all threads

Backtrace

Revision history for this message
In , Philip Armstrong (phil-ubuntu) wrote :

Created attachment 82475
Backtrace with dri symbols

Revision history for this message
In , Philip Armstrong (phil-ubuntu) wrote :

Changed title to reflect underlying bug: I don't believe that the user should be able to make the XServer process segfault by substituting the wrong libstdc++ library when running an ordinary user process.

Revision history for this message
In , Marek Olšák (maraeo) wrote :

The solution is simple: don't use an older libstdc++. Lots of closed source apps do that, which breaks them if the Mesa driver was linked against a newer version. There is nothing we can do about that.

Revision history for this message
In , Marek Olšák (maraeo) wrote :

Oh and by the way, the X crash seems to be caused by indirect rendering, which had been broken according to what Keith Packard said at XDC2012, IIRC. The issue will be trivially resolved by nuking indirect rendering if I understood Keith's plan correctly, leaving you with no rendering whatsoever and an even stronger incentive to delete the old libstdc++.

Revision history for this message
In , Philip Armstrong (phil-ubuntu) wrote :

(In reply to comment #5)
> The solution is simple: don't use an older libstdc++. Lots of closed source
> apps do that, which breaks them if the Mesa driver was linked against a
> newer version. There is nothing we can do about that.

Obviously the solution is simple: I've already implemented it.

But seriously? You think a hard Xserver crash caused by a userspace client is NotABug?

I don't believe that it's reasonable for the Xserver to crash in this fashion: Refuse to run the application? Sure. Fall back to software rendering? Why not. Segfault and kill the entire desktop? That doesn't seem very user friendly to me frankly.

Revision history for this message
In , Marek Olšák (maraeo) wrote :

Sorry for my hastiness. If you have a fix, that's great! Feel free to send it to the appropriate mailing list. Thanks!

Revision history for this message
In , Philip Armstrong (phil-ubuntu) wrote :

Sadly I was only referring to the 'dump the old libstdc++' solution to the immediate problem of the Xserver crashing. I don't have a patch at this point.

If the reality is 'this problem is real, but will be going away in the next Xorg release because that rendering pipeline is going away' so it's a WONTFIX (or rather a WILLFIXINLATERRELEASE perhaps), then maybe that's ok. I do (personally) think crashing is always a bug however: the user shouldn't be able to crash the Xserver like this. (Segfaults smell of potential security issues too, even if in reality anyone with the level of access required to trigger this one can probably punch any number of holes in the system.)

I'll take a look at the code and see what's going on, but if it turns out that the code in question is going away anyway then maybe it should be simply ripped out forthwith if it crashes like this?

Revision history for this message
In , Idr (idr) wrote :

Is the libstdc++ that the Xserver sees replaced? If the server is picking up the wrong libstdc++... yeah, don't do that. You wouldn't replace parts of your car engine with random parts from a different model, would you? :)

If the driver loaded by the server is still using the correct system libstdc++, it should work fine. Two things to try:

1. Try running the app with LIBGL_ALWAYS_INDIRECT=y. Does it still crash the server?

2. Try collecting an apitrace of the application. You'll probably have to run it via ssh or from the console. Otherwise everything will be forcibly killed when the server dies, and you won't get a trace. It might be sketchy any way. Does replaying the trace on an unmangled system still crash the server?

Revision history for this message
In , Philip Armstrong (phil-ubuntu) wrote :

<i>Is the libstdc++ that the Xserver sees replaced? If the server is picking up the wrong libstdc++... yeah, don't do that. You wouldn't replace parts of your car engine with random parts from a different model, would you? :)</i>

Nope, the Xserver is being linked against the system libstdc++ - it's being launched by gdm3 in a completely stock fashion.

The only place the older libstdc++ is being used is when the binary in question is run: the shell script wrapper sets LD_LIBRARY_PATH to point to a directory of support libs, including the old libstdc++. I'm running it from a terminal which in turn is running on the desktop of the original Xsession launched by gdm3.

If you look at the error messages from the program, it appears that the r600_dri.so (or any of the other mesa drivers) can't load as a result, because they're trying to link against the old libstdc++ (thanks to the LD_LIBRARY_PATH). I suspect the Xserver crashes because it tries to call into them anyway, despite the fact that the dlopen() call failed.

I'll try the INDIRECT thing in the morning, if I get a chance. I doubt the API trace will kill the Xserver, because removing the old libstdc++ from the LD_LIBRARY_PATH of the binary works just fine, although I suppose the binary could be looking at GL features and changing it's behaviour depending on what's available: this is doubtful though as the openGL usage is very basic. It's just texture blits and scaling from watching the program in action. Can't hurt to try of course!

Revision history for this message
In , Philip Armstrong (phil-ubuntu) wrote :

NB. To put this another way, why is the Xserver letting a userspace program decide which libraries it should link it's own glx drivers against? Isn't that asking for trouble?

Revision history for this message
In , Alan Coopersmith (alan-coopersmith) wrote :

(In reply to comment #12)
> NB. To put this another way, why is the Xserver letting a userspace program
> decide which libraries it should link it's own glx drivers against?

It shouldn't, unless that program is doing something like ldconfig to change
the global linker configuration underneath the X server - the X server relies
on the system loader & dlopen() to find its libraries.

Revision history for this message
In , Philip Armstrong (phil-ubuntu) wrote :

Given that the program is being run as an ordinary unprivileged user, it shouldn't be playing games with ldconfig.

Revision history for this message
In , Idr (idr) wrote :

(In reply to comment #14)
> Given that the program is being run as an ordinary unprivileged user, it
> shouldn't be playing games with ldconfig.

It seems unlikely that it is, and that's why I've asked for those tests. Removing the actual application and the old libstdc++ from the equation (by using the apitrace with force indirect-rendering) will confirm whether or not this is a legit Xserver (or Mesa driver) bug or a system configuration issue.

Revision history for this message
In , Philip Armstrong (phil-ubuntu) wrote :

So, running the Xserver as usual (ie, unchanged from stock install) & running FTL linked against the old libstdc++ but with LIBGL_ALWAYS_INDIRECT=y causes the Xserver to crash as before.

I'm fairly sure that the setting is having an effect, because if I also set LIBGL_DEBUG=verbose, I don't get any extra output, whereas if I just set LIBGL_DEBUG & not LIBGL_ALWAYS_INDIRECT, I get the expected debugging output as seen above.

I'm installing apitrace as I type.

Revision history for this message
In , Philip Armstrong (phil-ubuntu) wrote :

Replaying the crashing trace recorded by apitrace does not cause the Xserver to crash, which seems unsurprising since everything is fine if the binary in question is linked against the system libstdc++ instead of the older bundled one. During the replay, we're recreating the latter situation, so it seems consistent that the Xserver is fine.

Revision history for this message
In , Michel Dänzer (michel-daenzer) wrote :

(In reply to comment #17)
> Replaying the crashing trace recorded by apitrace does not cause the Xserver
> to crash, [...]

What if you replay it with LIBGL_ALWAYS_INDIRECT=y?

I think this is just a normal indirect rendering bug, and libstdc++ only matters insofar as the bad one causes the app to fall back to indirect rendering.

Revision history for this message
In , Philip Armstrong (phil-ubuntu) wrote :

Ah, I'd missed that case!

Revision history for this message
In , Philip Armstrong (phil-ubuntu) wrote :

(Hit post accidentally there)

OK, so if I just set LIBGL_ALWAYS_INDIRECT, and link the binary against the usual system libraries, not the bundled ones (verified with ldd & running the binary directly, not via any shellscripts) then the Xserver crashes.

Replaying the trace doesn't seem the trigger the crash though. It claims that the final call is 'incomplete' so perhaps I'm missing some crucial data?

Revision history for this message
In , Philip Armstrong (phil-ubuntu) wrote :

Running apitrace on the trace generated by running

$ DISPLAY=:0.0 LIBGL_ALWAYS_INDIRECT=y apitrace trace ./amd64/bin/FTL

from an ssh shell (which kills the Xserver)

gives me the following output when I replay the trace:

$ LIBGL_ALWAYS_INDIRECT=y apitrace replay FTL.trace
apitrace: warning: caught signal 11
11813: error: caught an unhandled exception
apitrace: info: taking default action for signal 11

but the Xserver remains live. The trace is 85Mb or so.

Revision history for this message
In , Philip Armstrong (phil-ubuntu) wrote :

tail of the trace dump is as follows:

11773 glGenTextures(n = 1, textures = &1870)
11774 glBindTexture(target = GL_TEXTURE_2D, texture = 1870)
11775 glTexImage2D(target = GL_TEXTURE_2D, level = 0, internalformat = GL_RGBA, width = 32, height = 32, border = 0, format = GL_RGBA, type = GL_UNSIGNED_BYTE, pixels = blob(4096))
11776 glTexParameterf(target = GL_TEXTURE_2D, pname = GL_TEXTURE_MIN_FILTER, param = GL_NEAREST)
11777 glTexParameterf(target = GL_TEXTURE_2D, pname = GL_TEXTURE_MAG_FILTER, param = GL_NEAREST)
11778 glGenLists(range = 256) = 1
11779 glGenTextures(n = 256, textures = ?) // incomplete

Revision history for this message
In , Philip Armstrong (phil-ubuntu) wrote :

Sorry: the mismatch in numbers is because the replay came from a different dump. Running it on the dump I posted gives the expected

$ ~/Code/apitrace/build/apitrace replay FTL.trace
apitrace: warning: caught signal 11
11779: error: caught an unhandled exception
apitrace: info: taking default action for signal 11

Revision history for this message
Misha Bazanov (bmw-) wrote :
Revision history for this message
Apport retracing service (apport) wrote :

StacktraceTop:
 _mesa_GenTextures (n=256, textures=<optimized out>) at ../../../../../src/mesa/main/texobj.c:1028
 __glXDisp_GenTextures (cl=0x7fe953918270, pc=<optimized out>) at ../../glx/indirect_dispatch.c:2851
 __glXDispatch (client=<optimized out>) at ../../glx/glxext.c:581
 Dispatch () at ../../dix/dispatch.c:432
 main (argc=9, argv=0x7fff261547d8, envp=<optimized out>) at ../../dix/main.c:298

Revision history for this message
Apport retracing service (apport) wrote : Stacktrace.txt
Revision history for this message
Apport retracing service (apport) wrote : StacktraceSource.txt
Revision history for this message
Apport retracing service (apport) wrote : ThreadStacktrace.txt
Changed in xorg-server (Ubuntu):
importance: Undecided → Medium
tags: removed: need-amd64-retrace
Timo Aaltonen (tjaalton)
information type: Private → Public
affects: xorg-server (Ubuntu) → mesa (Ubuntu)
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in mesa (Ubuntu):
status: New → Confirmed
Changed in mesa:
importance: Unknown → Medium
status: Unknown → Won't Fix
penalvch (penalvch)
tags: added: latest-bios-f5
Revision history for this message
penalvch (penalvch) wrote :

Misha Bazanov / Anton Sudak, thank you for reporting this and helping make Ubuntu better.

As per https://wiki.ubuntu.com/Releases, Ubuntu 13.10 reached EOL on July 17, 2014.

Is this reproducible with a supported release?

Changed in mesa (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Oibaf (oibaf) wrote :

No reply to previous comment after 7 years, and the upstream bug was closed.

Changed in mesa (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.