Shutdown crash due to possible mesa race at startup

Bug #1267893 reported by Chris Coulson
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Oxide
Fix Released
Critical
Unassigned
mesa (Ubuntu)
Triaged
Medium
Maarten Lankhorst

Bug Description

When running the oxide test suite, I saw this crash:

#0 glx_display_free (priv=priv@entry=0x0) at ../../../../src/glx/glxext.c:226
#1 0x00002aaaae215305 in __glXCloseDisplay (dpy=0x624970, codes=<optimised out>) at ../../../../src/glx/glxext.c:272
#2 0x00002aaaaeee44f2 in XCloseDisplay (dpy=0x624970) at ../../src/ClDisplay.c:65
#3 0x00002aaab3ac8c3a in QXcbConnection::~QXcbConnection (this=0x6245d0, __in_chrg=<optimised out>) at qxcbconnection.cpp:382
#4 0x00002aaab3ac8f89 in QXcbConnection::~QXcbConnection (this=0x6245d0, __in_chrg=<optimised out>) at qxcbconnection.cpp:388
#5 0x00002aaab3accfd6 in qDeleteAll<QList<QXcbConnection*>::const_iterator> (end=..., begin=...) at ../../../../include/QtCore/../../src/corelib/tools/qalgorithms.h:321
#6 qDeleteAll<QList<QXcbConnection*> > (c=...) at ../../../../include/QtCore/../../src/corelib/tools/qalgorithms.h:329
#7 QXcbIntegration::~QXcbIntegration (this=0x61a7d0, __in_chrg=<optimised out>) at qxcbintegration.cpp:127
#8 0x00002aaab3acd1e9 in QXcbIntegration::~QXcbIntegration (this=0x61a7d0, __in_chrg=<optimised out>) at qxcbintegration.cpp:128
#9 0x00002aaaaafa82f6 in QGuiApplicationPrivate::~QGuiApplicationPrivate (this=0x6183d0, __in_chrg=<optimised out>) at kernel/qguiapplication.cpp:1067
#10 0x00002aaaaafa8469 in QGuiApplicationPrivate::~QGuiApplicationPrivate (this=0x6183d0, __in_chrg=<optimised out>) at kernel/qguiapplication.cpp:1070
#11 0x00002aaaac7d0b66 in cleanup (pointer=<optimised out>) at ../../include/QtCore/../../src/corelib/tools/qscopedpointer.h:63
#12 ~QScopedPointer (this=0x7fffffffd568, __in_chrg=<optimised out>) at ../../include/QtCore/../../src/corelib/tools/qscopedpointer.h:99
#13 QObject::~QObject (this=0x7fffffffd560, __in_chrg=<optimised out>) at kernel/qobject.cpp:750
#14 0x00002aaaaafa814c in QGuiApplication::~QGuiApplication (this=0x7fffffffd560, __in_chrg=<optimised out>) at kernel/qguiapplication.cpp:387
#15 0x000000000040080d in main (argc=3, argv=0x7fffffffd668) at tst_qmltests.cc:34

What's happening is that __glXCloseDisplay() is being called twice, because the glx extension seems to be registered twice on the Display handle.

This appears to be happening because of a race in __glXInitialize() between the main thread and Chromium's GPU thread. The main thread calls it for the first time from here:

#0 __glXInitialize (dpy=dpy@entry=0x624970) at ../../../../src/glx/glxext.c:798
#1 0x00002aaaae211a27 in glXGetFBConfigs (dpy=0x624970, screen=0, nelements=nelements@entry=0x7fffffffce7c) at ../../../../src/glx/glxcmds.c:1663
#2 0x00002aaaae2124e3 in glXChooseFBConfig (dpy=<optimised out>, screen=<optimised out>, attribList=0x68a388, nitems=0x7fffffffcfec) at ../../../../src/glx/glxcmds.c:1623
#3 0x00002aaab3ae928f in qglx_findConfig (display=display@entry=0x624970, screen=screen@entry=0, format=..., drawableBit=drawableBit@entry=1) at glxconvenience/qglxconvenience.cpp:126
#4 0x00002aaab3ae93dc in qglx_findVisualInfo (display=0x624970, screen=0, format=format@entry=0x6f3fc8) at glxconvenience/qglxconvenience.cpp:171
#5 0x00002aaab3ad978d in QXcbWindow::create (this=0x6f3f70) at qxcbwindow.cpp:271
#6 0x00002aaab3accd11 in QXcbIntegration::createPlatformWindow (this=<optimised out>, window=0x64b950) at qxcbintegration.cpp:132
#7 0x00002aaaaafb259e in QWindow::create (this=this@entry=0x64b950) at kernel/qwindow.cpp:323
#8 0x00002aaaaafb2e58 in QWindow::setVisible (this=this@entry=0x64b950, visible=visible@entry=true) at kernel/qwindow.cpp:269
#9 0x00002aaaaafb3429 in QWindow::showNormal (this=this@entry=0x64b950) at kernel/qwindow.cpp:1524
#10 0x00002aaaaafb344e in QWindow::show (this=this@entry=0x64b950) at kernel/qwindow.cpp:1455
#11 0x00002aaaaacdbf37 in quick_test_main (argc=<optimised out>, argv=argv@entry=0x7fffffffd668, name=name@entry=0x400a00 "qmltests",
    sourceDir=sourceDir@entry=0x4009c8 "/home/chr1s/src/oxide/oxide/qt/tests/qmltests/data") at quicktest.cpp:341
#12 0x0000000000400801 in main (argc=3, argv=0x7fffffffd668) at tst_qmltests.cc:34

... and Chromium's GPU thread calls it for the first time from here:

#0 __glXInitialize (dpy=0x624970) at ../../../../src/glx/glxext.c:798
#1 0x00002aaaae211851 in glXQueryVersion (dpy=<optimised out>, major=0x2aaaed43d7ec, minor=0x2aaaed43d7f0) at ../../../../src/glx/glxcmds.c:487
#2 0x00002aaac606b74b in gfx::GLXApiBase::glXQueryVersionFn (this=0x2aab1c008480, dpy=0x624970, maj=0x2aaaed43d7ec, min=0x2aaaed43d7f0)
    at /home/chr1s/src/oxide/oxide/chromium/src/out/Debug/obj/gen/ui/gl/gl_bindings_autogen_glx.cc:724
#3 0x00002aaac6061c11 in gfx::GLSurfaceGLX::InitializeOneOff () at chromium/src/ui/gl/gl_surface_glx.cc:407
#4 0x00002aaac601b4ac in gfx::GLSurface::InitializeOneOffInternal () at chromium/src/ui/gl/gl_surface_x11.cc:57
#5 0x00002aaac6018890 in gfx::GLSurface::InitializeOneOff () at chromium/src/ui/gl/gl_surface.cc:58
#6 0x00002aaac96878dd in content::GpuChildThread::GpuChildThread (this=0x2aab1c001f40, channel_id=..., share_group=0x0) at chromium/src/content/gpu/gpu_child_thread.cc:76
#7 0x00002aaac968b386 in content::InProcessGpuThread::Init (this=0x2aab0805c720) at chromium/src/content/gpu/in_process_gpu_thread.cc:29
#8 0x00002aaac4af5a1e in base::Thread::ThreadMain (this=0x2aab0805c720) at chromium/src/base/threading/thread.cc:218
#9 0x00002aaac4ae310a in base::(anonymous namespace)::ThreadFunc (params=0x2aaad37d0020) at chromium/src/base/threading/platform_thread_posix.cc:80
#10 0x00002aaaae762f6e in start_thread (arg=0x2aaaed43f700) at pthread_create.c:311
#11 0x00002aaaabacb9cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

__glXInitialize() first enters a global lock and then searches a global list for the display handle. It returns the corresponding glx_display if it is found. All good so far, however...

If the display is not found, it releases the lock and begins to initialize a glx_display. Whilst it is doing this, another thread can come along and search the global list, not find the corresponding display, and also begin to initialize a glx_display.

After the glx_display has been initialized, the global lock is re-taken and the global list is searched again. If another thread has beaten the current thread to this point (the current display now appears in the list), it frees the glx_display it has just created and returns the one created by the other thread instead.

*However*, __glXInitialize() has already called XInitExtension() and XESetCloseDisplay(), meaning that the extension is now registered twice on the display handle despite glXInitialize() detecting that another thread beaten us to it. The result is now a guaranteed crash when calling XCloseDisplay().

Just to verify this, I set a breakpoint on glx_display_free, and it hits it during startup on the GPU thread:

#0 glx_display_free (priv=priv@entry=0x2aab180083b0) at ../../../../src/glx/glxext.c:225
#1 0x00002aaaae215724 in __glXInitialize (dpy=0x624970) at ../../../../src/glx/glxext.c:893
#2 0x00002aaaae211851 in glXQueryVersion (dpy=<optimised out>, major=0x2aaaed0307ec, minor=0x2aaaed0307f0) at ../../../../src/glx/glxcmds.c:487
#3 0x00002aaac606b74b in gfx::GLXApiBase::glXQueryVersionFn (this=0x2aab18008480, dpy=0x624970, maj=0x2aaaed0307ec, min=0x2aaaed0307f0)
    at /home/chr1s/src/oxide/oxide/chromium/src/out/Debug/obj/gen/ui/gl/gl_bindings_autogen_glx.cc:724
#4 0x00002aaac6061c11 in gfx::GLSurfaceGLX::InitializeOneOff () at chromium/src/ui/gl/gl_surface_glx.cc:407
#5 0x00002aaac601b4ac in gfx::GLSurface::InitializeOneOffInternal () at chromium/src/ui/gl/gl_surface_x11.cc:57
#6 0x00002aaac6018890 in gfx::GLSurface::InitializeOneOff () at chromium/src/ui/gl/gl_surface.cc:58
#7 0x00002aaac96878dd in content::GpuChildThread::GpuChildThread (this=0x2aab18001f40, channel_id=..., share_group=0x0) at chromium/src/content/gpu/gpu_child_thread.cc:76
#8 0x00002aaac968b386 in content::InProcessGpuThread::Init (this=0x2aab0805c6e0) at chromium/src/content/gpu/in_process_gpu_thread.cc:29
#9 0x00002aaac4af5a1e in base::Thread::ThreadMain (this=0x2aab0805c6e0) at chromium/src/base/threading/thread.cc:218
#10 0x00002aaac4ae310a in base::(anonymous namespace)::ThreadFunc (params=0x2aaad37d0020) at chromium/src/base/threading/platform_thread_posix.cc:80
#11 0x00002aaaae762f6e in start_thread (arg=0x2aaaed032700) at pthread_create.c:311
#12 0x00002aaaabacb9cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Revision history for this message
Chris Coulson (chrisccoulson) wrote :

Maarten, is this a mesa bug? :)

Changed in oxide:
importance: Undecided → High
status: New → Triaged
Changed in oxide:
importance: High → Critical
Changed in oxide:
assignee: nobody → Maarten Lankhorst (mlankhorst)
Changed in mesa (Ubuntu):
status: New → Triaged
assignee: nobody → Maarten Lankhorst (mlankhorst)
Changed in oxide:
assignee: Maarten Lankhorst (mlankhorst) → nobody
Revision history for this message
Robert Hooker (sarvatt) wrote :
Revision history for this message
Maarten Lankhorst (mlankhorst) wrote :

It seems to be a mesa bug..

Revision history for this message
Chris Coulson (chrisccoulson) wrote :

Note, we've worked around this temporarily in Oxide

Changed in oxide:
status: Triaged → Fix Released
penalvch (penalvch)
Changed in mesa (Ubuntu):
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.