Ubuntu

xserver crashes when closing application using clutter

Reported by Laurent Bigonville on 2010-03-28
92
This bug affects 18 people
Affects Status Importance Assigned to Milestone
X.Org X server
In Progress
Critical
xorg (Debian)
Fix Released
Unknown
xorg-server (Ubuntu)
High
Bryce Harrington
Lucid
High
Bryce Harrington

Bug Description

Binary package hint: xorg

When closing empathy (ppa version) after opening the map windows, the whole xserver crashes with:

Backtrace:
0: /usr/bin/X (xorg_backtrace+0x28) [0x4a2648]
1: /usr/bin/X (0x400000+0x6538d) [0x46538d]
2: /lib/libpthread.so.0 (0x7f7d4f3fe000+0xf8f0) [0x7f7d4f40d8f0]
3: /usr/bin/X (dixLookupPrivate+0xa) [0x44a4fa]
4: /usr/lib/xorg/modules/extensions/libdri2.so (DRI2DestroyDrawable+0x37) [0x7f7d4c5611e7]
5: /usr/lib/xorg/modules/extensions/libglx.so (0x7f7d4cd82000+0x40fb4) [0x7f7d4cdc2fb4]
6: /usr/lib/xorg/modules/extensions/libglx.so (0x7f7d4cd82000+0x36bb0) [0x7f7d4cdb8bb0]
7: /usr/bin/X (FreeClientResources+0xd3) [0x44d6d3]
8: /usr/bin/X (CloseDownClient+0x60) [0x42c370]
9: /usr/bin/X (0x400000+0x30a60) [0x430a60]
10: /usr/bin/X (0x400000+0x2613a) [0x42613a]
11: /lib/libc.so.6 (__libc_start_main+0xfd) [0x7f7d4e0f6c4d]
12: /usr/bin/X (0x400000+0x25ce9) [0x425ce9]
Segmentation fault at address 0x290

Related branches

Update:

I first made sure that my application only used GLX >=1.3 API. This would still crash. I then made sure it only used GLX <1.3, and now I have no crashes anymore. I'm planning to write a simple program that reproduces the crash/bug.

Created an attachment (id=34041)
Reproduces the crash

Here is a program that should crash the server eventually (just run it a few times, it might not happen every time). Also, it seems to happen more often if the program is exited the normal way (e.g. the window close button) than if it is killed (with some signal).

I think it is interesting that the server seems to *not* crash if the make context current call is removed.

Another thing, I now use version 2:1.7.5-1 in Debian.

Now please, someone reply. Currently, the only thing I've got for sending in this report is spam to my email address.

Is your program using indirect rendering?

Created an attachment (id=34244)
glxinfo output

glxinfo says "direct rendering: Yes".

Adding this piece of code just after the context is created in the example program:

printf ("Direct: %s\n", (glXIsDirect (dpy, glxcontext) == True) ? "yes" : "no");

Gives output "yes" in both the cases when it does not crash and the cases it
does crash.

I have attached the output of glxinfo too.

Exiting any Clutter 1.2 application causes the same crash. I have the same environment as the reporter but newer:

xserver-xorg-core: 2:1.7.5.902-1
xserver-xorg-video-intel: 2:2.10.903-1
linux-image-2.6.33-2-amd64: 2.6.33-1~experimental.4

I have tested running the program (attached before, with modification to print "Direct: yes/no") on my other computer (same server version, but software rasterizer). The program crashes, but not the server. The program outputs the following:

failed to create drawable
Direct: yes
Segmentation fault

Happens also when I run remote programs (it also says "Direct: yes" in that case, which makes no sense, I think).

Top of backtrace:

#0 0xb77d305f in XCreateDrawable (psc=0x80aa2f8, xDrawable=56623137,
    drawable=56623137, modes=0x80b4480) at drisw_glx.c:89
        gcvalues = {function = 3, plane_mask = 3450467868,
          foreground = 3540936910, background = 1915168984,
          line_width = -928079311, line_style = -2102111902,
          cap_style = 1901201503, join_style = -1946909669,
          fill_style = 504982920, fill_rule = -1981880903,
          arc_mode = 742396610, tile = 3411661115, stipple = 3456143762,
          ts_x_origin = 2135257643, ts_y_origin = -1073749304,
          font = 3221217988, subwindow_mode = -1073749324,
          graphics_exposures = 0, clip_x_origin = 63, clip_y_origin = 1,
          clip_mask = 3221218224, dash_offset = 1870652674,
          dashes = -48 '\320'}
        visTemp = {visual = 0xbfffe494, visualid = 0, screen = 0,
          depth = 134938528, class = 56623137, red_mask = 3221218296,
          green_mask = 3078446832, blue_mask = 56623137,
          colormap_size = 134934184, bits_per_rgb = -1073748968}
        num_visuals = 0
#1 driCreateDrawable (psc=0x80aa2f8, xDrawable=56623137, drawable=56623137,
    modes=0x80b4480) at drisw_glx.c:321
        pdraw = <value optimized out>
        swrast = <value optimized out>
#2 0xb77b175e in FetchDRIDrawable (dpy=<value optimized out>,
    glxDrawable=56623137, gc=0x80aeea8) at glxcurrent.c:294
        priv = <value optimized out>
        pdraw = <value optimized out>
        psc = <value optimized out>
#3 0xb77b1ab6 in MakeContextCurrent (dpy=0x8060a30, draw=56623137,
    read=56623137, gc=0x80aeea8) at glxcurrent.c:370
        pdraw = 0xb7fff697
        pread = <value optimized out>
        reply = {type = 14 '\016', unused = 0 '\000', sequenceNumber = 0,
          length = 0, contextTag = 134521636, pad2 = 1, pad3 = 3078236324,
          pad4 = 134934184, pad5 = 3221218996, pad6 = 134899480}
        oldGC = 0xb77f77e0
        opcode = 151 '\227'
        oldOpcode = <value optimized out>
        bindReturnValue = <value optimized out>
        state = <value optimized out>
#4 0x08048dd9 in configure ()
...

In my previous comment I meant to say "run the program remotely" instead of "run remote programs".

*** Bug 27326 has been marked as a duplicate of this bug. ***

Laurent Bigonville (bigon) wrote :

Binary package hint: xorg

When closing empathy (ppa version) after opening the map windows, the whole xserver crashes with:

Backtrace:
0: /usr/bin/X (xorg_backtrace+0x28) [0x4a2648]
1: /usr/bin/X (0x400000+0x6538d) [0x46538d]
2: /lib/libpthread.so.0 (0x7f7d4f3fe000+0xf8f0) [0x7f7d4f40d8f0]
3: /usr/bin/X (dixLookupPrivate+0xa) [0x44a4fa]
4: /usr/lib/xorg/modules/extensions/libdri2.so (DRI2DestroyDrawable+0x37) [0x7f7d4c5611e7]
5: /usr/lib/xorg/modules/extensions/libglx.so (0x7f7d4cd82000+0x40fb4) [0x7f7d4cdc2fb4]
6: /usr/lib/xorg/modules/extensions/libglx.so (0x7f7d4cd82000+0x36bb0) [0x7f7d4cdb8bb0]
7: /usr/bin/X (FreeClientResources+0xd3) [0x44d6d3]
8: /usr/bin/X (CloseDownClient+0x60) [0x42c370]
9: /usr/bin/X (0x400000+0x30a60) [0x430a60]
10: /usr/bin/X (0x400000+0x2613a) [0x42613a]
11: /lib/libc.so.6 (__libc_start_main+0xfd) [0x7f7d4e0f6c4d]
12: /usr/bin/X (0x400000+0x25ce9) [0x425ce9]
Segmentation fault at address 0x290

Laurent Bigonville (bigon) wrote :

I can reproduce it with an intel and an ati card

Changed in xorg-server:
status: Unknown → Confirmed
Bryce Harrington (bryce) on 2010-03-29
affects: xorg (Ubuntu) → xorg-server (Ubuntu)
Timo Aaltonen (tjaalton) wrote :

looks like it's confirmed upstream.

Changed in xorg-server (Ubuntu):
importance: Undecided → Medium
status: New → Confirmed
Robert Hooker (sarvatt) wrote :

Can you reproduce this, then run apport-collect 550218 after the next startup so we can see your hardware information and the logs?

from bug 27326 - this issue is also affecting Clutter 1.2, since we switched to GLX 1.3 API.

Jiri Techet (techet) wrote :

I'm experiencing the very same problem with my Lenovo R61 notebook and 64bit Lucid. (Unfortunatelly with apport-collect I keep getting "No additional information collected" message with anything being uploaded so I'll try to describe my configuration and how to reproduce the bug manually.).

This is what Xorg says about my graphics card:

(--) PCI:*(0:0:2:0) 8086:2a02:17aa:20b5 Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller rev 12, Mem @ 0xf8100000/1048576, 0xe0000000/268435456, I/O @ 0x00001800/8

This is how to reproduce the bug:

1. Install all required development packages to be able to build clutter.
2. Download and build clutter 1.2.4.
3. Go to tests/interactive and run some of the tests, i.g. test-viewport.
4. Run and close the test repeatably - Xorg will crash eventually when closing the window. (It seems that the probability of crash is higher just after the system boot. But maybe it's my impression only.)

Not sure if it's related, but I also get the following message in the console after executing the test:

do_wait: drmWaitVBlank returned -1, IRQs don't seem to be working correctly.
Try adjusting the vblank_mode configuration parameter.

I have also tested this with desktop PC using Nvidia proprietary drivers and wasn't able to reproduce this behaviour.

Neil J. Patel (njpatel) wrote :

Same thing happens with i965 on a Lenovo t400s, same backtrace. Has been happening since Clutter 1.2.2 landed in Lucid. I've spoken to the Clutter guys and it seems like it's fixed in Fedora 13 as they have a newer Xorg, though I can't confirm. I'll try and test out a F13 live cd this weekend.

I think this is going to be a bigger issue as a few Gnome games have also started using Clutter.

Changed in xorg-server (Ubuntu):
assignee: nobody → Bryce Harrington (bryceharrington)
importance: Medium → High
Robert Hooker (sarvatt) wrote :

From happyhamster in a duped bug report:
https://bugs.edge.launchpad.net/ubuntu/+source/xorg-server/+bug/549576/comments/6

"I narrowed it down to libclutter-1.0-0_1.2.4-0ubuntu1 I think (and perhaps libclutter-gtk-0.10-0_0.10.4-0ubuntu1). On a fully upgraded system, downgrading to libclutter-1.0-0_1.0.6-0ubuntu1 (and libclutter-gtk-0.10-0_0.10.2-0ubuntu2, else quadrapassel won't start) is enough to make the X crash go away.

Or, start a live session (lucid beta-1), and upgrade only the libclutter-1.0-0 package to see X crash when closing quadrapassel (the crash happens about 1 in 3 tries on my system)."

Changed in xorg (Debian):
status: Unknown → New

Here is a clearer backtrace of the crash in case it helps.

Program received signal SIGSEGV, Segmentation fault.
0x0807fdae in privateExists (privates=0x14f, key=0xffa0fc) at ../../dix/privates.c:79
79 return *key && *privates &&
(gdb) bt full
#0 0x0807fdae in privateExists (privates=0x14f, key=0xffa0fc) at ../../dix/privates.c:79
No locals.
#1 0x0807ffd1 in dixLookupPrivate (privates=0x14f, key=0xffa0fc) at ../../dix/privates.c:162
        ptr = 0x4f0
#2 0x00ff6bf9 in DRI2GetScreen (pScreen=0xffffffff) at ../../../../hw/xfree86/dri2/dri2.c:78
No locals.
#3 0x00ff731b in DRI2DestroyDrawable (pDraw=0xbae9110) at ../../../../hw/xfree86/dri2/dri2.c:344
        ds = 0xa25c568
        pPriv = 0x0
        pWin = 0xffa100
        pPixmap = 0xbae9490
#4 0x0056c7f9 in __glXDRIdrawableDestroy (drawable=0xbae93c0) at ../../glx/glxdri2.c:110
        private = 0xbae93c0
        core = 0x129bd40
#5 0x0055cd31 in DrawableGone (glxPriv=0xbae93c0, xid=67108869) at ../../glx/glxext.c:163
        c = 0x0
#6 0x08094ab3 in FreeClientResources (client=0xb9ec3a8) at ../../dix/resource.c:806
        rtype = 54
        head = 0xb9ec47c
        resources = 0xb9ec468
        this = 0xbae9a10
        j = 5
#7 0x0808fa43 in CloseDownClient (client=0xb9ec3a8) at ../../dix/dispatch.c:3631
        really_close_down = 1
#8 0x08087bc8 in Dispatch () at ../../dix/dispatch.c:424
        clientReady = 0xb8feac8
        result = -1
        client = 0xb9ec3a8
        nready = 0
        icheck = 0x82594fc
        start_tick = 720
#9 0x08066d33 in main (argc=8, argv=0xbfd8efd4, envp=0xbfd8eff8) at ../../dix/main.c:285
        i = 1
        alwaysCheckForInput = {0, 1}

#2 0x00ff6bf9 in DRI2GetScreen (pScreen=0xffffffff)

That doesn't look good.

The backtraces in comments #6 and #10 are not congruent, are these actually the same crash? #10 matches the original reported trace so perhaps is the one to focus on.

(In reply to comment #11)
>
> The backtraces in comments #6 and #10 are not congruent, are these actually the
> same crash? #10 matches the original reported trace so perhaps is the one to
> focus on.
>

The back trace in comment #6 is not from the server, but the previously attached example program that reproduces the bug. When I run it on my laptop with Intel chip the server crashes, but when I run it on my desktop with software rasterizer the server survives but the program crashes.

Applying this commit to debian/ubuntu's xserver 1.7.6 fixes the server segfault when closing clutter apps (with clutter 1.2.4) for me, but the problem still remains with the testcase on this bug. Both cases have identical backtraces to the one in comment 10, and forcing swrast with LIBGL_ALWAYS_SOFTWARE=1 gives an identical backtrace and experience as comment #6 for the test case on comment #2. I should mention I am using intel as well as is everyone I have seen reporting this so far.

http://cgit.freedesktop.org/xorg/xserver/commit/?id=fa5103a02bd509e4a102afdad2ab26cb22210367

dri2: No need to blit from front on DRI2GetBuffers if they're just being reused.
It can be quite an expensive operation, so we're better off not doing
it unless it's totally required.

Signed-off-by: Francisco Jerez <email address hidden>
Signed-off-by: Kristian Høgsberg <email address hidden>
Reviewed-by: Kristian Høgsberg <email address hidden>

The fix suggested in the previous comment does not resolve the crashes in clutter 1.2.4 here. Sorry :/

Can you try this mesa fix?

diff --git a/src/glx/dri2_glx.c b/src/glx/dri2_glx.c
index c4b5cb9..27a700d 100644
--- a/src/glx/dri2_glx.c
+++ b/src/glx/dri2_glx.c
@@ -164,7 +164,7 @@ dri2DestroyDrawable(__GLXDRIdrawable * pdraw)
    const __DRIcoreExtension *core = pdraw->psc->core;

    (*core->destroyDrawable) (pdraw->driDrawable);
- DRI2DestroyDrawable(pdraw->psc->dpy, pdraw->xDrawable);
+ DRI2DestroyDrawable(pdraw->psc->dpy, pdraw->drawable);
    Xfree(pdraw);
 }

Created an attachment (id=34628)
Make DRI2DestroyDrawable more robust if pDraw is invalid

If pDraw is invalid (i.e. we already freed the priv) don't try to use it to look up the screen.

I think this is just a workaround... any comments Kristian?

*** Bug 27406 has been marked as a duplicate of this bug. ***

(In reply to comment #16)
> Created an attachment (id=34628) [details]
> Make DRI2DestroyDrawable more robust if pDraw is invalid
>
> If pDraw is invalid (i.e. we already freed the priv) don't try to use it to
> look up the screen.
>
> I think this is just a workaround... any comments Kristian?

With this patch in a ubuntu lucid environment (xserver 1.7.6, intel 2.9.1+backports, mesa 7.7.1) it still happens, but the probability of occurance is greatly reduced instead of being guaranteed at every closure of quadrapassel. With xorg-edgers packages (mesa 7.9, intel 2.11) on lucid as well as this xserver patch it completely goes away though.

I don't see(In reply to comment #16)
> Created an attachment (id=34628) [details]
> Make DRI2DestroyDrawable more robust if pDraw is invalid
>
> If pDraw is invalid (i.e. we already freed the priv) don't try to use it to
> look up the screen.

I don't see how that can work - if the WindowRec pointed to by pDraw was already freed, DRI2GetDrawable() can't be relied upon.

See e.g. commits 7b6400a1b8d2f228fcbedf17c30a7e3924e4dd2a ('glx: Fix drawable private leak on destroy'), 2075d4bf9e53b8baef0b919da6c44771220cd4a5 ('glx: If a destroyed window is bound to the current context, make it not current.') and 3020b1d43e34fca08cd51f7c7c8ed51497d49ef3 ('glx: Clean up more thoroughly if the drawable of a current context goes away.') for how similar issues were dealt with previously.

BTW, it may not be immediately related to this problem, but I suspect commit 711e26466ae04ae93ff4c48d377d83d68a6320e9 ('DRI2: handle drawable destruction properly at DRI2SwapComplete time') can't work properly for windows - it tries to defer the drawable destruction, but that's not possible for windows.

Yeah, I know it's just a bad workaround. It makes the crashes much less frequent but definitely doesn't address the real problem. Thanks for the heads up on the window destruction too; I think more of this code needs to use resource IDs and lookups for tracking rather than the drawable pointers. Eric made a change like this to the intel driver...

I wonder if there's a better way of tracking the GLX drawable vs X window drawable handles in both the Mesa and X code; it seems like we've been confusing them a lot lately.

Created an attachment (id=34687)
Make sure corresponding X drawable exists before trying to use it

Looked over this some more with Kristian today. It looks like the root of the problem is that the GLX drawable we're destroying in __glXdrawableDestroy has a stale corresponding X drawable, which was destroyed in the GLX DestroyWindow hook (added by 120286aef59dabdb7c9fa762e08457e5cc8ec3a6).

That'll make the X drawable go away, but we'll be left with a pDraw pointing into freed space, probably crashing when we get to __glXdrawableDestroy for the actual GLX drawable in question.

This patch does a lookup on the X drawable ID corresponding to the GLX drawable at destruction time, and ignores it if it no longer exists. Kristian put together an alternate approach as well but it needs a little work: http://people.freedesktop.org/~krh/glx-mess.patch.

And per the last comment, DRI2 priv destruction also needs some fixing; I think it's now safe to always free the priv and clear the private wherever we call DRI2FreeDrawable, since we check everywhere we need to whether the priv still exists and so shouldn't crash.

(In reply to comment #21)
> Created an attachment (id=34687) [details]
> Make sure corresponding X drawable exists before trying to use it
>
> Looked over this some more with Kristian today. It looks like the root of the
> problem is that the GLX drawable we're destroying in __glXdrawableDestroy has a
> stale corresponding X drawable, which was destroyed in the GLX DestroyWindow
> hook (added by 120286aef59dabdb7c9fa762e08457e5cc8ec3a6).
>
> That'll make the X drawable go away, but we'll be left with a pDraw pointing
> into freed space, probably crashing when we get to __glXdrawableDestroy for the
> actual GLX drawable in question.
>
> This patch does a lookup on the X drawable ID corresponding to the GLX drawable
> at destruction time, and ignores it if it no longer exists. Kristian put
> together an alternate approach as well but it needs a little work:
> http://people.freedesktop.org/~krh/glx-mess.patch.
>
> And per the last comment, DRI2 priv destruction also needs some fixing; I think
> it's now safe to always free the priv and clear the private wherever we call
> DRI2FreeDrawable, since we check everywhere we need to whether the priv still
> exists and so shouldn't crash.

Jesse, I tested your patch and it fixed the X crash issue in MeeGo. Thank you.

*** Bug 27473 has been marked as a duplicate of this bug. ***

Changed in xorg (Debian):
status: New → Confirmed

(In reply to comment #21)
> Looked over this some more with Kristian today. It looks like the root of the
> problem is that the GLX drawable we're destroying in __glXdrawableDestroy has a
> stale corresponding X drawable, which was destroyed in the GLX DestroyWindow
> hook (added by 120286aef59dabdb7c9fa762e08457e5cc8ec3a6).

glxDestroyWindow() itself doesn't destroy the DrawableRec, but as I've pointed out over and over again, the destruction of a window indeed cannot be deferred.

> That'll make the X drawable go away, but we'll be left with a pDraw pointing
> into freed space, probably crashing when we get to __glXdrawableDestroy for the
> actual GLX drawable in question.

The purpose of glxDestroyWindow() was to clean up such dangling references. Apparently it's missing some of them (now).

> This patch does a lookup on the X drawable ID corresponding to the GLX drawable
> at destruction time, and ignores it if it no longer exists. Kristian put
> together an alternate approach as well but it needs a little work:
> http://people.freedesktop.org/~krh/glx-mess.patch.

Yeah, when we discussed this kind of issue earlier, Kristian was against using IDs everywhere, and I think there are good arguments for not doing that.

> And per the last comment, DRI2 priv destruction also needs some fixing; I think
> it's now safe to always free the priv and clear the private wherever we call
> DRI2FreeDrawable, since we check everywhere we need to whether the priv still
> exists and so shouldn't crash.

If you're referring to the last paragraph of comment #19, I'm not sure that'll do. If there are outstanding swaps referencing the DrawableRec of a window being destroyed, those swaps will need to be cancelled or whatever is necessary to clean up the dangling references.

(In reply to comment #24)
> (In reply to comment #21)
> > Looked over this some more with Kristian today. It looks like the root of the
> > problem is that the GLX drawable we're destroying in __glXdrawableDestroy has a
> > stale corresponding X drawable, which was destroyed in the GLX DestroyWindow
> > hook (added by 120286aef59dabdb7c9fa762e08457e5cc8ec3a6).
>
> glxDestroyWindow() itself doesn't destroy the DrawableRec, but as I've pointed
> out over and over again, the destruction of a window indeed cannot be deferred.

Right, thus this patch.

> > That'll make the X drawable go away, but we'll be left with a pDraw pointing
> > into freed space, probably crashing when we get to __glXdrawableDestroy for the
> > actual GLX drawable in question.
>
> The purpose of glxDestroyWindow() was to clean up such dangling references.
> Apparently it's missing some of them (now).
>
> > This patch does a lookup on the X drawable ID corresponding to the GLX drawable
> > at destruction time, and ignores it if it no longer exists. Kristian put
> > together an alternate approach as well but it needs a little work:
> > http://people.freedesktop.org/~krh/glx-mess.patch.
>
> Yeah, when we discussed this kind of issue earlier, Kristian was against using
> IDs everywhere, and I think there are good arguments for not doing that.

Such as? Do you have ideas for an alternate approach? Or do you like Kristian's proposed patch better?

>
> > And per the last comment, DRI2 priv destruction also needs some fixing; I think
> > it's now safe to always free the priv and clear the private wherever we call
> > DRI2FreeDrawable, since we check everywhere we need to whether the priv still
> > exists and so shouldn't crash.
>
> If you're referring to the last paragraph of comment #19, I'm not sure that'll
> do. If there are outstanding swaps referencing the DrawableRec of a window
> being destroyed, those swaps will need to be cancelled or whatever is necessary
> to clean up the dangling references.

I don't know how to handle this yet, I'll look at it some more...

Robert Hooker (sarvatt) wrote :

I've got an xserver package with the patch on the linked fdo bug [1] here which fixes the segfaults for me. It will take a few hours to build on the PPA though.

https://edge.launchpad.net/~sarvatt/+archive/bugfixes

Also the patched xserver is available in xorg-edgers, but if you are going to test one please try the above linked one instead.

https://edge.launchpad.net/~xorg-edgers/+archive/ppa

[1] https://bugs.freedesktop.org/show_bug.cgi?id=26394#c21

(In reply to comment #25)
> Such as? Do you have ideas for an alternate approach? Or do you like
> Kristian's proposed patch better?

I'll leave the argument to Kristian, as he was previously arguing against patches of mine using ID lookup, but I thought the alternative approach should be clear by now: Clean up all references to the DrawableRec of a window being destroyed.

(In reply to comment #26)
> (In reply to comment #25)
> > Such as? Do you have ideas for an alternate approach? Or do you like
> > Kristian's proposed patch better?
>
> I'll leave the argument to Kristian, as he was previously arguing against
> patches of mine using ID lookup, but I thought the alternative approach should
> be clear by now: Clean up all references to the DrawableRec of a window being
> destroyed.

Ok I'll chat with Kristian. The main problem with the cleanup approach is that it's a bit more invasive than the small patch here, and seems to violate some of the existing GLX layering; I was hoping you had something clever to make the impact small. I agree that we'll add some lookup overhead, I'm just not sure how much impact that will have.

Not to derail, but I just wanted to say that the patch in comment #21 did fix all 4 machines I've tested in a ubuntu lucid environment. Thank you!

There's a xserver package here for anyone else experiencing this on lucid and wanting to test it. The publisher is being a little slow today so it may not show up for a bit.

https://edge.launchpad.net/~sarvatt/+archive/bugfixes

Download full text (3.2 KiB)

(In reply to comment #24)
> (In reply to comment #21)
> > Looked over this some more with Kristian today. It looks like the root of the
> > problem is that the GLX drawable we're destroying in __glXdrawableDestroy has a
> > stale corresponding X drawable, which was destroyed in the GLX DestroyWindow
> > hook (added by 120286aef59dabdb7c9fa762e08457e5cc8ec3a6).u
>
> glxDestroyWindow() itself doesn't destroy the DrawableRec, but as I've pointed
> out over and over again, the destruction of a window indeed cannot be deferred.
>
> > That'll make the X drawable go away, but we'll be left with a pDraw pointing
> > into freed space, probably crashing when we get to __glXdrawableDestroy for the
> > actual GLX drawable in question.
>
> The purpose of glxDestroyWindow() was to clean up such dangling references.
> Apparently it's missing some of them (now).

It (120286aef59dabdb7c9fa762e08457e5cc8ec3a6) always missed GLX windows created using glXCreateWindow. Those have a different XID than the X window. For the DestroyWindow hook to work correctly, we have to store the __GLXdrawable as a private on the window, look it up in the hook and then call FreeResource on the glx drawable XID.

> > This patch does a lookup on the X drawable ID corresponding to the GLX drawable
> > at destruction time, and ignores it if it no longer exists. Kristian put
> > together an alternate approach as well but it needs a little work:
> > http://people.freedesktop.org/~krh/glx-mess.patch.
>
> Yeah, when we discussed this kind of issue earlier, Kristian was against using
> IDs everywhere, and I think there are good arguments for not doing that.

The idea was mostly to avoid the lookup overhead - not that it's a too expensive operation, but since we already looked up the drawable pointer once we might as well hold on to it. The other concern is robustness in case of reused XIDs.

> > And per the last comment, DRI2 priv destruction also needs some fixing; I think
> > it's now safe to always free the priv and clear the private wherever we call
> > DRI2FreeDrawable, since we check everywhere we need to whether the priv still
> > exists and so shouldn't crash.
>
> If you're referring to the last paragraph of comment #19, I'm not sure that'll
> do. If there are outstanding swaps referencing the DrawableRec of a window
> being destroyed, those swaps will need to be cancelled or whatever is necessary
> to clean up the dangling references.

Before 711e26466ae04ae93ff4c48d377d83d68a6320e9, the callback mechanism was designed so that the we would hold on to the DRI2 private and pass it back in the DRI2SwapComplete callback. While we can't prevent a window from being destroyed, we can make the DRI2 private ref-counted and keep it alive long enough that any outstanding flips finish. If a window goes away while we have a flip pending, we can NULL the DrawablePtr in the DRI2 private. When the DRI2SwapComplete eventually happens, the DRI2 private is still valid, and will allow us to reliably discover that the window is gone. That mechanism worked well enough I think, but the problem was that we didn't always check to see if the drawable was gone in the callback. Jesse is saying...

Read more...

Robert Hooker (sarvatt) wrote :

I've confirmed this has fixed 4 machines of mine now with no side affects under a stock lucid environment so I'm attaching a debdiff.

Created an attachment (id=34730)
Alternate fix

To sum up, what I'd recommend is that we pick up Francisco's patch:

  http://lists.freedesktop.org/archives/xorg-devel/2010-February/005525.html

and add the patch attached here on top to properly destroy GLX drawables in the DestroyWindow hook, whether they were created through the compatibility mechanism or glXCreateWindow(). Compared to Jesses patch, this doesn't have issues with reused XID (not sure that it's a big issue though).

If we go with Jesses patch we should be able to just drop the DestroyWindow hook, ie revert 120286aef59dabdb7c9fa762e08457e5cc8ec3a6.

happyhamster (morrgiah) wrote :

Robert's patch (post #22) works fine here too. No crashes on lucid beta-2 (AMD64) when closing Quadrapassel, and no other complaints. (Clean installs on 2 machines; Intel and AMD/ATI graphics.)

Bryce Harrington (bryce) wrote :

Patch looks good to me, I've committed it to our xserver git

Changed in xorg-server (Ubuntu Lucid):
milestone: none → ubuntu-10.04-beta-2
status: Confirmed → Fix Committed
Bryce Harrington (bryce) wrote :

The package is uploaded and waiting on archive admin approval.

subject: [ubuntu/lucid] xorg-server 2:1.7.6-2ubuntu2 (Waiting for approval)

RoiSoleil (heliosgilles) on 2010-04-07
Changed in xorg-server (Ubuntu Lucid):
status: Fix Committed → Fix Released
Steve Langasek (vorlon) on 2010-04-08
Changed in xorg-server (Ubuntu Lucid):
status: Fix Released → Fix Committed
milestone: ubuntu-10.04-beta-2 → ubuntu-10.04
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xorg-server - 2:1.7.6-2ubuntu2

---------------
xorg-server (2:1.7.6-2ubuntu2) lucid; urgency=low

  [Bryce Harrington]
  * Add 113_quell_nouveau_aiglx.patch: Don't emit error message about
    AIGLX on nouveau. 3D is not supported yet on -nouveau so this error
    message serves only to confuse bug reporters.
    (LP: #529590)

  [Robert Sarvatt]
  * Add 114_dri2_make_sure_x_drawable_exists.patch: Makes sure
    a corresponding X drawable exists before trying to use it, fixing
    xserver segfaults under DRI2 when closing down GLX apps.
    (LP: #550218)
 -- Bryce Harrington <email address hidden> Wed, 31 Mar 2010 16:37:45 -0700

Changed in xorg-server (Ubuntu Lucid):
status: Fix Committed → Fix Released

Reassigning to Kristian as he is pushing the fix.

Created an attachment (id=35001)
DRI2: Track DRI2 drawables as resources, not privates

The main motivation here is to have the resource system clean up the
DRI2 drawable automatically so glx doesn't have to. Right now, the
glx drawable resource must be destroyed before the X drawable, so that
calling DRI2DestroyDrawable doesn't crash. By making the DRI2
drawable a resource, GLX doesn't have to worry about that and the
resource destruction order becomes irrelevant.

https://bugs.freedesktop.org/show_bug.cgi?id=26394

Signed-off-by: Kristian Høgsberg <email address hidden>

Created an attachment (id=35002)
glx: Drop DestroyWindow hook

Now that glx doesn't call DRI2DestroyDrawable anymore, we don't need to
force a specific resource destruction order in the DestroyWindow hook.

Signed-off-by: Kristian Høgsberg <email address hidden>

Created an attachment (id=35003)
DRI2: Track DRI2 drawables as resources, not privates

The main motivation here is to have the resource system clean up the
DRI2 drawable automatically so glx doesn't have to. Right now, the
glx drawable resource must be destroyed before the X drawable, so that
calling DRI2DestroyDrawable doesn't crash. By making the DRI2
drawable a resource, GLX doesn't have to worry about that and the
resource destruction order becomes irrelevant.

https://bugs.freedesktop.org/show_bug.cgi?id=26394

Signed-off-by: Kristian Høgsberg <email address hidden>

Created an attachment (id=35004)
glx: Drop DestroyWindow hook

Now that glx doesn't call DRI2DestroyDrawable anymore, we don't need to
force a specific resource destruction order in the DestroyWindow hook.

Signed-off-by: Kristian Høgsberg <email address hidden>

Created an attachment (id=35005)
DRI2: Track DRI2 drawables as resources, not privates

The main motivation here is to have the resource system clean up the
DRI2 drawable automatically so glx doesn't have to. Right now, the
glx drawable resource must be destroyed before the X drawable, so that
calling DRI2DestroyDrawable doesn't crash. By making the DRI2
drawable a resource, GLX doesn't have to worry about that and the
resource destruction order becomes irrelevant.

https://bugs.freedesktop.org/show_bug.cgi?id=26394

Signed-off-by: Kristian Høgsberg <email address hidden>

Created an attachment (id=35006)
glx: Drop DestroyWindow hook

Now that glx doesn't call DRI2DestroyDrawable anymore, we don't need to
force a specific resource destruction order in the DestroyWindow hook.

Signed-off-by: Kristian Høgsberg <email address hidden>

Attachment 35005 and attachment 35006 are the latest proposed fix. They've been posted to xorg-devel for review, blocking on keithp to review and commit at this point.

The patches are now on xserver master, please give that a try and close the bug if it fixes the problem for you.

Kristian:

I have a question about the patch "DRI2: Track DRI2 drawables as resources, not privates", it calls AddResource() in DRI2CreateDrawable() to add DRI2 drawable as resources, but where we free them, should we call FreeResourceByType() to free the DRI2 drawable in ProcDRI2DestroyDrawable() ?

Also in my test in MeeGo. I find that in ProcDRI2CreateDrawable(), sometimes the value of "pDrawable->id" is 0, which is different with stuff->drawable. that would cause some problem. because we will get an old DRI2 drawable from DRI2GetDrawable() that attached to the zero value of "pDrawable->id". I think we need to add some check to make sure "pDrawable->id" is synced with "stuff->drawable".

I attached a draft patch that addressing above two points, please review.

Created an attachment (id=35191)
DRI2-drawable-id-check-and -free-resource.patch

with the below 4 commits in xserver, I don't see crash anymore.

glx: Drop DestroyWindow hook
DRI2: Track DRI2 drawables as resources, not privates
glx: Let the resource system destroy pixmaps
glx: Track GLX 1.3 style GLX drawables under their X drawable ID as well

Created an attachment (id=35195)
Also handle AIGLX

This patch I posted to the xorg-devel list also handles the DRI2CreateDrawable() call in __glXDRIscreenCreateDrawable(). However, I'm not sure it's valid to just assign the pixmap id field like this.

See bug 27767 for one possible symptom of this problem.

Changed in xorg (Debian):
status: Confirmed → Fix Released
stel (stel-onshore) wrote :

not for me. i'm to date but still can't log into X. either there is more to the problem or this bug is different from #533901. here's my system info.

kubuntu lucid (w/ medibuntu source.list) dist-upgrade just now 2am 5-6-2010
 kernel 2.6.32-22-generic #33-Ubuntu SMP
 kdm
 AMD Athlon(tm) Processor LE-1660
 MCP78S [GeForce 8200] SMBus
 asus M3N78-VM
 using nvidia-current package = v 195.136.15

stel (stel-onshore) wrote :

btw, there isn't a pkg called xorg-server. i assume you mean xserver-xorg

William Chambers (bioselement) wrote :

I'm marking this as new as I'm not sure the protocol for this. It seems that this bug is the same as the one hitting kubuntu's kdm.

Changed in xorg-server (Ubuntu Lucid):
status: Fix Released → New
Bryce Harrington (bryce) wrote :

No, the previous commenter is using the proprietary -nvidia driver so it's going to be something unrelated. Particularly since this bug was fixed and confirmed so (and if it actually did regress a lot of people would notice.)

It is better practice to file a NEW bug and reference other bugs you think are related, rather than reopen closed ones when you are not certain.

Changed in xorg-server (Ubuntu Lucid):
status: New → Fix Released

*** Bug 28380 has been marked as a duplicate of this bug. ***

Changed in xorg-server:
importance: Unknown → Critical
status: Confirmed → In Progress
Changed in xorg-server:
importance: Critical → Unknown
Changed in xorg-server:
importance: Unknown → Critical
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.