dbus-launch hangs at session start waiting on socket output in libxcb

Bug #232364 reported by Remove Me on 2008-05-20
98
This bug affects 1 person
Affects Status Importance Assigned to Milestone
D-Bus
Fix Released
High
libxcb
Won't Fix
Critical
dbus (Ubuntu)
High
Unassigned
Hardy
Undecided
Unassigned
libx11 (Ubuntu)
Undecided
Unassigned
Hardy
Undecided
Unassigned
libxcb (Ubuntu)
Critical
Unassigned
Hardy
Undecided
Unassigned
linux (Ubuntu)
Undecided
Unassigned
Hardy
Undecided
Unassigned
xfce4-session (Ubuntu)
Undecided
Unassigned
Hardy
Undecided
Unassigned
xfce4-utils (Ubuntu)
Critical
Cody A.W. Somerville
Hardy
Critical
Cody A.W. Somerville

Bug Description

Ubuntu 8.04

It is not reproducible (I don't know when it happens). But it happens often enough.
dbus-launch can't be killed with SIGTERM (nor it dies when X session is killed, nor after
Ctrl-Alt-Backspace) when it happens and seem to hang on some synchronization
routine IIRC from the last time I tried to debug the issue.

I installed dbgsyms this times and will try to debug dbus-launch next time it happens.

BTW, isn't dbus-launch supposed to exit after dbus-daemon is started (for the user session)?

[Workaround]
kill the "dbus-launch --sh-syntax --exit-with-session" with signal 9. This allows the login process to finish.

Or else just reboot several times. It seems to be a race condition and some people see it affecting login only intermittently.

[Background]
Due to various problems in the venerable xlib, Xorg upstream created the "X C-Language Bindings" (XCB). Debian and Ubuntu switched to an xcb-enabled libx11 in the Hardy timeframe. Prior to this, a known issue with Java (bug LP: #86103) prevented us from shipping XCB in Xorg. A patch to enable 'sloppy locking' solved the java issue and allowed Ubuntu to follow Debian in shipping with this enabled in Hardy.

[Next Steps]
XCB is a new technology and as such is a prominent suspect, however we've not yet proven it as the culprit beyond a shadow of doubt. A non-XCB libx11 package has been prepared for testing, and so the first step is to demonstrate conclusively that the issue is completely absent with that package.

Since Hardy is an LTS, it is important that we have this issue fixed, however disabling XCB in libx11 maybe too short sighted; doing so could just unpredictably generate regressions in other packages, and it really only sweeps the problem under the rug for us to re-encounter later. Much better would be to work with upstream to get this issue resolved definitively. Going forward, as more X client applications start depending on XCB, having it available in Hardy will be of obvious benefit. But if we can show that a non-XCB libx11 resolves the issue, and no other viable workaround or solution comes to light, we may have no choice than to fall back to that.

Tried to reproduce this in Xnest/Xvnc on fatou with

  /usr/bin/dbus-launch --sh-syntax --exit-with-session startkde

But I can't. How can I help here?

I can reproduce it if I just ssh to fatou and start
# dbus-launch --sh-syntax --exit-with-session bash
(starts a bash under the dbus-launch process)
then I can see (or rather not see) lost characters as I type

since I can now test it from remote ...

/work/built/mbuild/oldboy-dmueller-35098/x86_64/dbus-1-x11-1.1.4-9.x86_64.rpm

this fixes the problem for me from remote, will try again with startkde
as I get to the office.

I believe the hanging keyboard is a different issue, at least I hope so. please try oldboy-dmueller-35101

I don't know how exactly to reproduce it either, but it seems to happen when also ssh-askpass-x11 is involved. ro's login script started dbus-launch and then started ssh agent, asking for password. after that, all ewly started apps were hanging in an blocking read() called from deeply under XOpenDisplay from the X server.

I've tried to reproduce it on my machine but I can't, but I only run on one cpu.

ro, can you paste the exact sequence of commands that invoked ssh-password query and then startkde? perhaps I was overlooking something

in .xinitrc:
---------------------------------------------------------------------------
WINDOWMANAGER="$dbuslaunch --sh-syntax --exit-with-session $WINDOWMANAGER"
SSH_AG="ssh-agent"
exec $SSH_AG ~/bin/ssh_agent_script
---------------------------------------------------------------------------

bin/ssh_agent_script:
---------------------------------------------------------------------------
#!/bin/bash

~/bin/sshaskwhile
exec $WINDOWMANAGER
---------------------------------------------------------------------------

bin/sshaskwhile
---------------------------------------------------------------------------
#!/bin/bash

until ssh-add -l | grep $USER@ > /dev/null; do
    ssh-add .ssh/identity .ssh/id_rsa < /dev/null
done
---------------------------------------------------------------------------

oops: and no, 35101 does not help either :(

Dirk, can you reproduce this now with the instructions Rudi provided?

you can remove the ssh stuff from any test setup.

reproduced it here on my laptop MacBookPro with a Core2Duo CPU

let .xinitrc end as follows:
......................................................................
# add dbus-launch if found
dbuslaunch="`which dbus-launch 2>/dev/null`"
if [ -n "$dbuslaunch" ] && [ -x "$dbuslaunch" ]; then
    WINDOWMANAGER="$dbuslaunch --sh-syntax --exit-with-session $WINDOWMANAGER"
fi

#
exec $WINDOWMANAGER

# call failsafe
exit 0
.......................................................................

and that's enough to reproduce the hang.

I can't reproduce this issue on shannon (x86_64), also running STABLE. Dirk, if you can reproduce it, please reopen.

I can not reproduce it on my machine (single processor). it is reproduceable on multicore though.

btw the bugreport was opened because imho the autostart-dbus stuf fin the global xinitrc should be removed

So you want to remove this completely from /etc/X11/xinit/xinitrc.common?

#
# Launch dbus if no session is activ or if the session is not reachable
#
if dbuslaunch="$(type -p dbus-launch)" && dbussend="$(type -p dbus-send)" ; then
    dbustest () {
        $dbussend --session --type=method_call \
                  --dest='org.freedesktop.DBus' \
                  /org/freedesktop/DBus \
                  org.freedesktop.DBus.NameHasOwner \
                  string:'org.freedesktop.DBus' > /dev/null 2>&1
    }
    if test -z "$DBUS_SESSION_BUS_ADDRESS" || ! dbustest ; then
        WINDOWMANAGER="$dbuslaunch --sh-syntax --exit-with-session \
                       $WINDOWMANAGER"
    fi
fi
unset dbuslaunch dbussend dbustest

IIRC it has been introduced by intention. So why now remove it again?

because I don't think it serves any purpose. it also seems that xcb_connect is apparently racy and likes to deadlock if dbus-launch is involved. Fixing the latter is much preferable of course, but removing the code that triggers it is fine with me.

Ok. I talked with Dirk, and what he remembered was, that starting the windowmanger via dbus was still required at the time the code was introduced.
So I'll simply remove the code now. :-)

XUbuntu 8.04, 2.6.26-rc3

It is not reproducable (I don't know when it happens). But it happens often enough.
dbus-launch can't be killed with SIGTERM (nor it dies when X session is killed, nor after
Ctrl-Alt-Backspace) when it happens and seem to hang on some synchronization
routine IIRC from the last time I tried to debug the issue.

I installed dbgsyms this times and will try to debug dbus-launch next time it happens.

BTW, isn't dbus-launch supposed to exit after dbus-daemon is started (for the user session)?

: ~; gdb -p 3045
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu".
Attaching to process 3045
Reading symbols from /usr/bin/dbus-launch...Reading symbols from /usr/lib/debug/usr/bin/dbus-launch...done.
done.
Reading symbols from /usr/lib/libX11.so.6...done.
Loaded symbols for /usr/lib/libX11.so.6
Reading symbols from /lib/tls/i686/cmov/libc.so.6...done.
Loaded symbols for /lib/tls/i686/cmov/libc.so.6
Reading symbols from /usr/lib/libxcb-xlib.so.0...Reading symbols from /usr/lib/debug/usr/lib/libxcb-xlib.so.0.0.0...done.
done.
Loaded symbols for /usr/lib/libxcb-xlib.so.0
Reading symbols from /usr/lib/libxcb.so.1...Reading symbols from /usr/lib/debug/usr/lib/libxcb.so.1.0.0...done.
done.
Loaded symbols for /usr/lib/libxcb.so.1
Reading symbols from /lib/tls/i686/cmov/libdl.so.2...done.
Loaded symbols for /lib/tls/i686/cmov/libdl.so.2
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /usr/lib/libXau.so.6...done.
Loaded symbols for /usr/lib/libXau.so.6
Reading symbols from /usr/lib/libXdmcp.so.6...done.
Loaded symbols for /usr/lib/libXdmcp.so.6
Reading symbols from /lib/tls/i686/cmov/libnss_compat.so.2...done.
Loaded symbols for /lib/tls/i686/cmov/libnss_compat.so.2
Reading symbols from /lib/tls/i686/cmov/libnsl.so.1...done.
Loaded symbols for /lib/tls/i686/cmov/libnsl.so.1
Reading symbols from /lib/tls/i686/cmov/libnss_nis.so.2...done.
Loaded symbols for /lib/tls/i686/cmov/libnss_nis.so.2
Reading symbols from /lib/tls/i686/cmov/libnss_files.so.2...done.
Loaded symbols for /lib/tls/i686/cmov/libnss_files.so.2
0xb8002424 in __kernel_vsyscall ()
(gdb) bt
#0 0xb8002424 in __kernel_vsyscall ()
#1 0xb7e8484d in select () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7da309a in _xcb_in_read_block (c=0x80579a8, buf=0x8057040, len=8)
    at xcb_in.c:248
#3 0xb7da2343 in xcb_connect_to_fd (fd=13, auth_info=0xbff1cdf0)
    at xcb_conn.c:133
#4 0xb7da4a51 in xcb_connect (displayname=0x0, screenp=0x0) at xcb_util.c:279
#5 0xb7f43717 in _XConnectXCB () from /usr/lib/libX11.so.6
#6 0xb7f2c029 in XOpenDisplay () from /usr/lib/libX11.so.6
#7 0x0804b3de in x11_init () at dbus-launch-x11.c:218
#8 0x0804abb2 in main (argc=5, argv=0xbff1d5a4) at dbus-launch.c:432
(gdb) quit
The program is running. Quit anyway (and detach it)? (y or n) EOF [assumed Y]
Detaching from program: /usr/bin/dbus-launch, process 3045

I tried replacing select in _xcb_in_read_block (actually read_block) with
poll and open the polling loop, like in the attached patch.
Now I cannot reproduce the problem at all (it was very hard, need
many reboots, before)

Just opening the loop turned out to be not enough.

Now, I don't understand what (in this case) seem to workaround
the issue when I use poll.

Ok, poll does not workaround anything. It just seem make the hang
happen even more seldom, but does not eliminate the problem.

Remove Me (remove-me) wrote :

http://ubuntuforums.org/showthread.php?s=e0e7563685c47e976781c3f846c1a6b5&p=4918094#post4918094

seems to be the same issue. The workaround (to disable ssh-agent lines in /etc/xdg/xfce4/xinitrc) works for me too.

I can confirm this bug. "dbus-launch --sh-syntax --exit-with-session " is hanging on __kernel_vsyscall() which usually indicates the problem is else where and could be a million different things. The select() is on the fd that is the bidirectional link to the X server - maybe X is not fully initialized or waiting on dbus? Or maybe it is within the ioctl of a shonky device driver since beyond that point it'll call the read routine in a device driver?

Changed in dbus:
importance: Undecided → High
milestone: none → ubuntu-8.04.2
status: New → Confirmed

From bug #232122:

 Alex Riesen wrote on 2008-05-27: (permalink)

After a while testing the ssh-agent workaround I have to say that it *does not* work.
It is harder to get the hanging, but it still happens.
The problem is still there.

Also note, killing the "dbus-launch --sh-syntax --exit-with-session" with signal 9 works for me and the login process finishes.

"thread apply all bt" in gdm returns nothing. However, strace outputs:

select(14, [13], NULL, NULL, NULL

and then hangs. This, I believe, means that it is waiting for activity on fd 13.

Output of lsof follows for the process (full output of lsof attached):

COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
dbus-laun 5524 cody-somerville cwd DIR 8,4 4096 2 /
dbus-laun 5524 cody-somerville rtd DIR 8,4 4096 2 /
dbus-laun 5524 cody-somerville txt REG 8,4 21416 2095178 /usr/bin/dbus-launch
dbus-laun 5524 cody-somerville mem REG 8,4 38412 1376724 /lib/tls/i686/cmov/libnss_files-2.7.so
dbus-laun 5524 cody-somerville mem REG 8,4 34352 1376726 /lib/tls/i686/cmov/libnss_nis-2.7.so
dbus-laun 5524 cody-somerville mem REG 8,4 83708 1376721 /lib/tls/i686/cmov/libnsl-2.7.so
dbus-laun 5524 cody-somerville mem REG 8,4 30436 1376722 /lib/tls/i686/cmov/libnss_compat-2.7.so
dbus-laun 5524 cody-somerville mem REG 8,4 16616 2094387 /usr/lib/libXdmcp.so.6.0.0
dbus-laun 5524 cody-somerville mem REG 8,4 6988 2094376 /usr/lib/libXau.so.6.0.0
dbus-laun 5524 cody-somerville mem REG 8,4 9684 1376718 /lib/tls/i686/cmov/libdl-2.7.so
dbus-laun 5524 cody-somerville mem REG 8,4 93832 2094104 /usr/lib/libxcb.so.1.0.0
dbus-laun 5524 cody-somerville mem REG 8,4 4172 2095100 /usr/lib/libxcb-xlib.so.0.0.0
dbus-laun 5524 cody-somerville mem REG 8,4 1364388 1376715 /lib/tls/i686/cmov/libc-2.7.so
dbus-laun 5524 cody-somerville mem REG 8,4 944876 2095128 /usr/lib/libX11.so.6.2.0
dbus-laun 5524 cody-somerville mem REG 8,4 109152 1342259 /lib/ld-2.7.so
dbus-laun 5524 cody-somerville 0r CHR 1,3 6194 /dev/null
dbus-laun 5524 cody-somerville 1u CHR 1,3 6194 /dev/null
dbus-laun 5524 cody-somerville 2u CHR 1,3 6194 /dev/null
dbus-laun 5524 cody-somerville 3u unix 0xef15d1c0 14430 socket
dbus-laun 5524 cody-somerville 4r FIFO 0,5 14350 pipe
dbus-laun 5524 cody-somerville 5w FIFO 0,5 13719 pipe
dbus-laun 5524 cody-somerville 6w FIFO 0,5 14350 pipe
dbus-laun 5524 cody-somerville 7r FIFO 0,5 13723 pipe
dbus-laun 5524 cody-somerville 8r FIFO 0,5 14351 pipe
dbus-laun 5524 cody-somerville 9w FIFO 0,5 14351 pipe
dbus-laun 5524 cody-somerville 10r FIFO 0,5 14352 pipe
dbus-laun 5524 cody-somerville 11w FIFO 0,5 14352 pipe
dbus-laun 5524 cody-somerville 12u CHR 1,3 6194 /dev/null
dbus-laun 5524 cody-somerville 13u unix 0xef15da80 14454 socket
dbus-laun 5524 cody-somerville 15u unix 0xef05a380 14386 /tmp/seahorse-xcipkb/S.gpg-agent
dbus-laun 5524 cody-somerville 17r FIFO 0,5 14436 pipe

Correction: In my previous comment, I meant to say gdb and not gdm. :)

Forwarding a Ubuntu bug:
https://bugs.edge.launchpad.net/ubuntu/+source/libxcb/+bug/232364

A number of Xubuntu users have been experiencing failures on startup when launching dbus-launch. Backtraces indicate the problem always occurs during a select() call in _xcb_in_read_block. The freezes are intermittently reproducible (i.e., restart several times and eventually it'll come up).

(gdb) bt
#0 0xb8002424 in __kernel_vsyscall ()
#1 0xb7e8484d in select () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7da309a in _xcb_in_read_block (c=0x80579a8, buf=0x8057040, len=8)
    at xcb_in.c:248
#3 0xb7da2343 in xcb_connect_to_fd (fd=13, auth_info=0xbff1cdf0)
    at xcb_conn.c:133
#4 0xb7da4a51 in xcb_connect (displayname=0x0, screenp=0x0) at xcb_util.c:279
#5 0xb7f43717 in _XConnectXCB () from /usr/lib/libX11.so.6
#6 0xb7f2c029 in XOpenDisplay () from /usr/lib/libX11.so.6
#7 0x0804b3de in x11_init () at dbus-launch-x11.c:218
#8 0x0804abb2 in main (argc=5, argv=0xbff1d5a4) at dbus-launch.c:432
(gdb) quit

strace also shows that the hang is occurring on a select call:

  select(14, [13], NULL, NULL, NULL

Hi,

 I'm the Xubuntu Team Lead. Please let me know if I can do anything to assist in fixing/testing this bug.

Cheers,

The traces so far show libxcb, so it's probably not worth having a libx11 component open on it.

Changed in libx11:
status: New → Invalid

Here is a copy of my /var/log/Xorg.0.log - note, his bug did not occur this uptime.

Attached is the xfce4 xinitrc script that launches the offending dbus-launch --sh-syntax --exit-with-session process. You'll notice that my copy is slightly modified to include debug statement (which were unhelpful in assisting me locate said offending process as the script always completed as expected).

Bryce Harrington (bryce) wrote :

Please try setting this environment variable, to see if the locking logic has an effect (this change was made in Hardy on March 11th):
  export LIBXCB_DISABLE_SLOPPY_LOCK=1

Changed in libxcb:
status: New → Incomplete
Bryce Harrington (bryce) wrote :

Thanks for attaching the files, I've forwarded the bug upstream at https://bugs.freedesktop.org/show_bug.cgi?id=16420. Please subscribe to the upstream bug, so if they have questions or suggestions, they can reach you directly.

Changed in libxcb:
importance: Undecided → High
status: Incomplete → Triaged
Bryce Harrington (bryce) wrote :

Fwiw, there appears to be a rewrite of the socket handling, discussed here:

  http://lists.freedesktop.org/archives/xcb/2008-March/003347.html

However, in reviewing the patches they do not appear to directly relate to _xcb_in_read_block() or its parents.

Here is dbus-launch wrapped in strace which leads me to believe that maybe it xcb is returning to dbus.

Changed in libxcb:
status: Unknown → Confirmed

Link to a similar fixed bugreport on opensuse:
https://bugzilla.novell.com/show_bug.cgi?id=361180
The problem was sidesteped, I don't know if this can be done as well here.

Created an attachment (id=17263)
dbus-launch trace

cody-somerville@mercurial:~$ cat /usr/bin/dbus-launch
#!/bin/sh

exec /usr/bin/strace /usr/bin/dbus-launch.real "$@" 2> /tmp/dbus-launch.out

Created an attachment (id=17264)
strace after killing process

Created an attachment (id=17265)
lsof output

Created an attachment (id=17266)
fd/pid listing

From the postkill:

[pid 7877] read(20, 0x8056f3c, 4096) = -1 EAGAIN (Resource temporarily unavailable)
[pid 7877] ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfd17a18) = -1 ENOTTY (Inappropriate ioctl for device)
[pid 7877] select(21, [20], NULL, [20], NULL) = 1 (in [20])
[pid 7877] read(20, "", 4096) = 0

Is it just me, or does this seem to occur more on xubuntu-desktop than the gnome-desktop?? (It happens on both for me, but is *A LOT* more frequent on the xubuntu-session for me

Further research seems to indicate to me that this is a known issue with XCB/Xlib since 2004.

http://osdir.com/ml/freedesktop.xcb/2004-03/msg00001.html

"So I'm declaring Xlib/XCB usable, aside from one irritating bug that I
haven't figured out how to track down yet (some apps sometimes hang
until an event is recieved). It'd help a lot if people would try
building freedesktop.org's Xlib with `configure --with-xcb`, and let me
know whether it works for them as a drop-in replacement for normal Xlib."

http://xcb.freedesktop.org/XCBCompletedTasks/

"17 Mar 2004
<...>
Current known Xlib/XCB bugs:
    * Internal connections are broken, which I think means input methods won't work.
    * Apps sometimes hang until an event is received.
    * Last I checked, the display managers I tested crashed"

http://www.sidefx.com/index.php?option=com_forum&Itemid=172&page=viewtopic&p=52983&sid=01795b3f1f48fc2ea7bd55870380772e

"Non-technical/non-Linux people, feel free to skip this paragraph: For those who want to know, it's libX11's 1.1 release, which builds libX11 on top of XCB, the new low-level X library and protocol. There seems to be several deadlocks involved in waiting for responses from the X server."

and finally http://bugs.freedesktop.org/show_bug.cgi?id=9528

Changed in libxcb:
importance: High → Critical
Changed in xfce4-session:
status: New → Invalid
description: updated
Changed in linux:
status: New → Invalid
Bryce Harrington (bryce) wrote :

The locking assertion failure sounds like bug 185311.

Since we're seeing a few issues with traces involving libxcb, I've prepared a non-xcb-enabled libx11 here: http://people.ubuntu.com/~bryce/Testing/libx11/ . (I'm not certain I reverted all the deps correctly, so let me know if there are dependency-related issues with this. You probably should save your old libx11 debs in case the lack of xcb causes problems in other apps.)

Would someone mind testing with this noxcb libx11 and see if it makes any difference?

Bryce Harrington (bryce) on 2008-06-21
description: updated
Bryce Harrington (bryce) wrote :

Ignore that first sentence in the prior reply (paste-o).

uzi (uzzi09) wrote :

Syslog: I believe the crash occurred somewhere between 22:00 - 22:20, but I'm not positive unfortunately (Sorry). I pasted a large chunk anyway because there were some "WARNING"s earlier and to avoid cutting out things that may be useful to you guys that I may not know about

uzi (uzzi09) wrote :

Sorry, I forgot to mention that the system had been perfectly fine since I gave up on xubuntu-desktop and stuck with my default, gnome-desktop, until I decided to play an flv file in xine. It always tends to happen when there is something graphics intensive that takes place.... Hope it helps

Changed in dbus:
status: Unknown → Fix Released

I had thought that dbus --auto-launch was launched by dbus --sh-syntax --exit-with-session but after reading the man page, I realized that it only occurs when something tries to access dbus and there is no session bus already started. A quick check showed that gnome-screensaver which is launched just shortly before dbus --sh-syntax --exit-with-session depends on dbus which lead me to hypothesis that the race is between the two dbus-launch processes caused by the libxcb issues. After moving dbus-launch before gnome-screensaver, I'm happy to report that it doesn't appear to hang anymore. Although this does not prove that it is fixed (since the issue is intermittent and I may have just gotten lucky) and it definitely doesn't fix the libxcb problems, I'm going to upload my changes to Intrepid right away as I'd like it to be shipped with first alpha of Intrepid. I'll begin working on an SRU tomorrow after getting more feedback and doing more tests provided the issue doesn't come back. :)

Changed in xfce4-utils:
assignee: nobody → cody-somerville
importance: Undecided → Critical
milestone: none → intrepid-alpha-1
status: New → In Progress
uzi (uzzi09) wrote :

Is it possible for me to give it a try? I seem to be able to replicate the issue on my machine by playing an flv in xine and then turning up the volume. Attached is one of the syslogs.

I thought about turning off visual effects (which were on moderate originally) and give it a try, and it seems to work fine without them.

Changed in xfce4-utils:
milestone: intrepid-alpha-1 → ubuntu-8.04.1
assignee: nobody → cody-somerville
importance: Undecided → Critical
milestone: none → ubuntu-8.04.1
status: New → In Progress
milestone: ubuntu-8.04.1 → intrepid-alpha-1
Steve Langasek (vorlon) wrote :

invalid for intrepid, so presumed invalid for hardy as well.

Changed in libx11:
status: New → Invalid
Changed in xfce4-session:
status: New → Invalid
Changed in linux:
status: New → Invalid

Uploaded xfce4-utils_4.4.2-4ubuntu1.1 to hardy-proposed.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xfce4-utils - 4.4.2-8ubuntu3

---------------
xfce4-utils (4.4.2-8ubuntu3) intrepid; urgency=low

  * debian/patches/04_avoid-xcblib-hang.patch: NEW
    - Avoid race condition and deadlock in libxcb by launching
       dbus-launch before gnome-screensaver (which should occur anyhow).
    - Added sleep 1 after dbus-launch to help avoid race condition.
    - Fixes lp: #232364
  * debian/control:
    - Updated maintainer field to Xubuntu Developers.
    - Bumped standards version to 3.8.0

 -- <email address hidden> (Cody A.W. Somerville) Mon, 30 Jun 2008 09:33:01 -0300

Changed in xfce4-utils:
status: In Progress → Fix Released
Remove Me (remove-me) wrote :

Since when the race conditions get _fixed_ by sticking random "sleep"s somewhere?
Looks like temporary workaround (which likely will not work for some slower machine).

The sleep statement is not the fix, launching dbus-launch before gnome-screensaver is the fix. The sleep statement is there simply as a supportive measure in case my last few weeks of testing was just pure luck.

Changed in xfce4-utils:
status: In Progress → Triaged
Steve Langasek (vorlon) wrote :

Accepted into -proposed, please test and give feedback here. Please see https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in dbus:
status: New → Fix Committed
Changed in libxcb:
status: New → Fix Committed
Changed in xfce4-utils:
milestone: ubuntu-8.04.1 → none
status: Triaged → Fix Committed
Steve Langasek (vorlon) on 2008-06-30
Changed in dbus:
status: Fix Committed → New
Changed in libxcb:
status: Fix Committed → New
Changed in xfce4-utils:
milestone: none → ubuntu-8.04.1
Remove Me (remove-me) wrote :

Still looks fishy. _Why_ can gnome-screensaver block dbus-launch? Was that ever found? How did you find it is gnome-screensave, BTW?

Besides, it is the login process longer by 1 second. It is already annoyingly slow.

As I've already described in an earlier comment, it was not gnome-screensaver particularly blocking but the second instance of dbus-launch (dbus-launch --autolaunch) gnome-screensaver initiated when it attempted to send a dbus message with no active session (before my SRU, dbus-launch (dbus-launch --sh-syntax --exit-with-session) was called after gnome-screensaver). The actual freeze/block is caused by the libxcb race/architecture issues. I was able to determine this via research, extensive time spent in gdb, and some good ol' intuition.

Your last point is a good one. The sleep statement probably should be removed - I simply added it "just in case". My rationale was that users would rather wait an extra second then for the login process to freeze all together. However, on my computer after applying my changes (with out the sleep), I've had no trouble what so ever.

yaztromo (tromo) wrote :

To quote Cody in this very thread:

"I realized that it only occurs when something tries to access dbus and there is no session bus already started. A quick check showed that gnome-screensaver which is launched just shortly before dbus --sh-syntax --exit-with-session depends on dbus which lead me to hypothesis that the race is between the two dbus-launch processes caused by the libxcb issues."

There is your answer.

Bryce Harrington (bryce) wrote :

Marking libxcb won't fix for Hardy because a) while we know it's a bug in libxcb, the only fix we know of at this time is to disable libxcb entirely - with the side effect of breaking compiz for everyone, b) Cody's workaround by shifting when gnome-screensaver starts up seems likely to avoid the problem condition, and c) if/when a fixed libxcb becomes available, it will probably be too large of a patch to be SRU'd.

However, leaving this open for Intrepid since we expect to see a new libxcb in time for Intrepid, that should hopefully solve this issue.

Changed in libxcb:
status: New → Won't Fix
Bryce Harrington (bryce) wrote :

Invalidating dbus task because we're pretty sure we know it to be a libxcb issue, and can be worked around in the xfce4-utils scripts.

Changed in dbus:
status: Confirmed → Invalid
status: New → Invalid
Remove Me (remove-me) wrote :

Ok, thanks for the explanation.
Do you think gnome-screensaver wont even attempt to start dbus-launch (through dbus library itself, supposedly), if it finds it preset in environment? In that case the sleep will be definitely unnecessary.

A quick inspection of dbus sources (_dbus_transport_open in dbus-transport.c) seems to show that the first successfully opened transport will be used. The transport gets selected according to the current environment (including information stored in X display atoms).

Agreed. I added the sleep statement to /help make sure/ our dbus-launch had properly started the bus session before starting gnome-screensaver.

Okay, I've been using the -proposed version since it was accepted and I'm still good :) Other individuals report happiness as well.

No sightings of the bug since then for me, hopefully the workaround did it, thank you for great work!

Steve Langasek (vorlon) wrote :

Thanks for the feedback. Marking as 'verification-done' and copying to hardy-updates (the fix is already included in the 8.04.1 CD images).

Changed in xfce4-utils:
status: Fix Committed → Fix Released
Remove Me (remove-me) wrote :

I confirm the problem gone too. Even with that "sleep 1" removed.

Martin Pitt (pitti) wrote :

Why was ubuntu-archive subscribed to this bug? I don't see anything to be done for archive admins here?

Hello!

I submitted today a bug on freedesktop.org https://bugs.freedesktop.org/show_bug.cgi?id=16617 which consider similar problem. I prepared the simplest test case which
reproduce a problem with libX11 and xcb with multithreaded applications. The problem considers Ubuntu 8.04 which using libXCB.

Maybe this bug help resolving those issue too.

Regards,

Bryce Harrington (bryce) wrote :

[Karol's case is already covered in bug #232476]

Changed in libxcb:
assignee: nobody → bryceharrington
Martin Pitt (pitti) wrote :

Unsub'ing ubuntu-archive. Please resubscribe with concrete instructions if there's something to be done from u-a.

Mark Painter (mpainter) wrote :

I still see this problem with the updated xfce4-utils, including when using xscreensaver rather than gnome-screensaver. If I increase the sleep a ridiculous amount (eg a minute), login will proceed rather than deadlock. This is on an amd64 install, I've yet to test it with similar hardware on i386.

Can you please attach "ps ux -e" and a traceback when the deadlock is occurring Mark?

Mark Painter (mpainter) wrote :

I've determined that this is happening to me because of a program I am starting out of /etc/X11/Xsession.d/ was causing a dbus-launch start. So I don't think there's anything more that needs to be done from the xfce side. I suppose the init script could check for an autolaunched dbus process and avoid the hang, but it won't leave the xfce session with dbus set up nicely.

Also of note, this hang does not occur for me when I do not use the --exit-with-session flag, and I've been removing that and adding a kill for it to the xinitrc.

For completeness, a backtrace of the dbus-launch started from xinitrc:
#0 0x00007fa915278d53 in __select_nocancel () from /lib/libc.so.6
#1 0x00007fa914d9637b in _xcb_in_read_block (c=0x6118c0, buf=<value optimized out>, len=8) at xcb_in.c:248
#2 0x00007fa914d957a9 in xcb_connect_to_fd (fd=<value optimized out>, auth_info=0x7fff1da2ac40)
    at xcb_conn.c:133
#3 0x00007fa914d97ae0 in xcb_connect (displayname=<value optimized out>, screenp=<value optimized out>)
    at xcb_util.c:279
#4 0x00007fa91555152a in _XConnectXCB () from /usr/lib/libX11.so.6
#5 0x00007fa91553a7c6 in XOpenDisplay () from /usr/lib/libX11.so.6
#6 0x0000000000403957 in ?? ()
#7 0x000000000040337e in ?? ()
#8 0x00007fa9151c61c4 in __libc_start_main () from /lib/libc.so.6
#9 0x0000000000401b09 in ?? ()
#10 0x00007fff1da2b4e8 in ?? ()
#11 0x0000000000000000 in ?? ()

Bryce Harrington (bryce) on 2008-10-15
Changed in libxcb:
assignee: bryceharrington → nobody

I have a user of Intrepid who used to be able to login who is affected now.
He has no internet, so no changes in software installed triggered the issue.
Login in X using the ssh option is a workaround for him for the time being.

Steve Langasek (vorlon) wrote :

It's my understanding that the libxcb in jaunty has significantly refactored the locking code. Is this bug still an issue in Ubuntu 9.04?

Remove Me (remove-me) wrote :

Haven't seen it for a long while. OTOH, I modified the xfce startup scripts back then
to workaround the problem, and I'm not sure if I ever turned the Ubuntu scripts back.

IOW, I have no idea.

Andrew Pollock (apollock) wrote :

So is there any way we can get libxcb patched or backported for Hardy? Given it's LTS, this, and other third-party software breakage is giving us some grief. So far, we've found that building the existing Hardy version of libxcb without optimisation has helped, so I have no idea if that makes the problem a compiler bug or not. We're still investigating.

Bjorne (bjorn39) wrote :

I have not read all the comments here so this might be outdated info.
After the latest update, i got a problem when logging in... the keyboard and mouse was not responding so i culd not get in. Strange enough the Ctrl+Alt+del combination worked and pressing the Numlock and capslock turned on the led's on the keyboard but still not responding when trying to press the alfabetic keys.
I found out though that if i press "print screen/sysreq" key it suddenly worked again.

Bryce Harrington (bryce) on 2009-09-02
tags: added: hardy

I can't actually believe this was ever an XCB bug. The strace output posted on the launchpad bug shows that it was waiting for the connection setup response from the X server, and if that never arrived, it's hard to imagine how it could be XCB's fault.

I could believe, though, that two instances of dbus-launch somehow deadlocked against each other. Perhaps one calls XGrabServer, then waits for the other one to finish connecting to the X server?

The fix that Ubuntu seems to have settled on, if I'm reading the launchpad bug correctly, is to ensure that there aren't two dbus-launch instances racing each other. That seems plausible to me.

Changed in libxcb:
status: Confirmed → Invalid
Bryce Harrington (bryce) wrote :

According to the upstream bug report this is not an xcb bug but rather dbus. If you disagree, please reopen the upstream bug report with additional detail.

Changed in libxcb (Ubuntu):
status: Triaged → Invalid
Changed in libxcb:
importance: Unknown → Critical
status: Invalid → Won't Fix
Changed in libxcb:
importance: Critical → Unknown
Changed in libxcb:
importance: Unknown → Critical
Changed in dbus:
importance: Unknown → High
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.