telnetd: deadlock on cleanup

Bug #507455 reported by Simon Kagstrom
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
netkit-telnet (Ubuntu)
New
Undecided
Unassigned

Bug Description

(Please submit to the debian upstream as well)

The cleanup function in telnetd is called both directly and on SIGCHLD signals. This, unfortunately, triggered a deadlock in eglibc 2.9 while running on a 2.6.31.11 kernel.

 What we were seeing is hangs like these:

      (gdb) bt
      #0 0xb7702424 in __kernel_vsyscall ()
      #1 0xb7658e61 in __lll_lock_wait_private () from ./lib/libc.so.6
      #2 0xb767e7b5 in _L_lock_15 () from ./lib/libc.so.6
      #3 0xb767e6e0 in utmpname () from ./lib/libc.so.6
      #4 0xb76bcde7 in logout () from ./lib/libutil.so.1
      #5 0x0804c827 in cleanup ()
      #6 <signal handler called>
      #7 0xb7702424 in __kernel_vsyscall ()
      #8 0xb7641003 in __fcntl_nocancel () from ./lib/libc.so.6
      #9 0xb767e0c3 in getutline_r_file () from ./lib/libc.so.6
      #10 0xb767d675 in getutline_r () from ./lib/libc.so.6
      #11 0xb76bce42 in logout () from ./lib/libutil.so.1
      #12 0x0804c827 in cleanup ()
      #13 0x0804a0b5 in telnet ()
      #14 0x0804a9c3 in main ()

and what has happened here is that the user closes the telnet session via the escape character. This causes telnetd to call cleanup in frame the SIGCHLD signal is delivered while telnetd is executing cleanup.

Telnetd then calls the signal handler for SIGCHLD, which is cleanup(). Ouch. The actual deadlock is in libc. getutline_r in frame #10 gets the __libc_utmp_lock lock, and utmpname above does the same thing in frame

The fix in the patch registers the SIGCHLD handler as cleanup_sighandler, and makes cleanup disable the SIGCHLD signal before calling cleanup_sighandler.

I wrote a test script to trigger the bug below. Whether it actually triggers it or not probably depends on timing, glibc and/or kernel versions, but here it's quite reliable. login-script is a expect script that logs in, executes a command and exits (not via shell exit though, see above).

The script simply forks 10 telnet sessions in the background in a loop.

#!/bin/sh

tgt=$1
if [ $# -ne 1 ]; then
    echo "Usage: blabla machine-to-test"
    exit 1
fi

n=10
while [ 1 ]; do
    i=0
    while [ $i -lt $n ]; do
 login-script $tgt ls > /dev/null &
 i=$(($i + 1))
    done
    wait
done

Revision history for this message
Simon Kagstrom (simon-kagstrom) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.