Comment 2 for bug 732322

Revision history for this message
In , Michaelk (michaelk) wrote :

Created attachment 1429
strace sessions of vzctl enter

Hi!

Problem:

When running "vzctl enter <VEID>" the command will hang with latest
version of bash i.e 4.2.7 ( happens with all 4.2.x versions )

I first thought this was bash bug and filed a report to the maintainer
of bash (Chet Ramey) but got back the response below which indicates this
might be a bug in vzctl instead,

Let me know if you need any more information.

There is also a thread discussing this issue at:

https://groups.google.com/group/gnu.bash.bug
https://groups.google.com/group/gnu.bash.bug/browse_thread/thread/0be5df8f41c8b88c#

//Michael

> Hi Chet!
>
> I run 3 different strace sessions (see attached file)
>
> 1. A working session (bash 3.2.25)
>
> # strace -ff -o /tmp/bash_strace/bash_working/bash_working.log vzctl enter 152
>
> 2. A failing session (bash 4.2.7):
>
> # strace -ff -o /tmp/bash_strace/bash_not_working/bash_not_working.log
> vzctl enter 152
>
> for the failing session pstree shows:
>
> bash(23067)---strace(23230)---vzctl(23231)---vzctl(23232)---bash(23233)

I suspect this is a bug in vzctl that was masked by bash-4.1 and previous
versions.

The only change of any significance here is that bash-4.1 closed file
descriptors 3-20 at startup. That's a bug; you can't close fds out
from under libraries like that. This caused mysterious crashes on Mac
OS X, for example when running bash as a login shell under iTerm.
Bash-4.2 sets the fds to close-on-exec instead.

The problem is that vzctl plays fast and loose with file descriptors.
It leaves read and write ends of pipes open in the child process it
forks to exec bash when it uses the other ends internally to communicate
with that child through the pty it opens as the controlling terminal.
The big difference between the non-working and working versions is that
bash-4.2 inherits file descriptors 3, 7, 9, and 10 and leaves them open,
where bash-4.1 closed them.

This results in the the process group that bash-4.2 is using being
orphaned, which makes read() return EOF and the kernel send SIGHUP and
SIGCONT to bash. This is consistent with the strace output.

You can test this by changing shell.c to call close(i) instead of
SET_CLOSE_ON_EXEC(i) around like 541. That's just to prove vzctl has
a bug, however -- I'm not going to revert that change.

Keep in mind that I haven't looked at the vzctl source code, and so don't
have any patches for it. Somehow, though, the file descriptors that
get closed in process 23231 after forking 23232 (in the bash-not-working
set of traces, fds 3,7,9,10) need to get closed in 23233 after 23232 forks
it and before it execs bash.

Let me know how it goes. If you can make the right changes to vzctl and
that fixes the problem, so much the better.