ssh child process hangs (Call proc.terminate before proc.wait when possible?)

Bug #686224 reported by dobey
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Expired
Medium
Unassigned

Bug Description

Sometimes, it is possible for the child ssh process to seemingly be stuck waiting for something, causing the parent to hang in wait. The fix applied for bug #659590 however, does not fix this. But calling proc.terminate before proc.wait, does cause the problem to go away. It seems safe to terminate the process at this point, since the stdout/stdin have already been closed.

Tags: hpss ssh
Revision history for this message
Martin Pool (mbp) wrote :

Thanks for the report.

Just killing the subprocess seems like kind of a big hammer, and one I'd rather not apply until we either know the problem, or we have consciously given up on understanding it.

If you can reproduce this can you please tell us

1- what strace shows that bzr and the child are doing
2- ditto 'ps o pid,wchan $SSH_PID'
3- 'lsof -p $SSH_PID'
4- the C-level backtrace of the child (attach using gdb, etc.)

You may need to run some of them as root to get around current security restrictions.

summary: - Call proc.terminate before proc.wait when possible
+ ssh child process hangs (Call proc.terminate before proc.wait when
+ possible?)
tags: added: hpss ssh
Changed in bzr:
importance: Undecided → Medium
status: New → Incomplete
Revision history for this message
Max Bowsher (maxb) wrote :

I'd go so far as to say that bzr should NOT WAIT for ssh children to exit. Consider the use of ssh in ControlMaster auto mode. A ssh child invoked by bzr may remain active managing a multiplexed connection long after bzr itself is done with the channels being run over that process's stdin/out.

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 686224] Re: ssh child process hangs (Call proc.terminate before proc.wait when possible?)

On 9 December 2010 21:53, Max Bowsher <email address hidden> wrote:
> I'd go so far as to say that bzr should NOT WAIT for ssh children to
> exit. Consider the use of ssh in ControlMaster auto mode. A ssh child
> invoked by bzr may remain active managing a multiplexed connection long
> after bzr itself is done with the channels being run over that process's
> stdin/out.

I'm a bit surprised to discover this is true. I had assumed that the
opportunistic master would detach itself once the original command
completed, but apparently not. So I agree, yes, we shouldn't wait.
We could do a non-blocking check whether they have exited but even
that may be of limited value.

--
Martin

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Bazaar because there has been no activity for 60 days.]

Changed in bzr:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.