Interrupting bzr Ctrl-C does not kill openssh subprocess

Bug #141172 reported by Andrew Bennetts
6
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Medium
Unassigned

Bug Description

Jono Lange reports that when bzr is interrupted with Ctrl-C while connected to a bzr+ssh:// service, that the openssh child process is not terminated. i.e the old process, and thus the old connection, hangs around indefinitely until the server closes its end.

Ideally the openssh child process would be killed when the bzr parent process is.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

There was a lot of discussion about this issue, which I now can't find. Was it on some mailing list?

Revision history for this message
Jonathan Lange (jml) wrote :

I remember most of the conversations as voice, so I can't point you to anything. However, I'd be able to answer specific questions that you have.

Revision history for this message
John A Meinel (jameinel) wrote :

I believe the first portion is that we didn't want to kill the ssh subprocess instantly, so that we have time to recover from errors, rather than just getting EOF or broken pipe from the connection.

But I agree that we should eventually get the process to close down and exit.

Even more, though, I think that if bzr+ssh is waiting on a lock on the server end, even if the client is killed, it won't detect that until it can actually grab the lock and then tries to talk to the client. Which causes some odd double-locking issues (I think that is a different bug, though.)

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Ah, yes, it was the server side double locking that I was worrying about. Is there a bug number for that?

Revision history for this message
John A Meinel (jameinel) wrote :

I don't think it has been officially reported as such.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I filed bug 172392.

Martin Pool (mbp)
Changed in bzr:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Martin Pool (mbp) wrote :

See also bug 140579 about Launchpad desire for this.

tags: added: hpss launchpad ssh
Revision history for this message
Andrew Bennetts (spiv) wrote :

We do disable SIGINT in the SSH subprocess, see the os_specific_subprocess_params in bzrlib/transport/ssh.py (to fix bug 5987, which is what John is referring to I think).

Perhaps we should keep a set of active SSH transports (maybe using weakrefs?), and have an almost top-level try/except handler for KeyboardInterrupt that sends SIGINT to them. Ideally we should be closing them directly as part of cleanups when unwinding the stack, but it seems reasonable to have insurance in case we don't clean up our internals perfectly.

In fact, we already have that set of weakrefs, bzrlib.transport.ssh._subproc_weakrefs, with callbacks that try to close those subprocesses when the last reference to that transport goes away. So if we are still leaving openssh subprocesses, either:

 * _close_ssh_subproc isn't strong enough (it doesn't send SIGINT, it just closes stdin and stdout of the process then waits)
 * the transport object isn't being garbage collected as bzr shuts down, which is probably fairly likely, especially considering the way we use os._exit (but could be possible even if we didn't)

So probably it is reasonable to change e.g. bzrlib.commands.exception_to_return_code to call _close_ssh_subproc or send SIGINT to those subprocesses if they are still live. The main trick would seem to be avoiding importing bzrlib.transport.ssh if it wasn't already imported.

Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
Jelmer Vernooij (jelmer)
tags: removed: check-for-breezy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.