"bzr break-lock bzr+ssh://bazaar.launchpad.net/..." fails to break a lock

Bug #148087 reported by Andrew Bennetts
4
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
High
Andrew Bennetts
Launchpad itself
Fix Released
High
Jonathan Lange

Bug Description

It appears that "bzr break-lock bzr+ssh://bazaar.launchpad.net/..." fails to actually break locks.

See comments of bug #125420 for some examples. Also reported on IRC and possibly the mailing list IIRC.

The workaround is to use sftp:// instead of bzr+ssh:// with break-lock.

Tags: hpss lp-code
Tim Penhey (thumper)
Changed in launchpad-bazaar:
milestone: 1.1.10 → 1.1.11
Jonathan Lange (jml)
Changed in launchpad-bazaar:
assignee: nobody → jml
importance: Undecided → Medium
Jonathan Lange (jml)
Changed in launchpad-bazaar:
assignee: jml → nobody
status: New → Confirmed
Tim Penhey (thumper)
Changed in launchpad-bazaar:
milestone: 1.1.11 → 1.1.12
Tim Penhey (thumper)
Changed in launchpad-bazaar:
importance: Medium → High
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I don't think this is a problem with the supermirror specifically. I can confuse myself rotten just playing around pushing to bzr+ssh://localhost/tmp/whatever (and locking it by just python2.4 -c 'from bzrlib import branch; branch.Branch.open("bzr+ssh://localhost/tmp/whatever").lock_write()').

I think the problem could be that there are bzr serve processes hanging around after the client has disconnected, i.e. I think this could be a dupe of bug #141172.

Jonathan Lange (jml)
Changed in launchpad-bazaar:
assignee: nobody → jml
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Actually I think this is related to the just filed bug #172392, not 141172.

Revision history for this message
Jonathan Lange (jml) wrote :

This has been observed on vanilla smartservers, so it's not a Launchpad bug.

Changed in launchpad-bazaar:
milestone: 1.1.12 → none
Revision history for this message
Jonathan Lange (jml) wrote :

Andrew has said that he will look at this on Monday. We'd really like this fixed for the 1.0 / 1.12 release.

Changed in bzr:
assignee: jml → spiv
status: Confirmed → Triaged
Revision history for this message
Andrew Bennetts (spiv) wrote :

I'll try to fix this today, hopefully we can get it into bzr 1.0.

Changed in bzr:
milestone: none → 1.0final
Revision history for this message
Andrew Bennetts (spiv) wrote :

Oops, target for 1.0rc3 rather than final!

Changed in bzr:
milestone: 1.0final → 1.0rc3
Revision history for this message
Martin Pool (mbp) wrote :

not for 1.0

Changed in bzr:
milestone: 1.0rc3 → none
status: Triaged → Confirmed
Revision history for this message
Paul Tagliamonte (paultag) wrote :

bug still present in 1.0.0

Revision history for this message
C.Kontros (coryisatm) wrote :

Still present in 1.2.0.candidate.1 (hardy). This is really unacceptable and unfair to people managing projects with LP/BZR. This has been open for 4 months.

Revision history for this message
John A Meinel (jameinel) wrote :

I thought the fix was to change the timeout in the server process. Which means this is actually a launchpad-bazaar bug, not a bzr client bug.

Specifically in bzr-1.1 this was introduced:
        old_lockdir_timeout = lockdir._DEFAULT_TIMEOUT_SECONDS
        try:
            ui.ui_factory = ui.SilentUIFactory()
            lockdir._DEFAULT_TIMEOUT_SECONDS = 0
            smart_server.serve()
        finally:
            ui.ui_factory = old_factory
            lockdir._DEFAULT_TIMEOUT_SECONDS = old_lockdir_timeout

Setting the timeout to 0 seconds means that the bzr service will not hang waiting for the lock, but fail immediately. (This only needs to be present on the server, not on the client).

So marking it fixed in bzr, but open in launchpad-bazaar.

Changed in bzr:
milestone: none → 1.1
status: Confirmed → Fix Released
Changed in launchpad-bazaar:
status: New → Confirmed
Revision history for this message
Jonathan Lange (jml) wrote :

I could have sworn this was fixed...

Changed in launchpad-bazaar:
importance: Undecided → High
milestone: none → 1.2.4
status: Confirmed → Triaged
Jonathan Lange (jml)
Changed in launchpad-bazaar:
assignee: nobody → jml
milestone: 1.2.4 → 1.2.3
Revision history for this message
Jonathan Lange (jml) wrote :

It turns out that it *is* fixed, sort of.

I'm guessing that you are doing 'bzr push', seeing the lock message and then hitting Ctrl-C, because that's the only way I could reproduce this big. When you do that, Bazaar leaves the SSH process lying around still connected to Launchpad (see bug 141172 and bug 172392).

This means that running 'bzr break-lock' actually does break the lock. Unfortunately, the ssh process immediately re-acquires the lock, so when you try to push again, it looks as if break-lock didn't work.

This behaviour is undesirable and work to fix it is being tracked in the bugs I mentioned earlier. I'll have a chat to the Bazaar guys to see if we can get it fixed soon.

Changed in launchpad-bazaar:
status: Triaged → Fix Released
Revision history for this message
John A Meinel (jameinel) wrote :

So.... I believe when we spawn ssh we explicitly first call something to ignore SIGINT before we spawn the process.

I believe we did that originally so that when someone does ^C we would have a chance to clean up before we exited out of our process.

So the question is, how can we have both? Have the subprocess not die until we have done everything we can. We can't just grab the SIGINT and queue it up to be raised later. Because if we don't set the subprocess to ignore it, then it dies right away. If we do set it to ignore it, then there is no way we can send it SIGINT when we are ready.

Anyway, the quick fix would be to do:
=== modified file 'bzrlib/transport/ssh.py'
--- bzrlib/transport/ssh.py 2008-02-08 18:36:35 +0000
+++ bzrlib/transport/ssh.py 2008-03-14 15:19:29 +0000
@@ -587,8 +587,7 @@
         # Running it in a separate process group is not good because then it
         # can't get non-echoed input of a password or passphrase.
         # <https://launchpad.net/products/bzr/+bug/40508>
- return {'preexec_fn': _ignore_sigint,
- 'close_fds': True,
+ return {'close_fds': True,
                 }

But I think that will end up causing more locks to be left around.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.