Bazaar Version Control System

Comment 15 for bug 98836

I quit using BZR a long time ago because this problem blocked me. I use Git now. Works great.

Winston Wolff
Stratolab - Games for Learning
tel: (646) 827-2242

On Aug 4, 2010, at 11:04 PM, John Gilmore wrote:

> NFS works fine with BZR unless your BZR installation is broken. It's
> always "that other guy's problem". But real users in real network
> configurations who don't happen to give bzr what it demands in terms of
> file locks have been complaining about this for five years. Bzr doesn't
> gracefully fall back; it dies and leaves an ugly corpse behind. Patches
> have been posted for this but never applied. And yes, my NFS server is
> running lockd, but it doesn't actually work with my Ubuntu client's NFS
> implementation. Somehow, my network filesystem all works, flawlessly,
> for years, but not for bzr.
> This bug has fourteen duplicates. In many, you tried to patiently
> explain that it was *their* problem and that if they just teased their
> network configuration long enough, no bug would actually need to be
> fixed. Several of those bug reports had other users tack on comments
> saying, "Me too - I'm having the same trouble". As far as I could tell,
> few of them ever actually got their problem resolved -- they just went
> away unsatisfied. In one of them, #137387, you went so far as to
> analyze a tcpdump of the underlying NFS traffic and detected a possible
> bug in file locking in NFSv4, then wrote "I'm not sure what to do with
> this bug. I'm close to saying it's a server bug or quirk and bzr is not
> doing anything wrong." From other comments, I think it related to
> subtle semantics: can you upgrade a read lock to a write lock by having
> the same program make a second lock on the same file, or does the second
> lock produce some kind of later error? It's much harder to "do the
> obvious thing" in that case, from the wrong end of a network connection
> and without any idea of which user process an NFS request is coming
> from. Instead, a truncate that tries to drop a byte locked by the read
> lock might well return an error, protecting the file contents for the
> reader. Even if you're right and they're wrong, you didn't solve that
> user's problem. The sysadmins involved had been fighting much more
> serious NFS bugs for years; they were hoping to find version control
> software that didn't tickle subtle file locking semantics questions.
> They patched out the read lock and it fixed the problem, but you didn't
> accept the patch. You didn't even take their fix for leaving a lockfile
> lying around when a filesystem lock fails (requiring a bzr break-lock to
> recover). In another report, #114528, two users reported switching
> their project to subversion because it worked on their AFP network and
> bzr didn't. Still no response.
> Rather than telling me and every other bug reporter to reconfigure our
> filesystems and LANs and patch up our kernels, please consider this a
> wake-up call. File locking is giving your users more trouble than it
> cures. The existing code doesn't work cleanly in a wide variety of
> actual installations, despite the specs saying that it should. Even
> when it works, it takes too much sysadmin effort, and when it fails, it
> invariably hurts somebody who never actually needed any file locking,
> somebody who'll never do bzr operations in parallel. Rewriting the
> whole file format seems to be taking a lot of time and isn't working
> yet. Maybe the old file format will work even if you don't lock it, or
> you can add a command that script writers can use to explicitly control
> parallelism for the few who care. Perhaps before the decade is out, you
> or another bzr maintainer could eliminate the misfeature, remove the
> poorly chosen dependency, accept some of the patches, handle an error
> condition, finish the rewrite, fix the bug or whatever it takes. Thank
> you for your work on bzr.
> --
> [MASTER] "OS locks must die" - dirstate file write locks exclude readers and limit portability
> You received this bug notification because you are a direct subscriber
> of a duplicate bug.