lstat on NFS4 hangs while bzr's trying to read the dirstate file

Bug #403697 reported by poelzi
32
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Bazaar
Invalid
Medium
Unassigned
nfs4-acl-tools (Ubuntu)
New
Undecided
Unassigned

Bug Description

bzr's use of OSLocks appears to cause wedges in the NFS4 code, but either NFS or bzr has changed sufficiently in the karmic cycle that the problem has gone away.

steps to reproduce:
bzr branch lp:pybindgen
cd pybindgen
bzr status

this locks up bzr so hard, not even a kill -9 kills the process. in gdb i can't get a backtrace, however, running in strace a trace can be made and the process can be killed. strange...

Revision history for this message
poelzi (poelzi) wrote :
Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 403697] [NEW] hard lockup on bzr command

The strace (if accurate) shows the last operation is

lstat("/home/poelzi/Projects/pybindgen/pybindgen/.bzr/checkout/dirstate",
 <unfinished ...>

This isn't normally something bzr should be able to cause to hang even
if it tried, so I would suspect some filesystem or kernel problem.
What fs are you using?

--
Martin <http://launchpad.net/~mbp/>

summary: - hard lockup on bzr command
+ bzr hangs trying to lstat dirstate file
Changed in bzr (Ubuntu):
status: New → Incomplete
status: Incomplete → Confirmed
Changed in bzr:
status: New → Incomplete
Changed in bzr (Ubuntu):
status: Confirmed → Incomplete
Changed in bzr:
importance: Undecided → Medium
Revision history for this message
Martin Pool (mbp) wrote : Re: bzr hangs trying to lstat dirstate file

Those operations work for me.

Revision history for this message
poelzi (poelzi) wrote :

i have to investigate here more. reason i did the bugreport is that someone was able to reproduce it.

it's on a nfs share. the filesystem is fine, i checked the server but as i can't access the directory on the client, even a cd hangs, something is very wrong. i will reboot soon. and report back.

Revision history for this message
poelzi (poelzi) wrote :

i did some more investigation now.
The problem seem to be that the checkout is on an nfs(4) share. After a fresh reboot everything is fine, i can cd into the .bzr/checkout folder and access files. After i run any bzr command the command freezes and the directory is not accessible anymore. Looks like a file lock problem to me.

i also tested bzr from karmic and there the problem does not occur.

Revision history for this message
Robert Collins (lifeless) wrote :

Closing because the problem has gone or so we're told ;)

tags: added: dirstate dirstate2
description: updated
Changed in bzr:
status: Incomplete → Invalid
Changed in bzr (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
Martin Pool (mbp) wrote : Re: bzr stat hangs on nfs4 share

2009/11/28 Albert Cervin <email address hidden>:
> I have a nfs4 share exported (/etc/exports further down) on a server
> running karmic. When i access bazaar branches on this share from my
> client machine and do bzr stat, everything just hangs.
>
> /etc/exports line on the server
> /srv/stuff
> 192.168.0.0/255.255.255.0(rw,sync,fsid=0,crossmnt,no_subtree_check)
>
> I then mount the share with default options on a machine running karmic
> desktop edition (64-bit).
>
> Similar behaviour has been discussed here
> (https://bugs.launchpad.net/ubuntu/+source/bzr/+bug/403697) but in that
> thread it's said that the problem goes away in karmic but that is not
> the case for me.
>
> I can also add that the problem does not occur with other Bazaar
> commands like bzr diff, or if i use NFS3.

Hi Albert,

It does sound like you're hitting the same problem as in 403697 -
running strace on the bzr client process would probably tell us for
sure, if it is hanging during an lstat call.

So you should probably just subscribe to that bug and I'll reopen it.
I suspect it is an OS bug and may not be something bzr can work
around.

It may help if you can capture a network trace showing the NFS
operations up to this point, using wireshark - if you don't know how
to do that, don't worry, and we'll let somebody from the Ubuntu bug
team help.

--
Martin <http://launchpad.net/~mbp/>

summary: - bzr hangs trying to lstat dirstate file
+ lstat on NFS4 hangs while bzr's trying to read the dirstate file
affects: bzr (Ubuntu) → ubuntu
Changed in ubuntu:
status: Invalid → New
Revision history for this message
Martin Pool (mbp) wrote :

>
> I can also add that the problem does not occur with other Bazaar
> commands like bzr diff, or if i use NFS3.

If it's consistently reproducible that you don't get a problem with diff that might help localize the bug.

Revision history for this message
Martin Pool (mbp) wrote :

Albert Cervan writes:

I also think it is an OS problem. I keep getting this error in the os logs: svc: failed to register lockdv1 RPC service (errno 97). This is the nfs OS lock service, right?

Revision history for this message
Fabio Marconi (fabiomarconi) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better.
Is this bug reproducible with the latest Lucid packages ?
Tanks in advance.

Changed in ubuntu:
status: New → Incomplete
Revision history for this message
Sven Heinrich (s-heinrich) wrote : Re: [Bug 403697] Re: lstat on NFS4 hangs while bzr's trying to read the dirstate file
Download full text (3.5 KiB)

Thank you, that you are spending time with this bug. This bug came up
with Karmic. Now I use Lucid and still have problems with the NFS lock.
I received another message concerning the locking, see below. But I am
not that experienced with our NFS that is why I want to check the NFS
installation with our admin first.

Sven Heinrich

*Subject: **[Bug 98836] Re: [MASTER] "OS locks must die" - dirstate file
write locks exclude readers and limit portability*
*Reply-To: *Bug 98836 <<email address hidden>
<mailto:<email address hidden>>>

John, bzr works fine on NFS unless your NFS installation is broken, as
is explained perfectly clearly in bug 108605. This bug is saying it
would be nice to work around misconfigured NFS. There's also a 'nolock'
NFS option if you want it. Don't be a troll.

--
[MASTER] "OS locks must die" - dirstate file write locks exclude readers
and limit portability
https://bugs.launchpad.net/bugs/98836
You received this bug notification because you are a direct subscriber
of a duplicate bug (137387).

Status in Bazaar Version Control System: Confirmed

Bug description:
As of bzr 1.12 (and many previous versions) the dirstate working tree
format uses an OS lock on the dirstate file. This was done so that we
could safely make in-place updates to the dirstate file. Bzr also (and
its tied to the OS lock usage) uses an edit-inplace approach to
modifying the file.

However, this causes several problems:

  * While the dirstate file is locked, it cannot be read: eg by info
(bug 174055) or by diff
  * OS locks don't work well on all platforms
  * They are particularly problematic on network filesystems, which
often don't have working file locking either inherently or because of a
configuration problem, eg smb (bug 31006), AFP (bug 114528), nfs (bug
108605
)
  * OS lock behaviour varies between platforms therefore is harder to
test and debug, eg bug 305006
  * On some platforms, OS locks are implicitly shared across a process
and this makes them harder to test and/or hides bugs
  * OS locks can't be broken and don't show who is holding the lock
  * OS locks are not supported by Jython
* When the disk is full or bzr crashes the dirstate file can be shorter
than it should be.

All these bugs are collected into this one bug, as few of them can be
fixed without fixing the OS lock issue, and fixing the OS lock issue
will fix them all.

If the use of diff and stat while a commit editor is open is fixed in a
different way - e.g. by a separate stat cache, then we can just modify
this description to only list the remaining issues.

These bugs are somewhat distinct aspects so shouldn't be marked as
dupes, but they can probably best be fixed together.

Totally fixing this requires changing the format not to rely on file
locking, which requires a format that is safe if it is being read and
written simultaneously. That format can't assume any particular
behaviour if an attempt is made to rename a file while it's being read,
because that can either fail, cause an error for the reader, or follow
the rename, depending on the platform.

Some partial fixes may also be possible.

On 2010-08-05 21:35, Fabio Marconi wrot...

Read more...

Revision history for this message
Albert Cervin (abbe-c) wrote :

The problem does not occour if the files on the nfs server is owned by root:root which suggests the problem has something to do with the nfs4 id-mapper...

Revision history for this message
Albert Cervin (abbe-c) wrote :

It also works correctly if the bzr commands are run as root on the client machine (!?)...

affects: ubuntu → nfs4-acl-tools (Ubuntu)
Changed in nfs4-acl-tools (Ubuntu):
status: Incomplete → New
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.