Comment 50 for bug 974664

Revision history for this message
In , Tim (tim-redhat-bugs) wrote :

Hmm. I have more interesting information....

On 3.3.0-8.fc17.x86_64, where it works, /usr/bin/flock does this (which is odd, I suppose, but still...):

open("/home/tim/tmp/spon", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = 3
flock(3, LOCK_EX) = -1 EIO (Input/output error)
access("/home/tim/tmp/spon", R_OK|W_OK) = 0
close(3) = 0
open("/home/tim/tmp/spon", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 3
flock(3, LOCK_EX) = 0

So it opens the file read-only, then tries to LOCK_EX and the NFS server (entirely reasonably) says "NFS4ERR_OPENMODE, you silly person". Then and only then does /usr/bin/flock get around to checking the permissions, opening it read/write, and doing another LOCK_EX, which works.

On my 3.3.2-4.fc17.x86_64, however, /usr/bin/flock only gets as far as

open("/home/tim/tmp/spon", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = 3
flock(3, LOCK_EX ....never returns

while a sniffer trace shows a continuous stream of OPEN(readonly)/LOCK(write) operations which get a continous stream of replies of the form "yeah, all right" and "what are you smoking?" respectively.

So I would venture to suggest that the problem lurks around the -NFS4ERR_OPENMODE case in nfs4_handle_exception() in fs/nfs/nfs4proc.c. I've got printk()s which clearly show it's going through that path, calling nfs4_schedule_stateid_recovery() and dropping down to wait_on_recovery:

Someone more familiar with the code will probably beat me to working out what the precise circumstances are in which it should be retrying and not returning the error, but I shall have a go at working that out anyway, 'cos I have a warped sense of fun...