rpc.statd causes NFS clients to hang while doing hostname lookups
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
nfs-utils (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
It appears that rpc.statd from nfs-utils 1.2.5-3ubuntu3.1 will cause clients to hang if the resolution of the IP address of a client takes several seconds.
We've recently deployed Ubuntu 12.04 NFS servers and shortly after installation clients were hanging. At the time, the server was logging messages like:
Dec 11 20:06:19 nfs-server kernel: [18392.347264] statd: server rpc.statd not responding, timed out
Dec 11 20:06:19 nfs-server kernel: [18392.347298] lockd: cannot monitor client1
Dec 11 20:06:54 nfs-server kernel: [18427.364941] statd: server rpc.statd not responding, timed out
Dec 11 20:06:54 nfs-server kernel: [18427.364972] lockd: cannot monitor client2
Dec 11 20:07:29 nfs-server kernel: [18462.382624] statd: server rpc.statd not responding, timed out
Dec 11 20:07:29 nfs-server kernel: [18462.382654] lockd: cannot monitor client3
I did some stracing of rpc.statd and noticed that it was talking to avahi and then hanging for several seconds. Apparently mdns lookups through avahai have a timeout of 5 seconds. It turns out that a few client systems didn't have working reverse DNS and was causing mdns4 to kick in to attempt to resolve them. The host without the reverse DNS was on a different subnet so it's not surprising that avahi couldn't find it.
At the time our nsswitch.conf had:
hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4
After disabling mdns4 and restarting statd the problem did not reoccur. mdns4_minimal does not exhibit the problem as it filters out most IP addresses:
hosts: files mdns4_minimal [NOTFOUND=return] dns
I suspect that if other directory services (DNS, LDAP, whatever) are slow to return results for the hosts table that it will behave in the same way.
Status changed to 'Confirmed' because the bug affects multiple users.