nfs4 leaves megabytes of errors in syslog
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
nfs-utils (CentOS) |
New
|
Undecided
|
Unassigned | ||
nfs-utils (Ubuntu) |
Incomplete
|
Undecided
|
Unassigned |
Bug Description
Binary package hint: nfs-common
I have a little 2-computer network using NFS.
The client is ubuntu 8.04.1, the server is the current Debian testing
distribution.
When I share files with nfs4, the connection sometimes hangs
so you can do "ls n" (where "n" is a NFS mountpoint) and it
hangs forever. It typically runs for tens of minutes with light use
before it freezes. When it freezes, the kernel is still running,
everything is running that does not try to access files mounted
over NFS.
I know it's not a server freeze because I can connect
to the same NFS server, same exported file system from
another computer, and it'll work (at least for a while).
I've seen this both ways: where Ubuntu freezes, but
Debian can access itself via NFS and where Debian freezes
but Ubuntu can access Debian via NFS.
The equivalent configuration works reliably with NFSv3.
When it's frozen, my /var/log/syslog on the Ubuntu client
side rapidly fills up with error messages:
Aug 16 20:04:47 kitchen kernel: [10854.865221] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 1
Aug 16 20:04:47 kitchen kernel: [10854.866133] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 10017
Aug 16 20:04:47 kitchen kernel: [10854.866849] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 10017
Aug 16 20:04:47 kitchen kernel: [10854.867614] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 1
Aug 16 20:04:47 kitchen kernel: [10854.867995] NFSv4 callback: too many open TCP sockets, consider increasing the number of nfsd threads
Aug 16 20:04:47 kitchen kernel: [10854.868003] NFSv4 callback: last TCP connect from 192.168.3.2, port=42971
Aug 16 20:04:47 kitchen kernel: [10854.869477] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 1
Aug 16 20:04:47 kitchen kernel: [10854.870381] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 10017
Aug 16 20:04:47 kitchen kernel: [10854.871131] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 10017
Aug 16 20:04:47 kitchen kernel: [10854.871874] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 1
Aug 16 20:04:47 kitchen kernel: [10854.872880] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 1
Aug 16 20:04:47 kitchen kernel: [10854.873780] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 10017
Aug 16 20:04:47 kitchen kernel: [10854.874491] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 10017
...
Aug 16 20:04:47 kitchen kernel: [10854.971314] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 22
Aug 16 20:04:47 kitchen kernel: [10854.972318] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 22
Aug 16 20:04:47 kitchen kernel: [10854.974021] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 22
Aug 16 20:04:47 kitchen kernel: [10854.975075] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 22
Aug 16 20:04:47 kitchen kernel: [10854.976335] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 22
Aug 16 20:04:47 kitchen kernel: [10854.977571] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 22
Aug 16 20:04:47 kitchen kernel: [10854.978437] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 22
Aug 16 20:04:47 kitchen kernel: [10854.979118] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 22
Aug 16 20:04:47 kitchen kernel: [10854.979788] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 22
Aug 16 20:04:47 kitchen kernel: [10854.980474] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 22
Aug 16 20:04:47 kitchen kernel: [10854.983996] Error: state recovery failed on NFSv4 server 192.168.3.1 with error 22
...
It's mostly error 22.
This goes on for thousands and thousands of lines, at the rate of 1000 lines per second!
$ lsb_release -rd
Description: Ubuntu 8.04.1
Release: 8.04
$
$ apt-cache policy nfs-common
nfs-common:
Installed: 1:1.1.2-2ubuntu2.1
Candidate: 1:1.1.2-2ubuntu2.1
Version table:
*** 1:1.1.2-2ubuntu2.1 0
500 http://
100 /var/lib/
1:
500 http://
$
Relevant lines from /etc/fstab: (NFS4 is currently commented out,
but it was active earlier).
# desk.local:/gpk /home/gpk/n nfs4 bg,intr 0 0
desk.local:
# desk.local:
desk.local:
Here's /etc/exports on the server (again, the NFS4 lines are currently commented out,
but they were active a little while ago):
# /export/big 192.168.
/export/big 192.168.
gpk@desk:~$
ssh, web, and ping connections between the two machines work nicely.
It's a standard wired network, specified in /etc/networks/
I've changed the title because I've gotten NFS4 to work reliably by changing the configuration. Now, I do only one mount, rather than two.
So, it's really that NFS4 cannot handle a misconfiguration gracefully, or that it does not detect a misconfiguration. And/or that the documentation doesn't make it obvious that it isn't an allowed configuration.