If NFS server reboots, client somethings fails to remount the file system properly
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
nfs-utils (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
If the NFS server reboots, sometimes client servers fail to re-establish the NFS mount properly. This is using NFS v4.2. The client machine will hang if I try to do a df when an NFS mount is in this condition. Attempting to access a file will likewise hang indefinitely. This is a problem with the stock 5.3.0.x kernel and it is a problem with 5.6 still. I have not tried 5.7 yet. I filed this against nfs-kernel, but I am not 100% sure it is not nfs-common but there is some sort of communications not happening between them.
ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: nfs-kernel-server 1:1.3.4-2.5ubuntu3
Uname: Linux 5.6.0 x86_64
ApportVersion: 2.20.11-0ubuntu27.2
Architecture: amd64
CasperMD5CheckR
CurrentDesktop: MATE
Date: Tue Jun 2 21:34:31 2020
InstallationDate: Installed on 2017-05-27 (1102 days ago)
InstallationMedia: Ubuntu-MATE 17.04 "Zesty Zapus" - Release amd64 (20170412)
SourcePackage: nfs-utils
UpgradeStatus: Upgraded to focal on 2020-04-26 (38 days ago)
With NFSv4, user and group information is sent on the wire as names, instead of uids. And these names are qualified with a domain. So for example an exported directory containing files for the "ubuntu" user will have the ownership sent to NFSv4 clients as "ubuntu@DOMAIN", where "DOMAIN" is the DNS domain of the server.
Whave I have seen in some testing is that after a reboot, for some reason (probably service ordering), the NFS server bits do not know yet the domain of the machine, and then this becomes "localdomain", and the user is sent to the client as "ubuntu@ localdomain" . The client, being on the same DNS domain, decides that "localdomain" is none of its business, and declares that user as "nobody".
You said there was a "hang", but maybe the issue above is realted somehow? One way to see it is in the client logs, where "nfsidmap" is called. If you configure /etc/request- key.d/id_ resolver. conf to call nfidmap with extra "-v -v -v" in the command line, it will be more verbose in /var/log/syslog and say which user it's trying to resolve.
If you see the "localdomain" issue, then try hardcoding the domain in /etc/idmapd.conf on both the server and the client, to your actual DNS domain, and see if that helps.
Hope this helps