NFS4 automount using replicated servers doesn't work

Bug #607039 reported by Stephane Miller
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
autofs5 (Ubuntu)
Fix Released
Medium
Unassigned
Nominated for Lucid by Serge Hallyn
Nominated for Maverick by Serge Hallyn
Nominated for Natty by Serge Hallyn
Nominated for Oneiric by Serge Hallyn
module-init-tools (Ubuntu)
Invalid
Undecided
Unassigned
Nominated for Lucid by Serge Hallyn
Nominated for Maverick by Serge Hallyn
Nominated for Natty by Serge Hallyn
Nominated for Oneiric by Serge Hallyn
nfs-utils (Ubuntu)
New
Medium
Unassigned
Nominated for Lucid by Serge Hallyn
Nominated for Maverick by Serge Hallyn
Nominated for Natty by Serge Hallyn
Nominated for Oneiric by Serge Hallyn

Bug Description

module-init-tools in debian has 'alias nfs4 nfs' in /etc/modprobe.d/aliases.conf. ubuntu does not have that file or that line. We need it because nfs4.ko no longer exists, but 'mount -t nfs4' tries to load it.

Binary package hint: autofs5

[ updated description: mount -t nfs4 server:/mnt /mnt fails through oneiric with -ENODEV from mount.nfs4 ]

In lucid, using autofs5 version 5.0.4-3.1ubuntu5, we are having trouble using failover with NFS. Autofs allows you to specify multiple hosts with weighting in its configuration files and will fall back to a secondary host if the first host is not available at mount time. However, currently on our systems, the mount fails completely. This functionality does work with our RHEL 5 hosts (which use autofs 5.0.1). Automounting from a single host does work, but when we supply multiple hosts, an strace shows that it's trying to connect to the IP address '0.0.0.0', port 0, rather than any of the hosts we specify.

I have attached the full strace output, our /etc/auto.master file, and the relevant file from /etc/autofs.d including the multiple hosts.

Revision history for this message
Stephane Miller (stephaneeee) wrote :
Revision history for this message
Stephane Miller (stephaneeee) wrote :
Revision history for this message
Stephane Miller (stephaneeee) wrote :
Mathias Gug (mathiaz)
Changed in autofs5 (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Dern (nico-sonycom) wrote :

I'm having the same issue using autofs 5.0.4-3.1ubuntu5.1 on lucid.

When using NFS3 specifying replicated servers does work, when using NFS4 it doesn't. If I run automount manually with verbose and debugging enabled I see the
following output when trying to access /usr/local:

#################################################
handle_packet: type = 5
handle_packet_missing_direct: token 296, name /usr/local, request pid
2887
attempting to mount entry /usr/local
lookup_mount: lookup(file): looking up /usr/local
lookup_mount: lookup(file): /usr/local ->
-fstype=nfs4,ro,nodev,nosuid,nonstrict,nodev,sync,_netdev,proto=tcp,retry=10,rsize=8192,wsize=8192,soft server1:/local/ubuntu64 server2:/local/ubuntu64
parse_mount: parse(sun): expanded entry:
-fstype=nfs4,ro,nodev,nosuid,nonstrict,nodev,sync,_netdev,proto=tcp,retry=10,rsize=8192,wsize=8192,soft server1:/local/ubuntu64 server2:/local/ubuntu64
parse_mount: parse(sun): gathered options:
fstype=nfs4,ro,nodev,nosuid,nonstrict,nodev,sync,_netdev,proto=tcp,retry=10,rsize=8192,wsize=8192,soft
parse_mount: parse(sun): dequote("server1:/local/ubuntu64") ->
server1:/local/ubuntu64
parse_mount: parse(sun): dequote("server2:/local/ubuntu64") ->
server2:/local/ubuntu64
parse_mount: parse(sun): core of entry:
options=fstype=nfs4,ro,nodev,nosuid,nonstrict,nodev,sync,_netdev,proto=tcp,retry=10,rsize=8192,wsize=8192,soft, loc=server1:/local/ubuntu64 server2:/local/ubuntu64
sun_mount: parse(sun): mounting root /usr/local, mountpoint /usr/local,
what server1:/local/ubuntu64 server2:/local/ubuntu64, fstype nfs4,
options
ro,nodev,nosuid,nodev,sync,_netdev,proto=tcp,retry=10,rsize=8192,wsize=8192,soft
mount_mount: mount(nfs): root=/usr/local name=/usr/local
what=server1:/local/ubuntu64 server2:/local/ubuntu64, fstype=nfs4,
options=ro,nodev,nosuid,nodev,sync,_netdev,proto=tcp,retry=10,rsize=8192,wsize=8192,soft
mount_mount: mount(nfs): nfs
options="ro,nodev,nosuid,nodev,sync,_netdev,proto=tcp,retry=10,rsize=8192,wsize=8192,soft", nosymlink=0, ro=1
get_nfs_info: called for host server2 proto tcp version 0x40
get_nfs_info: called for host server1 proto tcp version 0x40
mount(nfs): no hosts available
dev_ioctl_send_fail: token = 296
failed to mount /usr/local
####################################################

'get_nfs_info' seems to fail.

I can also confirm it does work properly on CentOS 5.5 running autofs 5.0.1.

Revision history for this message
Dern (nico-sonycom) wrote :

I just verified the problem still exists on Maverick running autofs5 5.0.5-0ubuntu2.

Same scenario: if I list only 1 server the automount works fine, if I add a second server the mount fails completely without leaving any logs.

Revision history for this message
Dern (nico-sonycom) wrote :

The problem does not appear on Fedora using autofs 5.0.5 so it looks like this is an Ubuntu specific bug.

Please note that the description for this bug report is misleading as autofs doesn't do failover (as far as I know). However it is supposed to be able to automaticaly choose an available server from a list of replicated servers at mount time.

Nico

Revision history for this message
Dern (nico-sonycom) wrote : Re: NFS automount using replicated servers doesn't work

I changed the title of this bug report from

   NFS automount failover doesn't work
to
   NFS automount using replicated servers doesn't work

as I believe it better reflects the problem and saying 'failover' in combination with may put of the maintainers as autofs isn't supposed to do real faillover (yet. correct me if I'm wrong).

Anyway, just trying to increase the chances of somebody actually noticing this bug report and doing something about it :-)

Nico

summary: - NFS automount failover doesn't work
+ NFS automount using replicated servers doesn't work
Revision history for this message
Dave Walker (davewalker) wrote :

This needs to be reproduced and determined if it is resolved on the current development version.

Thanks.

Changed in autofs5 (Ubuntu):
assignee: nobody → Serge Hallyn (serge-hallyn)
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Cannot reproduce this on precise, at least (which has the same autofs version as oneiric).

I used 3 hosts. Two had nfs-server installed, one had autofs4 and nfs-client. The two nfs servers had a file /mnt/whoami ('master1' on one and 'master2' on the other) and in /etc/exports had

     /mnt 10.55.60.39(ro)

I first verified that I could mount this on the client by hand. Then, on the client, I created a /srv/mnt directory, /etc/auto.master with:

/- /etc/auto.mnt --timeout=600

and /etc/auto.mnt with:

/srv/mnt -fstype=nfs4,proto=tcp,soft,intr,rsize=8192,wsize=8192 10.55.60.33(1),10.55.60.59(5):/mnt

restarted autofs. Then 'cat /srv/mnt/whoami' on the client showed 'master1'. Unmounted /srv/mnt on the client, and did 'service nfs-kernel-server stop' on master1. Then 'cat /srv/mnt/whoami' on the client showed 'master2'.

Will try on lucid next.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Reproduced on lucid (with -updates as well as with -proposed)

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

On natty, I don't get the 'connect(0.0.0.0)'. However, the mounts still fail, and this time it is a bug in nfs. Even on an oneiric host, if I do

   mount -t nfs server:/mnt /mnt

I get back ENODEV from mount.nfs4.

On natty, if I switch /etc/auto.mnt to use nfs instead of nfs4, it succeeds. I need to retry whether that works there, though I doubt it will.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

I was wrong! In fact using nfs in place of nfs4 does work on lucid.

So I believe this can all be blamed on nfs4 failure on lucid through oneiric.

summary: - NFS automount using replicated servers doesn't work
+ NFS4 automount using replicated servers doesn't work
Changed in autofs5 (Ubuntu):
assignee: Serge Hallyn (serge-hallyn) → nobody
status: New → Fix Released
Changed in nfs-utils (Ubuntu):
importance: Undecided → Medium
description: updated
Revision history for this message
Stefan Bader (smb) wrote :

Doh! I think I found out what it going on... The problem can be, as it has been said before, reduced to attempting a NFS mount with:

mount -tnfs4 <server>:<path> <mntpt>

which will fail with a completely useless message of no such device. It was also said that -tnfs does work. Now the interesting thing is that *after* one successful run with -tnfs, the -tnfs4 does *also* work. I must admit it took me a while to think of module loading. But that was the solution in my tests. Apparently a mount -tnfs automatically loads the nfs module, which does not work for -tnfs4 (I guess because there is some glue missing to tell the system that the nfs module is the correct one for both).

Could those affected try a quick work-around, please? Create a new file name /etc/modprobe.d/nfs4.conf with the following
content:

alias nfs4 nfs

In theory this should make sure the module gets autoloaded and the nfs4 autofs setup should be working...

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 607039] Re: NFS4 automount using replicated servers doesn't work

On Wed, Nov 23, 2011 at 01:48:24PM -0000, Stefan Bader wrote:
> Doh! I think I found out what it going on... The problem can be, as it
> has been said before, reduced to attempting a NFS mount with:

> mount -tnfs4 <server>:<path> <mntpt>

> which will fail with a completely useless message of no such device. It
> was also said that -tnfs does work. Now the interesting thing is that
> *after* one successful run with -tnfs, the -tnfs4 does *also* work.

Note that this is upstream bug #117957, which has been reported some time
ago.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
Stefan Bader (smb) wrote :

@Steve, this seems actually already to be addressed. Unfortunately in a way that is easy to get wrong. The nfs module would for example be autoloaded when the idmapd gets started. And the /etc/default/nfs-common comments clearly say it should be needed for v4. However one seems to be able to get a mount done without it being set to yes. (Probably running into other issues later). It would be good if there would be a way to get those things right in a simpler way. But I cannot think of a good one right now. Should mount.nfs4 start required daemons? But then that may miss sensible configuration...

Revision history for this message
Steve Langasek (vorlon) wrote :

mount.ntfs4 certainly shouldn't start any daemons, but I would expect it to autoload the nfs4 module the same way it previously autoloaded the nfs module.

BTW, for 12.04 we always start idmapd automatically; but if using autofs, it's still not guaranteed that idmapd will have started before autofs starts.

Revision history for this message
Stefan Bader (smb) wrote :

I am not sure whether mount.nfs4 ever did load the right module. At least it does not do it back to Lucid and I have not checked farther back. It may be that nfs once was built in and thus avoided all the issues. The modprobe call happens in mount. Which takes the fstype as the module to load. So I think the way forward is either /etc/modprobe.d alias definition or having the module itself declare that alias. I'll try to suggest the latter but I am not sure how acceptable that is upstream.
The alias is simpler, but question would be which package this should go into.

Revision history for this message
Stefan Bader (smb) wrote :

So as expected upstream was not too exited about a change to the kernel module. In fact the answer was suggesting that using the fstype nfs4 is rather deprecated. And I was observing that even using "mount -tnfs ..." the resulting mount options which could be observed in "cat /proc/mounts" were the same as using nfs4. Unfortunately the man pages for mount (and probably other too) do not reflect this at all.

Revision history for this message
Stefan Bader (smb) wrote :

I carefully went through the various man pages again and found that despite me missing those before, there are indications:

man nfs
...
The fstype field contains "nfs". Use of the "nfs4" fstype in /etc/fstab is deprecated.
...
To mount using NFS version 4, use either the nfs file system type, with the nfsvers=4 mount option, or the nfs4 file system type.

man mount.nfs4
...
Under Linux 2.6.32 and later kernel versions, mount.nfs can mount all NFS file system versions. Under earlier Linux kernel versions, mount.nfs4 must be used for mounting NFSv4 file systems while mount.nfs must be used for NFSv3 and v2.

The man page for mount mentions nfs4 but says nothing specific about nfs4 but mentions it. And the examples in nfs(5) rather show the nfs4 fstype approach. So it is a bit mixed. My feeling would be that the nfs related man pages may be a bit more actively pushing people to drop nfs4 as the fstype. But probably add an alias definition in modprobe.d (if that is allowed by policy).

Steve, this sounds to me like we should open a related debian bug along. Or would we make changes in our package an then ask for things to be picked up?

description: updated
Revision history for this message
Stefan Bader (smb) wrote :

This is actually fixed in Precise by a change to nfs-utils for bug #662711. For that reason I'll mark the bug report here as a duplicate.

Changed in module-init-tools (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.