Autofs gives spurious "No such file or directory" with lots of NFS mounts, breaking a typical use pattern

Bug #65499 reported by Asheesh Laroia
20
This bug affects 1 person
Affects Status Importance Assigned to Milestone
autofs (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Binary package hint: autofs

We at acm.jhu.edu use a Solaris NFS server with one filesystem per home directory. We rely on autofs to have user home directories be available on access rather than having them all mounted at once.

To ensure the problem was not our Solaris server, I created a test case between two Ubuntu 6.06 LTS workstations. Note that in an enterprise environment, having a broken autofs is a very important problem.

As root on machine1:

cd /tmp/
for k in $(seq -w 0 999)
do
mkdir -p /tmp/exported/$k
touch /tmp/exported/$k/file-$k
echo "/tmp/exported/$k *(rw)" >> /etc/exports
done
/etc/int.d/nfs-kernel-server reload

As any user on machine2:

cd /net/machine1 # command will hang for a while
cd tmp
cd exported
ls # all dirs appear

Now, even though you are in /net/machine1/tmp/exported/ , if you run "mount" you will see only a few directories (100 or so) are mounted in here. If you "cd" into one of the ones that isn't mounted (e.g., 751 on my system):

paulproteus@machine2:/net/machine1/tmp/exported$ cd 751
-bash: cd: 751: No such file or directory

Now the directory no longer shows up in "ls" either:

paulproteus@machine2:/net/machine1/tmp/exported$ ls | grep 751
paulproteus@machine2:/net/machine1/tmp/exported$

But manually mounting it works:

paulproteus@machine2:/net/machine1/tmp/exported$ sudo mount machine1:/tmp/exported/751 /mnt
paulproteus@machine2:/net/machine1/tmp/exported$ ls /mnt/
file-751
paulproteus@machine2:/net/machine1/tmp/exported$

NOTE: In this bug, I created 1000 exports, but the problem appears with as few 150 exports for me.

In summary, with lots of NFS exports:
* autofs mounts all NFS dirs first, which may fail at doing huge numbers of NFS mounts at once
* some fail to mount, and "cd"ing into those gives incorrect "No such file or directory" results
* This is reproducible easily
* this is a huge problem for enterprise NFS scenarios that depend on autofs

The problem may not be in autofs user-space code but could be in autofs kernel code, I don't know. If there's anything more I can do to provide information about this problem, ask!

Revision history for this message
Siegfried (siegzeit) wrote :

I have the same problem. It makes some work impossible. I even tried different distros like gentoo and it still happens.

Revision history for this message
Russell Phillips (ignissport) wrote :

I have had a similar problem, except directories disappear and then magically reappear a few minutes later. It will also come back after a reboot.

I found this in my syslog, which may be a clue into the problem.

Jan 12 19:45:59 zoidberg automount[6350]: >> /sbin/showmount: can't get address for 192.168.1.2/home
Jan 12 19:45:59 zoidberg automount[6350]: lookup(program): lookup for 192.168.1.2/home failed
Jan 12 19:45:59 zoidberg automount[6350]: failed to mount /net/192.168.1.2/home
Jan 12 19:53:29 zoidberg automount[6551]: >> /sbin/showmount: can't get address for 192.168.1.2/home
Jan 12 19:53:29 zoidberg automount[6551]: lookup(program): lookup for 192.168.1.2/home failed
Jan 12 19:53:29 zoidberg automount[6551]: failed to mount /net/192.168.1.2/home
Jan 12 19:53:29 zoidberg automount[6543]: >> mount.nfs: mounting 192.168.1.2:/home failed, reason given by server:
Jan 12 19:53:29 zoidberg automount[6543]: >> No such file or directory
Jan 12 19:53:29 zoidberg automount[6543]: mount(nfs): nfs: mount failure 192.168.1.2:/home on /net/192.168.1.2/home
Jan 12 19:53:29 zoidberg automount[6560]: >> /sbin/showmount: can't get address for 192.168.1.2/home/russell
Jan 12 19:53:29 zoidberg automount[6560]: lookup(program): lookup for 192.168.1.2/home/russell failed
Jan 12 19:53:29 zoidberg automount[6560]: failed to mount /net/192.168.1.2/home/russell
Jan 12 19:53:29 zoidberg automount[6563]: >> /sbin/showmount: can't get address for 192.168.1.2/home/pub
Jan 12 19:53:29 zoidberg automount[6563]: lookup(program): lookup for 192.168.1.2/home/pub failed
Jan 12 19:53:29 zoidberg automount[6563]: failed to mount /net/192.168.1.2/home/pub

My server is running 7.10, the client is running 8.04 Development Branch. I have several other 7.10 machines, but the problem appears most frequently on the 8.04 machine.

Revision history for this message
DaveAbrahams (boostpro) wrote :

Something that looks very much like this problem occurs with only three exports for me.

I have a machine called hydra with the following showmount output:

$ showmount -e hydra
Export list for hydra:
/export gss/krb5
/export/users gss/krb5
/usr/share/scratch 192.168.188.0/255.255.255.0

But it doesn't take too many pokes into /export/users for that directory tree to disappear entirely:

$ ls /net/hydra
export usr
$ ls /net/hydra/export
users
$ ls /net/hydra/export/users
ls: /net/hydra/export/users: No such file or directory
$ ls /net/hydra
export usr
$ ls /net/hydra/export/users/dave
ls: /net/hydra/export/users/dave: No such file or directory
$ ls /net/hydra/export/users/
ls: /net/hydra/export/users/: No such file or directory
$ ls /net/hydra
usr

I have no problem accessing the tree under /net/hydra/usr
These are two gutsy machines.

Revision history for this message
DaveAbrahams (boostpro) wrote :
Download full text (4.5 KiB)

I should add my syslog:

Mar 16 12:22:50 gutsy automount[15744]: >> /sbin/showmount: can't get address for hydra/exports
Mar 16 12:22:50 gutsy automount[15744]: lookup(program): lookup for hydra/exports failed
Mar 16 12:22:50 gutsy automount[15744]: failed to mount /net/hydra/exports
Mar 16 12:22:50 gutsy automount[15750]: >> /sbin/showmount: can't get address for hydra/exports
Mar 16 12:22:50 gutsy automount[15750]: lookup(program): lookup for hydra/exports failed
Mar 16 12:22:50 gutsy automount[15750]: failed to mount /net/hydra/exports
Mar 16 12:22:50 gutsy automount[15756]: >> /sbin/showmount: can't get address for hydra/exports
Mar 16 12:22:50 gutsy automount[15756]: lookup(program): lookup for hydra/exports failed
Mar 16 12:22:50 gutsy automount[15756]: failed to mount /net/hydra/exports
Mar 16 12:25:35 gutsy automount[15777]: >> /sbin/showmount: can't get address for hydra/.hidden
Mar 16 12:25:35 gutsy automount[15777]: lookup(program): lookup for hydra/.hidden failed
Mar 16 12:25:35 gutsy automount[15777]: failed to mount /net/hydra/.hidden
Mar 16 12:25:35 gutsy automount[15783]: >> /sbin/showmount: can't get address for .hidden
Mar 16 12:25:35 gutsy automount[15783]: lookup(program): lookup for .hidden failed
Mar 16 12:25:35 gutsy automount[15783]: failed to mount /net/.hidden
Mar 16 12:27:35 gutsy automount[15800]: >> /sbin/showmount: can't get address for hydra/export/users
Mar 16 12:27:35 gutsy automount[15800]: lookup(program): lookup for hydra/export/users failed
Mar 16 12:27:35 gutsy automount[15800]: failed to mount /net/hydra/export/users
Mar 16 12:27:46 gutsy automount[15807]: >> /sbin/showmount: can't get address for hydra/.hidden
Mar 16 12:27:46 gutsy automount[15807]: lookup(program): lookup for hydra/.hidden failed
Mar 16 12:27:46 gutsy automount[15807]: failed to mount /net/hydra/.hidden
Mar 16 12:27:46 gutsy automount[15816]: >> /sbin/showmount: can't get address for hydra/.hidden
Mar 16 12:27:46 gutsy automount[15816]: lookup(program): lookup for hydra/.hidden failed
Mar 16 12:27:46 gutsy automount[15816]: failed to mount /net/hydra/.hidden
Mar 16 12:27:57 gutsy automount[15823]: >> /sbin/showmount: can't get address for hydra/export
Mar 16 12:27:57 gutsy automount[15823]: lookup(program): lookup for hydra/export failed
Mar 16 12:27:57 gutsy automount[15823]: failed to mount /net/hydra/export
Mar 16 12:27:57 gutsy automount[15829]: >> /sbin/showmount: can't get address for hydra/export
Mar 16 12:27:57 gutsy automount[15829]: lookup(program): lookup for hydra/export failed
Mar 16 12:27:57 gutsy automount[15829]: failed to mount /net/hydra/export
Mar 16 12:28:05 gutsy automount[15836]: >> /sbin/showmount: can't get address for hydra/export
Mar 16 12:28:05 gutsy automount[15836]: lookup(program): lookup for hydra/export failed
Mar 16 12:28:05 gutsy automount[15836]: failed to mount /net/hydra/export
Mar 16 12:28:05 gutsy automount[15842]: >> /sbin/showmount: can't get address for hydra/export
Mar 16 12:28:05 gutsy automount[15842]: lookup(program): lookup for hydra/export failed
Mar 16 12:28:05 gutsy automount[15842]: failed to mount /net/hydra/export
Mar 16 12:28:35 gutsy automount[15853]: >> /sbin/showmount: can't get ...

Read more...

Revision history for this message
Rich (rincebrain) wrote :

IIRC, we ran into this bug at my location - we couldn't mount all 100+ NFS mounts in parallel, and I think we concluded NFS ran out of ports. It's actually the reason we started using autofs.

Subscribing, as I'm curious to see if it'll get fixed.

(I'm pretty certain the reports from 3 or 4 mounts are a separate bug)

Revision history for this message
Rich (rincebrain) wrote :

Oh, hah.

I completely failed to notice this bug was from you, Asheesh.

Good thing I subscribed anyway. :)

Revision history for this message
Alexander Perlis (alexanderperlis) wrote :

We've also seen this bug (looping through automounted user home directories leads to errors after about the first 100 mounts). Surprisingly, this seems to be related to automount defaulting to using tcp instead of udp for its nfs mounts (nfs documentation claims udp is the default, but with autofs it seems to be the other way around, even though this doesn't seem to be documented anywhere).

It might have to do with too many sockets sitting in the TIME_WAIT state. I examined "netstat 2>/dev/null | grep TIME_WAIT | wc -l" after each loop iteration, and after around 100 mounts the number of TIME_WAITs had reached about 500 and we started getting "No such file or directory" errors. Adding multi-second sleep delays into the loop kept the TIME_WAITs count below about 150 and all mounts were successful.

An alternative is to use UDP: simply add the "udp" option to the autofs mount lines (e.g., in /etc/auto.master) and restart autofs. Then no sleep delays were necessary to avoid errors.

So that's two different work-arounds. But what about fixing the bugs? There are two bugs here:

(1) Why does autofs default to using tcp instead of udp for nfs? Is this the way it should be? If so, it should be documented.

(2) When using tcp, every automount seems to create 8 - 15 sockets that sit in TIME_WAIT for 4 minutes. With lots of automounts, this can quickly exhaust the system. Is this an automount bug, or an inherent nfs bug when using nfs over tcp? Does nfs inherently create lots of short-lived tcp connections?

Alexander

Revision history for this message
Alexander Perlis (alexanderperlis) wrote :

Amending my prior comment: I stand by the two suggested work-arounds, but retract my theory that the problem has to do with how many TCP sockets are in TIME_WAIT.

If you google "autofs TIME_WAIT" and "autofs nfs bindresvport address already in use" and "autofs can't read superblock" (the latter two being two of the errors that show up in the syslog), you'll find various discussions, going back a few years, on this problem of errors when doing lots of automounts in rapid succession. But as far as I can tell, no discussion ended in a definitive explanation and bug fix.

Revision history for this message
Gaetan Nadon (memsize) wrote :

For the record, I have not tried to reproduce this problem on my home computer.
Thanks for reporting this bug and any supporting documentation. Since this bug has enough information provided for a developer to begin work, I'm going to mark it as confirmed and let them handle it from here. Thanks for taking the time to make Ubuntu better!
BugSquad

Changed in autofs:
status: New → Confirmed
Revision history for this message
darylb (darylblanc) wrote :

I just hit this autofs bug (connecting to a Solaris NFS server). I tried restarting autofs but that didn't help. I thought I'd note this here as I found that stopping and starting portmap allowed me to get into the mounts I previously couldn't access, without rebooting the whole system.

I'm running Ubuntu 10.04 LTS (Lucid), here are the upstart commands to stop and start portmap on lucid:

sudo service portmap stop
sudo service portmap start

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.