autofs - "Too many levels of symbolic links" after apt upgrade

Bug #1827286 reported by Jason D. Kelleher
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
autofs5 (Ubuntu)
Invalid
Undecided
Unassigned
Xenial
Invalid
Undecided
Unassigned
Bionic
Invalid
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Xenial
Confirmed
Undecided
Unassigned
Bionic
Fix Released
Undecided
Unassigned

Bug Description

Description: Ubuntu 16.04.6 LTS
Release: 16.04

I moved to autofs this week as a workaround for Bug #1577575 failing to mount NFS entries at boot. This worked file until I ran 'apt upgrade' today.

Now trying to access /vol/home mount results in the following error:

root@chi:~# ls -la /vol/home/
ls: cannot access '/vol/home/': Too many levels of symbolic links
root@chi:~#

Oddly enough, the other two autofs entries work fine (output trimmed):
root@chi:~# ls -la /vol/media
total 4
...
root@chi:~#
root@chi:~# ls -al /vol/temp/
total 24
...
root@chi:~#

I have a scratch VM I can break tonight by performing upgrades individually and testing after each. Will provide that detail when available.

Here is /var/log/apt/history.log

 Start-Date: 2019-05-01 17:37:00
Commandline: apt upgrade
Upgrade: libdns-export162:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.12, 1:9.10.3.dfsg.P4-8ubuntu1.14), libisccfg140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.12, 1:9.10.3.dfsg.P4-8ubuntu1.14), ureadahead:amd64 (0.100.0-19, 0.100.0-19.1), libldap-2.4-2:amd64 (2.4.42+dfsg-2ubuntu3.4, 2.4.42+dfsg-2ubuntu3.5), libirs141:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.12, 1:9.10.3.dfsg.P4-8ubuntu1.14), bind9-host:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.12, 1:9.10.3.dfsg.P4-8ubuntu1.14), dnsutils:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.12, 1:9.10.3.dfsg.P4-8ubuntu1.14), libisc160:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.12, 1:9.10.3.dfsg.P4-8ubuntu1.14), passwd:amd64 (1:4.2-3.1ubuntu5.3, 1:4.2-3.1ubuntu5.4), bind9utils:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.12, 1:9.10.3.dfsg.P4-8ubuntu1.14), libisc-export160:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.12, 1:9.10.3.dfsg.P4-8ubuntu1.14), liblwres141:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.12, 1:9.10.3.dfsg.P4-8ubuntu1.14), login:amd64 (1:4.2-3.1ubuntu5.3, 1:4.2-3.1ubuntu5.4), iproute2:amd64 (4.3.0-1ubuntu3.16.04.4, 4.3.0-1ubuntu3.16.04.5), bind9:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.12, 1:9.10.3.dfsg.P4-8ubuntu1.14), libdns162:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.12, 1:9.10.3.dfsg.P4-8ubuntu1.14), unattended-upgrades:amd64 (0.90ubuntu0.10, 1.1ubuntu1.18.04.7~16.04.2), libisccc140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.12, 1:9.10.3.dfsg.P4-8ubuntu1.14), libbind9-140:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.12, 1:9.10.3.dfsg.P4-8ubuntu1.14), uidmap:amd64 (1:4.2-3.1ubuntu5.3, 1:4.2-3.1ubuntu5.4), bind9-doc:amd64 (1:9.10.3.dfsg.P4-8ubuntu1.12, 1:9.10.3.dfsg.P4-8ubuntu1.14), tzdata:amd64 (2018i-0ubuntu0.16.04, 2019a-0ubuntu0.16.04)
End-Date: 2019-05-01 17:37:24

Tags: autofs
tags: added: autofs
Revision history for this message
Jason D. Kelleher (jdkelleher) wrote :

Looks like the issue is symlinking to a director prior to autofs mounting it.

Everything fine...

root@numbersix:/# ssh chi uname -a
Linux chi 4.4.0-146-generic #172-Ubuntu SMP Wed Apr 3 09:00:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
root@chi:~# ls -la /home
ls: cannot access '/home': No such file or directory
root@chi:~#
root@numbersix:/# ssh chi 'ls -la /vol/*'
/vol/home:
total 8
drwxr-xr-x 1 root root 154 Apr 21 19:10 .
drwxr-xr-x 5 root root 4096 Jul 2 2018 ..
drwxr-xr-x 1 2005 2005 1070 Nov 17 2015 bethann
        [trimmed]

/vol/media:
total 4
drwxr-xr-x 1 root root 128 Apr 13 18:29 .
drwxr-xr-x 5 root root 4096 Jul 2 2018 ..
        [trimmed]

/vol/zhora-temp:
total 24
dr-xr-xr-x 6 root root 4096 Mar 3 08:21 .
drwxr-xr-x 5 root root 4096 Jul 2 2018 ..
        [trimmed]
root@numbersix:/#

Everything works fine if the link is made after...

root@chi:/# ln -s /vol/home
root@chi:/# ls -la /home
lrwxrwxrwx 1 root root 9 May 2 12:11 /home -> /vol/home
root@chi:/#
root@chi:/# ls -la /vol/*
/vol/home:
total 8
drwxr-xr-x 1 root root 154 Apr 21 19:10 .
drwxr-xr-x 5 root root 4096 Jul 2 2018 ..
        [trimmed]

/vol/media:
total 4
drwxr-xr-x 1 root root 128 Apr 13 18:29 .
drwxr-xr-x 5 root root 4096 Jul 2 2018 ..
        [trimmed]

/vol/zhora-temp:
total 24
dr-xr-xr-x 6 root root 4096 Mar 3 08:21 .
drwxr-xr-x 5 root root 4096 Jul 2 2018 ..
        [trimmed]
root@chi:/#

It breaks after the reboot...

root@chi:/# init 6
Connection to chi closed by remote host.
Connection to chi closed.
root@numbersix:~# root@chi:/# init 6
Connection to chi closed by remote host.
Connection to chi closed.
root@numbersix:~#

root@numbersix:~#
root@numbersix:~# ssh chi
Welcome to Ubuntu 16.04.6 LTS (GNU/Linux 4.4.0-146-generic x86_64)

 * Documentation: https://help.ubuntu.com
 * Management: https://landscape.canonical.com
 * Support: https://ubuntu.com/advantage

21 packages can be updated.
15 updates are security updates.

New release '18.04.2 LTS' available.
Run 'do-release-upgrade' to upgrade to it.

root@chi:~# ls -la /vol/*
ls: cannot open directory '/vol/home': Too many levels of symbolic links
/vol/media:
total 4
drwxr-xr-x 1 root root 128 Apr 13 18:29 .
drwxr-xr-x 5 root root 4096 Jul 2 2018 ..
        [trimmed]

/vol/zhora-temp:
total 24
dr-xr-xr-x 6 root root 4096 Mar 3 08:21 .
drwxr-xr-x 5 root root 4096 Jul 2 2018 ..
        [trimmed]
root@chi:~#

At this point, the link needs to be deleted and the VM rebooted to get back to a normal state - just restarting autofs won't clear it.

affects: systemd (Ubuntu) → autofs5 (Ubuntu)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

to summarize and be sure on the case:
1. there is a NFS target
2. you have autofs configured to mount /vol/home from that NFS
3. if there is a symlink /home -> /vol/home before that mount happens it fails
Is the above correct?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I followed the first hits that I have found [1][2] to set up a similar situation.

After mounting NFS the first time I moved all in my ~ to /vol/home

mount:
192.168.122.247:/mnt/sharedfolder on /vol/home type nfs4 (rw,relatime,vers=4.2,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.118,local_lock=none,addr=192.168.122.247)

autofs.master:
/vol /etc/auto.nfs
/etc/auto.nfs:
home -fstype=nfs,rw,retry=0 192.168.122.247:/mnt/sharedfolder

# detach target
$ sudo umount /vol/home

# check autofs
1. /vol is empty
2. ll /vol/home shows the content
3. mount shows an NFS mount that seems about right

Now lets remove the /home and replace it with a symlink as you did (that feels wrong btw).
  lrwxrwxrwx 1 root root 9 May 3 10:02 home -> /vol/home/

After a reboot (now the symlink exists) I find the system like that:
1. the mount looks ok
  192.168.122.247:/mnt/sharedfolder on /vol/home type nfs4
  (rw,relatime,vers=4.2,rsize=65536,wsize=65536,namlen=255,hard,
  proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.118,
  local_lock=none,addr=192.168.122.247)
2. the symlink still is fine
  $ ll /home
  lrwxrwxrwx 1 root root 9 May 3 10:02 /home -> /vol/home/
3. my home dir looks fine
  $ ll /home/ubuntu/
  total 32
  drwxr-xr-x 5 ubuntu ubuntu 4096 May 3 10:05 ./
  drwxrwxrwx 3 nobody nogroup 4096 May 3 09:53 ../
  -rw-r--r-- 1 ubuntu ubuntu 220 Apr 4 2018 .bash_logout
  -rw-r--r-- 1 ubuntu ubuntu 3771 Apr 4 2018 .bashrc
  drwx------ 2 ubuntu ubuntu 4096 May 3 09:39 .cache/
  drwx------ 3 ubuntu ubuntu 4096 May 3 09:39 .gnupg/
  -rw-r--r-- 1 ubuntu ubuntu 807 Apr 4 2018 .profile
  drwx------ 2 ubuntu ubuntu 4096 May 3 09:38 .ssh/
  -rw-r--r-- 1 ubuntu ubuntu 0 May 3 09:40 .sudo_as_admin_successful

Summing that up, the case that you described works fine for me.
Could you outline what is different in your config?
And best attach all of your autofs and NFS related configuration?

[1]: https://wiki.ubuntuusers.de/Autofs/
[2]: https://vitux.com/install-nfs-server-and-client-on-ubuntu/

Revision history for this message
Jason D. Kelleher (jdkelleher) wrote :
Revision history for this message
Jason D. Kelleher (jdkelleher) wrote :
Revision history for this message
Jason D. Kelleher (jdkelleher) wrote :
Revision history for this message
Jason D. Kelleher (jdkelleher) wrote :
Revision history for this message
Jason D. Kelleher (jdkelleher) wrote :
Revision history for this message
Jason D. Kelleher (jdkelleher) wrote :
Revision history for this message
Jason D. Kelleher (jdkelleher) wrote :
Revision history for this message
Jason D. Kelleher (jdkelleher) wrote :
Revision history for this message
Jason D. Kelleher (jdkelleher) wrote :

I did a little more testing and this looks to be specific to /home. Even mounting over /home (no linking) with autofs results in the same issue.

All autofs files as well as the list of packages installed have been attached.

As a side note, having /home as a link is an artifact of some very old build scripts and worked fine with an /etc/fstab NFS mount. autofs was deployed as a workaround for bug #1577575.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Jason,
I understand your comment #12 that we can eliminate the symlink from the equation.
It is not needed for you to trigger the issue - thanks for that hint.

The only differences between my try to reproduce in comment #3 and yours are:
- you set more NFS options in the master file
- you use direct [1] mounts while I used indirect ones

I modified my config accordingly to be more similar to yours:
Instead of
/etc/auto.master
/vol /etc/auto.nfs
/etc/auto.nfs
home -fstype=nfs,rw,retry=0 192.168.122.247:/mnt/sharedfolder

I now use:
/etc/auto.master
/- /etc/auto.nfs --timeout=0,-fstype=nfs,nfsvers=3,rw,hard,intr,rsize=8192,wsize=8192
/etc/auto.nfs
/vol/home 192.168.122.247:/mnt/sharedfolder

That still comes up just fine after boot (with symlink for home):
ubuntu@bionic-autofs-client:~$ mount | grep nfs
/etc/auto.nfs on /vol/home type autofs (rw,relatime,fd=6,pgrp=735,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=18438)
192.168.122.247:/mnt/sharedfolder on /vol/home type nfs (rw,relatime,vers=3,rsize=8192,wsize=8192,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.122.247,mountvers=3,mountport=48863,mountproto=udp,local_lock=none,addr=192.168.122.247)
ubuntu@bionic-autofs-client:~$ ll /home
lrwxrwxrwx 1 root root 9 May 3 10:02 /home -> /vol/home/
ubuntu@bionic-autofs-client:~$ ll /home/ubuntu/
total 44
drwxr-xr-x 5 ubuntu ubuntu 4096 May 6 06:34 ./
drwxrwxrwx 3 nobody nogroup 4096 May 3 09:53 ../
-rw------- 1 ubuntu ubuntu 330 May 6 06:35 .bash_history

Since you said changing the symlink isn't needed I switched to mount /home directly
/etc/auto.nfs
/home 192.168.122.247:/mnt/sharedfolder

That worked just as much.
ubuntu@bionic-autofs-client:~$ mount | grep home
/etc/auto.nfs on /home type autofs (rw,relatime,fd=6,pgrp=715,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=18223)
192.168.122.247:/mnt/sharedfolder on /home type nfs (rw,relatime,vers=3,rsize=8192,wsize=8192,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.122.247,mountvers=3,mountport=48863,mountproto=udp,local_lock=none,addr=192.168.122.247)

But then I realized one more difference - out of habit I used the latest LTS which is Bionic, but you reported the bug on Xenial. The setup is easy to copy over. Let me set the same up on Xenial to be sure that it works for me on that version as well.

[1]: https://linux.die.net/man/5/auto.master

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Yep, here we go.
Setting up the same on Xenial got me:

ubuntu@xenial-autofs-client:/$ ll /home
ls: cannot open directory '/home': Too many levels of symbolic links

Very interesting

Changed in autofs5 (Ubuntu):
status: New → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I played a bit with my config and found:
- the error is even reported if /home is set to mount something that won't work (e.g. non existent server), so it must failing early in the autofs processing
- one can mount things manually (as expected)

The difference is only 5.1.1-1ubuntu3 to 5.1.2-1ubuntu3 as upstream doesn't make a lot of changes anymore. Unfortunately I found no obvious change in between those versions which would be responsible for that.

The fail I get for the direct mount to /home.
With your symlink trick (that you mentioned was for a different case in the past) I can mount it thou:

ubuntu@xenial-autofs-client:/$ cat /etc/auto.master | grep -v '^#'
+dir:/etc/auto.master.d
+auto.master
/vol /etc/auto.nfs --timeout=0,nosymlink,-fstype=nfs,nfsvers=3,rw,hard,intr,rsize=8192,wsize=8192
ubuntu@xenial-autofs-client:/$ cat /etc/auto.nfs | grep -v '^#'
home 192.168.122.54:/mnt/sharedfolder
ubuntu@xenial-autofs-client:/$ ll /home
lrwxrwxrwx 1 root root 9 May 6 08:40 /home -> /vol/home/

=> works

I can confirm on Xenial that I can mount it fine with the symlink in place.
Circling back to your report now a reboot should break it, but even throughout a reboot that works fine.

I don't know why the same doesn't work for you - it does for me.
But we can instead continue to use the direct mount case as that recreates your issue.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Debugging per [1], did not show anything new.
There just is no activity on the actual access of /home all I see is the trigger for /home registering at startup.

Startup:
ubuntu@xenial-autofs-client:/$ sudo automount -f -v --debug
Starting automounter version 5.1.1, master map /etc/auto.master
using kernel protocol version 5.02
lookup_nss_read_master: reading master file /etc/auto.master
parse_init: parse(sun): init gathered global options: (null)
lookup_read_master: lookup(file): read entry +dir:/etc/auto.master.d
lookup_nss_read_master: reading master dir /etc/auto.master.d
lookup_read_master: lookup(dir): scandir: /etc/auto.master.d
lookup_read_master: lookup(file): read entry /-
master_do_mount: mounting /-
automount_path_to_fifo: fifo name /var/run/autofs.fifo--
lookup_nss_read_map: reading map file /etc/auto.nfs
parse_init: parse(sun): init gathered global options: nosymlink,fstype=nfs,nfsvers=3,rw,hard,intr,rsize=8192,wsize=8192
mounted direct on /home with timeouts disabled
do_mount_autofs_direct: mounted trigger /home
st_ready: st_ready(): state = 0 path /-

It is the open syscall itself that fails:
0.000029 open("/home", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ELOOP (Too many levels of symbolic links) <0.000012>

So we must have set up the kernel/trigger to be broken - no userspace component active at the time (although it might have set it up wrong before).
But since I haven't found fixes in autofs maybe it is fixed in the kernel then?

Yes, here I found a fix.
Installing linux-virtual-hwe-16.04 (since I was in a VM to test) resolved the issue.
With the same setup I can now mount /home just fine again on Xenial.

On a bare metal system you likely want linux-generic-hwe-16.04 instead.

I'm adding a kernel task to potentially identify and backport the fix to the 4.4 kernel, but since after so much time it isn't in the stable releases chances are that it is hard to backport.
Until then you can get this issue resolved with the -hwe kernels.

[1]: https://help.ubuntu.com/community/Autofs#Debugging_Auto_Mount_Problems

Changed in autofs5 (Ubuntu Bionic):
status: New → Invalid
Changed in autofs5 (Ubuntu Xenial):
status: New → Invalid
Changed in autofs5 (Ubuntu):
status: Confirmed → Invalid
Changed in linux (Ubuntu):
status: New → Fix Released
Changed in linux (Ubuntu Xenial):
status: New → Confirmed
Changed in linux (Ubuntu Bionic):
status: New → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

In the kernel
$ git log -p v4.4...v4.15 -- fs/autofs4
has a bunch of fixes, but none is clearly matching the case here just by reading it.

Therefore I'll leave that evaluation to the kernel Team.

@Jason I hope that you can just use HWE kernels and be happy for now since those seem to work for your case.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.