diskless setup with nfs mounted home hangs on shutdown/reboot

Bug #1594658 reported by Sergey Frolov
40
This bug affects 8 people
Affects Status Importance Assigned to Milestone
nfs-utils (Ubuntu)
Confirmed
Undecided
Unassigned
systemd (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Ubuntu 16.04 fresh install hangs when shutting down.
The system is diskless PXE-booted with a couple of nfs mounted directories including homedir.

As I have no persistent storage whatsoever, I don't have any logs, but in debug-shell, running journalctl -f I can see that it hangs at

[ *] (1 of 2) A stop job running for Raise Network Interfaces (10s / 1min 30s)Jun 20 20:23:05 $hostname kernel: nfs: server $nfs-ip-address not responding, still trying
Jun 20 20:23:05 $hostname kernel: nfs: server $nfs-ip-address not responding, still trying
Jun 20 20:23:14 $hostname kernel: nfs: server $nfs-ip-address not responding, still trying
Jun 20 20:24:08 $hostname kernel: nfs: server $nfs-ip-address not responding, still trying

Second job in "(1 of 2)" is thermald, turning it off does not fix the problem.
Also, this counter "(10s / 1min 30s)" stops visually updating.

server $nfs-ip-address is not responding, because, all network interfaces are already down at this point.

I am not exactly sure why this happens. Looks like there is a wrong ordering of shutdown of systemd services, which bring down interfaces before something nfs-related, but I am not sure if that's the reason of hanging.

Workaround that fixes the problem:
In /lib/systemd/system/networking.service
comment following line:
ExecStop=/sbin/ifdown -a --read-environment

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nfs-utils (Ubuntu):
status: New → Confirmed
Changed in systemd (Ubuntu):
status: New → Confirmed
Revision history for this message
Darko Veberic (darko-veberic-kit) wrote :

this appears also on all our desktops in the institute after an upgrade from 14.04 to 16.04. user yp homes are mounted via autofs/am through nfs and the network is setup via dhcp.

Revision history for this message
Sean (seanshivak) wrote :

Same issue here on clean 16.04 install. Only happens on wifi devices doesnt seem to affect wired devices. NFS mounts setup via fstab. Unmounting NFS shares before shutdown allows the system to shutdown.

Revision history for this message
Andy Sayler (andy.sayler) wrote :

In our deployment, this happens on a wired ethernet connection. So it's not wifi only for us. But since it seems like a race condition, different environments may trigger it in different scenarios.

Revision history for this message
Sean (seanshivak) wrote :

What fixed the issue for me is doing:

sudo systemctl edit --full nfs-config.service

then editing it to look like this:

[Unit]
Description=Preprocess NFS configuration
After=local-fs.target remote-fs.target NetworkManager.service
DefaultDependencies=no

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/lib/systemd/scripts/nfs-utils_env.sh

Other suggestions on various forums dont work such as adding dbus.service to wpa_supplicant.service

Revision history for this message
Ralf Menzel (menzel) wrote :

I have the same (or a similar) problem on a system with a wired ethernet connection.

Using the debug-shell I found that the shutdown was hanging on a call of

  /bin/sh /sbin/resolvconf -d enp0s25.inet

Here enp0s25 is the ethernet interface. It has a static configuration that contains "dns-nameservers" and "dns-search" lines.

Adding some diagnostic output to /sbin/resolvconf I found that it seems to hang in the case statement that spans lines 31 to 37. After some trial and error I change the pattern in line 36 from

  ~*

to

  '~'*

With this change shutdown now proceeds smoothly.

I'm not an expert shell programmer, so I don't know why the change has this effect. Looks almost like some obscure dash bug to me.

Revision history for this message
ben thielsen (btb-bitrate) wrote :

it appears that my symptoms also relate to this bug report. however, i'd like to add that the feedback provided by the system, at least in my particular case, is extremely unhelpful. i think it can be better.

i have a very minimal virtual guest, running 16.04. it has a "physical" wired connection, and an nfs mount in fstab:

>cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#

# <file system> <mount point> <type> <options> <dump> <pass>
proc /proc proc nodev,noexec,nosuid 0 0
LABEL=root / ext4 errors=remount-ro 0 1
LABEL=var /var ext4 defaults 0 2
LABEL=swap none swap sw 0 0
10.128.35.251:/foo_example_com /home/foo.example.com nfs rw,hard,intr 0 0

>cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet static
 address 10.101.27.31/24
 gateway 10.101.27.1

in this case, there is no [perceptible] delay during shutdown, but a lengthy delay at boot, while system waits for mounting to succeed/timeout.

when there are network problems, the nfs mount attempt cannot succeed, and this is when the system says "a start job is running for raise network interfaces", while it sits for five minutes, yet ultimately proceeds, with a perfectly operational network interface.

this sucks. while the root cause here, in terms of the actual problem, is that 1] the nfs mount obviously fails, and 2] the timeout for this failure is [imho] too long, the troubleshooting process to determine this was made much longer than it should have been due to the poor feedback about what was actually happening.

in this particular case, the system should, at the minimum, at least tell the operator that there is a mount attempt waiting, and ideally, provide some actual detail about what mount attempt in particular. saying "a start job is running for raise network interfaces" is woefully inadequate.

Revision history for this message
Paul Crawford (psc-sat) wrote :

We were seeing this on a fresh Ubuntu 16.04 64-bit installation using fixed IP addresses and automounter for NFS drives. The suggestion #7 seems to fix it for use, but it would be useful to know just what is happening with the script. Is it trawling through some user's home directory when expanding the '~*' entry in the case statement in /sbin/resolvconf by any chance?

Revision history for this message
Bram Van Rensbergen (decius) wrote :

Had the same issue running Ubuntu 16.04 LTS (64-bit as well), with Ubuntu regularly just not shutting down at all, stuck at attempting to unmount a NFS share. Suggestion #7 seems to have fixed the issue for me as well, so thanks for that!

Dan Streetman (ddstreet)
Changed in systemd (Ubuntu):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.