On shutdown, NetworkManager shuts down before NFS unmounts, causing a long hang

Bug #113095 reported by Tommy Hurtig on 2007-05-07
68
This bug affects 10 people
Affects Status Importance Assigned to Milestone
dbus (Ubuntu)
Undecided
Unassigned
Nominated for Intrepid by Daniel J Blueman
network-manager (Ubuntu)
Undecided
Unassigned
Nominated for Intrepid by Daniel J Blueman

Bug Description

I use NFS and NIS to connect some workstations to a server and /home is mounted with NFS from the server. The the system shuts down NFS and NIS cant connect to the server and do a correct shutdown because NetworkManager shuts down all network connections earlier in the proces. This results in NFS and NIS doing long timeouts and the shutdown or restart process takes about 10-15 minutes.

Tommy Hurtig (tommy-hurtig) wrote :

I tried to stop dbus later in in the shutdown/restart process. It now has K90dbus in both rc0.d and rc6.d. This solves the problem with the long timeout but there is still some error messages on the screen.

Howe late is it recommended that dbus is stopped or how early is it possible to stop NFS and NIS?

Jörg Wendel (linux-jwendel) wrote :

Hi,
the same here (Feisty ans Gutsy)

this does the trick for me:

rm /etc/rc*/S31umountnfs.sh
ln -s ../init.d/umountnfs.sh /etc/rc0.d/S17umountnfs.sh
ln -s ../init.d/umountnfs.sh /etc/rc6.d/S17umountnfs.sh

So nfs will be stopped before the network goes down.

I'm experiencing the same issue in Hardy and Intrepid (and before), with NFSv4 homedir and been bumping into this really a lot.

To see what's actually happening, change VERBOSE to 'yes' in /etc/default/rcS and boot without 'splash', as you'll already know; there is a problem/race with your suggested change - take rc6 (for reboot preparation) for example:

$ ls -1 /etc/rc6.d/S*
S01linux-restricted-modules-common
S15wpa-ifupdown
S20sendsigs
S30urandom
S31umountnfs.sh
S32portmap
S35networking
S40umountfs
S48cryptdisks
S59cryptdisks-early
S60umountroot
S85kexec
S90reboot

The umountnfs script won't (always) succeed until user processes (with open file handles) are killed with the 'sendsigs' script - but this also kills networkmanager, which brings the ethernet interface down.

There is a handy directory /lib/init/rw/sendsigs.omit.d/ which contains symlinks to processes which must not be killed here, so the true fix would be add the code to the NetworkManager init script, at the end of the start clause (taken from the portmap init script):

        mkdir -p /lib/init/rw/sendsigs.omit.d
        rm -f /lib/init/rw/sendsigs.omit.d/portmap
        ln -s /var/run/portmap.pid /lib/init/rw/sendsigs.omit.d/portmap

Patch attached. Works great here - please confirm there.

Changed in dbus:
status: New → In Progress
Alexander Sack (asac) wrote :

i think one of the problems with NM being killed is the /usr mounted over nfs use case. Would this patch improve that as well? do loaded libs/binaries keep the device in use or wouldnt that be a problem?

Yes, if /usr were mounted over NFS, we'd experience this hang too.

There are secondary implications of the NFS unmount not being seen by the NFS server, since the NFS server still has an mount entry for the client, which is no longer contactable. It tries to make a delegation callback, which invariably times out, causing other problems.

This may well fit into the category of 'high' priority, but life is good, as we have a fix ;-) .

On Wed, Aug 27, 2008 at 09:44:48PM -0000, Daniel J Blueman wrote:
> Yes, if /usr were mounted over NFS, we'd experience this hang too.
>
> There are secondary implications of the NFS unmount not being seen by
> the NFS server, since the NFS server still has an mount entry for the
> client, which is no longer contactable. It tries to make a delegation
> callback, which invariably times out, causing other problems.
>
> This may well fit into the category of 'high' priority, but life is
> good, as we have a fix ;-) .
>

OK. The right solution would be to make NM not tear down interfaces on
shutdown. After some discussion in #nm it seems like this makes sense
for a few device classes only (e.g. tunnel or ppp devices should be
stopped on shutdown, while ethernet and wifi should continue to
exist).

Your patch would be fallback in caes we dont get to this during this
cycle.

Thanks!

 - Alexander

I have run into this problem after upgrading to Kubuntu 8.10. On reboot/shutdown it hangs after sendsigs "All processes ended within 2 seconds." Daniels patch didnt help, but having the umountnfs.sh run before sendsigs like suggested by Jörg fixed it for me.

On Sat, Nov 01, 2008 at 04:22:26PM -0000, Jan Ericsen wrote:
> I have run into this problem after upgrading to Kubuntu 8.10. On
> reboot/shutdown it hangs after sendsigs "All processes ended within 2
> seconds." Daniels patch didnt help, but having the umountnfs.sh run
> before sendsigs like suggested by Jörg fixed it for me.
>

Do you have configured all your interfaces in /etc/network/interfaces?

 - Alexander

No only the default i think: just loopback device there (- = newline):
auto lo - iface lo inet loopback - address 127.0.0.1 - netmask 255.0.0.0 - #iface eth0 inet dhcp
There are two other devices, eth0 (wired) and eth1 (ipw2200 wireless) which are autoconfigured via dhcp. The WAP key for eth1 was entered with KNetworkManager.

Dustin Kirkland  (kirkland) wrote :

I am experiencing this problem on all Intrepid machines on my network. It took me quite some time to diagnose, as I was chasing error messages related to acpid and sound, when it was ultimately NFS+NetworkManager that was causing my problem, as described above.

Daniel's patch solves the problem for me, however, I should note that I applied the patch applied to /etc/init.d/portmap, rather than any NetworkManager script (as shown in the patch header).

Alexander, what are the chances of an SRU to Intrepid? I suspect this affects a fair number of machines out there. There are a ton of unsolved bugs against acpi and alsa (a la MythTV users) which I suspect might actually be this bug.

:-Dustin

Dustin Kirkland wrote:
> I am experiencing this problem on all Intrepid machines on my network.
> It took me quite some time to diagnose, as I was chasing error messages
> related to acpid and sound, when it was ultimately NFS+NetworkManager
> that was causing my problem, as described above.
>
> Daniel's patch solves the problem for me, however, I should note that I
> applied the patch applied to /etc/init.d/portmap, rather than any
> NetworkManager script (as shown in the patch header).
>
> Alexander, what are the chances of an SRU to Intrepid? I suspect this
> affects a fair number of machines out there. There are a ton of
> unsolved bugs against acpi and alsa (a la MythTV users) which I suspect
> might actually be this bug.
>
>
the issue here would be that / and /usr still cannot be remote mounts right?

will those cases be worse off before when using this patch?

Dustin Kirkland  (kirkland) wrote :

On Fri, Dec 26, 2008 at 5:26 PM, Alexander Sack <email address hidden> wrote:
> the issue here would be that / and /usr still cannot be remote mounts right?
>
> will those cases be worse off before when using this patch?

I can do some testing of those cases next week, I suppose.

What should be clear, though, is that NFS mounted / and /usr are
subsets of all NFS mounts out there.

I _suspect_ that NFS mount / and /usr are a distinct minority of NFS mounts.

:-Dustin

I wonder whether we should ship a package managed link in /lib/init/rw/.... or do that in some postinst script? Would there be a problem to just ship the "pid-file" link in package?

Thierry Carrez (ttx) wrote :

This is linked to bug 211631.
CIFS mounts also trigger an annoying timeout at shutdown. A fix preventing NM to be prematurely killed in sendsigs should fix both bugs.

Thierry Carrez (ttx) wrote :

For testing purposes I uploaded a network-manager upgrade for intrepid to my PPA:
https://launchpad.net/~ttx/+archive/ppa

Let me know if it fixes this problem as well as bug 211631.

On Tue, Mar 03, 2009 at 12:49:31PM -0000, Thierry Carrez wrote:
> For testing purposes I uploaded a network-manager upgrade for intrepid to my PPA:
> https://launchpad.net/~ttx/+archive/ppa
>
> Let me know if it fixes this problem as well as bug 211631.
>

Can you plesae provide a patch against the ubuntu.0.7.1 branch?

https://code.edge.launchpad.net/~network-manager/network-manager/ubuntu.0.7.1

Thanks!

 - Alexander

Fix is in my branch: https://code.launchpad.net/~ttx/network-manager/sendsigs-protection
Test packages available in my PPA.

For this to work with wireless it also needs a fix (yet to come) in wpasupplicant.
Also It won't work with n-m connections that are not set "for all users" as Gnome will tear them down at session logout.

Not a D-Bus bug

Changed in dbus (Ubuntu):
status: New → Invalid
Thomas Hood (jdthood) on 2012-07-10
Changed in network-manager (Ubuntu):
status: In Progress → Confirmed
Thomas Hood (jdthood) wrote :

Is this still a problem in Ubuntu 12.04?

summary: - nfs timeout on shutdown
+ On shutdown, NetworkManager shuts down before NFS unmounts, causing a
+ long hang
Changed in network-manager (Ubuntu):
status: Confirmed → Incomplete
cotillion (tobias-schwan) wrote :

this problem is still present on my machine (12.04) since I can remember. Shutdown happens always fast. Only if I mounted nfs-devices the shutdown progress hangs after stating that "killing all processes failed".

As the patches talked about in this thread are quite old (2009), I do not want to install them to my system.

Standing by for further testing if needed.

Thomas Hood (jdthood) on 2012-08-13
Changed in network-manager (Ubuntu):
status: Incomplete → Confirmed
Magnus (koma-lysator) wrote :

I think this just started happen to me (running 13.10). I started using fstab to nfs-mount my nas.

This bug is present in ubuntu 14.04 as well, and really, really annoying. It seems that there is hardly any way to get NFS drives unmounted "before" networkmanager rips out the network. Sigh.

Peter Liedler (peter-liedler) wrote :

Did not have the problem in 14.10.
Now I a m testing vidid, 15.04 and the nfs timeout takes about ten minutes during shutdown because network manager already closed the connection.

abssorb (abssorb) wrote :

Still seeing this 10 years later on 16.04. Is the patch safe to apply to 16.04?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers