/etc/init.d/umountroot: "mount: / is busy" due to dhclient is not stopped

Bug #1169614 reported by Max on 2013-04-16
94
This bug affects 18 people
Affects Status Importance Assigned to Milestone
network-manager (Ubuntu)
High
Unassigned

Bug Description

The result is
"mount: / is busy"
and root filesystem recovery during next boot.

network-manager 0.9.8.0-0ubuntu3 in Raring
neither stop dhclient in response to
"stop network-manager" nor remove pid file
from /run/sendsigs.omit.d/
So dhclient skipped by /etc/init.d/sendsigs
and a lease file is open for writing in
/var/lib/NetworkManager
while /etc/init.d/umountroot is running.
As a consequence root filesystem can not
be cleanly remount readonly before poweroff.

In default setup the problem is hidden by
plymouth. The problem does not appear
if network is disabled or the cable is unplugged
before shutdown.

There are some other bugs that might be caused
by this issue:
Bug lp: #1073433 Ext4 corruption associated with shutdown of Ubuntu 12.10

Max (m-gorodok) wrote :

debian/patches/quit_dhclient_on_exit.patch
relies on TARGET_DEBIAN define.
Unfortunately distro-specific features are removed
from upstream
http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=012c5f4b271b935852267ed865933ea3bb6983ba

So dhclient is not stopped with network-manager.

Max (m-gorodok) wrote :

Enforce Ubuntu-specific code modification,
although distro-specific features have been
dropped by upstream.

Note that this is a patch for patches, not for
the source code directly

Changed in network-manager (Ubuntu):
status: New → Confirmed
Max (m-gorodok) on 2013-04-19
tags: added: patch raring

The attachment "unconditional preproces instead of TARGET_DEBIAN" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

Max (m-gorodok) on 2013-05-16
summary: - dhclient is not stopped during shutdown
+ /etc/init.d/umountroot: "mount: / is busy" due to dhclient is not
+ stopped
description: updated
Steve Dodd (anarchetic) wrote :

I can confirm the above patch fixes the problem of dhclient still being running on shutdown for me (once I figured out how to use quilt so I could rebuild nm.)

Still need to a fix an upstart/ureadahead problem to get a clean shutdown on my machine, but this is progress.

Steve Dodd (anarchetic) on 2013-06-25
tags: added: saucy
M4he (mahe) wrote :

I'm using raring and also having the issue of shutdown/restart hanging for a short time due to "mount: / is busy".

I made a script that executes "ps -Af" on shutdown/reboot and writes it to a log file. (see attachment)
There's a long line containing "/sbin/dhclient"; I presume that indicates dhclient being the root cause?

Steve Dodd (anarchetic) wrote :

M4he, yes, looks like it. Does the patch above apply cleanly to n-m in saucy? It solved things for me. If you're on x86-32 I can probably build you a package to test easily.

Steve Dodd (anarchetic) wrote :

*raring, sorry.

M4he (mahe) wrote :

Thanks for your reply, Steve!
Unfortunately I'm on x86_64, so I'd have to build it myself I guess. Could you give me a little hint on how to do that?
I issued a "apt-get source network-manager" and got 2 packages (*ubuntu6.debian.tar.gz and *0.9.8.0.orig.tar.xz), I also downloaded the patch attached to post #2. How do I proceed?

Steve Dodd (anarchetic) wrote :

Note for those wanting to apply the above patch: network-manager uses quilt, and the above patch only "patches the patches." So do "quilt pop -a" after unpacking the source, then apply the patch above, then "quilt push -a" to reapply the changed patches.

M4he (mahe) wrote :

I was able to apply the patch from post #2 now, thanks to Steve's hint.
Using the patched network-manager I don't experience any hangs at shutdown/reboot anymore!

pfoo (pfoo) wrote :

Considering that this might causes dataloss/filesystem corruption is there any reason explaining why this patch has not been added yet in 13.04 networkmanager ?

Changed in network-manager (Ubuntu):
status: Confirmed → Triaged
importance: Undecided → High
Ivan Larionov (xeron-oskom) wrote :

Still exists in 13.10.

Øyvind Stegard (oyvinst) wrote :

lsof just before remount to read-only of root:
dhclient 1317 root 4w REG 8,5 578 530637 /var/lib/NetworkManager/dhclient-90ac51b1-a118-416f-a126-0ad83a2c7b9c-eth0.lease

.. and unclean shutdown as a result.

On a clean and up-to-date Ubuntu 13.10 installation. When patch exists since april of 2013, why is this bug still present ?

Steve Dodd (anarchetic) wrote :

Looking at this now, and getting sidetracked by RCS confusion .. what is the difference between:

http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/trusty/network-manager/trusty/files

and

http://bazaar.launchpad.net/~network-manager/network-manager/ubuntu/files

? The latter seems to have the up-to-date code that I get when I "apt-get source" on trusty, no idea what the former is..

Anyway, it seems to above patch has not yet been merged, but so far I have not experienced unclean shutdowns on trusty. Will investigate further ...

Steve Dodd (anarchetic) wrote :

@oyvinst, the current NetworkManager in trusty (0.9.8.8-0ubuntu1) *does* seem to stop dhclient:

Jan 26 17:25:02 xubuntu NetworkManager[1766]: <info> caught signal 15, shutting down normally.
Jan 26 17:25:02 xubuntu NetworkManager[1766]: <info> (eth0): device state change: activated -> disconnected (reason 'connection-removed') [100 30 38]
Jan 26 17:25:02 xubuntu NetworkManager[1766]: <info> (eth0): deactivating device (reason 'connection-removed') [38]

According to the changelog, we got a new upstream version merged 24/01/14:

network-manager (0.9.8.8-0ubuntu1) trusty; urgency=medium

  [ Mathieu Trudel-Lapierre ]
  * New upstream release (LP: #1269179).

Can you update and check that the problem is resolved for you? The third log message is printed by nm_device_deactivate(), which calls dhcp4_cleanup(), which calls nm_dhcp_client_stop(). What I can't currently work out is .. what changed?

Steve Dodd (anarchetic) wrote :

We should probably remove the obsolete patches to avoid confusion.

Ivan Larionov (xeron-oskom) wrote :

This problem still exists in the latest 14.04 with all updates.

Ivan Larionov (xeron-oskom) wrote :
Steve Dodd (anarchetic) wrote :

Weird - I'm not seeing it here. I added some debug to n-m a while back, will see if I can find the patch..

Steve Dodd (anarchetic) wrote :

OK, if you can change /etc/init/network-manager.conf so the exec line reads:

exec NetworkManager --log-level=DEBUG

And apply the attached patch to the n-m sources, rebuild & install, what happens? (As you need syslogd to stay up to log the results, just 'sudo stop network-manager' and have a in /var/log/syslog.)

I see (slightly edited):

Mar 4 22:07:44 xubuntu NetworkManager[27154]: <info> caught signal 15, shutting down normally.
Mar 4 22:07:44 xubuntu NetworkManager[27154]: <info> (eth0): device state change: activated -> disconnected (reason 'connection-removed') [100 30 38]
Mar 4 22:07:44 xubuntu NetworkManager[27154]: <info> (eth0): deactivating device (reason 'connection-removed') [38]
Mar 4 22:07:44 xubuntu NetworkManager[27154]: <debug> [nm-dhcp-client.c:148] nm_dhcp_client_stop_pid(): Sending SIGTERM to 27161.
Mar 4 22:07:44 xubuntu NetworkManager[27154]: <info> (eth0): canceled DHCP transaction, DHCP client pid 27161

?field.comment=OK, if you can change /etc/init/network-manager.conf so the exec line reads:

exec NetworkManager --log-level=DEBUG

And apply the attached patch to the n-m sources, rebuild & install, what happens? (As you need syslogd to stay up to log the results, just 'sudo stop network-manager' and have a in /var/log/syslog.)

I see (slightly edited):

Mar 4 22:07:44 xubuntu NetworkManager[27154]: <info> caught signal 15, shutting down normally.
Mar 4 22:07:44 xubuntu NetworkManager[27154]: <info> (eth0): device state change: activated -> disconnected (reason 'connection-removed') [100 30 38]
Mar 4 22:07:44 xubuntu NetworkManager[27154]: <info> (eth0): deactivating device (reason 'connection-removed') [38]
Mar 4 22:07:44 xubuntu NetworkManager[27154]: <debug> [nm-dhcp-client.c:148] nm_dhcp_client_stop_pid(): Sending SIGTERM to 27161.
Mar 4 22:07:44 xubuntu NetworkManager[27154]: <info> (eth0): canceled DHCP transaction, DHCP client pid 27161

Ivan Larionov (xeron-oskom) wrote :

Syslog:

Mar 5 21:36:10 animeworld NetworkManager[16725]: <debug> [1394040970.362890] [nm-netlink-monitor.c:164] link_msg_handler(): netlink link message: iface idx 2 flags 0x11043
Mar 5 21:36:11 animeworld NetworkManager[16725]: <info> (eth0): IP6 addrconf timed out or failed.
Mar 5 21:36:11 animeworld NetworkManager[16725]: <debug> [1394040971.362602] [NetworkManagerUtils.c:637] nm_utils_do_sysctl(): sysctl: setting '/proc/sys/net/ipv6/conf/eth0/disable_ipv6' to '0'
Mar 5 21:36:11 animeworld NetworkManager[16725]: <info> Activation (eth0) Stage 4 of 5 (IPv6 Configure Timeout) scheduled...
Mar 5 21:36:11 animeworld NetworkManager[16725]: <info> Activation (eth0) Stage 4 of 5 (IPv6 Configure Timeout) started...
Mar 5 21:36:11 animeworld NetworkManager[16725]: <info> Activation (eth0) Stage 4 of 5 (IPv6 Configure Timeout) complete.
Mar 5 21:36:21 animeworld NetworkManager[16725]: <debug> [1394040981.549314] [nm-ip6-manager.c:1340] netlink_notification(): netlink event type 24
Mar 5 21:36:35 animeworld NetworkManager[16725]: <info> caught signal 15, shutting down normally.
Mar 5 21:36:35 animeworld NetworkManager[16725]: <debug> [1394040995.480639] [nm-sleep-monitor-systemd.c:78] drop_inhibitor(): Dropping systemd sleep inhibitor
Mar 5 21:36:35 animeworld dnsmasq[16746]: exiting on receipt of SIGTERM
Mar 5 21:36:35 animeworld NetworkManager[16725]: <debug> [1394040995.507176] [nm-dns-manager.c:591] update_dns(): updating resolv.conf
Mar 5 21:36:35 animeworld NetworkManager[16725]: <info> Writing DNS information to /sbin/resolvconf
Mar 5 21:36:35 animeworld NetworkManager[16725]: <info> exiting (success)

Strace in attachment.

Ivan Larionov (xeron-oskom) wrote :

NM with patch: https://launchpad.net/~xeron-oskom/+archive/network-manager

BTW I don't see debug info from patch you provided.

Steve Dodd (anarchetic) wrote :

Interesting, your setup seems to skip a whole bunch of steps .. starting with:

device state change: activated -> disconnected

Will have a look at the source and see how that could get skipped. Anything odd about your system - did you configure your wired connection in n-m "by hand"?

Steve Dodd (anarchetic) wrote :

Hmm, my best guess is that your connection never makes it to an "activated" state .. can you grep your syslog for 'state ch.*activated'?

Ivan Larionov (xeron-oskom) wrote :

Connection configured automatically.

State changed to activated during nm start. Log attached.

Ivan Larionov (xeron-oskom) wrote :

Created new ethernet connection with all defaults — same isue, dhclient isn't stopped after stopping NM.

Steve Dodd (anarchetic) wrote :

Color me baffled. Try adding some more debug??

diff -ur network-manager-0.9.8.8/src/nm-policy.c network-manager-0.9.8.8.sd/src/nm-policy.c
--- network-manager-0.9.8.8/src/nm-policy.c 2014-01-26 17:37:05.000000000 +0000
+++ network-manager-0.9.8.8.sd/src/nm-policy.c 2014-03-06 22:15:15.621617528 +0000
@@ -1971,13 +1971,16 @@
  const GSList *active, *iter;

  active = nm_manager_get_active_connections (manager);
+ nm_log_dbg(LOGD_CORE,"Searching for connection '%s' (%p)", nm_connection_get_id(connection),connection);
  for (iter = active; iter; iter = g_slist_next (iter)) {
   NMActiveConnection *ac = iter->data;
   NMActiveConnectionState state = nm_active_connection_get_state (ac);
   GError *error = NULL;

+ nm_log_dbg(LOGD_CORE,"Found connection '%s' (%p)", nm_connection_get_id(connection),connection);
   if (nm_active_connection_get_connection (ac) == connection &&
       (state <= NM_ACTIVE_CONNECTION_STATE_ACTIVATED)) {
+ nm_log_dbg(LOGD_CORE,"Deactivating connection '%s' (%p)", nm_connection_get_id(connection),connection);
    if (!nm_manager_deactivate_connection (manager,
                                           nm_active_connection_get_path (ac),
                                           NM_DEVICE_STATE_REASON_CONNECTION_REMOVED,

Ivan Larionov (xeron-oskom) wrote :

Looks like it doesn't print this messages as well.

I verified my package and I see this patches applied. No idea why there're no debug messages from your patches.

Seems that code path isn't being triggered at all for you. Interestingly,
older versions exhibit the same issue for me. Going to try to isolate the
commit that "fixed" things for me (binary search time!) .. hopefully then
will have some understanding of what's going on.
On 7 Mar 2014 18:01, "Ivan Larionov" <email address hidden> wrote:

> Looks like it doesn't print this messages as well.
>
> I verified my package and I see this patches applied. No idea why
> there're no debug messages from your patches.
>
> ** Attachment added: "nm_start_stop_2.log"
>
> https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1169614/+attachment/4012309/+files/nm_start_stop_2.log
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1169614
>
> Title:
> /etc/init.d/umountroot: "mount: / is busy" due to dhclient is not
> stopped
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1169614/+subscriptions
>

Steve Dodd (anarchetic) wrote :

OK, the commit that fixed things for me was upstream commit 52b3810b0e0d87bb47c6d77627d49fbb323d804d, "fix NMPolicy/NMManager refcounting." Not familiar with exactly how git works, but that seems to have been cherry picked and reapplied to the main branch as 8c167c1f8f4338ccdc9906a9595e02c67fc1d9ca:

http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=8c167c1f8f4338ccdc9906a9595e02c67fc1d9ca

Somewhere between 0.9.8.2 and 0.9.8.4.

So I'm guessing its some sequence of destructor calls that is causing my dhclient to get terminated, and for some reason that's not happening for you.

Ivan Larionov (xeron-oskom) wrote :

So what should we do to get this bug fixed? Report in to NM developers?

Steve Dodd (anarchetic) wrote :

Not sure it's an upstream bug - if you read back, there used to be
Debian-specific code wrapped in an #ifdef, which depends on a
preprocessor symbol that is no longer defined .. if you've got time you
could see if the seem problem ever existed in Debian, and if so, if/how
they fixed it ..

S.

On Tue, Mar 11, 2014 at 08:11:34AM -0000, Ivan Larionov wrote:
> So what should we do to get this bug fixed? Report in to NM developers?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1169614
>
> Title:
> /etc/init.d/umountroot: "mount: / is busy" due to dhclient is not
> stopped
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1169614/+subscriptions

Ivan Larionov (xeron-oskom) wrote :

Couldn't find anything like this in debian bugtracker.

Ivan Larionov (xeron-oskom) wrote :

BTW it stops dhclient if I do disconnect from gnome's network-manager menu.

Steve Dodd (anarchetic) wrote :

Interesting .. should be possible to figure out the different code
paths, next time I have time to have a look...

On Wed, Mar 12, 2014 at 07:17:11PM -0000, Ivan Larionov wrote:
> BTW it stops dhclient if I do disconnect from gnome's network-manager
> menu.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1169614
>
> Title:
> /etc/init.d/umountroot: "mount: / is busy" due to dhclient is not
> stopped
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1169614/+subscriptions

Max (m-gorodok) wrote :

The bug was apparent in 13.10, but I do not see it after upgrade to 14.04 (amd64).
I do not see dhclient in lsof output but I have not check with ps.

The bad surprise is that /etc/default/halt is ignored in 14.04.
(Consequence of systemd inclusion?)

Ivan Larionov (xeron-oskom) wrote :

This bug is fixed for me by:

network-manager (0.9.8.8-0ubuntu6) trusty; urgency=medium

  [ Martin Pitt ]
  * Mark autopkgtests with "isolation-machine" as these tests can't run in
    schroot or LXC.

  [ Felix Geyer ]
  * Don't check for TARGET_DEBIAN in patches. Upstream has removed the macro
    so just unconditionally enable the code in the patches. (LP: #1296226)
    This fixes the regression of the following patches:
    - quit_dhclient_on_exit.patch (LP: #963106)
    - lp990011_use_tempaddr_sysctl_default.patch (LP: #990011)
    - add_sendsigs_omissions.patch (LP: #869635)

 -- Christopher James Halse Rogers <email address hidden> Tue, 08 Apr 2014 10:32:10 +1000

Anyone else seeing this issue?

Changed in network-manager (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers