NetworkManager doesn't respond to SIGTERM in daemon mode

Bug #1124803 reported by Daniel Gnoutcheff on 2013-02-14
66
This bug affects 12 people
Affects Status Importance Assigned to Milestone
NetworkManager
Fix Released
Medium
network-manager (Gentoo Linux)
New
Undecided
Unassigned
network-manager (Ubuntu)
Undecided
Unassigned

Bug Description

NetworkManager does not respond to SIGTERM when started in daemon mode, as upstart does by default. This also happens when running NetworkManager manually from the command line as long as the --no-daemon option is not used.

This was reported upstream and was fixed by commit
  64342a313ef497fca8a4fb7567900d4a1460065f

It appears that this bug forces upstart to SIGKILL NetworkManager during shutdown, which, amoung other things, means that n-m doesn't deconfigure network interfaces as intended. This bug, along with bug 1124789 (in wpasupplicant), means that no attempt is made to disconnect from a wifi network during shutdown. This in turn triggers a bug in my BIOS which causes this system to hardlock in the middle of BIOS post when I reboot while still connected to a wifi network:
  http://thread.gmane.org/gmane.linux.kernel.wireless.general/102862

ProblemType: Bug
DistroRelease: Ubuntu 12.10
Package: network-manager 0.9.6.0-0ubuntu7
ProcVersionSignature: Ubuntu 3.5.0-24.37-generic 3.5.7.4
Uname: Linux 3.5.0-24-generic x86_64
ApportVersion: 2.6.1-0ubuntu10
Architecture: amd64
Date: Wed Feb 13 22:32:52 2013
IpRoute:
 default via 10.179.201.1 dev eth0 proto static
 10.179.201.0/24 dev eth0 proto kernel scope link src 10.179.201.80 metric 1
 10.179.201.0/24 dev wlan0 proto kernel scope link src 10.179.201.105 metric 9
 169.254.0.0/16 dev eth0 scope link metric 1000
MarkForUpload: True
NetworkManager.state:
 [main]
 NetworkingEnabled=true
 WirelessEnabled=true
 WWANEnabled=true
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
SourcePackage: network-manager
UpgradeStatus: Upgraded to quantal on 2012-10-20 (116 days ago)
WpaSupplicantLog:

mtime.conffile..etc.NetworkManager.NetworkManager.conf: 2012-03-21T22:57:56
nmcli-dev:
 DEVICE TYPE STATE DBUS-PATH
 wlan0 802-11-wireless connected /org/freedesktop/NetworkManager/Devices/1
 eth0 802-3-ethernet connected /org/freedesktop/NetworkManager/Devices/0
nmcli-nm:
 RUNNING VERSION STATE NET-ENABLED WIFI-HARDWARE WIFI WWAN-HARDWARE WWAN
 running 0.9.6.0 connected enabled enabled enabled enabled enabled

Daniel Gnoutcheff (gnoutchd) wrote :
description: updated
summary: - NetworkManager doesn't respond to SIGTERM
+ NetworkManager doesn't respond to SIGTERM in daemon mode
Changed in network-manager:
importance: Unknown → Medium
status: Unknown → Fix Released
tags: added: string-fix
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in network-manager (Ubuntu):
status: New → Confirmed
Max (m-gorodok) wrote :

Please, issue an update for 12.10 quantal (network-manager 0.9.6.0-0ubuntu7)
Due to this bug root filesystem can not be cleanly unmounted on shutdown.

Upstart launch NetworkManager as a daemon.
If a system-wide network connection is established,
dhclient and dnsmasqd open files for writing in /var directory.
During shutdown upstart sends SIGTERM to NetworkManager,
but the signal is blocked. 5 seconds later NetworkManager
is killed (SIGKILL) by upstart, but dhclient and dnsmasq are alive.
Also the directory /var/run/sendsigs.omit.d/ contains the following files

network-manager.dhclient-eth0.pid network-manager.dnsmasq.pid

At the very end /etc/init.d/umountroot can not unmount / filesystem
due to files open for writing by dhclient in /var/lib/dhcp

mount: / is busy

Message appears before poweroff. To make it apparent use
sudo halt
command.

There are a bunch of bugs (e.g. #1061639, #1073433, etc)
sometimes with mistereous comments and workarounds.
They can be connected with this issue.

I suppose that filestem recovery on every boot is strong reason for
package update in quantal

Daniel J Blueman (watchmaker) wrote :

From the upstream report [1], the patch that resolves this regression has been applied to networkmanager's master branch, so Ubuntu needs to carry this patch until it is included upstream:

https://bug683932.bugzilla-attachments.gnome.org/attachment.cgi?id=224204

[1] https://bugzilla.gnome.org/show_bug.cgi?id=683932

Dimitri John Ledkov (xnox) wrote :

I believe this bug is fixed in Ubuntu Raring Ringtail with this upload:
https://launchpad.net/ubuntu/+source/network-manager/0.9.8.0-0ubuntu1

As the 64342a313ef497fca8a4fb7567900d4a1460065f is part of the 0.9.8.0 release.

If you need a fix for the bug in previous versions of Ubuntu, please do steps 1 and 2 of the SRU Procedure [1] to bring the need to a developer's attention.

 E-mail me about changes to this bug report

[1]: https://wiki.ubuntu.com/StableReleaseUpdates#Procedure

Changed in network-manager (Ubuntu):
status: Confirmed → Fix Released
Ernie 07 (ernestboyd) wrote :

64-bit D07-1304 3.8.0-15-generic #25-Ubuntu SMP Wed Mar 27 19:19:30 UTC 2013 downloaded via the 2013-04_01 build FAILS!

1. Boot system
2. Enable networking
3. Shutdown system
4. Boot an alternate system fsck check throws errors EVERY time.

Disabling networking prior to shutdown results in a proper shutdown with NO fsck errors reported via an alternate system.

These symptoms which have existed since the release of 12.10 are obvious, 100% repeatable and can give Ubuntu developers the reputation that they were trained in Redmond by the Vista team.

Russell Faull (rfaull) wrote :

Ernie, what version of Network-Manager is installed?

Ernie 07 (ernestboyd) wrote :

NetworkManger version = 0.9.8.0

Max (m-gorodok) wrote :

Ernie, could you, please, test if network manager stops correctly on SIGTERM (immediately) or on SIGKILL (with 5 second pause) after
sudo stop network-manager
To start it again
sudo start network-manager

Another interesting point is last messages on shutdown:
sudo halt

I rebuild network-manager 0.9.6.0-0ubuntu7 with just one patch that set signal handlers
at the correct moment just after fork. I do not see any problem with 3.8.3 i386 kernel.

Ernie 07 (ernestboyd) wrote :

Hi Max,

Using 64-bit 3.8.0-16-generic #26-Ubuntu SMP Mon Apr 1 19:52:57 UTC 2013 from the 2013-04_02 daily build:

1. NetworkManager stopped correctly (immediately) via sudo stop network-manager.
2. A reboot to an alternate system and fsck of the system under test presented errors.

Regardless of whether I stopped or restarted NetworkManager, as long as I manually unchecked Enable Networking, a clean shutdown would occur and a subsequent fsck would show no errors.

A process (maybe more than one) is being gracefully shut down when I manually uncheck Enable Networking but is not getting properly shutdown via sudo stop network-manager. Hope this data point is helpful.

Max (m-gorodok) wrote :

kernel
3.8.3-030803-generic #201303141650 SMP Thu Mar 14 20:58:52 UTC 2013 i686 i686 i686 GNU/Linux
network-manager:
  Installed: 0.9.8.0-0ubuntu2
and related libnm* and libnl* libraries from raring
on the top of 12.10 quantal

The system shuts down correctly. Although I have not tried x86_64.

Ernie, I can only suggest to try
sudo stop network-manager
and check
ls -l /run/sendsigs.omit.d/
should be no files related to neworkmanager
ps axw | grep -ie '\(netw\|dhcl\|dns\)'

Have you tried any workarounds or debug scripts from similar bugs
(/etc/init.d/networking, some new jobs in /etc/inti,
scripts to save list of open files and running processes, etc.)?

Perhaps, there is something meaningful in the very last messages.
Try to login from a tty ([Ctrl+Alt+F1]),
sudo halt
and press [Ctrl+Alt+F1] again to suppress plymouth image.

The next step would be rising upstart log-priority.

Max (m-gorodok) wrote :

A have checked x86_64 Quantal 12.10 as well.
I do not see any problem with busy /.
NetworkManager 0.9.8.0.

The only minor issue is 10 seconds pause during shutdown
due to /etc/network/if-post-down.d/avahi-daemon,
but it is likely caused by the provider and his
subdomain in .local zone. Root filesystem for sure
is cleanly unmouned.

Personally, I would vote for the update of 0.9.6.0 including just
the patch fixing this issue, not the Raring 0.9.8

Ernie, I should ask if your problem can be connected to some
edited configs or debug scripts. You can try to inspect
running processes and open files (lsof) as it is mentioned
in comments to Bug lp: #1073433

Ernie 07 (ernestboyd) wrote :

I strongly believe this problem is associated with termination rather than startup.

I can establish conditions for failure by checking enable networking or prevent failure by unchecking enable networking prior to reboot, restart or shutdown.

Using a separate OS (12.04), fsck can be used to demonstrate the failure of the 13.04 under test 100% of the time.

Max (m-gorodok) wrote :

Ernie, I can confirm that 12.10 with original network-manager
0.9.6.0 has this problem and it appears during shutdown.

I do not play with unchecking of networking, it is always enabled.

The patch mentioned in the comments fixes the issue for me.
Another option is to install to 12.10 the following packages
for Raring:
libnl-3-200_3.2.16-0ubuntu1_amd64.deb
libnl-genl-3-200_3.2.16-0ubuntu1_amd64.deb
libnl-route-3-200_3.2.16-0ubuntu1_amd64.deb
libnm-glib4_0.9.8.0-0ubuntu2_amd64.deb
libnm-util2_0.9.8.0-0ubuntu2_amd64.deb
network-manager_0.9.8.0-0ubuntu2_amd64.deb

I have tested i686 and x86_64, with updated
network-manager both systems cleanly unmount /.

If you see the problem, please, provide
the list of running processes and files open
for writing when root filesystem is about
to be remounted for readonly.

Ernie 07 (ernestboyd) wrote :

Max,

I have given up on 12.10 but would like to make use of 13.04.

The Raring daily amd64.iso contains network-manager version 9.8.0 and it exhibits the same failure pattern as 9.6.0 in 12.04.
The failure is 100% reproducible and will manifest each time the OS is shut down UNLESS enable networking has been unchecked first.

Can you get your fix into the raring daily build?

Russell Faull (rfaull) wrote :

Ernie, have you tried temporarily replacing NM with wicd on 13.04? (See https://help.ubuntu.com/community/WICD)

When I use wicd, there is always a clean shutdown, without any delays. On my 12.10 installation using NM 9.6.0, I can get clean shutdowns by unchecking 'available to all users'. Checked, on the current network connection results in an unclean shutdown -- 100% reproducible.

Max (m-gorodok) wrote :

Russel, have you tested network-manager 0.9.8.0 and related
libraries form Raring or just network-manager 0.9.6.0
rebuilt with the following patch?
https://bug683932.bugzilla-attachments.gnome.org/attachment.cgi?id=224204

I expect that it should work with always enabled network
available for all users.

Russell Faull (rfaull) wrote :

Max, I have not tested either. I am using the latest 12.10 version (0.9.6.0-0ubuntu7).

I would be happy to test further. Is there a binary version available of 0.9.6.0 with the patch you mentioned?

BTW, my machine shuts down in 5-8 seconds using wicd, compared with 20-25 seconds with NM installed. As well, all processes seem to end normally. With NM installed, 'Killing all processes' fails before an otherwise clean shutdown, of course this is with unchecked 'available to all users', otherwise it's unclean.

Max (m-gorodok) wrote :

Ernie, you have faced another bug in NetworkManager.
The comments are in
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1073433/comments/102
so you should post comments there
or someone should open a dedicated bug.
I have removed duplication link.

In 12.10 Quantal NetworkManager 0.9.8.0 does not open
a lease file in /var/lib/NetworkManager so I do not
think that version from Raring is desired in Quantal.

I would like to see in 12.10 Quantal patched
network-manager 0.9.6.0.
It stops dhclient on 'stop network-manager'

To Russell: I built patched network-manager myself.

Francisco Reverbel (reverbel) wrote :

After I replaced network-manager with wicd on 12.10 everything works perfectly. Now shutdown is much faster -- no delays at all. And I got rid of the hacked init scripts that I was using to circumvent the bug.

Thanks for mentioning wicd, Russell.

Ernie 07 (ernestboyd) wrote :

Unfortunately wicd would not install on amd64.deb for me. Failure indicated partway through the installation.

Ernie 07 (ernestboyd) wrote :

Forget to mention that the wicd failure occurred on 13.04 amd64.deb.

Max (m-gorodok) wrote :

NetworkManager 0.9.8.0 from Raring is not a solution.
It use /var/lib/NetworkManager/ directory to store .lease
files. It is not allowed by apparmor policy
isc-dhcp-client: /etc/apparmor/init/network-interface-security/sbin.dhclient

If permissions for /var/lib/NetworkManager is granted to dhclient,
dhclient is not stopped after 'stop network-manager'.
This leads to problems during shutdown.
Since network-manager-0.9.8.0 does not clean
/run/sendsigs.omit.d, dhclient is ignored by /etc/init.d/sendsigs.
When /etc/init.d/umountroot is running, dhclient is alive
and his lease file is open for writing.

The result is
mount: / is busy
and filesystem recovery during next boot.

If the patch, that fixes signal handling thread,
is applied to network-manager-0.9.6.0 then
dhcleint is stopped with network-manager
and root filesystem can be remounted readonly.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.