networking.service hangs on shutdown -- killing dhclient has no effect any more

Bug #1556175 reported by Martin Pitt on 2016-03-11
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
bind9 (Ubuntu)
High
LaMont Jones
isc-dhcp (Ubuntu)
High
Unassigned

Bug Description

On current xenial, LXD containers take ages to reboot:

lxc launch images:ubuntu/xenial/amd64 x1
lxc exec x1 reboot

"systemctl list-jobs" shows:

117 networking.service stop running

root@x1:~# systemctl status networking
● networking.service - Raise network interfaces
   Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
  Drop-In: /run/systemd/generator/networking.service.d
           └─50-insserv.conf-$network.conf
   Active: deactivating (stop-sigterm) since Fri 2016-03-11 16:03:51 UTC; 21s ago
     Docs: man:interfaces(5)
  Process: 187 ExecStop=/sbin/ifdown -a --read-environment (code=exited, status=0/SUCCESS)
  Process: 55 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=0/SUCCESS)
  Process: 49 ExecStartPre=/bin/sh -c [ "$CONFIGURE_INTERFACES" != "no" ] && [ -n "$(ifquery --r
 Main PID: 55 (code=exited, status=0/SUCCESS)
    Tasks: 1 (limit: 512)
   CGroup: /system.slice/networking.service
           └─146 /sbin/dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0

Mar 11 16:03:52 x1 dhclient[207]: For info, please visit https://www.isc.org/software/dhcp/
Mar 11 16:03:52 x1 dhclient[207]:
Mar 11 16:03:52 x1 dhclient[207]: Listening on LPF/eth0/00:16:3e:5a:f5:97
Mar 11 16:03:52 x1 ifdown[187]: Listening on LPF/eth0/00:16:3e:5a:f5:97
Mar 11 16:03:52 x1 ifdown[187]: Sending on LPF/eth0/00:16:3e:5a:f5:97
Mar 11 16:03:52 x1 ifdown[187]: Sending on Socket/fallback
Mar 11 16:03:52 x1 dhclient[207]: Sending on LPF/eth0/00:16:3e:5a:f5:97
Mar 11 16:03:52 x1 dhclient[207]: Sending on Socket/fallback
Mar 11 16:03:52 x1 dhclient[207]: DHCPRELEASE on eth0 to 10.0.3.1 port 67 (xid=0x6480ef2b)
Mar 11 16:03:52 x1 ifdown[187]: DHCPRELEASE on eth0 to 10.0.3.1 port 67 (xid=0x6480ef2b)

The above is also the last journal line that I get for the usual 90s timeout, then the container finally reboots.

Not many processes are actually running any more:

root 1 0.0 0.0 37076 5304 ? Ss 16:03 0:00 /sbin/init
root 38 0.0 0.0 41572 3316 ? Ss 16:03 0:00 /lib/systemd/systemd-udevd
root 41 0.0 0.0 35276 5668 ? Ss 16:03 0:00 /lib/systemd/systemd-journald
root 146 0.0 0.0 181376 2664 ? Ss 16:03 0:00 /sbin/dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases -I -df /var/lib/dhcp/dhclient6.eth0.leases eth0

but apparently the dhclient instance refuses to die (note that ifdown -a itself is not running any more).

Martin Pitt (pitti) wrote :

This is actually unrelated to rebooting -- "ifdown -a" returns immediately and leaves eth0 DOWN, but does not kill dhclient despite saying "Killed old client process".

Changed in ifupdown (Ubuntu):
assignee: nobody → Martin Pitt (pitti)
status: New → Triaged
importance: Undecided → High
milestone: none → ubuntu-16.03
summary: - networking.service hangs on shutdown in LXD
+ networking.service hangs on shutdown in LXD / ifdown fails to kill
+ dhclient
summary: - networking.service hangs on shutdown in LXD / ifdown fails to kill
- dhclient
+ networking.service hangs on shutdown -- ifdown fails to kill dhclient

strace shows that ifdown does read the pid and SIGTERMs dhclient correctly:

201 open("/run/dhclient.eth0.pid", O_RDONLY) = 7
201 read(7, "146\n", 4096) = 4
201 kill(146, SIGTERM) = 0

But after that pid 146 (dhclient) is still running happily. "kill 146" also does not help (just kill -9 of course). This is apparently the real regression, in wily and before you can TERM dhclient.

Martin Pitt (pitti) on 2016-03-11
affects: ifupdown (Ubuntu) → isc-dhcp (Ubuntu)
Changed in isc-dhcp (Ubuntu):
assignee: Martin Pitt (pitti) → nobody
tags: added: regression-release xenial
Changed in isc-dhcp (Ubuntu):
assignee: nobody → Martin Pitt (pitti)
summary: - networking.service hangs on shutdown -- ifdown fails to kill dhclient
+ networking.service hangs on shutdown -- killing dhclient has no effect
+ any more
Martin Pitt (pitti) wrote :

Indeed dhclient now blocks SIGTERM -- after sending it, it gets

ShdPnd: 0000000000004000
SigBlk: 0000000000004003

Downgrading to 4.3.3-5ubuntu1 fixes the bug, so the regression is not in the new upstream version nor in Debian, but in one of our recent Ubuntu changes.

Martin Pitt (pitti) wrote :

Further bisection shows that 4.3.3-5ubuntu5 is the last version that works, and 4.3.3-5ubuntu7 is the first version that fails (ubuntu6 cannot be tested as that FTBFSed).

The corresponding diff is http://launchpadlibrarian.net/242313487/isc-dhcp_4.3.3-5ubuntu5_4.3.3-5ubuntu7.diff.gz -- the change was that this now links to the non-export bind libraries. There is no rationale/explanation of that, but supposedly the internal libraries handle SIGTERM differently.

Martin Pitt (pitti) wrote :

Lamont apparently knows what's going on. He'll restore the -export versions soon, so that we can switch isc-dhcp back to those.

Changed in isc-dhcp (Ubuntu):
assignee: Martin Pitt (pitti) → nobody
Changed in bind9 (Ubuntu):
assignee: nobody → LaMont Jones (lamont)
importance: Undecided → High
status: New → Triaged
LaMont Jones (lamont) on 2016-03-16
Changed in bind9 (Ubuntu):
status: Triaged → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package isc-dhcp - 4.3.3-5ubuntu10

---------------
isc-dhcp (4.3.3-5ubuntu10) xenial; urgency=low

  [ Stefan Bader ]
  * Revert back to bind against bind9 export libraries
    (LP: #1556175, LP: #1551351)

  [ LaMont Jones ]
  * libirs-export requires libisccfg-export now

 -- Stefan Bader <email address hidden> Thu, 17 Mar 2016 17:45:53 +0100

Changed in isc-dhcp (Ubuntu):
status: Triaged → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package bind9 - 1:9.10.3.dfsg.P4-3

---------------
bind9 (1:9.10.3.dfsg.P4-3) experimental; urgency=medium

  * Fix vcs links
  * build in debian/tmp, use bind9.install

 -- LaMont Jones <email address hidden> Fri, 18 Mar 2016 14:46:30 -0600

Changed in bind9 (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers