systemctl stop networking hang / timeout

Bug #1551415 reported by Scott Moser
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
ifupdown (Ubuntu)
Confirmed
High
Unassigned

Bug Description

I noticed today on lxd that the latest xenial image is failing to bring down networking
$ cat /etc/cloud/build.info
  build_name: server
  serial: 20160227-141431

to reproduce:

$ lxd-images import ubuntu --alias xenial xenial --stream=daily
$ lxc launch xenial xtest
# wait a bit
$ lxc exec xtest /bin/bash

% systemctl stop networking
... long time passes ....

% systemctl status networking
● networking.service - Raise network interfaces
   Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
  Drop-In: /run/systemd/generator/networking.service.d
           └─50-insserv.conf-$network.conf
   Active: failed (Result: timeout) since Mon 2016-02-29 20:28:14 UTC; 5s ago
     Docs: man:interfaces(5)
  Process: 1241 ExecStop=/sbin/ifdown -a --read-environment (code=exited, status=0/SUCCESS)
  Process: 951 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=0/SUCCESS)
  Process: 948 ExecStartPre=/bin/sh -c [ "$CONFIGURE_INTERFACES" != "no" ] && [ -n "$(ifquery --read-environme
 Main PID: 951 (code=exited, status=0/SUCCESS)

Feb 29 20:26:35 xtest dhclient[992]: DHCPOFFER of 10.0.3.238 from 10.0.3.1
Feb 29 20:26:35 xtest dhclient[992]: DHCPACK of 10.0.3.238 from 10.0.3.1
Feb 29 20:26:35 xtest ifup[951]: DHCPACK of 10.0.3.238 from 10.0.3.1
Feb 29 20:26:35 xtest ifup[951]: bound to 10.0.3.238 -- renewal in 1669 seconds.
Feb 29 20:26:35 xtest systemd[1]: Started Raise network interfaces.
Feb 29 20:26:44 xtest systemd[1]: Stopping Raise network interfaces...
Feb 29 20:28:14 xtest systemd[1]: networking.service: State 'stop-sigterm' timed out. Killing.
Feb 29 20:28:14 xtest systemd[1]: Stopped Raise network interfaces.
Feb 29 20:28:14 xtest systemd[1]: networking.service: Unit entered failed state.
Feb 29 20:28:14 xtest systemd[1]: networking.service: Failed with result 'timeout'.
root@xtest:~# ubuntu-bug /lib/systemd/system/networking.service

Related bugs:
 * bug 1551351: dhclient does not renew leases

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: ifupdown 0.8.10ubuntu1
ProcVersionSignature: Ubuntu 4.4.0-8.23-generic 4.4.2
Uname: Linux 4.4.0-8-generic x86_64
ApportVersion: 2.20-0ubuntu3
Architecture: amd64
Date: Mon Feb 29 20:28:28 2016
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
SourcePackage: ifupdown
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Scott Moser (smoser) wrote :
description: updated
Revision history for this message
Scott Moser (smoser) wrote :

it does reproduce after that with simply:
% systemctl start networking
% systemctl stop networking

Revision history for this message
Scott Moser (smoser) wrote :

does not seem to be lxc specific. Ii reproduced on a cloud-image also.

Revision history for this message
Stefan Bader (smb) wrote :

Maybe that is actually related to the dhclient problem. I wonder, if you kill that manually before the ifdown (kill -11) does get rid of the long delay?

Scott Moser (smoser)
description: updated
Changed in ifupdown (Ubuntu):
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Scott Moser (smoser) wrote :

Stefan, yeah. kill -11 "fix"es
root@test-x1:~# systemctl start networking
root@test-x1:~# pidof dhclient
1332
root@test-x1:~# kill -11 $(pidof dhclient)
root@test-x1:~# time systemctl stop networking

real 0m0.212s
user 0m0.008s
sys 0m0.000s

Revision history for this message
Doug McMahon (mc3man) wrote :

Have no idea if related to here but just recently on plain old Ubuntu Desktop installs of 16.04 restarts have slowed done considerably.
For me about 4-5 times slower, actual numbers are 2-3 sec. normally, now 10-15 secs. Others seemed affected with different numbers.
If I disconnect from Network, ( ethernet) prior to a restart then expected 2-3 sec. returns

Revision history for this message
Stefan Bader (smb) wrote :

That sounds like it is related. While doing more debug I found that network-manager starts dhclient in forground mode (-d) and that is not affected by bug 1551351. But regardless of how it was started, dhclient seem not to react on SIGHUP. Maybe disconnecting from the network manually uses a more aggressive method of stopping the dhclient than just doing a shutdown or reboot. The system probably goes into the mode of first trying to send SIGHUP to running processes, then wait a bit and finally doing a SIGTERM. That would explain the increased time.

Revision history for this message
costinel (costinel) wrote :
Download full text (9.5 KiB)

how do I debug this?

if i manually run 'ifup -a --read-environment' or 'systemctl start networking' it brings all stuff up cleanly.
by 'debug' i mean more than just looking at journalctl -u networking

below contents of network/interfaces, journalctl -u networking and contents of netwroking unit

---------

[Unit]
Description=Raise network interfaces
Documentation=man:interfaces(5)
DefaultDependencies=no
Wants=network.target
After=local-fs.target network-pre.target apparmor.service systemd-sysctl.service systemd-modules-load.service
Before=network.target shutdown.target network-online.target
Conflicts=shutdown.target

[Install]
WantedBy=multi-user.target
WantedBy=network-online.target

[Service]
Type=oneshot
EnvironmentFile=-/etc/default/networking
ExecStartPre=-/bin/sh -c '[ "$CONFIGURE_INTERFACES" != "no" ] && [ -n "$(ifquery --read-environment --list --exclude=lo)" ] && udevadm settle'
ExecStart=/sbin/ifup -a --read-environment
ExecStop=/sbin/ifdown -a --read-environment --exclude=lo
RemainAfterExit=true
TimeoutStartSec=3sec #modified from default which just made the machine hang on boot until timeout

------

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
#allow-hotplug eno1
#allow-hotplug ipsec0
auto eno1
auto ipsec0

iface eno1 inet manual
        up ip link set eno1 up

iface ipsec0 inet static
        bridge_ports eno1
        address 192.168.15.113
        netmask 255.255.255.0
        gateway 192.168.15.254
        # dns-* options are implemented by the resolvconf package, if installed
        dns-nameservers 8.8.8.8
        #link-speed 1000
        #link-duplex full
        #ethernet-autoneg off
        #dns-search project.local
        #up dnsmasq -s lan -S //192.168.15.113 -u root --strict-order --pid-file=/run/dnsmasq-lan.pid --dhcp-no-override --except-interface=lo --interface=ipsec0 --dhcp-leasefile=/var/lib/dnsmasq.lan.leases --dhcp-authoritative --listen-address 192.168.15.113 --dhcp-range 192.168.15.101,192.168.15.199 --dhcp-lease-max=252 --bind-interfaces --dhcp-option-force=option:router,192.168.15.254 --dhcp-option-force=option:ntp-server,192.168.15.113
        up dnsmasq -s project -S //192.168.15.113 -u root --strict-order --pid-file=/run/dnsmasq-project.pid --dhcp-no-override --except-interface=lo --interface=ipsec0 --dhcp-leasefile=/var/lib/dnsmasq.project.leases --dhcp-authoritative --listen-address 192.168.15.113 --dhcp-range 192.168.15.101,192.168.15.199 --dhcp-lease-max=252 --bind-interfaces --dhcp-option-force=option:router,192.168.15.254 --dhcp-option-force=option:ntp-server,192.168.15.113
        down kill $(cat /run/dnsmasq-project.pid)

#allow-hotplug eno2
auto eno2
iface eno2 inet static
        address 192.168.100.100
        netmask 255.255.255.0
        #link-speed 100
        #link-duplex full
        #ethernet-autoneg off
        #gateway
        # dns-* options are implemented by the resolvconf package, if installed
        #dns-nameservers 10.10.200.10
        #dns-search proj...

Read more...

Revision history for this message
costinel (costinel) wrote :
Revision history for this message
costinel (costinel) wrote :

this is workaround for me that runs correctly:

in rc.local:

(systemctl stop networking.service; systemctl start networking.service; systemctl restart networking.service; true) &

if i put the commands one by one, rc.local unit will fail because systemctl start exits non-zero. notice the 'restart' action for the second command.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.