systemctl stop networking hang / timeout

Bug #1551415 reported by Scott Moser on 2016-02-29
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
ifupdown (Ubuntu)
High
Unassigned

Bug Description

I noticed today on lxd that the latest xenial image is failing to bring down networking
$ cat /etc/cloud/build.info
  build_name: server
  serial: 20160227-141431

to reproduce:

$ lxd-images import ubuntu --alias xenial xenial --stream=daily
$ lxc launch xenial xtest
# wait a bit
$ lxc exec xtest /bin/bash

% systemctl stop networking
... long time passes ....

% systemctl status networking
● networking.service - Raise network interfaces
   Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
  Drop-In: /run/systemd/generator/networking.service.d
           └─50-insserv.conf-$network.conf
   Active: failed (Result: timeout) since Mon 2016-02-29 20:28:14 UTC; 5s ago
     Docs: man:interfaces(5)
  Process: 1241 ExecStop=/sbin/ifdown -a --read-environment (code=exited, status=0/SUCCESS)
  Process: 951 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=0/SUCCESS)
  Process: 948 ExecStartPre=/bin/sh -c [ "$CONFIGURE_INTERFACES" != "no" ] && [ -n "$(ifquery --read-environme
 Main PID: 951 (code=exited, status=0/SUCCESS)

Feb 29 20:26:35 xtest dhclient[992]: DHCPOFFER of 10.0.3.238 from 10.0.3.1
Feb 29 20:26:35 xtest dhclient[992]: DHCPACK of 10.0.3.238 from 10.0.3.1
Feb 29 20:26:35 xtest ifup[951]: DHCPACK of 10.0.3.238 from 10.0.3.1
Feb 29 20:26:35 xtest ifup[951]: bound to 10.0.3.238 -- renewal in 1669 seconds.
Feb 29 20:26:35 xtest systemd[1]: Started Raise network interfaces.
Feb 29 20:26:44 xtest systemd[1]: Stopping Raise network interfaces...
Feb 29 20:28:14 xtest systemd[1]: networking.service: State 'stop-sigterm' timed out. Killing.
Feb 29 20:28:14 xtest systemd[1]: Stopped Raise network interfaces.
Feb 29 20:28:14 xtest systemd[1]: networking.service: Unit entered failed state.
Feb 29 20:28:14 xtest systemd[1]: networking.service: Failed with result 'timeout'.
root@xtest:~# ubuntu-bug /lib/systemd/system/networking.service

Related bugs:
 * bug 1551351: dhclient does not renew leases

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: ifupdown 0.8.10ubuntu1
ProcVersionSignature: Ubuntu 4.4.0-8.23-generic 4.4.2
Uname: Linux 4.4.0-8-generic x86_64
ApportVersion: 2.20-0ubuntu3
Architecture: amd64
Date: Mon Feb 29 20:28:28 2016
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
SourcePackage: ifupdown
UpgradeStatus: No upgrade log present (probably fresh install)

Scott Moser (smoser) wrote :
description: updated
Scott Moser (smoser) wrote :

it does reproduce after that with simply:
% systemctl start networking
% systemctl stop networking

Scott Moser (smoser) wrote :

does not seem to be lxc specific. Ii reproduced on a cloud-image also.

Stefan Bader (smb) wrote :

Maybe that is actually related to the dhclient problem. I wonder, if you kill that manually before the ifdown (kill -11) does get rid of the long delay?

Scott Moser (smoser) on 2016-03-01
description: updated
Changed in ifupdown (Ubuntu):
status: New → Confirmed
importance: Undecided → High
Scott Moser (smoser) wrote :

Stefan, yeah. kill -11 "fix"es
root@test-x1:~# systemctl start networking
root@test-x1:~# pidof dhclient
1332
root@test-x1:~# kill -11 $(pidof dhclient)
root@test-x1:~# time systemctl stop networking

real 0m0.212s
user 0m0.008s
sys 0m0.000s

Doug McMahon (mc3man) wrote :

Have no idea if related to here but just recently on plain old Ubuntu Desktop installs of 16.04 restarts have slowed done considerably.
For me about 4-5 times slower, actual numbers are 2-3 sec. normally, now 10-15 secs. Others seemed affected with different numbers.
If I disconnect from Network, ( ethernet) prior to a restart then expected 2-3 sec. returns

Stefan Bader (smb) wrote :

That sounds like it is related. While doing more debug I found that network-manager starts dhclient in forground mode (-d) and that is not affected by bug 1551351. But regardless of how it was started, dhclient seem not to react on SIGHUP. Maybe disconnecting from the network manually uses a more aggressive method of stopping the dhclient than just doing a shutdown or reboot. The system probably goes into the mode of first trying to send SIGHUP to running processes, then wait a bit and finally doing a SIGTERM. That would explain the increased time.

costinel (costinel) wrote :
Download full text (9.5 KiB)

how do I debug this?

if i manually run 'ifup -a --read-environment' or 'systemctl start networking' it brings all stuff up cleanly.
by 'debug' i mean more than just looking at journalctl -u networking

below contents of network/interfaces, journalctl -u networking and contents of netwroking unit

---------

[Unit]
Description=Raise network interfaces
Documentation=man:interfaces(5)
DefaultDependencies=no
Wants=network.target
After=local-fs.target network-pre.target apparmor.service systemd-sysctl.service systemd-modules-load.service
Before=network.target shutdown.target network-online.target
Conflicts=shutdown.target

[Install]
WantedBy=multi-user.target
WantedBy=network-online.target

[Service]
Type=oneshot
EnvironmentFile=-/etc/default/networking
ExecStartPre=-/bin/sh -c '[ "$CONFIGURE_INTERFACES" != "no" ] && [ -n "$(ifquery --read-environment --list --exclude=lo)" ] && udevadm settle'
ExecStart=/sbin/ifup -a --read-environment
ExecStop=/sbin/ifdown -a --read-environment --exclude=lo
RemainAfterExit=true
TimeoutStartSec=3sec #modified from default which just made the machine hang on boot until timeout

------

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
#allow-hotplug eno1
#allow-hotplug ipsec0
auto eno1
auto ipsec0

iface eno1 inet manual
        up ip link set eno1 up

iface ipsec0 inet static
        bridge_ports eno1
        address 192.168.15.113
        netmask 255.255.255.0
        gateway 192.168.15.254
        # dns-* options are implemented by the resolvconf package, if installed
        dns-nameservers 8.8.8.8
        #link-speed 1000
        #link-duplex full
        #ethernet-autoneg off
        #dns-search project.local
        #up dnsmasq -s lan -S //192.168.15.113 -u root --strict-order --pid-file=/run/dnsmasq-lan.pid --dhcp-no-override --except-interface=lo --interface=ipsec0 --dhcp-leasefile=/var/lib/dnsmasq.lan.leases --dhcp-authoritative --listen-address 192.168.15.113 --dhcp-range 192.168.15.101,192.168.15.199 --dhcp-lease-max=252 --bind-interfaces --dhcp-option-force=option:router,192.168.15.254 --dhcp-option-force=option:ntp-server,192.168.15.113
        up dnsmasq -s project -S //192.168.15.113 -u root --strict-order --pid-file=/run/dnsmasq-project.pid --dhcp-no-override --except-interface=lo --interface=ipsec0 --dhcp-leasefile=/var/lib/dnsmasq.project.leases --dhcp-authoritative --listen-address 192.168.15.113 --dhcp-range 192.168.15.101,192.168.15.199 --dhcp-lease-max=252 --bind-interfaces --dhcp-option-force=option:router,192.168.15.254 --dhcp-option-force=option:ntp-server,192.168.15.113
        down kill $(cat /run/dnsmasq-project.pid)

#allow-hotplug eno2
auto eno2
iface eno2 inet static
        address 192.168.100.100
        netmask 255.255.255.0
        #link-speed 100
        #link-duplex full
        #ethernet-autoneg off
        #gateway
        # dns-* options are implemented by the resolvconf package, if installed
        #dns-nameservers 10.10.200.10
        #dns-search proj...

Read more...

costinel (costinel) wrote :
costinel (costinel) wrote :

this is workaround for me that runs correctly:

in rc.local:

(systemctl stop networking.service; systemctl start networking.service; systemctl restart networking.service; true) &

if i put the commands one by one, rc.local unit will fail because systemctl start exits non-zero. notice the 'restart' action for the second command.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers