networking stop incorrectly disconnects from (network) root filesystem

Bug #1629972 reported by LaMont Jones on 2016-10-03
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Undecided
Unassigned
ifupdown (Debian)
Fix Released
Unknown
ifupdown (Ubuntu)
High
Scott Moser
Xenial
High
Scott Moser
Yakkety
Medium
Unassigned

Bug Description

=== Begin SRU Template ===
[Impact]
The systemd networking.service unit will bring down the loopback device (lo)
when it is stopped. This behavior differs from the behavior in other
Ubuntu releases (upstart's networking.conf), where 'lo' is not taken down.

The problem that was seen was that iscsi root over ipv6 would hang on
shutdown.

[Test Case]
Test is fairly simple and can be demonstrated in lxc container.
The key is really that the lo device should not have its link set down
after stopping networking.service. So, below:
  out=$(ip address show dev lo up); [ -n "$out" ] && echo "$out" || echo empty

should not show 'empty', but should have LOOPBACK,UP,LOWER in its output.

$ release=yakkety; name=y1
$ lxc launch ubuntu-daily:$release $name
$ sleep 10
$ lxc exec $name /bin/bash

## show only things that are up (note output has LOOPBACK,UP,LOWER_UP)
% ip link show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

% ip address show dev lo up
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever

## Stop the service and show lo link is down (no 'UP' or 'LOWER_UP').
% systemctl stop networking.service
% ip link show dev lo
1: lo: <LOOPBACK> mtu 65536 qdisc noqueue state DOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
% out=$(ip address show dev lo up); [ -n "$out" ] && echo "$out" || echo empty
empty

## Now enable proposed, install update, reboot and show.
% rel=$(lsb_release -sc)
% echo "deb http://archive.ubuntu.com/ubuntu $rel-proposed main" |
    sudo tee /etc/apt/sources.list.d/proposed.list
% sudo apt update -qy && sudo apt install -qy ifupdown </dev/null
% dpkg-query --show ifupdown
ifupdown 0.8.13ubuntu3
% sudo reboot

## in rebooted system
% ip link show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
% ip address show dev lo up
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
% systemctl stop networking.service
% ip link show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
% out=$(ip address show dev lo up); [ -n "$out" ] && echo "$out" || echo empty
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever

[Regression Potential]
Should be pretty low. zesty and yakkety-proposed have this.
Taking down 'lo' is often cause of problems, and never the solution to
problems as far as I'm aware.

[Other Info]

=== End SRU Template ===

With the switch to systemd, all support for iscsi root (and other) filesystems disappeared, since shutdown yanks the rug out from under us.

Rather than just relying on /etc/iscsi/iscsi.initramfs (which d-i creates..), the DEV check should be expanded to include iscsi devices, and networking.service ExecStop should honor those checks.

Related bugs:
  * bug 1229458: grub2 needed changes
  * bug 1621615: network not configured when ipv6 netbooted into cloud-init
  * bug 1621507: ipv6 network boot does not work

LaMont Jones (lamont) on 2016-10-03
description: updated
Changed in maas:
status: New → Triaged
milestone: none → 2.1.0
LaMont Jones (lamont) wrote :

ifupdown is actually behaving correctly in this case. It's likely that cloud-initramfs-tools should be marking the interface "iface foo inet[6] manual" in every case, since that's the indicator that seems to prevent ifdown from downing the interface.

When I had it do that, what I then found was that we had gotten to "reached shutdown target", and the system was then having timeouts in systemd watchdogs. Looking at it with systemd's debug shell, /media/root-ro (the iscsi volume) has been unmounted, and /media/root-rw (the overlayfs) is still mounted.

Interestingly, the last few things are:

[ OK ] Unmounted /media/root-rw.
[ OK ] Reached target Unmount All Filesystems.
[ OK ] Stopped target Local File Systems (Pre).
         Stopped Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling...
[ OK ] Stopped Remount Root and Kernel File Systems
[ OK ] Stopped Create Static Device Nodes in /dev.
[ OK ] Reached target Shutdown

5 seconds later is the "connection 1:0: ping timeout of 5 secs expired, ....
and / is still (or is that "again") mounted on overlayfs rw,relatime,lowerdir=/media/root-ro,upperdir=/media/root-rw//overlay,workdir=/media/root-rw//overlay-workdir

And then there is the collection of "INFO: task systemd:1 blocked for more than 120 seconds." log entries on the console.

Changed in ifupdown (Ubuntu):
assignee: LaMont Jones (lamont) → nobody
Scott Moser (smoser) on 2016-10-12
Changed in ifupdown (Ubuntu):
status: New → Confirmed
importance: Undecided → High
Changed in ifupdown (Ubuntu Xenial):
status: New → Confirmed
importance: Undecided → High
Scott Moser (smoser) wrote :
Scott Moser (smoser) wrote :

I've subscribed pitti, in an effort to a.) have him review my changes b.) ask if there is any other way to do this. i've verified in ifupdown source that 'ifdown -a' will ultimately get '/bin/ip link set dev eth0 down' called.

Martin Pitt (pitti) wrote :

+ExecStop=/bin/sh -c 'for f in "$@"; do [ -e "$f" ] || continue; echo "$f existed."; exit 1; done; exit 0' -- /run/initramfs/open-iscsi.interface /run/network/network-root-fs
 ExecStop=/sbin/ifdown -a --read-environment

This will leave networking.service in "failed" state on stop -- *if* you do this hack, then please just put it into the existing ExecStop= line, and simplify this:

 ExecStop=/bin/sh -c '[ -e /run/initramfs/open-iscsi.interface /run/network/network-root-fs ] || ifdown -a --read-environment'

However, this is still a nasty hack. Why would ifdown shut down the interface in the first place? If you use iSCSI, then /e/n/i should *not* have a "dhcp" stanza (and an "auto enXXX) for that interface, but a "manual" one, so that the declaration and behavior don't fight with each other.

I. e.:

 - If ifdown downs a "manual" interface, then let's fix that properly.
 - If /e/n/i still has a "dhcp" declaration, then drop that (I thought that got fixed ages ago already).

Changed in maas:
milestone: 2.1.0 → 2.1.1
Paul Graydon (pgraydon-oracle) wrote :

I ran into this with iSCSI root. Installing cloud-init resulted in problems rebooting.

If I install the ifupdown patch (without cloud-init installed), that triggers the bug for me.

Spinning up a clean instance, installing cloud-init, letting it run (so the conditions that cause the bug are triggered), then installing the ifupdown package also fails to fix the reboot problems. I've confirmed this persists after the next boot as well.

Scott Moser (smoser) wrote :

It seems that the issue here is networking.service does:
  /sbin/ifdown -a --read-environment
but it should do:
  ifdown -a --exclude=lo

As is seen in the equivalent upstart job (/etc/init/networking.conf).

$ m="ifdown -a --read-environment";
$ sed -i "s,$m, --exclude=lo," /lib/systemd/system/networking.service
$ systemctl daemon-reload
$ sudo poweroff

Demonstration below.

## http://paste.ubuntu.com/23319833/
## ipv6 networking is broken by
## ip link set down dev lo
##
## This seems to be the root cause of bug 1629972 (http://pad.lv/1629972)
## where system networking (networking.service) is brought down on
## shutdown and causes hang.
##
## Note, that it is the link that is the issue, there is actually
## no problem seen in 'ping6' by either:
## ip addr del 127.0.0.1/8 dev lo
## ip addr del ::1/128 dev lo

$ for n in y1 y2; do lxc delete --force $n; lxc launch ubuntu-daily:yakkety $n; done
error: not found
Creating y1
Starting y1
error: not found
Creating y2
Starting y2

$ for n in y1 y2; do lxc exec $n -- dhclient -6 -1 -v eth0; done

$ lxc list y[12]
+------+---------+----------------------+------------------------------------------------+------------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+---------+----------------------+------------------------------------------------+------------+-----------+
| y1 | RUNNING | 10.75.205.102 (eth0) | fd42:eee5:7c43:3d62:c7fd:6ccc:6181:8d48 (eth0) | PERSISTENT | 0 |
+------+---------+----------------------+------------------------------------------------+------------+-----------+
| y2 | RUNNING | 10.75.205.5 (eth0) | fd42:eee5:7c43:3d62:f285:69dd:7210:c0a5 (eth0) | PERSISTENT | 0 |
+------+---------+----------------------+------------------------------------------------+------------+-----------+

$ for n in y1 y2; do
   lxc exec $n -- ip route add fd42:eee5:7c43:3d62::0/64 dev eth0

$ lxc exec y1 -- ping6 fd42:eee5:7c43:3d62:f285:69dd:7210:c0a5
PING fd42:eee5:7c43:3d62:f285:69dd:7210:c0a5(fd42:eee5:7c43:3d62:f285:69dd:7210:c0a5) 56 data bytes
64 bytes from fd42:eee5:7c43:3d62:f285:69dd:7210:c0a5: icmp_seq=1 ttl=64 time=0.133 ms
64 bytes from fd42:eee5:7c43:3d62:f285:69dd:7210:c0a5: icmp_seq=2 ttl=64 time=0.061 ms
64 bytes from fd42:eee5:7c43:3d62:f285:69dd:7210:c0a5: icmp_seq=3 ttl=64 time=0.069 ms
...

<window 2>
$ lxc exec y1 -- ip link set down dev lo

immediately ping6 starts dropping packets in window1.

Changed in ifupdown (Ubuntu):
assignee: nobody → Scott Moser (smoser)
Changed in ifupdown (Ubuntu Xenial):
assignee: nobody → Scott Moser (smoser)
Martin Pitt (pitti) wrote :

> ifdown -a --exclude=lo

Indeed, nicely spotted! So this would make a difference on machines which still have "lo" in /etc/network/interfaces (which hasn't been necessary/recommended since 2014), but I figure we still have too many places (installer/cloud image builder etc.) which write that. So adding --exclude=lo is correct for sure (even it might not completely fix this bug).

Martin Pitt (pitti) wrote :

I don't see a matching Debian bug report, so please forward this fix to Debian as well.

Changed in ifupdown (Debian):
status: Unknown → New
Scott Moser (smoser) wrote :

This is currently in the yakkety-proposed queue at 0.8.13ubuntu3 [1]
and the zesty proposed queue at 0.8.13ubuntu4 [2]

--
[1] https://launchpad.net/ubuntu/yakkety/+queue?queue_state=1&queue_text=ifupdown
[2] https://launchpad.net/ubuntu/zesty/+queue?queue_state=1&queue_text=ifupdown

Changed in ifupdown (Ubuntu Yakkety):
importance: Undecided → Medium
status: New → Confirmed

Hello LaMont, or anyone else affected,

Accepted ifupdown into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ifupdown/0.8.13ubuntu3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in ifupdown (Ubuntu Yakkety):
status: Confirmed → Fix Committed
tags: added: verification-needed
Changed in ifupdown (Ubuntu):
status: Confirmed → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ifupdown - 0.8.13ubuntu4

---------------
ifupdown (0.8.13ubuntu4) zesty; urgency=medium

  * no-change rebuild for upload to zesty.

ifupdown (0.8.13ubuntu3) yakkety-proposed; urgency=medium

  * networking.service: exclude loopback device lo in ExecStop (LP: #1629972)

 -- Scott Moser <email address hidden> Thu, 20 Oct 2016 12:40:08 -0400

Changed in ifupdown (Ubuntu):
status: Fix Committed → Fix Released
Changed in maas:
milestone: 2.1.1 → 2.1.2
Changed in maas:
milestone: 2.1.2 → 2.1.3
Scott Moser (smoser) wrote :

I've verified that networking.service does not take down 'lo' interface
with the proposed version in yakkety (0.8.13ubuntu3).

Below, first I show the problem in 0.8.13ubuntu2 and then install
the proposed version and show it fixed.

$ lxc launch ubuntu-daily:yakkety y1
$ sleep 10
$ lxc exec y1 /bin/bash

% dpkg-query --show ifupdown
ifupdown 0.8.13ubuntu2

## show only things that are up (note output has LOOPBACK,UP,LOWER_UP)
% ip link show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

% ip address show dev lo up
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever

## Stop the service and show lo link is down (no 'UP' or 'LOWER_UP').
% systemctl stop networking.service
% ip link show dev lo
1: lo: <LOOPBACK> mtu 65536 qdisc noqueue state DOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
% out=$(ip address show dev lo up); [ -n "$out" ] && echo "$out" || echo empty
empty

## Now enable proposed, install update, reboot and show.
% rel=$(lsb_release -sc)
% echo "deb http://archive.ubuntu.com/ubuntu $rel-proposed main" |
    sudo tee /etc/apt/sources.list.d/proposed.list
% sudo apt update -qy && sudo apt install -qy ifupdown </dev/null
% dpkg-query --show ifupdown
ifupdown 0.8.13ubuntu3
% sudo reboot

## in rebooted system
% ip link show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
% ip address show dev lo up
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
% systemctl stop networking.service
% ip link show dev lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
% out=$(ip address show dev lo up); [ -n "$out" ] && echo "$out" || echo empty
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever

tags: added: verification-done
removed: verification-needed
Scott Moser (smoser) on 2016-11-30
Changed in ifupdown (Ubuntu Xenial):
status: Confirmed → In Progress
Scott Moser (smoser) wrote :

I've adjusted the description, including SRU template of this bug to better fit what we found out and the solution that was applied.

description: updated
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ifupdown - 0.8.13ubuntu3

---------------
ifupdown (0.8.13ubuntu3) yakkety-proposed; urgency=medium

  * networking.service: exclude loopback device lo in ExecStop (LP: #1629972)

 -- Scott Moser <email address hidden> Fri, 14 Oct 2016 13:53:32 -0400

Changed in ifupdown (Ubuntu Yakkety):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for ifupdown has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Robie Basak (racb) wrote :

This is tangled up into the MAAS IPv6 SRU happening in bug 1621507. Please check that bug for verification-done before accepting ifupdown into xenial-updates.

Hello LaMont, or anyone else affected,

Accepted ifupdown into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ifupdown/0.8.10ubuntu1.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in ifupdown (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: removed: verification-done
tags: added: verification-needed
LaMont Jones (lamont) on 2016-12-19
tags: added: verification-done
removed: verification-needed
LaMont Jones (lamont) wrote :

Verification was done by using MAAS to enlist, commission, and deploy xenial (with the fix) on a machine. Previously, it failed to shut down, because the root disk got unmounted.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ifupdown - 0.8.10ubuntu1.2

---------------
ifupdown (0.8.10ubuntu1.2) xenial; urgency=medium

  * networking.service: exclude loopback device lo in ExecStop (LP: #1629972)
    This prevents the stop of networking.service from taking down the
    loopback 'lo' interface.

 -- Scott Moser <email address hidden> Wed, 30 Nov 2016 12:23:26 -0500

Changed in ifupdown (Ubuntu Xenial):
status: Fix Committed → Fix Released
Gavin Panella (allenap) on 2016-12-20
Changed in maas:
status: Triaged → Confirmed
Changed in ifupdown (Debian):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.