bond interfaces stop working after restart of systemd-networkd

Bug #1833671 reported by Tom Hughes on 2019-06-21
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
systemd
Unknown
Unknown
systemd (Ubuntu)
Medium
Unassigned
Bionic
Medium
Dan Streetman

Bug Description

[impact]

restarting systemd-networkd drops carrier on all bond slaves, temporarily interrupting networking over the bond.

[test case]

on a bionic system with 2 interfaces that can be put into a bond, create config files such as:

root@lp1833671:~# cat /etc/systemd/network/10-bond0.netdev
[NetDev]
Name=bond0
Kind=bond

root@lp1833671:~# cat /etc/systemd/network/20-ens8.network
[Match]
Name=ens8

[Network]
Bond=bond0

root@lp1833671:~# cat /etc/systemd/network/20-ens9.network
[Match]
Name=ens9

[Network]
Bond=bond0

root@lp1833671:~# cat /etc/systemd/network/30-bond0.network
[Match]
Name=bond0

[Network]
Address=1.2.3.4/32

restart networkd, or reboot, and verify the bond is up:

root@lp1833671:~# ip a
3: ens8: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc fq_codel master bond0 state UP group default qlen 1000
    link/ether 42:30:62:cc:36:2b brd ff:ff:ff:ff:ff:ff
4: ens9: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc fq_codel master bond0 state UP group default qlen 1000
    link/ether 42:30:62:cc:36:2b brd ff:ff:ff:ff:ff:ff
5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 42:30:62:cc:36:2b brd ff:ff:ff:ff:ff:ff
    inet 1.2.3.4/32 scope global bond0
       valid_lft forever preferred_lft forever
    inet6 fe80::4030:62ff:fecc:362b/64 scope link
       valid_lft forever preferred_lft forever

restart networkd and check /var/log/syslog:

root@lp1833671:~# systemctl restart systemd-networkd
root@lp1833671:~# cat /var/log/syslog
...
Jul 23 21:08:07 lp1833671 systemd-networkd[1805]: ens9: Lost carrier
Jul 23 21:08:07 lp1833671 systemd-networkd[1805]: ens8: Lost carrier
Jul 23 21:08:07 lp1833671 systemd-networkd[1805]: ens9: Gained carrier
Jul 23 21:08:07 lp1833671 systemd-networkd[1805]: ens8: Gained carrier

[regression potential]

this changes how bond slaves are managed, so regressions could affect any configurations using bonding.

[other info]

the patch is already included in d, and ifupdown manages networking in x, so this is needed only for b.

[original description]

Running systemd-networkd from systemd 237-3ubuntu10.23 on Ubuntu 18.04.2 I have one machine where, every time systemd-networkd restarts (ie every time there is an update to systemd) the bond0 interface stops working.

I see both physical interfaces go soft down and then come back again:

Jun 21 07:28:24 odin.openstreetmap.org systemd[1]: systemd 237 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SEC
Jun 21 07:28:24 odin.openstreetmap.org systemd[1]: Detected architecture x86-64.
Jun 21 07:28:24 odin.openstreetmap.org kernel: bond0: link status down for backup interface eno2, disabling it in 200 ms
Jun 21 07:28:24 odin.openstreetmap.org kernel: bond0: link status down for active interface eno1, disabling it in 200 ms
Jun 21 07:28:24 odin.openstreetmap.org kernel: 8021q: adding VLAN 0 to HW filter on device eno2
Jun 21 07:28:25 odin.openstreetmap.org kernel: 8021q: adding VLAN 0 to HW filter on device eno1
Jun 21 07:28:25 odin.openstreetmap.org kernel: bond0: link status up again after 200 ms for interface eno2
Jun 21 07:28:25 odin.openstreetmap.org kernel: bond0: link status up again after 100 ms for interface eno1

and after that nothing until I stop systemd-networkd, delete the bond interface, and then start systemd-networkd again.

On most machines the cycle seems to take a bit longer and the interfaces reach a hard down start before coming back and in that case there seems to be no problem.

I think this is likely an instance of this upstream bug:

https://github.com/systemd/systemd/issues/10118

which has a fix here:

https://github.com/systemd/systemd/pull/10465

Related branches

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed

This bug also affects my company. Please integrate the fix soon.

Dan Streetman (ddstreet) on 2019-07-23
Changed in systemd (Ubuntu):
status: Confirmed → Fix Released
Changed in systemd (Ubuntu Bionic):
assignee: nobody → Dan Streetman (ddstreet)
importance: Undecided → Medium
status: New → In Progress
Dan Streetman (ddstreet) on 2019-07-23
tags: added: ddstreet-next systemd
Dan Streetman (ddstreet) on 2019-07-23
description: updated

Hello Tom, or anyone else affected,

Accepted systemd into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/237-3ubuntu10.25 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in systemd (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-bionic
Tom Hughes (tomhughes) wrote :

I've just updated the two machines where I was seeing this to 237-3ubuntu10.25 and in both cases the update was successful and managed to complete without disconnecting the network.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Changed in systemd (Ubuntu):
importance: Undecided → Medium

I can confirm as well, 237-3ubuntu10.25 fixes the issue for me.

Dan Streetman (ddstreet) wrote :

autopkgtest analysis for this upload in bug 1835581

Dan Streetman (ddstreet) on 2019-08-06
tags: removed: ddstreet-next
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 237-3ubuntu10.25

---------------
systemd (237-3ubuntu10.25) bionic; urgency=medium

  [ Dan Streetman ]
  * d/p/lp1835581-src-network-networkd-dhcp4.c-set-prefsrc-for-classle.patch:
    - set src address for dhcp 'classless' routes (LP: #1835581)
  * d/p/lp1833671-networkd-keep-bond-slave-up-if-already-attached.patch:
    - keep bond slave up if already attached (LP: #1833671)

  [ Jorge Niedbalski ]
  * d/p/lp1668771-resolved-switch-cache-option-to-a-tri-state-option-s.patch:
    Allows cache=no-negative option to be set, ignoring negative
    answers to be cached (LP: #1668771).

 -- Dan Streetman <email address hidden> Mon, 22 Jul 2019 12:45:02 -0400

Changed in systemd (Ubuntu Bionic):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for systemd has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.