229-4ubuntu20 added ARP option breaks existing bonding interfaces

Bug #1727301 reported by Markus Schade
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
systemd
Fix Released
Unknown
systemd (Ubuntu)
Invalid
Critical
Unassigned
Xenial
Fix Released
Critical
Dimitri John Ledkov
Zesty
Invalid
Critical
Unassigned
Artful
Invalid
Critical
Unassigned
Bionic
Invalid
Critical
Unassigned

Bug Description

[Impact]

 * Setting [Link] MTUBytes= in .network file has a side-effect of overflowing and setting NOARP flag. This is not intended behaviour / regression.

 * Trying to fix above by setting ARP=on fails to work as tristate is incorrectly acted upon by unconditionally adding NOARP flag

 * This is a regression in -updates.

[Affected Users]

 * Those who use networkd

 * Do not use netplan (as that sets mtubytes in the .link files, not in the .network)

 * Specify MTUBytes in .network file (not in the .link file)

[Test Case]

 * Configure an ethernet device with a .network file alone
 * e.g. Match by mac address and perform DHCP
 * Add [Link] section to the .network file which changes MTUBytes
 * Device brought up using this configuration should not have NOAPR flag in the output of iproute link output

 * Further add ARP=off to that .network file, the link should have NOARP flag
 * Further add ARP=on to that .network file, the link should not have NOARP flag

A test script is attached, that given an interface can abuse it to validate all of the above.

[Regression Potential]

 * These are upstream fixes for ARP= key that are part of zesty and up

[Other Info]

 * Upstream fixes
https://github.com/systemd/systemd/commit/b8b40317d0355bc70bb23a6240a36f3630c4952b.patch
https://github.com/systemd/systemd/commit/1ed1f50f8277df07918e13cba3331a114eaa6fe3.patch

 * Original bug report

this breaks existing configurations with bonding on upgrading from 229-4ubuntu19 to 229-4ubuntu20 on xenial

as bond interfaces are now by default configured without ARP. Hence you suddenly lose network connectivity on upgrade. Very bad for a SRU.

Plus adding "ARP=yes" to the Link section of a .network file does not work.

Before this update, bond interfaces (specifically 802.3ad) were defaulting to ARP enabled. After the upgrade, they are created with NOARP set on the link.

pre-upgrade:

eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP>
eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP>
bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP>

post-upgrade:
eth0: <BROADCAST,MULTICAST,NOARP,SLAVE,UP,LOWER_UP>
eth1: <BROADCAST,MULTICAST,NOARP,SLAVE,UP,LOWER_UP>
bond0: <BROADCAST,MULTICAST,NOARP,MASTER,UP,LOWER_UP>

Linux cnode11 4.4.0-97-generic #120-Ubuntu SMP Tue Sep 19 17:28:18 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

description: updated
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

How are eth0/eth1/bond0 configured?
Do you use netplan and networkd?
Do you use ifupdown?

Can you paste the output of $ networkctl?

Copies of the configuration e.g.:

/etc/network/interfaces
/etc/network/interfaces.d/*
/etc/netplan/*
/run/systemd/network/*
/etc/systemd/network/*

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Revision history for this message
Markus Schade (lp-markusschade) wrote :
Download full text (3.5 KiB)

the networking is configured via systemd-networkd.
bonding module is loaded with 'max_bonds=0' to address upcoming systemd change
https://github.com/systemd/systemd/issues/6184

pre-upgrade:

# networkctl
IDX LINK TYPE OPERATIONAL SETUP
  1 lo loopback carrier unmanaged
  2 eth0 ether carrier configuring
  3 eth1 ether carrier configuring
  4 bond0 ether routable configured
  5 bond0.200 ether routable configured

# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 5a:42:ff:c5:26:61 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 5a:42:ff:c5:26:61 brd ff:ff:ff:ff:ff:ff
4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 5a:42:ff:c5:26:61 brd ff:ff:ff:ff:ff:ff
5: bond0.200@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 5a:42:ff:c5:26:61 brd ff:ff:ff:ff:ff:ff

interfaces(.d) and netplan are empty and unused/disabled

/etc/systemd/network:

# cat eth1.network
[Match]
Name=eth1

[Network]
Bond=bond0

# cat eth0.network
[Match]
Name=eth0

[Network]
Bond=bond0

# cat bond0.netdev
[NetDev]
Name=bond0
Kind=bond

[Bond]
Mode=802.3ad
MIIMonitorSec=0.1s
LACPTransmitRate=fast
UpDelaySec=0.2s
DownDelaySec=0.2s

# cat bond0.network
[Match]
Name=bond0

[Address]
Address=192.168.1.100/24

[Route]
Gateway=192.168.1.1

[Network]
VLAN=bond0.200

[Link]
MTUBytes=9000

# cat bond0.200.netdev
[NetDev]
Name=bond0.200
Kind=vlan

[VLAN]
Id=200

# cat bond0.200.network
[Match]
Name=bond0.200

[Address]
Address=10.10.0.100/16

[Route]
Gateway=10.10.0.1

[Link]
MTUBytes=9000

After upgrade:

# networkctl
IDX LINK TYPE OPERATIONAL SETUP
  1 lo loopback carrier unmanaged
  2 eth0 ether carrier configuring
  3 eth1 ether carrier configuring
  4 bond0 ether routable configured
  5 bond0.200 ether routable configured

However, the link of eth0, eth1 and the bond and vlan interface changes to NOARP

# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,NOARP,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 5a:42:ff:c5:26:61 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,NOARP,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 5a:42:ff:c5:26:61 br...

Read more...

Revision history for this message
Tobias Wolf (towolf) wrote :

We are also affected by this suddenly after systemd upgrade the network was gone, had to go in via serial console.

We are only using networkd because Martin Pitt said in the run-up to the 16.04 release that networkd would be supported and to be used in the next LTS release.

# networkctl
IDX LINK TYPE OPERATIONAL SETUP
  1 lo loopback carrier unmanaged
  2 eth0 ether carrier configured
  3 eth1 ether carrier configured
  4 eth2 ether carrier configured
  5 eth3 ether carrier configured
  6 eth4 ether no-carrier configured
  7 eth5 ether no-carrier configured
  8 bond0 ether off unmanaged
  9 bond1 ether routable configured

9 links listed.

# head /etc/systemd/network/* -n 20
==> /etc/systemd/network/bond1.netdev <==
[NetDev]
Name=bond1
Kind=bond

[Bond]
Mode=802.3ad
MIIMonitorSec=100
TransmitHashPolicy=layer3+4
LACPTransmitRate=1

==> /etc/systemd/network/bond1.network <==
[Match]
Name=bond1

[Link]
MTUBytes=9000

[Network]
LinkLocalAddressing=no

[Network]
Address=10.230.0.4/22
Gateway=10.230.0.1

==> /etc/systemd/network/eth.network <==
[Match]
Name=eth*

[Network]
Bond=bond1

Revision history for this message
Dennis Kuhn (d.kuhn) wrote :

We have the same problem without bond configured. After upgrading to systemd 229-4ubuntu20 NOARP is set to our network interfaces. The machine has two network interfaces and the configuration is very simple:

# 50-ens5f0.network

[Match]
MACAddress=24:8a:7:11:50:a8

[Link]
MTUBytes=9000

[Network]
Address=10.32.34.34/30
DHCP=no
LinkLocalAddressing=no
IPv6AcceptRA=no

# 50-ens5f1.network
[Match]
MACAddress=24:8a:7:11:50:a9

[Link]
MTUBytes=9000

[Network]
Address=10.32.33.126/30
DHCP=no
LinkLocalAddressing=no
IPv6AcceptRA=no

# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens5f0: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 24:8a:07:11:50:a8 brd ff:ff:ff:ff:ff:ff
3: ens5f1: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 24:8a:07:11:50:a9 brd ff:ff:ff:ff:ff:ff

Revision history for this message
Dennis Kuhn (d.kuhn) wrote :

I could solve this issue by rebuilding the debian packages without two patches:

- networkd-add-support-to-configure-NOARP-ARP-for-interface.patch
- networkd-bond-support-primary-slave-and-active-slave-4873.patch

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

I am suspecting the following cherrypick is missing in xenial:

https://github.com/systemd/systemd/commit/1ed1f50f8277df07918e13cba3331a114eaa6fe3.patch

will prepare a PPA with this fix applied, and will expedite an sru with this fix in.

description: updated
Changed in systemd (Ubuntu):
importance: Undecided → Critical
Changed in systemd (Ubuntu Xenial):
status: New → In Progress
importance: Undecided → Critical
Changed in systemd (Ubuntu Zesty):
status: New → Fix Released
Changed in systemd (Ubuntu Artful):
status: New → Fix Released
Changed in systemd (Ubuntu Bionic):
status: Confirmed → Fix Released
Changed in systemd (Ubuntu Xenial):
assignee: nobody → Dimitri John Ledkov (xnox)
milestone: none → xenial-updates
Changed in systemd:
status: Unknown → Fix Released
Revision history for this message
Dimitri John Ledkov (xnox) wrote :
Download full text (3.8 KiB)

I have packages built with above patch applied, which I suspect is the missing portion to fix this.
It is available from this ppa https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3015
$ sudo add-apt-repository ppa:ci-train-ppa-service/3015
$ sudo apt-get update

I am failing to reproduce the reported regression, with quite similar setup. (See below).

Am I missing some sysctls to reproduce the regression? Or are my bonds/vlans/ethernet not suitable to trigger this regression?

# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp3s0f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 4a:fd:00:df:13:a3 brd ff:ff:ff:ff:ff:ff
3: enp3s0f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 4a:fd:00:df:13:a3 brd ff:ff:ff:ff:ff:ff
4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 4a:fd:00:df:13:a3 brd ff:ff:ff:ff:ff:ff
5: bond0.204@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether bc:76:4e:20:69:4c brd ff:ff:ff:ff:ff:ff
6: bond0.104@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether bc:76:4e:20:68:3d brd ff:ff:ff:ff:ff:ff

# dpkg -l | grep 229-4ubuntu20
ii libpam-systemd:amd64 229-4ubuntu20 amd64 system and service manager - PAM module
ii libsystemd0:amd64 229-4ubuntu20 amd64 systemd utility library
ii libudev1:amd64 229-4ubuntu20 amd64 libudev shared library
ii systemd 229-4ubuntu20 amd64 system and service manager
ii systemd-sysv 229-4ubuntu20 amd64 system and service manager - SysV links
ii udev 229-4ubuntu20 amd64 /dev/ and hotplug management daemon

==> 10-netplan-bond0.104.netdev <==
[NetDev]
Name=bond0.104
MACAddress=bc:76:4e:20:68:3d
Kind=vlan

[VLAN]
Id=104

==> 10-netplan-bond0.104.network <==
[Match]
Name=bond0.104

[Network]
Address=172.99.85.226/30
Address=2001:4802:78fd:33:be76:4eff:fe20:683d/64

[Route]
Destination=0.0.0.0/0
Gateway=172.99.85.225

[Route]
Destination=::/0
Gateway=2001:4802:78fd:33::1

==> 10-netplan-bond0.204.netdev <==
[NetDev]
Name=bond0.204
MACAddress=bc:76:4e:20:69:4c
Kind=vlan

[VLAN]
Id=204

==> 10-netplan-bond0.204.network <==
[Match]
Name=bond0.204

[Network]
Address=10.184.228.158/30

[Route]
Destination=10.176.0.0/12
Gateway=10.184.228.157

[Route]
Destination=10.208.0.0/12
Gateway=10.184.228.157

==> 10-netplan-bond0.netdev <==
[NetDev]
Name=bond0
Kind=bond

[Bond]
Mode=802.3ad
MIIMonitorSec=100
TransmitHashPolicy=layer3+4

==> 10-netplan...

Read more...

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

aha, i need to specify anything that requires to call the code path to adjust any flags from non default.

Changed in systemd (Ubuntu Zesty):
status: Fix Released → Confirmed
Changed in systemd (Ubuntu Bionic):
status: Fix Released → Confirmed
Changed in systemd (Ubuntu Zesty):
importance: Undecided → Critical
Changed in systemd (Ubuntu Artful):
status: Fix Released → Confirmed
importance: Undecided → Critical
Revision history for this message
Markus Schade (lp-markusschade) wrote :

The ppa package works for my use case. Maybe add this test-case to QA.

Changed in systemd (Ubuntu Zesty):
status: Confirmed → Invalid
Changed in systemd (Ubuntu Bionic):
status: Confirmed → Invalid
Changed in systemd (Ubuntu Artful):
status: Confirmed → Invalid
description: updated
Revision history for this message
Dimitri John Ledkov (xnox) wrote :
description: updated
Revision history for this message
Andy Whitcroft (apw) wrote : Please test proposed package

Hello Markus, or anyone else affected,

Accepted systemd into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/229-4ubuntu21 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in systemd (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-xenial
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

$ dpkg -l | grep 229-4
ii libpam-systemd:amd64 229-4ubuntu21 amd64 system and service manager - PAM module
ii libsystemd-dev:amd64 229-4ubuntu21 amd64 systemd utility library - development files
ii libsystemd0:amd64 229-4ubuntu21 amd64 systemd utility library
ii libudev-dev:amd64 229-4ubuntu21 amd64 libudev development files
ii libudev1:amd64 229-4ubuntu21 amd64 libudev shared library
ii systemd 229-4ubuntu21 amd64 system and service manager
ii systemd-sysv 229-4ubuntu21 amd64 system and service manager - SysV links
ii udev 229-4ubuntu21 amd64 /dev/ and hotplug management daemon

Executed the attached regression-apr.sh script against my eht0 interface, all 4 test cases reported good. Previously at least one of them failed.

tags: added: verification-done verification-done-xenial
removed: verification-needed verification-needed-xenial
description: updated
Robie Basak (racb)
tags: added: regression-update
removed: regression-updates
Revision history for this message
Markus Schade (lp-markusschade) wrote :

the updates-proposed build works in my case as well

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 229-4ubuntu21

---------------
systemd (229-4ubuntu21) xenial; urgency=medium

  * networkd: do not uncoditionally apply NOARP.
  * networkd: fix size of MTUBytes so that it does not overwrites ARP.
  * Fixes regression-updates LP: #1727301

 -- Dimitri John Ledkov <email address hidden> Fri, 27 Oct 2017 09:21:18 +0100

Changed in systemd (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Steve Langasek (vorlon) wrote : Update Released

The verification of the Stable Release Update for systemd has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.