Bridge MTU not applied at first boot, only after netplan apply/networkctl reload

Bug #2034099 reported by Trent Lloyd
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
In Progress
Medium
Trent Lloyd

Bug Description

If you set a Bridge's MTU to a specific value using netplan/networkd which is different to the MTU of the interface you add, it does not appear to be applied correctly during boot.

However the MTU switches to the correct value when you re-apply the configuration with "netplan apply" (or "systemctl restart systemd-networkd"; "networkctl reload")

Additionally:
- It does apply correctly when you create the bridge for the first time (after boot), I did not debug why.
- When deploying the same configuration with Juju & MAAS it works after first boot (likely because juju modifies the netplan config and runs netplan apply) but still fails on reboot.

== Cause ==

Linux bridges, by default, "auto-tune" the bridge MTU to MIN(mtu) of all member ports. This is updated each time an interface is added or removed (except with vlan aware mode, which uses MAX(mtu) instead).

If the user changes the MTU of the bridge explicitly, it disables this behaviour since 804b854d374e ("net: bridge: disable bridge MTU auto tuning if it was set manually") which landed in v4.17.

This is the expected behaviour, unfortunately that code has a bug in that it only works if the bridge MTU is changed after it was created. If the bridge MTU is set during bridge creation by passing IFLA_MTU (as systemd-networkd and network-manager do), it does not disable auto-tuning. Subsequently when the interface is added to the bridge, the MTU is auto-tuned to match.

So while systemd-networkd is setting this MTU initially, it is almost immediately changed by the auto-tuning when the member port is added. If you reload, it notices the MTU doesn't match and changes it again. This change then triggers the relevant code to disable auto-tuning - so it won't change again after that.

To verify this, you can examine the state of BROPT_MTU_SET_BY_USER with this drgn script:
https://gist.github.com/lathiat/7a3cace35bd28413822c362f76ad2f1a

You can also capture the MTU being set at creation with this bpftrace script:
https://gist.github.com/lathiat/1624723ceef8d17239ae450f03c8eb3b

It can also be helpful to set the following in a drop-in override using "systemctl edit systemd-networkd":
[Service]
Environment=SYSTEMD_LOG_LEVEL=debug

== Use Case ==

The specific use case that found this bug was wanting to have a VLAN with MTU 9000 and the default untagged VLAN interface with MTU 1500. For a VLAN sub-interface (e.g. eth0.42) to have MTU 9000 the parent interface (eth0) must also have MTU 9000. Even if that's not what you actually wanted.

One way to work around this is to create eth0 and eth0.42 with MTU 9000 but then create br0 with MTU 1500 containing eth0. Because we are intentionally setting the MTU of br0 to a value other than the member interfaces the bug is triggered.

However this also happens in other cases and this bug has often not been noticed because either

(a) If the same MTU is desired on the bridge and member ports, setting the MTU of the member ports to match result in auto-tuning still happening but to the desired value.

(b) When member ports are created by other tools (e.g. virtualisation software such as LXD) they often (but not always) clone the MTU of the bridge to the interface before adding it, specifically to avoid this behaviour. However not all software clones the bridge MTU before adding an interface (so the bridge MTU will get reduced to 1500 when a higher value is desired).

(c) ifupdown specifically sets MTU after creation (using ip link X set mtu N), so does not experience this bug, unlike networkd/Network Manager which set MTU during creation with IFLA_MTU.

== Test Case ==

# Notes:
# - This requires a multi-core VM to reproduce reliably. Single core VMs frequently fail to reproduce it.
# - Requires a spare interface separate to the primary interface, in the exampel we use eth2.
# - You can take the netplan generated configs from /run/systemd/network and use them directly in /etc/systemd/network and apply with "networkctl reload". You get the same results.

# 1. Create simple netplan configuration

# Ensure that no eth2 configuration exists in the other files, if so, remove that.
grep eth2 /etc/netplan/ -Ri

cat >> /etc/netplan/60-br0.yaml <<EOF
network:
  bridges:
    br0:
      addresses:
      - 172.16.1.1/24
      interfaces:
      - eth2
      mtu: 1500
  ethernets:
    eth2:
      mtu: 9000
  version: 2
EOF

# 2. Apply configuration the first time
# For some racey reason, this works when the configuration is first applied not during boot

netplan apply

grep . /sys/class/net/{br0,eth2}/mtu

# Result: Correct - br0 has MTU 1500
# /sys/class/net/br0/mtu:1500
# /sys/class/net/eth2/mtu:9000

# 3. Reboot
# Configuration is applied incorrectly on first boot

reboot

grep . /sys/class/net/{br0,eth2}/mtu

# Expected Result: Incorrect - br0 has MTU 9000
# /sys/class/net/br0/mtu:9000
# /sys/class/net/eth2/mtu:9000

# 4. Netplan apply after boot
# Configuration is fixed when re-applying the config

netplan apply

grep . /sys/class/net/{br0,eth2}/mtu

# Result: Correct - br0 has MTU 1500
# /sys/class/net/br0/mtu:1500
# /sys/class/net/eth2/mtu:9000

Trent Lloyd (lathiat)
Changed in linux (Ubuntu):
status: New → Confirmed
assignee: nobody → Trent Lloyd (lathiat)
Revision history for this message
Trent Lloyd (lathiat) wrote :

Submitted upstream for review:
https://<email address hidden>/T/#u

Trent Lloyd (lathiat)
Changed in linux (Ubuntu):
status: Confirmed → In Progress
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.