Comment 28 for bug 1701023

Revision history for this message
Tom Verdaat (tom-verdaat) wrote :

We've been doing a lot more testing and debugging and I'd like to share our findings:

1) Unfortunately it turns out this change does not fix the issue of interfaces not coming up correctly for a bond with a (static) network configuration. The race condition seems to be removed so at least there are no more hangs between bonds and their vlan children. All the interfaces also say they are UP both when running ifup and after reboot. However:
- Running "ifup <slavename>" does bring up the bond (and its vlans) in a working state.
- Running "ifup -a" or rebooting don't actually work, causing "network not available" errors and "Destination Host Unreachable" when pinging other machines. Executing "ifdown -a; ifup -a" shows that ifupdown tries to bring up the bond BEFORE the slaves in stead of the other way around. Even though after the 60s timeout the bond and it's slaves say they are UP, they don't actually function.
- We're not seeing any issues with bonds that do not have a network configuration of their own

2) The networking script stack / concept seems fundamentally flawed in three areas:

2.A) bonds relying on slaves having "bond-master" and being started by bringing up the slaves, but not supporting the master having "bond-slaves" and being able to start a bond by just bringing up the bond directly.

2.B) bringing a specific interface up automatically brings up it's child vlans. This does not make a lot of sense. The other way around does - e.g. in order to bring up a vlan we need to bring up it's raw device - but why would the ifupdown scripts assume that I want to bring up all of it's vlans when I bring up an interface that (also) serves as a raw device? In that case I would probably run "ifup -a"!

2.C) a vlan running on top of a bond cannot be brought up directly due to /sys/class/net/<bondname>/ not existing. This results in the following:
> # ifup bo-adm.2
> Set name-type for VLAN subsystem. Should be visible in /proc/net/vlan/config
> cat: /sys/class/net/bo-adm/mtu: No such file or directory
> Device "bo-adm" does not exist.
> bo-adm does not exist, unable to create bo-adm.2
> run-parts: /etc/network/if-pre-up.d/vlan exited with return code 1
> Failed to bring up bo-adm.2.

3) Our new workaround for boot has become this very intrusive systemd service:
> [Unit]
> Wants=network-online.target
> After=network-online.target
>
> [Install]
> WantedBy=multi-user.target
>
> [Service]
> Type=oneshot
> ExecStartPre=/sbin/ifdown bo-adm
> ExecStart=/sbin/ifup enp0s3
> ExecStart=/sbin/ifup enp0s10
> ExecStop=/sbin/ifdown bo-adm
> RemainAfterExit=yes
> TimeoutStartSec=5min