Code to map neutron openvswitch bridge to existing linux (juju) bridge breaks networking after reboot by adding interfaces.d/* include back

Bug #1795296 reported by Trent Lloyd on 2018-10-01
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Charm Helpers
OpenStack neutron-gateway charm
OpenStack nova-compute charm

Bug Description

When mapping a linux bridge (typically because juju wants to attach containers to that interface) to an openvswitch bridge for neutron (e.g. data-port='br-ex:br-ens3') the charm causes networking to be broken after a reboot because the IP is configured on both ens3 and br-ens3.

The reason is that

 - When juju/MAAS first deploys the machine, /etc/network/interfaces is a stub with loopback and "source /etc/network/interfaces.d/*" to load the interfaces.d files
 - cloud-init writes the various network configurations to /etc/network/interfaces.d/50-cloud-init.cfg
 - When juju deploys an LXD container, it takes the content of the 50-cloud-init.cfg file and then moves it into /etc/network/interfaces and removes the "source /etc/network/interfaces.d/*" statement
 - juju does NOT remove /etc/network/interfaces.d/50-cloud-init.cfg - this file is effectively a duplicate of the contents of /etc/network/interfaces with the exception that the network config is for "ens3" instead of "br-ens3"
 - When configured with data-port=br-ex:br-ens3 the neutron-openvswitch charm (in hooks/charmhelpers/contrib/network/ovs/ creates a veth pair (veth-br-ex / veth-br-ens3) in /etc/network/interfaces.d/veth-br-ens3.cfg with code to then add each half to the respective bridges
 - It then re-adds the "source /etc/network/interfaces.d/*" statement to /etc/network/interfaces

This causes networking to be broken after a reboot, because the IP address of ens3 is configured on BOTH ens3 and br-ens3. The exact reason is not 100% obvious (I guessed rp_filter but it seems not at fault, arp/neighbour discovery fails on ens3 and works on br-ens3 for some reason - in any case, the configuration is clearly incorrect)

It does not break before reboot presumably because ifdown for ens3 and ifup for br-ens3 is explicitly executed instead of ifup -a - or some similar sequence of events. I didn't check exactly why but it's likely mostly irrelevant.

The MTU is also not set on the veth pair, which means the linux bridge MTU will drop to 1500 if it was set to 9000. The veth pair itself and the openvswitch bridge mostly don't seem to care about MTU and will transmit packets anyway but linux bridges will drop packets with the wrong MTU. A separate bug for that is here:

I would argue in some ways this is a bug in juju, as other things may rely on interfaces.d and it probably should rewrite 50-cloud-init.cfg or otherwise not remove the interfaces.d include (or it could leave the include there but remove the 50-cloud-init.cfg) -- but this behavior exists in multiple juju versions so we may need to fix the charm as a workaround and juju for a long term fix.

Ultimately this stems from multiple parties trying to configure the network, it would be even more ideal if MAAS originally configured a linux bridge but for various reasons that is difficult to predict and users not using lxd may not want the overhead of the bridge being configured for no reason.

Trent Lloyd (lathiat) wrote :

Seems my guess about rp_filter is wrong, setting it to 0 has no effect, for whatever reason the outbound traffic does not transmit. So removing that from the description.

description: updated
Chris Gregan (cgregan) on 2018-10-01
tags: added: cpe-onsite
Trent Lloyd (lathiat) on 2018-11-20
tags: added: sts

On Mon, 2018-10-01 at 06:12 +0000, Trent Lloyd wrote:
> I would argue in some ways this is a bug in juju, as other things may
> rely on interfaces.d and it probably should rewrite 50-cloud-init.cfg
> or otherwise not remove the interfaces.d include (or it could leave
> the include there but remove the 50-cloud-init.cfg)

This is intentional (the removal of the source stanza), because juju
aims to control the networking of the machine.

Trent Lloyd (lathiat) wrote :

Though it is not material to the bug itself, Jay was able to determine *why* the traffic fails when both the interface and the bridge has the same IP address. I originally incorrectly suspected this was related to rp_filter.

An ARP request generates an INCOMPLETE ARP entry (Requested IP and requesting device) for the device selected to send the ARP request. When an ARP reply is received, the device that is associated with processing the reply is the one used to determine if the reply completes a valid INCOMPLETE request.

This means that the device that is used to send the ARP request must be the one that processes the ARP reply.

When generating outbound traffic an INCOMPLETE arp entry is created for the non-bridge version of the interface

Example expected "ip neigh show" output, trimmed for relevance: dev bond1.246 INCOMPLETE # Note the association between the requested IP and the device

When the reply comes back, it is processed by br-bond1.246 and the ARP entry is thus not completed

Trent Lloyd (lathiat) wrote :

Additionally on juju 2.3.8 at least I suspect the interface file updates are racing between juju (setting up multiple container bridges over time) and the charm updating the file. juju 2.4 may be better but I did not conclusively test that.

In some cases (unreliably) it failed the bridge setup - leaving no bridge in /etc/network/interfaces. One deploy it works, the next it didn't.

As an extra note when juju first converts data from /etc/network/interfaces.d/50-cloud-init.cfg to /etc/network/interfaces it actually includes all of /etc/network/interfaces.d/*.cfg - i.e. including the veth file. This is lost in a later update, I'm not 100% sure off hand why. But the file is rewritten multiple times as different interfaces get bridges added and the charm may also update them multiple times.

It's clear this method isn't going to be reliable on Xenial, however we have existing deployments using this option, so we need to find a way to fix that. This has broken network in multiple production environments.

The best hack fix I can think of at the moment is to move 50-cloud-init.cfg out of the way. The charm can probably do this. However we'd need to make sure that nothing is relying on that file later. i.e. juju doesn't try to re-read that and use it again (which I am wondering based on the above races if that does actually happen)

It also seems whether the network is broken on reboot may also be racy, depending on what order the routes get installed.

I would suggest this should be triaged High because it causes broken network on production deployments.

Changed in charm-neutron-gateway:
status: New → Triaged
Changed in charm-nova-compute:
status: New → Triaged
Changed in charm-neutron-gateway:
importance: Undecided → High
Changed in charm-nova-compute:
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers