Activity log for bug #1795296

Date Who What changed Old value New value Message
2018-10-01 06:12:04 Trent Lloyd bug added bug
2018-10-01 06:12:47 Trent Lloyd bug task added charm-helpers
2018-10-01 06:12:59 Trent Lloyd bug task added charm-neutron-gateway
2018-10-01 06:37:11 Trent Lloyd description When mapping a linux bridge (typically because juju wants to attach containers to that interface) to an openvswitch bridge for neutron (e.g. data-port='br-ex:br-ens3') the charm causes networking to be broken after a reboot because the IP is configured on both ens3 and br-ens3. The reason is that - When juju/MAAS first deploys the machine, /etc/network/interfaces is a stub with loopback and "source /etc/network/interfaces.d/*" to load the interfaces.d files - cloud-init writes the various network configurations to /etc/network/interfaces.d/50-cloud-init.cfg - When juju deploys an LXD container, it takes the content of the 50-cloud-init.cfg file and then moves it into /etc/network/interfaces and removes the "source /etc/network/interfaces.d/*" statement - juju does NOT remove /etc/network/interfaces.d/50-cloud-init.cfg - this file is effectively a duplicate of the contents of /etc/network/interfaces with the exception that the network config is for "ens3" instead of "br-ens3" - When configured with data-port=br-ex:br-ens3 the neutron-openvswitch charm (in hooks/charmhelpers/contrib/network/ovs/__init__.py#add_ovsbridge_linuxbridge) creates a veth pair (veth-br-ex / veth-br-ens3) in /etc/network/interfaces.d/veth-br-ens3.cfg with code to then add each half to the respective bridges - It then re-adds the "source /etc/network/interfaces.d/*" statement to /etc/network/interfaces This causes networking to be broken after a reboot, because the IP address of ens3 is configured on BOTH ens3 and br-ens3. The problem with this is that reverse path filtering (rp_filter) means that often the traffic is seen to come in one interface and go out the other and so is blocked. Traffic to the containers from the host would also be inconsistent in a similar way depending on whether the host tried to sent the traffic out ens3 or br-ens3. It does not break before reboot presumably because ifdown for ens3 and ifup for br-ens3 is explicitly executed instead of ifup -a - or some similar sequence of events. I didn't check exactly why but it's likely mostly irrelevant. The MTU is also not set on the veth pair, which means the linux bridge MTU will drop to 1500 if it was set to 9000. The veth pair itself and the openvswitch bridge mostly don't seem to care about MTU and will transmit packets anyway but linux bridges will drop packets with the wrong MTU. A separate bug for that is here: https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1791101 I would argue in some ways this is a bug in juju, as other things may rely on interfaces.d and it probably should rewrite 50-cloud-init.cfg or otherwise not remove the interfaces.d include (or it could leave the include there but remove the 50-cloud-init.cfg) -- but this behavior exists in multiple juju versions so we may need to fix the charm as a workaround and juju for a long term fix. Ultimately this stems from multiple parties trying to configure the network, it would be even more ideal if MAAS originally configured a linux bridge but for various reasons that is difficult to predict and users not using lxd may not want the overhead of the bridge being configured for no reason. When mapping a linux bridge (typically because juju wants to attach containers to that interface) to an openvswitch bridge for neutron (e.g. data-port='br-ex:br-ens3') the charm causes networking to be broken after a reboot because the IP is configured on both ens3 and br-ens3. The reason is that  - When juju/MAAS first deploys the machine, /etc/network/interfaces is a stub with loopback and "source /etc/network/interfaces.d/*" to load the interfaces.d files  - cloud-init writes the various network configurations to /etc/network/interfaces.d/50-cloud-init.cfg  - When juju deploys an LXD container, it takes the content of the 50-cloud-init.cfg file and then moves it into /etc/network/interfaces and removes the "source /etc/network/interfaces.d/*" statement  - juju does NOT remove /etc/network/interfaces.d/50-cloud-init.cfg - this file is effectively a duplicate of the contents of /etc/network/interfaces with the exception that the network config is for "ens3" instead of "br-ens3"  - When configured with data-port=br-ex:br-ens3 the neutron-openvswitch charm (in hooks/charmhelpers/contrib/network/ovs/__init__.py#add_ovsbridge_linuxbridge) creates a veth pair (veth-br-ex / veth-br-ens3) in /etc/network/interfaces.d/veth-br-ens3.cfg with code to then add each half to the respective bridges  - It then re-adds the "source /etc/network/interfaces.d/*" statement to /etc/network/interfaces This causes networking to be broken after a reboot, because the IP address of ens3 is configured on BOTH ens3 and br-ens3. The exact reason is not 100% obvious (I guessed rp_filter but it seems not at fault, arp/neighbour discovery fails on ens3 and works on br-ens3 for some reason - in any case, the configuration is clearly incorrect) It does not break before reboot presumably because ifdown for ens3 and ifup for br-ens3 is explicitly executed instead of ifup -a - or some similar sequence of events. I didn't check exactly why but it's likely mostly irrelevant. The MTU is also not set on the veth pair, which means the linux bridge MTU will drop to 1500 if it was set to 9000. The veth pair itself and the openvswitch bridge mostly don't seem to care about MTU and will transmit packets anyway but linux bridges will drop packets with the wrong MTU. A separate bug for that is here: https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1791101 I would argue in some ways this is a bug in juju, as other things may rely on interfaces.d and it probably should rewrite 50-cloud-init.cfg or otherwise not remove the interfaces.d include (or it could leave the include there but remove the 50-cloud-init.cfg) -- but this behavior exists in multiple juju versions so we may need to fix the charm as a workaround and juju for a long term fix. Ultimately this stems from multiple parties trying to configure the network, it would be even more ideal if MAAS originally configured a linux bridge but for various reasons that is difficult to predict and users not using lxd may not want the overhead of the bridge being configured for no reason.
2018-10-01 10:35:29 Dominique Poulain bug added subscriber Dominique Poulain
2018-10-01 12:35:00 Chris Gregan tags cpe-onsite
2018-11-20 09:16:58 Trent Lloyd tags cpe-onsite cpe-onsite sts
2018-11-26 16:02:31 Dmitrii Shcherbakov bug added subscriber Dmitrii Shcherbakov
2019-05-14 07:56:44 Chris MacNaughton charm-neutron-gateway: status New Triaged
2019-05-14 07:56:46 Chris MacNaughton charm-nova-compute: status New Triaged
2019-05-14 07:56:48 Chris MacNaughton charm-neutron-gateway: importance Undecided High
2019-05-14 07:56:50 Chris MacNaughton charm-nova-compute: importance Undecided High
2019-11-08 13:20:26 Alex Kavanagh tags cpe-onsite sts cold-start cpe-onsite sts