Comment 5 for bug 1877594

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Hi zzehring,

From what I see:

1) no DVR, data-port is only configured on neutron-gateway (3 units);
2) br-prov linux bridge is not listed in `juju status` (I am assuming because of the procedure via iproute2 mentioned in #2);
  2.1) I am guessing br-prov is not configured in MAAS either.

3) bond0 -> br-bond0 - used for the OAM network only without VLAN tagging (used by the host + containers);
4) bond1 -> br-prov (Linux bridge) -> [veth pair] -> br-data (ovs bridge)
           bond1.705
           bond1.709
           bond1.711 -> br-bond1-711 (used by the host + containers)

Since bond1 (untagged) does not have any directly assigned addresses, I believe the future-proof method would be migrating to the following configuration that does not involve veth interfaces at all:

bond1 -> br-data (OVS bridge)
         bond1.705
         bond1.709
         bond1.711 -> br-bond1-711

# data-port: br-data:bond1

ifupdown and veth pairs are not involved here as you can see.

This could be done in steps without taking machines hosting neutron-gateway units down (to avoid downtime for containers also placed onto neutron-gateway nodes).

a) remove one neutron-gateway unit (not necessarily its machine) and make sure neutron-l3-agent and neutron-openvswitch-agent services are stopped before proceeding (the charm stops them but doesn't remove the packages);
   * stopping neutron-l3-agent should take down qrouter namespaces which is intended;
b) remove the following on the target machine:
   * the veth pair between br-prov and br-data;
   * br-prov Linux bridge;
   * br-data OVS bridge;
c) deploy a separate neutron-gateway app (e.g. neutron-gateway-ng) with the same config as neutron-gateway except for `data-port: br-data:bond1` onto the same machine;
  * relate the new app in the same way as neutron-gateway;
e) repeat for all neutron-gateway units.

The l3 agent config will be the same on all gateway nodes even if there are 2 apps. And the transient L2 setup differences will not matter.

If routers created via Neutron API are L3HA then they would go through the failover procedure (router VIP migration) each time you take down Neutron L3 agents.

zzehring, does this migration procedure seem viable to you?