data-ports mapping to linuxbridge interfaces fails on fresh bionic-stein install

Bug #1877594 reported by Drew Freiberger
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Charm Helpers
Fix Released
High
Unassigned
OpenStack Neutron Gateway Charm
In Progress
High
Unassigned
OpenStack Neutron Open vSwitch Charm
In Progress
High
Unassigned

Bug Description

With data-port set to "br-data:br-prov" and br-prov being a linux-bridge containing the external provider network trunk (e.g. bond1, eth1), bionic neutron-gateway config-changed hook provides Traceback indicating lack of /etc/network/interfaces.d directory.

Traceback (most recent call last):
  File "hooks/config-changed", line 387, in <module>
    main()
  File "hooks/config-changed", line 379, in main
    hooks.execute(sys.argv)
  File "/var/lib/juju/agents/unit-neutron-gateway-3/charm/hooks/charmhelpers/core/hookenv.py", line 934, in execute
    self._hooks[hook_name]()
  File "/var/lib/juju/agents/unit-neutron-gateway-3/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1597, in wrapped_f
    stopstart, restart_functions)
  File "/var/lib/juju/agents/unit-neutron-gateway-3/charm/hooks/charmhelpers/core/host.py", line 741, in restart_on_change_helper
    r = lambda_f()
  File "/var/lib/juju/agents/unit-neutron-gateway-3/charm/hooks/charmhelpers/contrib/openstack/utils.py", line 1596, in <lambda>
    (lambda: f(*args, **kwargs)), __restart_map_cache['cache'],
  File "/var/lib/juju/agents/unit-neutron-gateway-3/charm/hooks/charmhelpers/contrib/hardening/harden.py", line 93, in _harden_inner2
    return f(*args, **kwargs)
  File "hooks/config-changed", line 156, in config_changed
    configure_ovs()
  File "/var/lib/juju/agents/unit-neutron-gateway-3/charm/hooks/neutron_utils.py", line 853, in configure_ovs
    add_ovsbridge_linuxbridge(br, port)
  File "/var/lib/juju/agents/unit-neutron-gateway-3/charm/hooks/charmhelpers/contrib/network/ovs/__init__.py", line 174, in add_ovsbridge_linuxbridge
    linuxbridge_port), 'w') as config:
FileNotFoundError: [Errno 2] No such file or directory: '/etc/network/interfaces.d/veth-br-prov.cfg'

To reproduce, deploy bionic-stein neutron-gateway and supporting charms (openstack-on-lxd would work), set up a linuxbridge on n-gw/0 containing the external network interface (eth1 for os-on-lxd) called br-prov, set up data-port="br-data:br-prov" and you'll get the above stack trace.

Doing a mkdir /etc/network/interfaces.d will resolve this error and then will result in missing ifup command traceback. This is part of the ifupdown package and is not installed natively on bionic.

The code that needs netplan-aware refactoring is:
https://github.com/juju/charm-helpers/blob/master/charmhelpers/contrib/network/ovs/__init__.py#L389-L398

Changed in charm-helpers:
status: New → Triaged
importance: Undecided → High
Changed in charm-neutron-gateway:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Indeed, will be fixed by https://github.com/juju/charm-helpers/pull/449

Unfortunately, as explained in this PR, it's not easy to move to netplan, because of lp:1876730

Changed in charm-neutron-openvswitch:
status: New → Triaged
importance: Undecided → High
Changed in charm-neutron-gateway:
status: Triaged → Confirmed
Changed in charm-helpers:
status: Triaged → Confirmed
Revision history for this message
Zachary Zehring (zzehring) wrote :

I've subscribed field critical.

We worked around this issue with the following:

# from unit
sudo ip link add name br-prov type bridge
sudo ip link set br-prov up
sudo ip link set bond1 master br-prov

# From the unit
sudo mkdir /etc/network/interfaces.d
sudo apt install ifupdown # needed package

# From the infra node
juju run -u neutron-gateway/<unit-id> hooks/config-changed

However, this causes issues with Juju as juju checks /sbin/ifup when deciding to use default network interfaces or netplan for bridging when deploying lxds (https://github.com/juju/juju/blob/d59b8637e150943cb987a8194addbdc116746b63/container/broker/instance_broker.go#L125).

To work around this, we have temporarily uninstalled ifupdown (removing /sbin/ifup) and this allows juju to deploy lxds correctly, creating necessary bridges.

What would the recommended workaround for this be?

Revision history for this message
Zachary Zehring (zzehring) wrote :

Here's juju status and config for neutron-gateway and neutron-openvswitch.

Revision history for this message
Ryan Beisner (1chb1n) wrote :

To confirm that we should expect this use case to work as presented, the following work was done to address the feature request, "support use of bridges via veth pairs for openvswitch data port configuration":
 - https://bugs.launchpad.net/charms/+source/neutron-openvswitch/+bug/1635067
 - https://review.opendev.org/#/q/topic:bug/1635067+(status:open+OR+status:merged)

There is work in flight to clear the way for a migration path, but it is not slated to land this close to the 20.05 stable charm release:
 - https://review.opendev.org/#/q/topic:bug/1809190+(status:open+OR+status:merged)

We're still looking at the workaround and will circle back here ASAP. Thank you.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Hi zzehring,

From what I see:

1) no DVR, data-port is only configured on neutron-gateway (3 units);
2) br-prov linux bridge is not listed in `juju status` (I am assuming because of the procedure via iproute2 mentioned in #2);
  2.1) I am guessing br-prov is not configured in MAAS either.

3) bond0 -> br-bond0 - used for the OAM network only without VLAN tagging (used by the host + containers);
4) bond1 -> br-prov (Linux bridge) -> [veth pair] -> br-data (ovs bridge)
           bond1.705
           bond1.709
           bond1.711 -> br-bond1-711 (used by the host + containers)

Since bond1 (untagged) does not have any directly assigned addresses, I believe the future-proof method would be migrating to the following configuration that does not involve veth interfaces at all:

bond1 -> br-data (OVS bridge)
         bond1.705
         bond1.709
         bond1.711 -> br-bond1-711

# data-port: br-data:bond1

ifupdown and veth pairs are not involved here as you can see.

This could be done in steps without taking machines hosting neutron-gateway units down (to avoid downtime for containers also placed onto neutron-gateway nodes).

a) remove one neutron-gateway unit (not necessarily its machine) and make sure neutron-l3-agent and neutron-openvswitch-agent services are stopped before proceeding (the charm stops them but doesn't remove the packages);
   * stopping neutron-l3-agent should take down qrouter namespaces which is intended;
b) remove the following on the target machine:
   * the veth pair between br-prov and br-data;
   * br-prov Linux bridge;
   * br-data OVS bridge;
c) deploy a separate neutron-gateway app (e.g. neutron-gateway-ng) with the same config as neutron-gateway except for `data-port: br-data:bond1` onto the same machine;
  * relate the new app in the same way as neutron-gateway;
e) repeat for all neutron-gateway units.

The l3 agent config will be the same on all gateway nodes even if there are 2 apps. And the transient L2 setup differences will not matter.

If routers created via Neutron API are L3HA then they would go through the failover procedure (router VIP migration) each time you take down Neutron L3 agents.

zzehring, does this migration procedure seem viable to you?

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

An update after a further discussion with the rest of the OpenStack team and more testing done by me.

The procedure described in #5 can be trimmed down to the following on each neutron-gateway unit (br-prov is the name of the pre-created Linux bridge):

# remove the linux bridge (removes the master device for <physical-port-name>)
sudo ip link del name dev br-prov
sudo ovs-vsctl add-port br-data <physical-port-name>
# remove the veth pair
sudo ip link del veth-br-prov
sudo ovs-vsctl del-port br-data veth-br-prov
sudo rm /etc/network/interfaces.d/veth-br-prov.cfg
# also remove any /e/n/i or netplan config that sets up the br-prov Linux bridge

# after doing it on all neutron-gateway units, change the application config:

juju config neutron-gateway data-port=br-data:<physical-port-name>

# -----------------

There is no need to take down L3 services on a given neutron-gateway unit during that operation but there will be a brief period of time when br-data will have no uplink and so workload packets will be dropped. I believe this period of time will be similar to the one in case of an HA router failover to a different neutron-gateway unit.

I tested this in a lab with a ping test running in parallel to performing the interface operations - and there was no noticeable impact, however, depending on the load there may be so I would advise doing this during a maintenance window or a low-usage period of time.

Changed in charm-neutron-gateway:
status: In Progress → Incomplete
Changed in charm-neutron-openvswitch:
status: In Progress → Incomplete
Changed in charm-helpers:
status: Confirmed → Incomplete
Revision history for this message
James Page (james-page) wrote :

Please can we consider removing field-critical from this bug report.

I'd like to mark it as 'Opinion' so that the bug sticks around for future travellers with the solution in #6

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack neutron-gateway charm because there has been no activity for 60 days.]

Changed in charm-neutron-gateway:
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Charm Helpers because there has been no activity for 60 days.]

Changed in charm-helpers:
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack neutron-openvswitch charm because there has been no activity for 60 days.]

Changed in charm-neutron-openvswitch:
status: Incomplete → Expired
Revision history for this message
Billy Olsen (billy-olsen) wrote :

Marking Charm Helpers change as fix-released as it was merged in this PR - https://github.com/juju/charm-helpers/pull/449. The resultant changes should be included in the necessary charms. Charm tasks remain open to deprecate the veth pair option as the bridge support is now included by default.

Changed in charm-helpers:
status: Expired → Fix Released
Revision history for this message
Billy Olsen (billy-olsen) wrote :

After conversation with Drew and based on comment #11, unsubscribing field-high

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-neutron-openvswitch (master)

Reviewed: https://review.opendev.org/c/openstack/charm-neutron-openvswitch/+/728382
Committed: https://opendev.org/openstack/charm-neutron-openvswitch/commit/0cbc2a8c0f645d34d3a293928de0be3bf36fd0ef
Submitter: "Zuul (22348)"
Branch: master

commit 0cbc2a8c0f645d34d3a293928de0be3bf36fd0ef
Author: Dmitrii Shcherbakov <email address hidden>
Date: Thu May 14 17:54:55 2020 +0300

    Deprecate linux bridge usage in data-port config

    f832f1073d47a430111c59563962922dfe37a0a5 addressed LP: #1635067 by
    adding support for using pre-created Linux bridges in the data-port
    config option.

    The same use-case of reusing a single physical interface for VLAN
    interfaces and plugging it into an OVS bridge can be addressed in a
    different way by plugging the physical interface directly into the OVS
    bridge and creating VLAN interfaces on that physical interface - this
    does not require the use of veth pairs which is problematic due to the
    performance reasons and lack of support for in netplan for veth pairs at
    the time of writing.

    There is a procedure to move from the setup with Linux bridge and veth
    pair used to the one that does not which will be documented to migrate
    the existing environments in-place.

    Partial-Bug: #1877594
    Change-Id: I5e455fa701cc2f5248ccfd9ed15f3c902aacb1ef
    Co-authored-by: Aurelien Lourot <email address hidden>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.