switching from external-network-id and external-port to data-port and bridge-mappings does not remove incorrect nics from bridges

Bug #1809190 reported by Xav Paice on 2018-12-20
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Charms Deployment Guide
Medium
Alex Kavanagh
OpenStack neutron-gateway charm
High
Alex Kavanagh

Bug Description

charm cs:neutron-gateway-258

I upgraded a site from Mitaka to Newton. There are 4 neutron-gateway applications, one for each external network, hosting a set of routers that connect to those nets.

The charm config includes a setting for external-network-id and external-port. Each of the applications has a different nic for the external network, i.e. eth1, eth2, eth3 and eth4, so if I have application gateway3 it is using eth3 as the external-port.

Each of the external networks is configured in openstack as such:

:~$ neutron net-show ext_net
+---------------------------+--------------------------------------+
| Field | Value |
+---------------------------+--------------------------------------+
| admin_state_up | True |
| availability_zone_hints | |
| availability_zones | nova |
| created_at | 2016-08-18T00:12:43Z |
| description | |
| id | foo |
| ipv4_address_scope | |
| ipv6_address_scope | |
| is_default | False |
| l2_adjacency | True |
| mtu | 1458 |
| name | ext_net |
| project_id | foo |
| provider:network_type | gre |
| provider:physical_network | |
| provider:segmentation_id | 85 |
| revision_number | 0 |
| router:external | True |
| shared | False |
| status | ACTIVE |
| subnets | foo |
| tags | |
| tenant_id | foo |
| updated_at | 2016-09-12T21:01:44Z |
+---------------------------+--------------------------------------+

On running the openstack-upgrade action on the neutron-gateway-X units, network connectivity to the external networks was lost.

After some time, we discovered that these deprecated options should have been changed out, so we reset external-network-id, and external-port to default and configured all 4 gateway applications:

data-port='br-ex:eth1 br-2:eth2 br-3:eth3 br-4:eth4' bridge-mappings='physnet1:br-ex net2:br-3 net3:br-3 net4:br-4'

We also used mysql edits to reconfigure the networks to have network_type of flat, and the physical_network set to whichever of physnet1 or whatever it needed.

We expected this to reconfigure the networks correctly, but in fact what happened is that the 'old' interface, e.g. eth3, was left in br-ex as well as the eth1 added. We probably should have just created an entirely new bridge rather than re-using br-ex. The two interfaces in the bridge caused some kind of storm and the entire physical network was saturated.

While this is clearly a design fault from the deployment point of view, it would be good to firstly have some massive warning flags about these older configs breaking on upgrade, also if an interface is in a bridge that shouldn't be it would be good to remove it rather than leave it there.

Ryan Beisner (1chb1n) on 2018-12-20
Changed in charm-neutron-gateway:
importance: Undecided → High
milestone: none → 19.04
assignee: nobody → Frode Nordahl (fnordahl)

On Wed, Dec 19, 2018 at 8:40 PM Xav Paice <email address hidden> wrote:

After some time, we discovered that these deprecated options should have
> been changed out, so we reset external-network-id, and external-port to
> default and configured all 4 gateway applications

The options should still work even though they are deprecated. If they are
in config.yaml and not removed they should still work. If they don't that's
a bug in the charm.

It would be nice to have a formal way to run a health check on config
options prior to charm upgrade. For example if ext-port had been removed in
a charm release you'd be able to run it and find out.

Changed in charm-neutron-gateway:
assignee: Frode Nordahl (fnordahl) → Alex Kavanagh (ajkavanagh)
Changed in charm-deployment-guide:
status: New → In Progress
assignee: nobody → Alex Kavanagh (ajkavanagh)
importance: Undecided → Medium
Alex Kavanagh (ajkavanagh) wrote :

So the actual neutron-gateway code doesn't attempt to delete existing bridges. i.e. if any of the parameters changes a mapping, the old mappings are left with the changes to the config being added via ovs. The question is whether the neutron-gateway charm should enumerate what is currently configured in ovs (for example), see what the config is, diff that, and then take the appropriate action to remove bridge/port mappings that are no longer configured.

Changed in charm-neutron-gateway:
status: New → Confirmed
Frode Nordahl (fnordahl) wrote :
Changed in charm-deployment-guide:
status: In Progress → Fix Committed
David Ames (thedac) on 2019-04-17
Changed in charm-neutron-gateway:
milestone: 19.04 → 19.07
Alex Kavanagh (ajkavanagh) wrote :

Having discussed this bug in the OpenStack team, we've come to the following conclusion:

"Re-write the config-changed hook to reflect the options

i.e. delete bridge mappings/ports that no longer exist and add port mappings that now do exist. This would mean that any additional configuration done directly on the unit (for example by installers to create networking situations that the charm config doesn't handle) would be deleted/broken on the next config-changed hook for any config item.

This ensures that the charm is in full control of the OVS bridges, and that any tinkering is transient and will potentially be scrubbed; OVS is sufficiently verbose in the data you can get to be able to manage this effectively - we could even add extra data to the ports we add so it makes it easier to discover what the charm did vs anything done outside of the charm operations."

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers