Disabling/enabling networks in neutron causes traffic loop with linxbridge agent

Bug #1888666 reported by Sebastian Lohff
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
New
Undecided
Unassigned

Bug Description

We observed traffic looping between two linuxbridge agents after a user disabled and then reenabled the network. Disabling a network causes all vethX interfaces to be cleaned from the bridge, but the physical interface remains in the bridge. Disabling the network will also cause a segment release for the segment the network agent is on in a hierarchical port binding setup. When reenabling the network a new VLAN id might be generated for the segment the network agent is on and thus the physical interface will be added with the new VLAN id. With two network agents bridging two different VLANs we get a loop.

Quick mitigation would be to identify the bridge with two physical interfaces in it, identify the stale interface and remove it.

Tested with Neutron Queens. Network was disabled/enabled via openstack network set --disable/--enable $uuid

Tags: linuxbridge
Revision history for this message
sean mooney (sean-k-mooney) wrote :

what release of openstack are you using

i have not found the code path for disabling a network but assuming it just set the adminstate on all the ports to down
that has not removed the interface form the linux bridge since
https://review.opendev.org/#/c/193485/22 which merged in liberty 5 years ago.
https://github.com/openstack/neutron/blob/f8b990736ba91af098e467608c6dfa0b801ec19c/neutron/plugins/ml2/drivers/agent/_common_agent.py#L259-L296

seting the admin state down simply sets the linkstate down.

if disabling the network which sets the admin state of the network down does something other then set the admin state on all the ports on that netorkdown perhaps there is a race? but if its just setting the port down i dont see how that would cause a loop since that would just set the linkstate down.

Changed in neutron:
status: New → Incomplete
Revision history for this message
Johannes Kulik (jkulik) wrote :

Release: queens
command1: openstack network set --disable $uuid
command2: openstack network set --enable $uuid

Disabling the network reaches this code in the dhcp-agent:
https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/dhcp/agent.py#L391-L396

Which calls "disable" on the agent(s), which comes down here
https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/linux/dhcp.py#L254-L255

This, through
https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/linux/dhcp.py#L260
comes to
https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/linux/dhcp.py#L1584-L1585
which deletes the ports in
https://github.com/openstack/neutron/blob/stable/queens/neutron/api/rpc/handlers/dhcp_rpc.py#L237-L247

If there are no ports left in the network, in a hierarchical port-binding setup
one can free the segment of the network again, as it's unused. This would lead
to a vlan tag change, which the linux-bridge-agent does recognise, as it
creates the new interface for it - it just doesn't remove the old one.

Johannes Kulik (jkulik)
Changed in neutron:
status: Incomplete → New
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.