Neutron ml2 linux bridge agent fails to clean up bridges on high volumes of deletes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
Low
|
Kevin Benton |
Bug Description
The neutron linux bridge ml2 plugin fails to clean up brq bridges on nodes hosting dhcp agents.
Environment:
Mirantis OpenStack 8 / Liberty
Linux Bridge ml2 plugin
vxlan segmentation
The workload on the environment involves large numbers of network and subnet creation and deletion over very short periods of time.
It appears that the issue arises from the network getting marked as deleted before the cleanup has an opportunity to take place:
An example log section:
2017-06-15 00:19:11.594 449231 DEBUG neutron.
2017-06-15 00:19:11.595 449231 ERROR neutron.
A collection of example bridges that failed to get cleaned up:
brqfed1d4bd-de 8000.92bb5a5feba5 no vxlan-77488
brqfeebf10d-df 8000.12e256acb4d7 no vxlan-72765
brqff281599-d2 8000.f26313c6ba80 no vxlan-66828
brqff2dca83-db 8000.c6f2263ea94d no vxlan-71111
brqff40fcfa-8e 8000.3621679c4a97 no vxlan-75989
brqff96bdd2-5c 8000.4efaf9b1ce1f no vxlan-69185
brqffe6412f-20 8000.ca59d8a1a1aa no vxlan-66860
brqffea1c9e-82 8000.8a917769c6e4 no vxlan-67359
This is fundamentally an issue with the network being deleted either before the port was processed on the agent in that network or while the agent was offline.
We can adjust the logic to always try to delete the network associated with a deleted network's ID. However, that won't solve the case where bridges are left behind if a network is deleted while the agent is offline.
To address the offline case I think you will always need to run a manual cleanup script unless we change the agent to try to delete bridges it doesn't recognize, but that seems risky.