Possible race condition when port unplugged from ovs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
Medium
|
Arnaud Morin |
Bug Description
A possible race condition can occur when nova unplug a port on integration bridge (when shelving an instance e.g.)
On such action, openvswitch is sending 2 events to neutron:
2022-10-07 01:00:31.790 214734 DEBUG neutron.
2022-10-07 01:00:32.179 214734 DEBUG neutron.
Now imagine that the second event is delayed, so neutron iteration will consider the first event as a port update and will:
- check the ofport --> -1
- put the port in skipped_devices
- put the port DOWN in DB through RPC call
- remove the port from "current" ports
- do nothing else, so the port is still configured: openflow rules, etc. stays
The on next iteration, if the "delete" event is received, the agent will:
- try to figure out if this port is configured by looking in "current"
- it's not so it does nothing
As a result, the port stays configured on the compute. Some openflow rules are left over.
Note that I am running neutron Stein with openvswitch 2.11.4.
I also check that the 2 events are received on an neutron victoria with openvswitch 2.15.0.
Note also that the race condition is very rare and difficult to reproduce, because the port needs to be removed from br-int, but still in ovs db with of_port=-1.
This can also happen on port detach (I reproduced it when attaching/detaching in a loop)