[ovn] Stale ports in the OVN database at churn
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
There are situations where, under a lot of control plane activity, OVN ports will stale and won't get cleaned up (unless the neutron-ovn-db-sync tool is run manually).
A possible scenario for this is:
a) Port creation
a.1) Port created in Neutron DB
a.b) Port created in OVN Northbound (NB) database.
a.c) NB ovsdb-server will notify of the port creation to all the connected workers
a.d) Each worker will eventually process this event and update their in-memory copy of the NB database
Immediately, the port gets deleted via API but the previous a.d) step hasn't been completed by all workers. Then the port deletion API request falls into one of those workers that haven't yet updated their in-memory OVN NB database copy with the newly created port.
b) Port deletion
b.1) Port deleted from Neutron DB
b.2) Port attempted to be deleted from OVN NB but lookup fails and its revision number is deleted [0]
At this point, the port will stale forever in the OVN database causing other issues that we have mitigated (eg. [1]) but ultimately the number of OVN resources may grow to a point that can affect very negatively to the overall cluster stability and performance.
A potential workaround to this problem might be to run the neutron-ovn-db-sync tool periodically to get rid of those but it is not recommended to do so while the API is operational.
[0] https:/
[1] https:/
tags: | added: ovn |
Fix proposed to branch: master /review. opendev. org/c/openstack /neutron/ +/827834
Review: https:/