[OVN] Stale ports can be present in OVN NB leading to metadata errors
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
High
|
Lucas Alvares Gomes |
Bug Description
Right now, there's a chance that deleting a port in Neutron with ML2/OVN actually deletes the object from Neutron DB while leaving a stale port in the OVN NB database.
This can happen when deleting a port [0] raises a RowNotFound exception. While it may look like it'd mean that the port didn't exist already in OVN NB truth is that the current port_delete function can throw that exception for different reasons (especially against OVN < 2.10 when Address Sets were used instead of Port Groups).
Such exception can be observed for example if some ACL or Address Set doesn't exist [1][2] amongst others. In this case, the revision number of the object will be deleted [3] and the port will be stale forever in OVN NB (it'll be skipped by the maintenance task).
One of the main impacts of this issue is that the OVN NB database will grow and have stale objects that are undetected (they'll be detected by the neutron-
As per metadata agent code here [4] if more than one port in the same network has the same IP address, a 404 will be returned back to the instance upon requesting metadata.
The workaround is running the neutron-db-sync script in repair mode to get rid of the stale ports.
A proper fix would involve a better granularity of the exceptions that can happen around a port deletion and acting accordingly upon each of them. In the worst case, we won't be deleting the revision number if the port still exists leaving up to the Maintenance task to fix it later on (< 5 minutes). Ideally, we should identify all possible code paths and delete the port from OVN whenever possible even if some other associated operation fails (with proper logging).
Also, this scenario seems to be more likely under a high concurrency of API operations (such as heat) and possibly when Port Groups are not supported by the schema (OVN < 2.10).
Danie Alvarez
[0] https:/
[1] https:/
[2] https:/
[3] https:/
[4] https:/
tags: | added: ovn |
Changed in neutron: | |
assignee: | nobody → Maciej Jozefczyk (maciej.jozefczyk) |
importance: | Undecided → High |
tags: | added: neutron-proactive-backport-potential |
Changed in neutron: | |
status: | In Progress → Fix Released |
Fix proposed to branch: master /review. opendev. org/722789
Review: https:/