neutron

Bug #1960006
Comment #18

Comment 18 for bug 1960006

Revision history for this message

Frode Nordahl (fnordahl) wrote on 2023-03-02:

#18

@Daniel

Thank you for responding, I'd have to check more in detail about whether stale ports are left around indefinitely or not, but I know that the operators of the ci cloud in question has to periodically run a script to remove stale ports, which may suggest some resources are missed. I'll endeavor to check.

More importantly, I disagree with the premise of the current approach. You can technically boot an instance which uses a resource that currently has duplicates in the OVN DB, but the fact that the instance does not have N/S connectivity for a long period of time, makes that point moot.

The stale resource most likely belonged to a deleted VM as you point out, but that does not matter when Neutron allocates the same IP for use with a new instance before the stale resource has been cleaned out.

For the thing that deploys the instance, let's say a functional test executor, this just becomes a weird failure scenario. It can talk to the instance, but the instance cannot talk to the internet to install packages, download OCI images or anything like that etc.

Even if this situation would have been corrected after 5 minutes, that is not going to solve the problem. The whole test execution has most likely stalled or failed long before those 5 minutes have passed, and as a consequence the cloud is perceived as unreliable.

I'm not quite sure what negative effects you refer to that adding a pre-flight check would cause?

If you are referring to that a negative response from the API would cause issues for the caller, I actually think that would be better than to provision something that does not actually work. The caller can always retry the API request.

I would hope for the code change to be able to amend the situation rather than returning a negative response though.