I found another cause for this issue not so far discussed, noting it here for future travellers that may find this bug on the off chance you happen to have a workflow that re-uses a neutron port on another instance.
We had a workflow like this:
- Create a static neutron port that is re-used
- Create an instance using that port
- Delete the instance
- Loop waiting for neutron to show the port's device_id has been cleared to know we can re-use it
- Create another instance using the same port
The problem is that the neutron.ports table entry is updated with device_id='', status='DOWN' before the entry in nova.virtual_interfaces is marked deleted.
So there is a small window of time, a few seconds or more, where you can create an virtual machines and use an unassigned neutron port, but the nova.virtual_interfaces will have a conflicting entry that hasn't yet been cleared and will cause the server creation to fail.
It seems that the nova.instances vm_state='deleted' and deleted=instances.id set after the virtual_interfaces entry is deleted, so as a workaround you can wait for deletion of the previous instance in such an exact workflow, but this is probably still a bug in that the virtual_interfaces entry should probably be getting cleared before the neutron port is released.
I haven't delved into the code to figure out how possible that is. The same behaviour exists on ussuri and antelope.
For reference, the order of MySQL queries in my focal-ussuri test environment (some of this may possibly be asynchronous and may vary depending on the worker count, load, size of database, etc):
2024-03-15T06:47:07 # start: openstack server create --wait
2024-03-15T06:47:12.2 INSERT INTO nova.instances
2024-03-15T06:47:16.0 UPDATE neutron.ports SET device_id
2024-03-15T06:47:18.1 INSERT INTO nova.virtual_interfaces
2024-03-15T06:47:21.9 UPDATE neutron.ports SET status='ACTIVE'
2024-03-15T06:47:27 # finish: openstack server create --wait
2024-03-15T06:47:45 # openstack server delete
2024-03-15T06:47:45.1 UPDATE nova.instances SET task_state='deleting'
2024-03-15T06:47:46.1 UPDATE neutron.ports SET status='DOWN'
2024-03-15T06:47:47.1 UPDATE neutron.ports SET device_id=''
2024-03-15T06:47:48.3 UPDATE nova.virtual_interfaces SET deleted=id
2024-03-15T06:47:48.4 UPDATE nova.instances SET vm_state='deleted'
2024-03-15T06:47:48.9 UPDATE nova.instances SET deleted=instances.id
I found another cause for this issue not so far discussed, noting it here for future travellers that may find this bug on the off chance you happen to have a workflow that re-uses a neutron port on another instance.
We had a workflow like this:
- Create a static neutron port that is re-used
- Create an instance using that port
- Delete the instance
- Loop waiting for neutron to show the port's device_id has been cleared to know we can re-use it
- Create another instance using the same port
The problem is that the neutron.ports table entry is updated with device_id='', status='DOWN' before the entry in nova.virtual_ interfaces is marked deleted.
So there is a small window of time, a few seconds or more, where you can create an virtual machines and use an unassigned neutron port, but the nova.virtual_ interfaces will have a conflicting entry that hasn't yet been cleared and will cause the server creation to fail.
It seems that the nova.instances vm_state='deleted' and deleted= instances. id set after the virtual_interfaces entry is deleted, so as a workaround you can wait for deletion of the previous instance in such an exact workflow, but this is probably still a bug in that the virtual_interfaces entry should probably be getting cleared before the neutron port is released.
I haven't delved into the code to figure out how possible that is. The same behaviour exists on ussuri and antelope.
For reference, the order of MySQL queries in my focal-ussuri test environment (some of this may possibly be asynchronous and may vary depending on the worker count, load, size of database, etc):
2024-03-15T06:47:07 # start: openstack server create --wait 15T06:47: 12.2 INSERT INTO nova.instances 15T06:47: 16.0 UPDATE neutron.ports SET device_id 15T06:47: 18.1 INSERT INTO nova.virtual_ interfaces 15T06:47: 21.9 UPDATE neutron.ports SET status='ACTIVE'
2024-03-
2024-03-
2024-03-
2024-03-
2024-03-15T06:47:27 # finish: openstack server create --wait
2024-03-15T06:47:45 # openstack server delete 15T06:47: 45.1 UPDATE nova.instances SET task_state= 'deleting' 15T06:47: 46.1 UPDATE neutron.ports SET status='DOWN' 15T06:47: 47.1 UPDATE neutron.ports SET device_id='' 15T06:47: 48.3 UPDATE nova.virtual_ interfaces SET deleted=id 15T06:47: 48.4 UPDATE nova.instances SET vm_state='deleted' 15T06:47: 48.9 UPDATE nova.instances SET deleted= instances. id
2024-03-
2024-03-
2024-03-
2024-03-
2024-03-
2024-03-