server = cls.create_test_server() cls.client.delete_server(server['id']) waiters.wait_for_server_termination(cls.client, server['id']) cls.deleted_server_id = server['id']
There is a race where nova-conductor will delete the instance mapping while nova-api is trying to update the queued_for_delete field for the instance mapping record. When that happens, nova-conductor deletes the instance mapping after nova-api has retrieved it for the intended update, and then nova-api fails with StaleDataError when it tries to save the instance mapping record to the database. We see the following log in screen-n-cond.txt[1]:
Jun 01 14:33:57.487787 ubuntu-bionic-rax-iad-0016890725 nova-conductor[14142]: DEBUG nova.conductor.manager [None req-e73643cb-efb2-445d-a6dc-5c6fd956c989 tempest-ServersNegativeTestJSON-1435542876 tempest-ServersNegativeTestJSON-1435542876] [instance: d87b9767-d6ac-4c23-ad5b-d1fd139f1662] While scheduling instance, the build request was already deleted. {{(pid=15387) schedule_and_build_instances /opt/stack/new/nova/nova/conductor/manager.py:1515}}
which triggers nova-conductor to delete the instance mapping [2]. Then we fail in the delete path while trying to update queued_for_delete [3].
I think we could fix this with a try-except to catch StaleDataError and then raise InstanceMappingNotFound to treat it as a missing instance mapping.
I found that this happens when a server is requested to be deleted while it's in the middle of booting, as seen in the ServersNegative TestJSON code:
@classmethod setup(cls) :
super( ServersNegative TestJSON, cls).resource_ setup() test_server( wait_until= 'ACTIVE' )
cls.server_ id = server['id']
def resource_
server = cls.create_
server = cls.create_ test_server( )
cls.client. delete_ server( server[ 'id'])
waiters. wait_for_ server_ termination( cls.client, server['id'])
cls.deleted_ server_ id = server['id']
There is a race where nova-conductor will delete the instance mapping while nova-api is trying to update the queued_for_delete field for the instance mapping record. When that happens, nova-conductor deletes the instance mapping after nova-api has retrieved it for the intended update, and then nova-api fails with StaleDataError when it tries to save the instance mapping record to the database. We see the following log in screen- n-cond. txt[1]:
Jun 01 14:33:57.487787 ubuntu- bionic- rax-iad- 0016890725 nova-conductor[ 14142]: DEBUG nova.conductor. manager [None req-e73643cb- efb2-445d- a6dc-5c6fd956c9 89 tempest- ServersNegative TestJSON- 1435542876 tempest- ServersNegative TestJSON- 1435542876] [instance: d87b9767- d6ac-4c23- ad5b-d1fd139f16 62] While scheduling instance, the build request was already deleted. {{(pid=15387) schedule_ and_build_ instances /opt/stack/ new/nova/ nova/conductor/ manager. py:1515} }
which triggers nova-conductor to delete the instance mapping [2]. Then we fail in the delete path while trying to update queued_for_delete [3].
I think we could fix this with a try-except to catch StaleDataError and then raise InstanceMapping NotFound to treat it as a missing instance mapping.
[1] https:/ /zuul.opendev. org/t/openstack /build/ 58647aab9847469 cb1dc474a7e7a1e 6d/log/ logs/screen- n-cond. txt#2379 /github. com/openstack/ nova/blob/ 2061ce1125039f3 595999457da3a6a d3c202ea2a/ nova/conductor/ manager. py#L1514- L1524 /github. com/openstack/ nova/blob/ 2061ce1125039f3 595999457da3a6a d3c202ea2a/ nova/compute/ api.py# L2423-L2434
[2] https:/
[3] https:/