It seems that when we fail we have a log message like this:
2023-07-12T11:36:01.045934+0000 devstack0 nova-compute[358332]: DEBUG nova.compute.resource_tracker [None req-6880982f-bd20-4983-b212-4ce32465fdd7 None None] [instance: ad7997b9-1c2c-4776-a30b-0f2b62fc2222] Instance with task_state "spawning" is not being actively managed by this compute host but has allocations referencing this compute node (3698a4ab-4810-4beb-915a-ed3ed883a2e1): {'resources': {'DISK_GB': 1, 'MEMORY_MB': 256, 'PCPU': 1}}. Skipping heal of allocations during the task state transition. {{(pid=358332) _remove_deleted_instances_allocations /opt/stack/nova/nova/compute/resource_tracker.py:1717}}
That comment sounds suspiciously relevant to this bug. However not being familiar with nova internals, I'm far from properly understanding at the moment in what conditions we get here and why this happens.
edit: This log message is not present every time the bug is present, so the above may be a false suspicion.
It seems that when we fail we have a log message like this:
2023-07- 12T11:36: 01.045934+ 0000 devstack0 nova-compute[ 358332] : DEBUG nova.compute. resource_ tracker [None req-6880982f- bd20-4983- b212-4ce32465fd d7 None None] [instance: ad7997b9- 1c2c-4776- a30b-0f2b62fc22 22] Instance with task_state "spawning" is not being actively managed by this compute host but has allocations referencing this compute node (3698a4ab- 4810-4beb- 915a-ed3ed883a2 e1): {'resources': {'DISK_GB': 1, 'MEMORY_MB': 256, 'PCPU': 1}}. Skipping heal of allocations during the task state transition. {{(pid=358332) _remove_ deleted_ instances_ allocations /opt/stack/ nova/nova/ compute/ resource_ tracker. py:1717} }
Which originates from here:
https:/ /opendev. org/openstack/ nova/src/ commit/ 6f56c5c9fd60ee1 d53376a9100a958 0cb2b38dc3/ nova/compute/ resource_ tracker. py#L1707- L1730
That comment sounds suspiciously relevant to this bug. However not being familiar with nova internals, I'm far from properly understanding at the moment in what conditions we get here and why this happens.
edit: This log message is not present every time the bug is present, so the above may be a false suspicion.