Different instance states after the compute restart:
* ERROR: the instance has already have the instance.host set in the db and therefore the compute startup detects it and push it to ERROR state
* ACTIVE: either the instance is already spawned successfully before the compute is stopped, or the build request still was in flight in AMQP when the compute stopped.
* BUILD: the build request reached the compute before it was stopped but instance.host wasn't set as the instance_claim did not finished before the compute is stopped. When the compute started again the compute does not detect this instance as it is not assigned to its host.
There is a periodic job in the compute that ERRORs out instances according to the instance_build_timeout config[1]. But it also only checks for instances assigned to the compute host so it does not push the stuck instance to ERROR.
Different instance states after the compute restart:
* ERROR: the instance has already have the instance.host set in the db and therefore the compute startup detects it and push it to ERROR state
* ACTIVE: either the instance is already spawned successfully before the compute is stopped, or the build request still was in flight in AMQP when the compute stopped.
* BUILD: the build request reached the compute before it was stopped but instance.host wasn't set as the instance_claim did not finished before the compute is stopped. When the compute started again the compute does not detect this instance as it is not assigned to its host.
There is a periodic job in the compute that ERRORs out instances according to the instance_ build_timeout config[1]. But it also only checks for instances assigned to the compute host so it does not push the stuck instance to ERROR.
[1]https:/ /github. com/openstack/ nova/blob/ c18f7f47f628e26 6e5b69f4b9733a0 f25ed4ffdd/ nova/compute/ manager. py#L1433