OpenStack Compute (nova)

Comment 1 for bug 1833581

Revision history for this message

Balazs Gibizer (balazs-gibizer) wrote on 2019-06-20:

Different instance states after the compute restart:

* ERROR: the instance has already have the instance.host set in the db and therefore the compute startup detects it and push it to ERROR state
* ACTIVE: either the instance is already spawned successfully before the compute is stopped, or the build request still was in flight in AMQP when the compute stopped.
* BUILD: the build request reached the compute before it was stopped but instance.host wasn't set as the instance_claim did not finished before the compute is stopped. When the compute started again the compute does not detect this instance as it is not assigned to its host.

There is a periodic job in the compute that ERRORs out instances according to the instance_build_timeout config[1]. But it also only checks for instances assigned to the compute host so it does not push the stuck instance to ERROR.

[1]https://github.com/openstack/nova/blob/c18f7f47f628e266e5b69f4b9733a0f25ed4ffdd/nova/compute/manager.py#L1433