OpenStack Compute (nova)

Bug #1884217
Comment #7

Comment 7 for bug 1884217

Revision history for this message

Balazs Gibizer (balazs-gibizer) wrote on 2021-09-03:

I might not have the full context but the code Sylvain is linked to are waiting for the instance being destroyed[1]. If we change the states we are waiting in _unprovision() then we hold up the destroying of the instance object in the upper layers. Therefor holding up volume / network resources, instance quotas. Also if the ironic node ends up in ironic_states.CLEANFAIL then that would mean that the end user visible instance will be in DELETING state for a very long time.

I think that in ironic we need to decouple the instance destroyed state from the resource is ready to be used again state. E.g.: _unprovision could put a trait (e.g. CUSTOM_IRONIC_NEEDS_CLEANING) on the ironic node RP and we can have a placement pre-filter that filters out nodes with that trait during scheduling.

I don't know how this is happen:
"This is fixed once the resource tracker comes along and corrects the information in placement,"
but that logic might be extended to the _unprovision case to solve the issue.

bottom line: I agree that this is possible a bug (I haven't reproduced myself). But the suggested solution needs further discussion.

I let others with more ironic knowledge to confirm / triage this.

[1] https://github.com/openstack/nova/blob/90777d790d7c268f50851ac3e5b4e02617f5ae1c/nova/virt/ironic/driver.py#L1295-L1297