Comment 3 for bug 1829479

Revision history for this message
Matt Riedemann (mriedem) wrote :

Chris Friesen brought up what sounds like a similar issue in IRC today:

http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-06-05.log.html#t2019-06-05T20:13:23

A host goes down and their tooling automatically evacuates the instances from it. The allocations will still be on the source host in this case because nova doesn't remove the allocations from the evacuated host until the service is restarted.

If you try to delete the compute service in this case it will fail here but be ignored:

https://github.com/openstack/nova/blob/653515a45032811b6bc2f1d0fb651472005496ec/nova/scheduler/client/report.py#L2183

Which means we'll continue to delete the compute_nodes and services table records for that service:

https://github.com/openstack/nova/blob/653515a45032811b6bc2f1d0fb651472005496ec/nova/api/openstack/compute/services.py#L279

But a resource provider still exists with that hostname, so trying to restart the compute service after that will fail because a provider already exists with that name but has a different UUID (which maybe makes this related to bug 1817833).