Comment 4 for bug 1952745

Revision history for this message
Artom Lifshitz (notartom) wrote :

I suspect this is a valid bug.

When nova-compute starts up, it looks for migration records with type 'evacuation', status 'done', and itself as the source host. It then destroys the associated libvirt instances from the hypervisor, and sets the migration record to 'completed' to avoid destroying them again on subsequent startups.

This is all well and good if it's the original compute host that comes back after being evacuated, but what if it's a brand new compute host with the same host name?

It'll find the 'done' evacuations, look for the associated libvirt instances to destroy on the hypervisor, not find any because it's a brand new compute, and I suspect at this point it will not set the migration record to 'completed', though code examination doesn't support this [1] unfortunately.

Assuming I'm correct, if previously-evacuated instances are now migrated back to the new (but with the old hostname) compute, and the nova-compute service is restarted, it'll pick up those 'done' migration records and destroy the libvirt instances.

I'm trying to reproduce this with a functional test, but not having any luck (unrelated issues) so far.

[1] https://opendev.org/openstack/nova/src/branch/master/nova/compute/manager.py#L837-L873