Comment 2 for bug 1419785

Revision history for this message
Matthew Booth (mbooth-9) wrote :

Some context: this happens because _destroy_evacuated_instances in compute.manager does (lightly edited for clarity):

        local_instances = self._get_instances_on_driver(context, filters)
        for instance in local_instances:
            if instance.host != self.host:
                ...DESTROY...

The only instances which will be destroyed are the ones for which instance.host != self.host.

The meaning of self.host in this context appears to be 'hypervisor'. However, self.host is also a service endpoint. Historically there was a 1:1 relationship between these 2, but there are now a couple of drivers where this no longer makes sense.

I think the correct fix for this would be something like adding driver.get_hypervisor_id() which returns a driver-specific identifier for the hypervisor location. Instance.host would then be set to this value. HA nova instances would then ensure that this returned the same value for all Novas managing the same hypervisor.

However, that's a spec and a bunch of work, and this is a critical issue.

Note that there is no problem in the above code if the active and standby node have the same value of self.host. The immediate workaround would seem to be to configure the active and standby nodes accordingly. This would presumably assume simultaneous failover of dns/ip.

For this specific issue, I would prefer to see a solution which is able to detect this situation and refuse to start Nova.