The compute manager's virtapi allows the compute driver to wait for external events via wait_for_instance_event() method. The common use case is for a compute driver to wait for the vifs to be plugged by neutron before proceeding through the spawn. The pattern is also present in the libvirt driver. See libvirt driver.py -> _create_domain_and_network(). In there you'll see the use of the wait_for_instance_event context manager.
The flow for the events to come into Nova is through nova/api/openstack/compute/server_external_events.py. which eventually calls compute_api.external_instance_event() to dispatch the events. In external_instance_event() you'll see it's using instance.host to call compute_rpcapi.external_instance_event(). So the RPC message will go to whatever host is currently set. In the case of evacuate, at that point in time (while the new host is spawning the recreated VM) it's set to the original host. Which is down. So the compute driver that initiated the action and is waiting for the event will never get it.
The question was raised why libvirt doesn't suffer the same fate. I can't answer that authoritatively, but libvirt has a lot of conditions that have to be met before it'll wait for the event. Here's what it's currently checking before waiting for a plug vif event:
timeout = CONF.vif_plugging_timeout
if (self._conn_supports_start_paused and utils.is_neutron() and not vifs_already_plugged and power_on and timeout):
events = self._get_neutron_events(network_info)
else:
events = []
But it does seem (from reading the code) that if all those conditions are met and the operation is an evacuate, it too would fail. Though I have not tried it.
To point out the issue a little more.
The compute manager's virtapi allows the compute driver to wait for external events via wait_for_ instance_ event() method. The common use case is for a compute driver to wait for the vifs to be plugged by neutron before proceeding through the spawn. The pattern is also present in the libvirt driver. See libvirt driver.py -> _create_ domain_ and_network( ). In there you'll see the use of the wait_for_ instance_ event context manager.
The flow for the events to come into Nova is through nova/api/ openstack/ compute/ server_ external_ events. py. which eventually calls compute_ api.external_ instance_ event() to dispatch the events. In external_ instance_ event() you'll see it's using instance.host to call compute_ rpcapi. external_ instance_ event() . So the RPC message will go to whatever host is currently set. In the case of evacuate, at that point in time (while the new host is spawning the recreated VM) it's set to the original host. Which is down. So the compute driver that initiated the action and is waiting for the event will never get it.
The question was raised why libvirt doesn't suffer the same fate. I can't answer that authoritatively, but libvirt has a lot of conditions that have to be met before it'll wait for the event. Here's what it's currently checking before waiting for a plug vif event:
timeout = CONF.vif_ plugging_ timeout conn_supports_ start_paused and
utils. is_neutron( ) and not
vifs_ already_ plugged and power_on and timeout): neutron_ events( network_ info)
if (self._
events = self._get_
else:
events = []
But it does seem (from reading the code) that if all those conditions are met and the operation is an evacuate, it too would fail. Though I have not tried it.