I think this bug is pretty serious. Say we fail get a cinder error in driver.post_live_migration() (this specific example is taken from a customer bug):
The above code runs on the source compute. We update instance.host to the destination in post_live_migration_at_destination. Therefore driver.post_live_migration() above fails, we never call post_live_migration_at_destination, and we never update instance.host to point to the destination.
Hostever, _post_live_migration is called via callback from the driver *after* migration has occurred. So at this point the VM is *actually running* on the destination, but Nova thinks it's still on the source. The instance will be in an error state, and a hard reboot at this point will cause it to start running again on the source, at which point it will be running on 2 compute hosts simultaneously.
I think this bug is pretty serious. Say we fail get a cinder error in driver. post_live_ migration( ) (this specific example is taken from a customer bug):
ComputeManager. _post_live_ migration( ) does:
... driver. post_live_ migration( ctxt, instance, block_device_info,
migrate_ data) compute_ rpcapi. post_live_ migration_ at_destination( ctxt,
instance, block_migration, dest)
self.
...
self.
The above code runs on the source compute. We update instance.host to the destination in post_live_ migration_ at_destination. Therefore driver. post_live_ migration( ) above fails, we never call post_live_ migration_ at_destination, and we never update instance.host to point to the destination.
Hostever, _post_live_ migration is called via callback from the driver *after* migration has occurred. So at this point the VM is *actually running* on the destination, but Nova thinks it's still on the source. The instance will be in an error state, and a hard reboot at this point will cause it to start running again on the source, at which point it will be running on 2 compute hosts simultaneously.