OpenStack Compute (nova)

Comment 5 for bug 1628606

Revision history for this message

Matthew Booth (mbooth-9) wrote on 2018-08-03:

I think this bug is pretty serious. Say we fail get a cinder error in driver.post_live_migration() (this specific example is taken from a customer bug):

ComputeManager._post_live_migration() does:

  ...
  self.driver.post_live_migration(ctxt, instance, block_device_info,
                                        migrate_data)
  ...
  self.compute_rpcapi.post_live_migration_at_destination(ctxt,
                    instance, block_migration, dest)

The above code runs on the source compute. We update instance.host to the destination in post_live_migration_at_destination. Therefore driver.post_live_migration() above fails, we never call post_live_migration_at_destination, and we never update instance.host to point to the destination.

Hostever, _post_live_migration is called via callback from the driver *after* migration has occurred. So at this point the VM is *actually running* on the destination, but Nova thinks it's still on the source. The instance will be in an error state, and a hard reboot at this point will cause it to start running again on the source, at which point it will be running on 2 compute hosts simultaneously.