Evacuation fails if the source host returns while the migration is still in progress
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Medium
|
Lee Yarwood | ||
Pike |
Fix Committed
|
Medium
|
Lee Yarwood | ||
Queens |
Fix Committed
|
Medium
|
Lee Yarwood |
Bug Description
Description
===========
If the migration is in a 'pre-migrating' state this can result in the source compute manager not removing the evacuating instances in question during _destroy_
More importantly the source host returning online early allows _init_instance to set instance.status to ERROR and instance.task_state to None thanks to the following failed rebuild logic :
As a result the in-progress rebuild will fail when it attempts to save the instance while expecting a certain task_state :
This issue was originally reported downstream while testing an instance high-availability feature that uses a mixture of Pacemaker and instance evacuation to keep instances online :
Nova reports overcloud instance in error state after failed double compute failover instance-ha evacuation
https:/
This report includes an example UnexpectedTaskS
2018-04-17 11:11:12.999 1 ERROR nova.compute.
UnexpectedTaskS
The rally based tests for this feature just happen to use the `b` sysrq-trigger that immediately reboots the host allowing them to recover just in time to hit this.
Steps to reproduce
==================
- Evacuate an instance
- Restart the source compute service before the instance is fully rebuilt
Expected result
===============
The source compute removes the instance and does not attempt to update the instance or task state.
Actual result
=============
The source compute doesn't attempt to remove the instance and attempts to update the instance and task state before the rebuild is complete.
Environment
===========
1. Exact version of OpenStack you are running. See the following
88adde8bba39
2. Which hypervisor did you use?
(For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
What's the version of that?
Libvirt + KVM
2. Which storage type did you use?
(For example: Ceph, LVM, GPFS, ...)
What's the version of that?
Local, yet to test with shared storage.
3. Which networking type did you use?
(For example: nova-network, Neutron with OpenVSwitch, ...)
N/A
Logs & Configs
==============
Changed in nova: | |
assignee: | Lee Yarwood (lyarwood) → Matt Riedemann (mriedem) |
Changed in nova: | |
assignee: | Matt Riedemann (mriedem) → Lee Yarwood (lyarwood) |
Changed in nova: | |
importance: | Undecided → Medium |
Related fix proposed to branch: master /review. openstack. org/562072
Review: https:/