libvirt: post_live_migration failures to disconnect volumes result in the rollback of live migrations
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Medium
|
Lee Yarwood |
Bug Description
Description
===========
At present any exceptions encountered during post_live_migration on the source after an instance has successfully migrated result in the overall failure of the migration and the instance being listed as running on the source while actually being on the destination.
Any such errors should be logged but otherwise ignored allowing the migration to complete and for the instance to continue to be tracked correctly.
Steps to reproduce
==================
- Live migrate an instance from host A to host B, ensuring post_live_migration fails.
Expected result
===============
Any failures on the source encountered by post_live_migration are logged but the overall migration still completes successfully.
Actual result
=============
The instance and overall migration are left in error states. Additionally the instance is reported as residing on the source host while actually running on the destination.
Environment
===========
1. Exact version of OpenStack you are running. See the following
list for all releases: http://
ba3147420c0a
2. Which hypervisor did you use?
(For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
What's the version of that?
Libvirt + KVM
2. Which storage type did you use?
(For example: Ceph, LVM, GPFS, ...)
What's the version of that?
N/A
3. Which networking type did you use?
(For example: nova-network, Neutron with OpenVSwitch, ...)
N/A
Changed in nova: | |
assignee: | Lee Yarwood (lyarwood) → Artom Lifshitz (notartom) |
Changed in nova: | |
assignee: | Artom Lifshitz (notartom) → Lee Yarwood (lyarwood) |
Not surprised about this since the _post_live_ migration method and the post_live_ migration_ at_destination that it calls are all huge and complicated. I've advocated for a long time now that we should be breaking down those giant methods into smaller parts so we can more correctly do error handling like this, but for a backportable fix we'd likely just need to handle the volume errors during post processing and refactor the code out later.