Restarting destination compute manager during live-migration can cause instance data loss
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
High
|
David McNally | ||
Icehouse |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
During compute manager startup init_host is called. One of the functions there is to delete instance data that doesn't belong to this host ie. _destroy_
Suppose a live-migration is in progress and the destination compute manager is restarted, it will find the migrating instance as not belonging to the host and destroy it. This can result in two outomes:
1. If live-migration is in progress, then the source hypervisor would hang, so a rollback is possible to trigger by killing the job.
2. However, if live-migration is completed and the post-live-
014-05-08 20:42:33.058 16724 WARNING nova.virt.
2014-05-08 20:43:33.370 16724 WARNING nova.virt.
Steps to reproduce:
1. Start live-migration
2. Wait for pre-live-migration to define the destination VM
3. Restart destination compute manager
To see what happens for case 2 (Live-migration having completed), put a breakpoint in init_host and delay till instance is running on the destination and then continue the nova-compute. In this case you'll end up with the instance directory like this:
ls -l 06ddbe13-
total 8
-rw-r--r-- 1 root root 89 May 8 19:59 disk.info
-rw-r--r-- 1 root root 1548 May 8 19:59 libvirt.xml
I verified this in a tripleo devtest environment.
tags: | added: compute |
Changed in nova: | |
assignee: | nobody → David McNally (dave-mcnally) |
Changed in nova: | |
importance: | Undecided → High |
tags: | added: icehouse-backport-potential |
Changed in nova: | |
milestone: | none → juno-2 |
status: | Fix Committed → Fix Released |
Changed in nova: | |
milestone: | juno-2 → 2014.2 |
Fix proposed to branch: master /review. openstack. org/93903
Review: https:/