libvirt migrate/resize on shared storage can cause data loss
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
High
|
Dan Smith | ||
Grizzly |
Fix Released
|
High
|
Rafi Khardalian |
Bug Description
When using shared storage across hypervisors, libvirt driver resize/migrate operations can result in a loss of instance data. This is happening because many of the operations to create a copy of the instance are done within a try/except block. Thus, if any operations fail, you're into the exception which does the following:
=== code ===
except Exception:
with excutils.
def _cleanup_
"""Used only for cleanup in case migrate_
try:
if os.path.
except Exception:
pass
=== end ===
It doesn't take looking at this code for long to see why this is going to be a problem with shared storage. In effect, the last ssh operation in the block above is going to blow away the original copy of the instance directory.
The issue can be easily reproduced by issuing a resize of an instance with a large root disk. In the middle of the resize, kill the ssh process created from the following call (https:/
Changed in nova: | |
assignee: | Rafi Khardalian (rkhardalian) → Dan Smith (danms) |
Changed in nova: | |
milestone: | none → havana-1 |
status: | Fix Committed → Fix Released |
tags: | added: grizzly-backport-potential |
tags: | removed: grizzly-backport-potential in-stable-grizzly |
Changed in nova: | |
importance: | Undecided → High |
Changed in nova: | |
milestone: | havana-1 → 2013.2 |
I've got a patch ready to be submitted.