Live snapshot is corrupted (possibly race condition?)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Medium
|
Slawek Kaplonski |
Bug Description
We are using nova 2:12.0.
2015-12-31 01:40:33.304 16805 INFO nova.compute.
2015-12-31 01:40:33.410 16805 INFO nova.virt.
2015-12-31 01:40:34.964 16805 INFO nova.virt.
2015-12-31 01:40:37.029 16805 INFO nova.virt.
The entire operation completes in a couple of seconds, which is unexpected.
While testing, I added some sleep calls to the _live_snapshot function in virt/libvirt/
try:
# NOTE (rmk): blockRebase cannot be executed on persistent
# domains, so we need to temporarily undefine it.
# If any part of this block fails, the domain is
# re-defined regardless.
if guest.has_
# NOTE (rmk): Establish a temporary mirror of our root disk and
# issue an abort once we have a complete copy.
+ time.sleep(10.0)
while dev.wait_for_job():
- time.sleep(0.5)
+ time.sleep(5.0)
finally:
if require_quiesce:
And the resulting log (which indicates that it is sleeping for not just the initial 10 second call, but even more than that; this means wait_for_job is returning false immediately before applying the modification, but after the modification it is actually returning true after the initial sleep and seems to be performing correctly):
2015-12-31 01:42:12.438 21232 INFO nova.compute.
2015-12-31 01:42:12.670 21232 INFO nova.virt.
2015-12-31 01:43:02.411 21232 INFO nova.virt.
2015-12-31 01:44:12.893 21232 INFO nova.virt.
Since sleeping 10 seconds before polling wait_for_job seemed to resolve it, I think there may be a race condition where wait_for_job may be called before the job is fully initialized from the rebase call. I have not had a chance to explore that possibility further though.
description: | updated |
tags: | added: libvirt |
Changed in nova: | |
assignee: | nobody → Eli Qiao (taget-9) |
Changed in nova: | |
status: | Incomplete → New |
status: | New → Confirmed |
Changed in nova: | |
importance: | Undecided → Medium |
Do the libvirt logs show any errors? How about the domain logs for the instance that you're performing the live snapshot on?