Block migrate with attached volumes copies volumes to themselves
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| OpenStack Compute (nova) |
High
|
Pawel Koniszewski | |||
| Juno |
Undecided
|
Unassigned | |||
| libvirt (Ubuntu) |
High
|
Unassigned | |||
| Trusty |
Undecided
|
Unassigned | |||
| Utopic |
Undecided
|
Unassigned | |||
| Vivid |
Undecided
|
Unassigned | |||
| Wily |
High
|
Unassigned | |||
| nova (Ubuntu) |
High
|
Unassigned | |||
| Trusty |
Medium
|
Unassigned | |||
| Utopic |
Undecided
|
Unassigned | |||
| Vivid |
High
|
Unassigned | |||
| Wily |
High
|
Unassigned | |||
Bug Description
When an instance with attached Cinder volumes is block migrated, the Cinder volumes are block migrated along with it. If they exist on shared storage, then they end up being copied, over the network, from themselves to themselves. At a minimum, this is horribly slow and de-sparses a sparse volume; at worst, this could cause massive data corruption.
More details at http://
| Changed in nova: | |
| assignee: | nobody → Chris St. Pierre (stpierre) |
| status: | New → In Progress |
| OpenStack Infra (hudson-openstack) wrote : | #2 |
Fix proposed to branch: master
Review: https:/
Change abandoned by Chris St. Pierre (<email address hidden>) on branch: master
Review: https:/
| Dr. Jens Harbott (j-harbott) wrote : | #4 |
Instead of disabling live migration in this case, as proposed by your patch, it may be an option to set the volumes on shared storage as "shareable" in the libvirt definition. We have been using that approach for our RBD backed volumes for some months now quite successfully, see https:/
We did some basic performance comparison and there does not seem to be any major impact, though this may need some further analysis.
| Chris St. Pierre (stpierre) wrote : | #5 |
I'd still be hesitant about that since Berrangé addressed that in his post to the ML: "Even that distinction [sharable vs. exclusive] is somewhat dubious and so not reliably what you would want."
I really think that at this point the important thing is to ensure that we don't copy volumes onto themselves over the network. Once we've removed the opportunity for extremely slow data corruption, then we can consider optional/possible ways to handle live migrations with volumes attached. But I think that we can demonstrate that, for now at least, the only solution that will work for everyone using libvirt is to disable these live migrations entirely.
| Loganathan Parthipan (parthipan) wrote : | #6 |
The proposed solution seems to block not just libvirt but all other hypervisors from being able to live-migrate with volumes. I feel that the solution has to be in the hypervisor/volume driver space.
I suggest a flag that enables your patch by default but gives people an opportunity to override if desired.
| tags: | added: libvirt |
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: master
commit d667b6a63e80b2f
Author: Chris St. Pierre <email address hidden>
Date: Wed Dec 3 16:16:34 2014 -0600
libvirt: Fail when live block migrating instance with volumes
This raises an exception when attempting to live block migrate (nova
live-migration --block-migrate) an instance with attached volumes.
libvirt copies these volumes from themselves to themselves. At a
minimum, this is horribly slow and de-sparses a sparse volume; at
worst, this could cause massive data corruption.
Closes-Bug: 1398999
Change-Id: Ibcd423976bb9fe
| Changed in nova: | |
| status: | In Progress → Fix Committed |
Related fix proposed to branch: master
Review: https:/
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: master
commit a390a2f257402d6
Author: Daniel P. Berrange <email address hidden>
Date: Tue Feb 17 17:10:49 2015 +0000
libvirt: switch LibvirtConnTestCase back to NoDBTestCase
The following commit changed LibvirtConnTestCase to inherit
from TestCase
commit d667b6a63e80b2f
Author: Chris St. Pierre <email address hidden>
Date: Wed Dec 3 16:16:34 2014 -0600
libvirt: Fail when live block migrating instance with volumes
This caused database setup to be performed once more, doubling
the test execution time.
Related-bug: #1398999
Change-Id: Ibad5bf4704a424
| Joe Gordon (jogo) wrote : | #10 |
I think there is a valid case for doing block migrate with a cinder volume attached to an instance:
* Cloud isn't using a shared filesystem for ephemeral storage
* Instance is booted from an image, and a volume is attached afterwards. An admin wants to take the box the instance is running on offline for maintenance with a minimal impact to the instances running on it.
The 'fix' was a a workaround not not an actual fix. It sounds like a fix is needed in libvirt first.
http://
| Changed in nova: | |
| status: | Fix Committed → Confirmed |
| Chris Friesen (cbf123) wrote : | #11 |
Would this also affect an instance that is boot-from-volume but where the instance files are on local storage? Or do we even support that scenario?
Fix proposed to branch: stable/juno
Review: https:/
| Pavel Boldin (pboldin) wrote : | #13 |
Neither `libvirt' nor `qemu' copy block devices marked as `shared'. It is either nova misbehaviour not to marking shared block devices as such or libvirt bug forgetting about such a mark.
| Dr. Jens Harbott (j-harbott) wrote : | #14 |
@Pavel: The flag is called "shareable" and there was some discussion that ended up in some ppl claiming that was misusing this flag. We do run pretty well with a patch setting that flag in our local setup though (see comment #4), targeted only at the ceph/rbd case.
| Jacek Nykis (jacekn) wrote : | #15 |
Is there a bug we can track where root cause is being worked on?
| Jacek Nykis (jacekn) wrote : | #16 |
Sorry for 2nd comment. Will you update icehouse as well?
| Dr. Jens Harbott (j-harbott) wrote : | #17 |
https:/
Icehouse is pretty near to EOL and I don't think that this issue will be deemed critical enough for a backport even to Juno.
| Jacek Nykis (jacekn) wrote : | #18 |
The ubuntu wiki says icehouse will be supported for 4 more years:
https:/
If there is a chance of data loss I think it's completely justified to have the workaround backported to LTS
| Pavel Boldin (pboldin) wrote : | #19 |
@DrJens, I'm already working on implementation of that bug.
I have few tests to be done before sending the patchset to the maillist for review.
Yet, there is a problem that NBD tunnelled migration is not supported.
| Jacek Nykis (jacekn) wrote : | #20 |
I raised LP1449096 asking for Ubuntu nova package to get the workaround
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: stable/juno
commit 98834ab9f745d53
Author: Chris St. Pierre <email address hidden>
Date: Wed Dec 3 16:16:34 2014 -0600
libvirt: Fail when live block migrating instance with volumes
This raises an exception when attempting to live block migrate (nova
live-migration --block-migrate) an instance with attached volumes.
libvirt copies these volumes from themselves to themselves. At a
minimum, this is horribly slow and de-sparses a sparse volume; at
worst, this could cause massive data corruption.
(cherry picked from commit d667b6a63e80b2f
Closes-Bug: 1398999
Change-Id: Ibcd423976bb9fe
| tags: | added: in-stable-juno |
| Launchpad Janitor (janitor) wrote : | #22 |
Status changed to 'Confirmed' because the bug affects multiple users.
| Changed in nova (Ubuntu): | |
| status: | New → Confirmed |
| Changed in nova (Ubuntu Vivid): | |
| status: | New → Triaged |
| Changed in nova (Ubuntu Wily): | |
| status: | Confirmed → Triaged |
| Changed in nova (Ubuntu Utopic): | |
| status: | New → Triaged |
| Changed in nova (Ubuntu Trusty): | |
| status: | New → Triaged |
| tags: | added: live-migrate |
| Pavel Boldin (pboldin) wrote : | #23 |
The libvirt code providing selective block migration in case of NBD migration (non-tunnelled one) have been merged: https:/
How can we progress on this issue?
| Tony Breeds (o-tony) wrote : | #24 |
@pboldin Thanks so much for doing that work.
I think we can now check the libvirt version and only raise the exception if libvirt < 1.2.17
| Changed in nova (Ubuntu Trusty): | |
| importance: | Undecided → High |
| Changed in nova (Ubuntu Wily): | |
| importance: | Undecided → High |
| Changed in nova (Ubuntu Vivid): | |
| importance: | Undecided → High |
| Changed in nova (Ubuntu Utopic): | |
| status: | Triaged → Won't Fix |
| Changed in libvirt (Ubuntu Utopic): | |
| status: | New → Won't Fix |
| Serge Hallyn (serge-hallyn) wrote : | #25 |
Looks like the patches to fix this (at
https:/
1.2.17, but not in 1.2.16 which is currently in wily.
| Changed in libvirt (Ubuntu Wily): | |
| assignee: | nobody → Serge Hallyn (serge-hallyn) |
| importance: | Undecided → High |
| status: | New → In Progress |
| Launchpad Janitor (janitor) wrote : | #26 |
This bug was fixed in the package libvirt - 1.2.16-2ubuntu9
---------------
libvirt (1.2.16-2ubuntu9) wily; urgency=medium
* Add upstream patches implementing a '--migrate-disks' option to virsh
migrate to specify block devices to migrate. (LP: #1398999)
-- Serge Hallyn <email address hidden> Fri, 04 Sep 2015 09:29:52 -0500
| Changed in libvirt (Ubuntu Wily): | |
| status: | In Progress → Fix Released |
| Bartosz Fic (bartosz-fic) wrote : | #27 |
I've tested this libvirt fix on simple 1 controller and 2 compute nodes multinode devstack setup.
Both compute nodes have libvirt in version 1.2.16.
After removing check which is in this patch https:/
block migration of vm with attached volume.
However, the same instance booted from image without any volume attached is successfully block migrated.
| Serge Hallyn (serge-hallyn) wrote : | #28 |
@bartosz-fic
So the libvirt bug for wily should still be marked as not fix released?
You said you are don "1.2.16" - to be sure, were you on 1.2.16-2ubuntu9 or later?
If so, do you have any idea which patches are still missing? The upstream patchset which was supposd to fix this was included with that release, so I wonder whether the bug is actually still present upstream.
| Dr. Jens Harbott (j-harbott) wrote : | #29 |
I think there is some confusion here. As I understand it, the part that was fixed in libvirt was changing the API so that now it is possible to define a subset of block devices to be copied during migration. Now to fix the original issue, another patch in nova will be needed, that uses this extended API to avoid copying shared block devices to itself.
| Serge Hallyn (serge-hallyn) wrote : | #30 |
Ah right - thanks.
Fix proposed to branch: master
Review: https:/
| Changed in nova: | |
| assignee: | Chris St. Pierre (stpierre) → Bartosz Fic (bartosz-fic) |
| status: | Confirmed → In Progress |
| Launchpad Janitor (janitor) wrote : | #32 |
Status changed to 'Confirmed' because the bug affects multiple users.
| Changed in libvirt (Ubuntu Trusty): | |
| status: | New → Confirmed |
| Changed in libvirt (Ubuntu Vivid): | |
| status: | New → Confirmed |
| Changed in nova: | |
| importance: | Undecided → High |
| Bartosz Fic (bartosz-fic) wrote : | #34 |
Selective block device migration feature was backported to libvirt 1.2.16 for ubuntu willy.
This patch provides block live migration of vm booted from image with attached devices on libvirt 1.2.16.
The attachment "Patch for Ubuntu willy with libvirt 1.2.16" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.
[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]
| tags: | added: patch |
| Changed in nova: | |
| assignee: | Bartosz Fic (bartosz-fic) → Pawel Koniszewski (pawel-koniszewski) |
| tags: |
added: live-migration removed: live-migrate |
| Changed in libvirt (Ubuntu): | |
| assignee: | Serge Hallyn (serge-hallyn) → nobody |
Related fix proposed to branch: master
Review: https:/
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: master
commit f0d5fc61916f412
Author: Pawel Koniszewski <email address hidden>
Date: Fri Dec 11 03:28:50 2015 +0100
Get list of disks to copy early to avoid multiple DB hits
To support selective block migration we need to read block devices
from nova block device mappings instead of libvirt block info.
It means that in current implementation we would call
_live_
live_
To avoid that this change gets disk paths early and pass them as
and additional paremeter to live migration monitor.
Change-Id: Ic894cfc7374ba0
Related-bug: #1398999
| Changed in libvirt (Ubuntu Wily): | |
| assignee: | Serge Hallyn (serge-hallyn) → nobody |
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: master
commit 23fd0389f0e23e7
Author: Pawel Koniszewski <email address hidden>
Date: Wed Feb 10 13:09:44 2016 +0100
Allow block live migration of an instance with attached volumes
Since libvirt 1.2.17 it is possible to select which block devices
should be migrated to destination host. Block devices that are not
provided will not be migrated. It means that it is possible to
exclude volumes from block migration and therefore prevent volumes
from being copied to themselves.
This patch implements new check of libvirt version. If version is
higher or equal to 1.2.17 it is possible to block live migrate vm
with attached volumes.
Co-Authored-By: Bartosz Fic <email address hidden>
Change-Id: I8fcc3ef3cb5d9f
Closes-Bug: #1398999
Partially implements: blueprint block-live-
| Changed in nova: | |
| status: | In Progress → Fix Released |
This issue was fixed in the openstack/nova 13.0.0.0b3 development milestone.
| Changed in nova (Ubuntu Vivid): | |
| status: | Triaged → Won't Fix |
| Changed in nova (Ubuntu Wily): | |
| status: | Triaged → Won't Fix |
Related fix proposed to branch: master
Review: https:/
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: master
commit 1032c79238e8725
Author: Matt Riedemann <email address hidden>
Date: Mon Apr 24 09:54:21 2017 -0400
Enable test_iscsi_volume in live migration job
The block_migrate_
the libvirt driver doesn't support live migration with an attached
volume because of bug 1398999 where volumes live on a network share
like RBD. However, I8fcc3ef3cb5d9f
nova says that this is possible with libvirt >= 1.2.17. Since we are
using libvirt 2.5.0 from the Ubuntu Cloud Archive on Xenial nodes
now, we should be able to enable this test.
Change-Id: I7d7a708b231070
Related-Bug: #1398999
Related fix proposed to branch: master
Review: https:/
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: master
commit 1328a50e2cd493f
Author: Matt Riedemann <email address hidden>
Date: Thu Sep 14 17:30:18 2017 +0000
Revert "Enable test_iscsi_volume in live migration job"
This reverts commit 1032c79238e8725
This wasn't actually ready to merge, and now that it has
we're seeing a spike in failures of test_iscsi_volume.
Change-Id: I74649dd63ef82a
Related-Bug: #1398999
| Changed in libvirt (Ubuntu Vivid): | |
| status: | Confirmed → Won't Fix |
| Changed in nova (Ubuntu Trusty): | |
| importance: | High → Medium |


Fix proposed to branch: master /review. openstack. org/139085
Review: https:/