libvirt: Data corruptor live migrating BFV instance with config disk

Bug #1719362 reported by Matthew Booth
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Matthew Booth
Ocata
High
Matt Riedemann
Pike
High
Matthew Booth

Bug Description

When live migrating a BFV instance with a config disk, the API currently requires block migration to be specified due to the local storage requirement. This doesn't make sense on a number of levels.

Before calling migrateToURI3() in this case, the libvirt driver filters out all disks which it shouldn't migrate, which is both of them: the config drive because it's read-only and we already copied it with scp, and the root disk because it's a volume. It calls migrateToURI3() with an empty migrate_disks in params, and VIR_MIGRATE_NON_SHARED_INC in flags (because block-migration).

There's a quirk in the behaviour of the libvirt python bindings here: it doesn't distinguish between an empty migrate_disks list, and no migrate_disks list. Both use the default behaviour of "block migrate all writable disks". This will include the attached root volume. As the root volume is simultaneously attached to both ends of the migration, one of which is running guest, this a data corruptor.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/507202

Changed in nova:
assignee: nobody → Matthew Booth (mbooth-9)
status: New → In Progress
Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

Should we target this to the upstream libvirt Python bindings too?

Revision history for this message
Matthew Booth (mbooth-9) wrote :

Good question. I asked Dan Berrangé who was helping me debug this and he says not, as it can't be fixed. The correct behaviour here is not to pass the flag if we don't want to block migrate anything, which makes sense anyway.

melanie witt (melwitt)
tags: added: libvirt volumes
melanie witt (melwitt)
Changed in nova:
importance: Undecided → High
Revision history for this message
Curtis Collicutt (6-curtis) wrote :

This bug is pretty important in NFV use cases where virtual machine images have no capacity for using metadata other than config drive. We have several clouds deployed where this is an issue, and of course the potential for data corruption is there as well. There must be many users of OpenStack out there not knowing they have this particular problem and potentially still doing live migration.

Revision history for this message
Stacy Véronneau (stacy-veronneau) wrote :

Can't comment more on the importance of this being fixed. Curtis's (6-curtis) point and UCs makes this a high importance bug. If I could +2 Curti's comment I would.

Changed in nova:
assignee: Matthew Booth (mbooth-9) → Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → melanie witt (melwitt)
Changed in nova:
assignee: melanie witt (melwitt) → Matthew Booth (mbooth-9)
Changed in nova:
assignee: Matthew Booth (mbooth-9) → Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → Matthew Booth (mbooth-9)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/507202
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ea9bf5216be60fb3fa1704cf6257adca4e634cb1
Submitter: Zuul
Branch: master

commit ea9bf5216be60fb3fa1704cf6257adca4e634cb1
Author: Matthew Booth <email address hidden>
Date: Mon Sep 25 17:32:14 2017 +0100

    libvirt: Don't VIR_MIGRATE_NON_SHARED_INC without migrate_disks

    If we specify block migration, but there are no disks which actually
    require block migration we call libvirt's migrateToURI3() with
    VIR_MIGRATE_NON_SHARED_INC in flags and an empty migrate_disks in
    params. Libvirt interprets this to be the default block migration
    behaviour of "block migrate all writeable disks". However,
    migrate_disks may only be empty because we filtered attached volumes
    out of it, in which case libvirt will block migrate attached volumes.
    This is a data corruptor.

    This change addresses the issue at the point we call migrateToURI3().
    As we never want the default block migration behaviour, we can safely
    remove the flag if the list of disks to migrate is empty.

    Change-Id: I9b545ca8aa6dd7b41ddea2d333190c9fbed19bc1
    Resolves-bug: #1719362

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/519632

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/519636

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/519632
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2486f34ec49ffdf381912d7f0fd5199ce0dfa013
Submitter: Zuul
Branch: stable/pike

commit 2486f34ec49ffdf381912d7f0fd5199ce0dfa013
Author: Matthew Booth <email address hidden>
Date: Mon Sep 25 17:32:14 2017 +0100

    libvirt: Don't VIR_MIGRATE_NON_SHARED_INC without migrate_disks

    If we specify block migration, but there are no disks which actually
    require block migration we call libvirt's migrateToURI3() with
    VIR_MIGRATE_NON_SHARED_INC in flags and an empty migrate_disks in
    params. Libvirt interprets this to be the default block migration
    behaviour of "block migrate all writeable disks". However,
    migrate_disks may only be empty because we filtered attached volumes
    out of it, in which case libvirt will block migrate attached volumes.
    This is a data corruptor.

    This change addresses the issue at the point we call migrateToURI3().
    As we never want the default block migration behaviour, we can safely
    remove the flag if the list of disks to migrate is empty.

    (cherry picked from commit ea9bf5216be60fb3fa1704cf6257adca4e634cb1)

    nova/tests/unit/virt/libvirt/test_driver.py:
      Explicitly asserts byte string destination_xml in
      _test_live_migration_block_migration_flags. Not required in master
      due to change I85cd9a90.

    Change-Id: I9b545ca8aa6dd7b41ddea2d333190c9fbed19bc1
    Resolves-bug: #1719362

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/519636
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2cfdb9c056192ff38e44fd5f03c9ede33ed09bd9
Submitter: Zuul
Branch: stable/ocata

commit 2cfdb9c056192ff38e44fd5f03c9ede33ed09bd9
Author: Matthew Booth <email address hidden>
Date: Mon Sep 25 17:32:14 2017 +0100

    libvirt: Don't VIR_MIGRATE_NON_SHARED_INC without migrate_disks

    If we specify block migration, but there are no disks which actually
    require block migration we call libvirt's migrateToURI3() with
    VIR_MIGRATE_NON_SHARED_INC in flags and an empty migrate_disks in
    params. Libvirt interprets this to be the default block migration
    behaviour of "block migrate all writeable disks". However,
    migrate_disks may only be empty because we filtered attached volumes
    out of it, in which case libvirt will block migrate attached volumes.
    This is a data corruptor.

    This change addresses the issue at the point we call migrateToURI3().
    As we never want the default block migration behaviour, we can safely
    remove the flag if the list of disks to migrate is empty.

    (cherry picked from commit ea9bf5216be60fb3fa1704cf6257adca4e634cb1)

    nova/tests/unit/virt/libvirt/test_driver.py:
      Explicitly asserts byte string destination_xml in
      _test_live_migration_block_migration_flags. Not required in master
      due to change I85cd9a90.

    (cherry picked from commit 2486f34ec49ffdf381912d7f0fd5199ce0dfa013)

    Change-Id: I9b545ca8aa6dd7b41ddea2d333190c9fbed19bc1
    Resolves-bug: #1719362

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers