Can't live-migrate after "round-trip" volume-upate

Bug #1691195 reported by Artom Lifshitz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Artom Lifshitz
Newton
Fix Committed
High
Artom Lifshitz
Ocata
Fix Committed
High
Artom Lifshitz

Bug Description

Description
===========

If an instance has had an attached volume volume-updated twice in a "round-trip" - ie, volume-update $vol1 $vol2, then volume-update $vol2 $vol1 - it cannot be live-migrated.

Steps to reproduce
==================

1. Create two iscsi volumes.
   # cinder create --name test_vol1 --volume-type iscsi 1
   # cinder create --name test_vol2 --volume-type iscsi 1

   (--volume-type iscsi isn't mandatory - in my devstack environment there
   is no iscsi volume-type, but that doesn't stop me from reproducing this
   bug)

2. Boot an instance.
   # nova boot --flavor 1 --image $imageid --nic net-id=$netid testvm1

3. Attach one iscsi volume to testvm1.
   # nova volume-attach testvm1 $test_vol1

4. Do volume-update to swap volume to 2nd one. (1st time volume-update)
   # nova volume-update testvm1 $test_vol1 $test_vol2

5. Do volume-update again to swap volume back to the 1st one. (2nd time
   volume-update)
   # nova volume-update testvm1 $test_vol2 $test_vol1

6. Live migrate instance to other compute node.
   # nova live-migration testvm1

Expected result
===============

Live migration succeeds.

Actual result
=============

Live migration fails with:

Apr 27 10:32:14 multi9h-3 nova-compute: File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1939, in migrateToURI3

Apr 27 10:32:14 multi9h-3 nova-compute: if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self)

Apr 27 10:32:14 multi9h-3 nova-compute: libvirtError: missing source information for device vdb

Environment
===========

This has been originally reported [1] in Red Hat OSP 9 (Mitaka) and is reproducible on devstack master as well.

Additional information
======================

There are two things going on here.

1. When performing the volume-update, the libvirt driver calls virDomainBlockRebase without the VIR_DOMAIN_BLOCK_REBASE_COPY_DEV flag [2], meaning the device XML changes from <source dev=/dev/isci/lun> to <source file=/dev/iscsi/lun>. This is a problem because /dev/iscsi/lun isn't a regular file, and causes the above error, except you need the "round-trip" volume-update to trigger it. Why? Because:

2. The serial number isn't updated when doing volume-update, and there's a bit of live-migration code [3] that checks for serial numbers before updating the XML. If the serial numbers don't match, the XML isn't updated, and libvirt doesn't notice that /dev/iscsi/lun isn't a file.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1446446
[2] http://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockRebase
[3] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/migration.py#L158

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/465205

Changed in nova:
assignee: nobody → Artom Lifshitz (notartom)
status: New → In Progress
description: updated
description: updated
Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → High
tags: added: libvirt live-migration volumes
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/471353

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/471356

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/465205
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a8a4a8ea7b8e6c85273ddb02d34d6af1740b183f
Submitter: Jenkins
Branch: master

commit a8a4a8ea7b8e6c85273ddb02d34d6af1740b183f
Author: Artom Lifshitz <email address hidden>
Date: Wed May 17 00:22:34 2017 +0000

    Use VIR_DOMAIN_BLOCK_REBASE_COPY_DEV when rebasing

    Previously, in swap_volume, the VIR_DOMAIN_BLOCK_REBASE_COPY_DEV flag
    was not passed to virDomainBlockRebase. In the case of iSCSI-backed
    disks, this caused the XML to change from <source dev=/dev/iscsi/lun>
    to <source file=/dev/iscsi/lun>. This was a problem because
    /dev/iscsi/lun is not a regular file. This patch passes the
    VIR_DOMAIN_BLOCK_REBASE_COPY_DEV flag to virDomainBlockRebase, causing
    the correct <source dev=/dev/iscsi/lun> to be generated upon
    volume-update.

    Change-Id: I868a0dae0baf8cded9c7c5807ea63ffc5eec0c5e
    Closes-bug: 1691195

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.0.0.0b2

This issue was fixed in the openstack/nova 16.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/471353
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ef853e038d9a3e9bfe02287c7c01c80b7a022ed6
Submitter: Jenkins
Branch: stable/ocata

commit ef853e038d9a3e9bfe02287c7c01c80b7a022ed6
Author: Artom Lifshitz <email address hidden>
Date: Wed May 17 00:22:34 2017 +0000

    Use VIR_DOMAIN_BLOCK_REBASE_COPY_DEV when rebasing

    Previously, in swap_volume, the VIR_DOMAIN_BLOCK_REBASE_COPY_DEV flag
    was not passed to virDomainBlockRebase. In the case of iSCSI-backed
    disks, this caused the XML to change from <source dev=/dev/iscsi/lun>
    to <source file=/dev/iscsi/lun>. This was a problem because
    /dev/iscsi/lun is not a regular file. This patch passes the
    VIR_DOMAIN_BLOCK_REBASE_COPY_DEV flag to virDomainBlockRebase, causing
    the correct <source dev=/dev/iscsi/lun> to be generated upon
    volume-update.

    Conflicts:
          nova/tests/unit/virt/libvirt/test_driver.py
          nova/virt/libvirt/driver.py

    NOTE(mriedem): The conflicts are due to
    fbcf8d673342570a1518dbf8d88f289c2c39cd30 needing to translate
    the exception message in driver.py and for passing instance
    to disconnect_volume in test_driver, which was added in Pike with
    b66b7d4f9d63e7f45ebfc033696d06c632a33ff1.

    Change-Id: I868a0dae0baf8cded9c7c5807ea63ffc5eec0c5e
    Closes-bug: 1691195
    (cherry picked from commit a8a4a8ea7b8e6c85273ddb02d34d6af1740b183f)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.0.6

This issue was fixed in the openstack/nova 15.0.6 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/newton)

Reviewed: https://review.openstack.org/471356
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8b9aa3e00101b1258f9b3dedca66aac81f655778
Submitter: Zuul
Branch: stable/newton

commit 8b9aa3e00101b1258f9b3dedca66aac81f655778
Author: Artom Lifshitz <email address hidden>
Date: Wed May 17 00:22:34 2017 +0000

    Use VIR_DOMAIN_BLOCK_REBASE_COPY_DEV when rebasing

    Previously, in swap_volume, the VIR_DOMAIN_BLOCK_REBASE_COPY_DEV flag
    was not passed to virDomainBlockRebase. In the case of iSCSI-backed
    disks, this caused the XML to change from <source dev=/dev/iscsi/lun>
    to <source file=/dev/iscsi/lun>. This was a problem because
    /dev/iscsi/lun is not a regular file. This patch passes the
    VIR_DOMAIN_BLOCK_REBASE_COPY_DEV flag to virDomainBlockRebase, causing
    the correct <source dev=/dev/iscsi/lun> to be generated upon
    volume-update.

    Conflicts:
          nova/tests/unit/virt/libvirt/test_driver.py
          nova/virt/libvirt/driver.py
          nova/virt/libvirt/guest.py

    NOTE(mriedem): The conflicts are due to
    fbcf8d673342570a1518dbf8d88f289c2c39cd30 needing to translate
    the exception message in driver.py and for passing instance
    to disconnect_volume in test_driver, which was added in Pike with
    b66b7d4f9d63e7f45ebfc033696d06c632a33ff1.

    NOTE(artom): In stable/newton, the conflict in guest.py is due to a
    different docstring for the rebase() method.

    NOTE(artom): This backport squashes
    5d5c5a5d92458d530115b3d3b8b381524b1a3a90 to guard againt older libvirt
    versions that don't have the VIR_DOMAIN_BLOCK_REBASE_COPY_DEV flag.

    Change-Id: I868a0dae0baf8cded9c7c5807ea63ffc5eec0c5e
    Closes-bug: 1691195
    (cherry picked from commit a8a4a8ea7b8e6c85273ddb02d34d6af1740b183f)
    (cherry picked from commit ef853e038d9a3e9bfe02287c7c01c80b7a022ed6)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 14.0.9

This issue was fixed in the openstack/nova 14.0.9 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.