Can't live-migrate after "round-trip" volume-upate

Bug #1691195 reported by Artom Lifshitz on 2017-05-16
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Artom Lifshitz
Newton
High
Artom Lifshitz
Ocata
High
Artom Lifshitz

Bug Description

Description
===========

If an instance has had an attached volume volume-updated twice in a "round-trip" - ie, volume-update $vol1 $vol2, then volume-update $vol2 $vol1 - it cannot be live-migrated.

Steps to reproduce
==================

1. Create two iscsi volumes.
   # cinder create --name test_vol1 --volume-type iscsi 1
   # cinder create --name test_vol2 --volume-type iscsi 1

   (--volume-type iscsi isn't mandatory - in my devstack environment there
   is no iscsi volume-type, but that doesn't stop me from reproducing this
   bug)

2. Boot an instance.
   # nova boot --flavor 1 --image $imageid --nic net-id=$netid testvm1

3. Attach one iscsi volume to testvm1.
   # nova volume-attach testvm1 $test_vol1

4. Do volume-update to swap volume to 2nd one. (1st time volume-update)
   # nova volume-update testvm1 $test_vol1 $test_vol2

5. Do volume-update again to swap volume back to the 1st one. (2nd time
   volume-update)
   # nova volume-update testvm1 $test_vol2 $test_vol1

6. Live migrate instance to other compute node.
   # nova live-migration testvm1

Expected result
===============

Live migration succeeds.

Actual result
=============

Live migration fails with:

Apr 27 10:32:14 multi9h-3 nova-compute: File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1939, in migrateToURI3

Apr 27 10:32:14 multi9h-3 nova-compute: if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self)

Apr 27 10:32:14 multi9h-3 nova-compute: libvirtError: missing source information for device vdb

Environment
===========

This has been originally reported [1] in Red Hat OSP 9 (Mitaka) and is reproducible on devstack master as well.

Additional information
======================

There are two things going on here.

1. When performing the volume-update, the libvirt driver calls virDomainBlockRebase without the VIR_DOMAIN_BLOCK_REBASE_COPY_DEV flag [2], meaning the device XML changes from <source dev=/dev/isci/lun> to <source file=/dev/iscsi/lun>. This is a problem because /dev/iscsi/lun isn't a regular file, and causes the above error, except you need the "round-trip" volume-update to trigger it. Why? Because:

2. The serial number isn't updated when doing volume-update, and there's a bit of live-migration code [3] that checks for serial numbers before updating the XML. If the serial numbers don't match, the XML isn't updated, and libvirt doesn't notice that /dev/iscsi/lun isn't a file.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1446446
[2] http://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockRebase
[3] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/migration.py#L158

Fix proposed to branch: master
Review: https://review.openstack.org/465205

Changed in nova:
assignee: nobody → Artom Lifshitz (notartom)
status: New → In Progress
description: updated
description: updated
Matt Riedemann (mriedem) on 2017-06-05
Changed in nova:
importance: Undecided → High
tags: added: libvirt live-migration volumes

Reviewed: https://review.openstack.org/465205
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a8a4a8ea7b8e6c85273ddb02d34d6af1740b183f
Submitter: Jenkins
Branch: master

commit a8a4a8ea7b8e6c85273ddb02d34d6af1740b183f
Author: Artom Lifshitz <email address hidden>
Date: Wed May 17 00:22:34 2017 +0000

    Use VIR_DOMAIN_BLOCK_REBASE_COPY_DEV when rebasing

    Previously, in swap_volume, the VIR_DOMAIN_BLOCK_REBASE_COPY_DEV flag
    was not passed to virDomainBlockRebase. In the case of iSCSI-backed
    disks, this caused the XML to change from <source dev=/dev/iscsi/lun>
    to <source file=/dev/iscsi/lun>. This was a problem because
    /dev/iscsi/lun is not a regular file. This patch passes the
    VIR_DOMAIN_BLOCK_REBASE_COPY_DEV flag to virDomainBlockRebase, causing
    the correct <source dev=/dev/iscsi/lun> to be generated upon
    volume-update.

    Change-Id: I868a0dae0baf8cded9c7c5807ea63ffc5eec0c5e
    Closes-bug: 1691195

Changed in nova:
status: In Progress → Fix Released

This issue was fixed in the openstack/nova 16.0.0.0b2 development milestone.

Reviewed: https://review.openstack.org/471353
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ef853e038d9a3e9bfe02287c7c01c80b7a022ed6
Submitter: Jenkins
Branch: stable/ocata

commit ef853e038d9a3e9bfe02287c7c01c80b7a022ed6
Author: Artom Lifshitz <email address hidden>
Date: Wed May 17 00:22:34 2017 +0000

    Use VIR_DOMAIN_BLOCK_REBASE_COPY_DEV when rebasing

    Previously, in swap_volume, the VIR_DOMAIN_BLOCK_REBASE_COPY_DEV flag
    was not passed to virDomainBlockRebase. In the case of iSCSI-backed
    disks, this caused the XML to change from <source dev=/dev/iscsi/lun>
    to <source file=/dev/iscsi/lun>. This was a problem because
    /dev/iscsi/lun is not a regular file. This patch passes the
    VIR_DOMAIN_BLOCK_REBASE_COPY_DEV flag to virDomainBlockRebase, causing
    the correct <source dev=/dev/iscsi/lun> to be generated upon
    volume-update.

    Conflicts:
          nova/tests/unit/virt/libvirt/test_driver.py
          nova/virt/libvirt/driver.py

    NOTE(mriedem): The conflicts are due to
    fbcf8d673342570a1518dbf8d88f289c2c39cd30 needing to translate
    the exception message in driver.py and for passing instance
    to disconnect_volume in test_driver, which was added in Pike with
    b66b7d4f9d63e7f45ebfc033696d06c632a33ff1.

    Change-Id: I868a0dae0baf8cded9c7c5807ea63ffc5eec0c5e
    Closes-bug: 1691195
    (cherry picked from commit a8a4a8ea7b8e6c85273ddb02d34d6af1740b183f)

This issue was fixed in the openstack/nova 15.0.6 release.

Reviewed: https://review.openstack.org/471356
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8b9aa3e00101b1258f9b3dedca66aac81f655778
Submitter: Zuul
Branch: stable/newton

commit 8b9aa3e00101b1258f9b3dedca66aac81f655778
Author: Artom Lifshitz <email address hidden>
Date: Wed May 17 00:22:34 2017 +0000

    Use VIR_DOMAIN_BLOCK_REBASE_COPY_DEV when rebasing

    Previously, in swap_volume, the VIR_DOMAIN_BLOCK_REBASE_COPY_DEV flag
    was not passed to virDomainBlockRebase. In the case of iSCSI-backed
    disks, this caused the XML to change from <source dev=/dev/iscsi/lun>
    to <source file=/dev/iscsi/lun>. This was a problem because
    /dev/iscsi/lun is not a regular file. This patch passes the
    VIR_DOMAIN_BLOCK_REBASE_COPY_DEV flag to virDomainBlockRebase, causing
    the correct <source dev=/dev/iscsi/lun> to be generated upon
    volume-update.

    Conflicts:
          nova/tests/unit/virt/libvirt/test_driver.py
          nova/virt/libvirt/driver.py
          nova/virt/libvirt/guest.py

    NOTE(mriedem): The conflicts are due to
    fbcf8d673342570a1518dbf8d88f289c2c39cd30 needing to translate
    the exception message in driver.py and for passing instance
    to disconnect_volume in test_driver, which was added in Pike with
    b66b7d4f9d63e7f45ebfc033696d06c632a33ff1.

    NOTE(artom): In stable/newton, the conflict in guest.py is due to a
    different docstring for the rebase() method.

    NOTE(artom): This backport squashes
    5d5c5a5d92458d530115b3d3b8b381524b1a3a90 to guard againt older libvirt
    versions that don't have the VIR_DOMAIN_BLOCK_REBASE_COPY_DEV flag.

    Change-Id: I868a0dae0baf8cded9c7c5807ea63ffc5eec0c5e
    Closes-bug: 1691195
    (cherry picked from commit a8a4a8ea7b8e6c85273ddb02d34d6af1740b183f)
    (cherry picked from commit ef853e038d9a3e9bfe02287c7c01c80b7a022ed6)

This issue was fixed in the openstack/nova 14.0.9 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers