evacuate on ceph backed volume fails

Bug #1249319 reported by Blane Bramble
110
This bug affects 16 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Feilong Wang
Icehouse
Fix Released
Undecided
Unassigned
Juno
Fix Released
Undecided
Unassigned

Bug Description

When using nova evacuate to move an instance from one compute host to another, the command silently fails. The issue seems to be that the rebuild process builds an incorrect libvirt.xml file that no longer correctly references the ceph volume.

Specifically under the <disk> section I see:

<source protocol="rbd" name="volumes/instance-00000004_disk">

where in the original libvirt.xml the file was:

<source protocol="rbd" name="volumes/volume-9e1a7835-b780-495c-a88a-4558be784dde">

Revision history for this message
Blane Bramble (blane) wrote :

The problem seems to be as follows:

During the rebuild prep_block_device calls _prep_block_device - this in turn calls DriverVolumeBlockDevice with the existing bdms as the argument - this bdms does not seem to have source_type or destination_type set, and so the call fails.

Revision history for this message
Blane Bramble (blane) wrote :

As a work-around, changing the relevant block_device_mapping_get_all_by_instance call to not use legacy mode seems to work:

Revision history for this message
Matt Thompson (mattt416) wrote :

I see the same thing on Havana -- Blane, which version are you running?

Revision history for this message
Blane Bramble (blane) wrote :

Hi Matt, yes this is with Havana.

Matt Riedemann (mriedem)
tags: added: libvirt volumes
tags: added: compute
removed: libvirt
melanie witt (melwitt)
Changed in nova:
importance: Undecided → High
status: New → Confirmed
Changed in nova:
assignee: nobody → Andres Isaac Benavides (andres-i-benavides)
Revision history for this message
Bart Wensley (bartwensley) wrote :

The same problem occurs when evacuating an instance on a volume with an NFS backend - the libvirt.xml file on the destination compute host is built incorrectly and references a disk image instead of the cinder volume. Testing done in Havana release.

I applied the patch Blane provided and that seems to fix the problem.

Revision history for this message
Andres Isaac Benavides (andres-i-benavides) wrote :

The patch that Blane provided is working ok with Ceph Backend also.

Revision history for this message
Andres Isaac Benavides (andres-i-benavides) wrote :

Blane just a question, Could we this patch as a solution for this bug?

Revision history for this message
Blane Bramble (blane) wrote :

Hi Andres, of course.

Revision history for this message
Bart Wensley (bartwensley) wrote :

Andres,

Before you fix this, it would be good to understand if any of the other calls to block_device_mapping_get_all_by_instance in nova.compute.manager.py that do not set legacy=False are also broken. If they are, they should also be fixed. I don't have a good enough understanding of this code myself to make that determination, but perhaps someone who is following this bug can comment?

Bart

Revision history for this message
Andres Isaac Benavides (andres-i-benavides) wrote :

Yes Bart, I am going to investigate the other calls in nova.compute.manager.py

Revision history for this message
Andres Isaac Benavides (andres-i-benavides) wrote :

I did not find anything unusual respect to other calls in nova.compute.manager. I do not know if someone who is following this bug can comment?

Revision history for this message
Ruel Masalta (rmasalta) wrote :

Hi,

anyone can help me for the details on how to do the tweak?

--- old/manager.py 2013-11-12 15:21:23.824525122 +0000
+++ new/manager.py 2013-11-12 17:11:07.952673903 +0000
@@ -2021,7 +2021,7 @@
             if bdms is None:
                 bdms = self.conductor_api.\
                         block_device_mapping_get_all_by_instance(
- context, instance)
+ context, instance, False)

             # NOTE(sirp): this detach is necessary b/c we will reattach the
             # volumes in _prep_block_devices below.

Revision history for this message
Ruel Masalta (rmasalta) wrote :

I've seen a lot of "if bdms is None:" blocks in manager.py file. Do I have to modify all of them?

Revision history for this message
Ruel Masalta (rmasalta) wrote :

Having a hard time applying the workaround in icehouse. Any help would be much appreciated.

Revision history for this message
Jakub Pavlik (pavlk-jakub) wrote :

I confirm this bug. Patch works for IBM SVC driver also.

tags: added: customer-found
Changed in mos:
assignee: nobody → MOS Nova (mos-nova)
milestone: none → 5.1.1
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

The bug is mirrored in MOS there: https://bugs.launchpad.net/mos/+bug/1367610

no longer affects: mos
tags: removed: customer-found
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

There's a related (potentially duplicate) bug #1340411.

Changed in nova:
assignee: Andres Isaac Benavides (andres-i-benavides) → Fei Long Wang (flwang)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/131613

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/121745
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=91d3272b975572d9866b7d959547e438142dc4fb
Submitter: Jenkins
Branch: master

commit 91d3272b975572d9866b7d959547e438142dc4fb
Author: Fei Long Wang <email address hidden>
Date: Tue Sep 16 15:43:37 2014 +1200

    Fix nova evacuate issues for RBD

    For RBD scenario, there are some issues in Nova code
    now against evacuate function:

    1. Based on current implementation, nova evacuate and
    nova rebuild are sharing some code. When user enables
    the on_shared_storage option for nova evacuate, nova
    will check if the instance path is accessible. For
    the RBD scenario, the volume(block) is shared between
    different hosts, though the path isn't shared at the
    filesystem level. This patch fixes this issue and adds
    test cases for that.

    2. Missing the 'recreate' parameter for rebuild method.
    Though the libvirt driver doesn't implement rebuild
    method(only Ironic driver implements it), but we really
    need to set 'recreate' in kwargs so it gets passed to
    _rebuild_default_impl so we don't call driver.destroy
    on evacuate for shared filesystem/block storage cases.
    It is fixed in this patch and test case is added as well.

    Closes-Bug: 1249319
    Closes-Bug: 1340411

    Change-Id: Idc8c45b055e986cf85730235d5d25777632ad1c1

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/131629

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/juno)

Reviewed: https://review.openstack.org/131613
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7920cfdab2fb10e01544eeb713a1e3bc79bc4996
Submitter: Jenkins
Branch: stable/juno

commit 7920cfdab2fb10e01544eeb713a1e3bc79bc4996
Author: Fei Long Wang <email address hidden>
Date: Tue Sep 16 15:43:37 2014 +1200

    Fix nova evacuate issues for RBD

    For RBD scenario, there are some issues in Nova code
    now against evacuate function:

    1. Based on current implementation, nova evacuate and
    nova rebuild are sharing some code. When user enables
    the on_shared_storage option for nova evacuate, nova
    will check if the instance path is accessible. For
    the RBD scenario, the volume(block) is shared between
    different hosts, though the path isn't shared at the
    filesystem level. This patch fixes this issue and adds
    test cases for that.

    2. Missing the 'recreate' parameter for rebuild method.
    Though the libvirt driver doesn't implement rebuild
    method(only Ironic driver implements it), but we really
    need to set 'recreate' in kwargs so it gets passed to
    _rebuild_default_impl so we don't call driver.destroy
    on evacuate for shared filesystem/block storage cases.
    It is fixed in this patch and test case is added as well.

    Closes-Bug: 1249319
    Closes-Bug: 1340411

    Change-Id: Idc8c45b055e986cf85730235d5d25777632ad1c1
    (cherry picked from commit 91d3272b975572d9866b7d959547e438142dc4fb)

tags: added: in-stable-juno
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/icehouse)

Reviewed: https://review.openstack.org/131629
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3de3f1066fa47312b8c3075abf790631034d67a3
Submitter: Jenkins
Branch: stable/icehouse

commit 3de3f1066fa47312b8c3075abf790631034d67a3
Author: Fei Long Wang <email address hidden>
Date: Tue Sep 16 15:43:37 2014 +1200

    Fix nova evacuate issues for RBD

    For RBD scenario, there are some issues in Nova code
    now against evacuate function:

    1. Based on current implementation, nova evacuate and
    nova rebuild are sharing some code. When user enables
    the on_shared_storage option for nova evacuate, nova
    will check if the instance path is accessible. For
    the RBD scenario, the volume(block) is shared between
    different hosts, though the path isn't shared at the
    filesystem level. This patch fixes this issue and adds
    test cases for that.

    2. Missing the 'recreate' parameter for rebuild method.
    Though the libvirt driver doesn't implement rebuild
    method(only Ironic driver implements it), but we really
    need to set 'recreate' in kwargs so it gets passed to
    _rebuild_default_impl so we don't call driver.destroy
    on evacuate for shared filesystem/block storage cases.
    It is fixed in this patch and test case is added as well.

    Closes-Bug: 1249319
    Closes-Bug: 1340411

    Conflicts:
            nova/tests/compute/test_compute_mgr.py
            nova/tests/virt/libvirt/test_libvirt.py
            nova/virt/libvirt/driver.py

    Change-Id: Idc8c45b055e986cf85730235d5d25777632ad1c1
    (cherry picked from commit 91d3272b975572d9866b7d959547e438142dc4fb)
    (cherry picked from commit 7920cfdab2fb10e01544eeb713a1e3bc79bc4996)

tags: added: in-stable-icehouse
Thierry Carrez (ttx)
Changed in nova:
milestone: none → kilo-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: kilo-1 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.