Live block migration fails for instances whose glance images have been deleted

Bug #1270825 reported by Loganathan Parthipan
110
This bug affects 20 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
melanie witt

Bug Description

Once the glance image from which an instance was spawned is deleted it's not possible to block migrate this instance.

To recreate:

1. Boot an instance off a public image or snapshot
2. Delete the image from glance
3. Live block migrate this instance. It will fail at pre-live-migration stage as the image could not be downloaded.

Tags: libvirt
Revision history for this message
Joe Gordon (jogo) wrote :

This makes sense as we pull down the base images from glance. We could fix this by pulling the image from the other machine but this sounds like more of a feature request then a bug.

Changed in nova:
importance: Undecided → Critical
importance: Critical → Wishlist
status: New → Confirmed
Revision history for this message
Loganathan Parthipan (parthipan) wrote :

It should be possible, in theory, to rebase the disk so that the backing file is removed in the destination on the exception ImageNotFound from glance client.

source node:
disk (qcow2) -> _base/backing_file

destination node:
disk (qcow2) - But no backing file

This would ensure that this particular instance would keep running and migrate again.

Not having a backing file in _base is acceptable in this case since there isn't going to be another instance off this rootdisk again.

tags: added: libvirt
Revision history for this message
Loganathan Parthipan (parthipan) wrote :

A possible design to elaborate comment #2.

1. source---> call pre_live_migration on destination
2. glance fetch returns ImageNotFound on destination
3. Handle exception and create disk, but keep it single layer qcow2. (ie. without a backing file)
4. return call back to source

5. Initiate live migration

Now in step 5, an incremental block copy would not work since the destination disk file would look like the source overlay file. We need to tell libvirt/kvm to do a deep copy. I don't know if this is possible with the current libvirt API. However, since qemu-img lets you rebase to any other backing file or flatten, I believe the mechanism exists.

Mehdi Abaakouk (sileht)
Changed in nova:
assignee: nobody → Mehdi Abaakouk (sileht)
Revision history for this message
Mehdi Abaakouk (sileht) wrote :

@parthipan, unfortunatly this solution won't work for instance with AMI image that have depends of AKI and ARI image, libvirt/kvm copy only disk passed as disk to kvm, not the files passed to -kernel, or -initrd

I will try to submit something that use the same logic than the migrate_power_off code

Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/90321

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
George Shuklin (george-shuklin) wrote :

I'd like to note that non-live migration fails too due this bug (not wishlist!), but in different way - instance failed to start after migration, but it can be workarounded by reset-state and stop/start.

Thomas Herve (therve)
tags: added: icehouse-backport-potential
Revision history for this message
Warren Wang (w-emailme) wrote :

Wouldn't this be easier to resolve if a new image status was available in Glance? Something like an archive mode where new images may not be launched from the image, but the image may still be used as a reference for existing instances to do things like migrate, snapshot, etc. Private does not work, and obviously neither does delete.

Revision history for this message
melanie witt (melwitt) wrote :

Based on the comments in the proposed patch and the fact that this bug affects 11 people, I'm unmarking this as a High bug to aid in tracking.

Changed in nova:
importance: Wishlist → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/90321
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Changed in nova:
assignee: Mehdi Abaakouk (sileht) → melanie witt (melwitt)
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Looks like Melanie picked this back up

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Fix Proposed again.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/90321
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3345ca029fd2527eec8de365a37779fd37809398
Submitter: Jenkins
Branch: master

commit 3345ca029fd2527eec8de365a37779fd37809398
Author: Mehdi Abaakouk <email address hidden>
Date: Fri Apr 25 10:39:48 2014 +0200

    libvirt: Fix migration when image doesn't exist

    When nova is used without shared_storage and glance doesn't have the
    images required by the instance anymore, the live block migration code
    can't prepare the destination host properly.

    This patch catches the case when images are not found in glance, and
    copies the missing images from the source host, like the
    migrate_disk_and_power_off code already does.

    The KVM disk deep copy method is not used because it won't work for AMI
    image that depends of AKI/ARI image. kernel and initrd are not considered
    by the kvm disk migration because they are readonly file (by the kvm pov)
    used only to boot the VM.

    Co-Authored-By: Sahid Orentino Ferdjaoui <email address hidden>
    Closes-bug: #1270825

    Change-Id: If81f8b1bbe3e738579ffe2d8f36807afb77560d8

Changed in nova:
status: In Progress → Fix Committed
Yaguang Tang (heut2008)
tags: removed: icehouse-backport-potential
Thierry Carrez (ttx)
Changed in nova:
milestone: none → kilo-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: kilo-3 → 2015.1.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/189923

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/juno)

Change abandoned by Artom Lifshitz (<email address hidden>) on branch: stable/juno
Review: https://review.openstack.org/189923

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.