Unshelving a VM breaks instance metadata when using qcow2 backed images

Bug #1732428 reported by Kalle Happonen on 2017-11-15
40
This bug affects 7 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Alexandre arents
Ocata
Medium
Unassigned
Pike
Medium
Unassigned

Bug Description

If you unshelve instances on compute nodes that use qcow2 backed instances, the instance image_ref will point to the original image the VM was lauched from. The base file for /var/lib/nova/instances/uuid/disk will be the snapshot which was used for shelving. This causes errors with e.g. resizes and migrations.

Steps to reproduce/what happens:
Have at least 2 compute nodes configured with the standard qcow2 backed images.

1) Launch an instance.
2) Shelve the instance. In the background this should in practice create a flattened snapshot of the VM.

3) Unshelve the instance. The instance will boot on one of the compute nodes. The /var/lib/nova/instances/uuid/disk should now have the snapshot as its base file. The instance metadata still claims that the image_ref is the original image which the VM was launched from, not the snapshot.

4) Resize/migrate the instance. /var/lib/nova/instances/uuid/disk should be copied to the other compute node. If you resize to an image with the same size disk, go to 5), if you resize to flavor with a larger disk, it probably causes an error here when it tries to grow the disk.

5a) If the instance was running: When nova tries to start the VM, it will copy the original base image to the new compute node, not the snapshot base image. The instance can't boot, since it doesn't find its actual base file, and it goes to an ERROR state.

5b) If the instance was shutdown: You can confirm the resize, but the VM won't start. The snapshot base file may be removed from the source machine causing dataloss.

What should have happened:
Either the instance image_ref should be updated to the snapshot image, or the snapshot image should be rebased to the original image, or is should force a raw only image after unshelve, or something else you smart people come up with.

Environment:
RDO Neutron with KVM

rpm -qa |grep nova
openstack-nova-common-14.0.6-1.el7.noarch
python2-novaclient-6.0.1-1.el7.noarch
python-nova-14.0.6-1.el7.noarch
openstack-nova-compute-14.0.6-1.el7.noarch

Also a big thank you to Toni Peltonen and Anton Aksola from nebula.fi for discovering and debugging this issue.

description: updated
Matt Riedemann (mriedem) on 2017-12-01
tags: added: shelve
Matt Riedemann (mriedem) wrote :

Nice analysis.

You're correct that when we shelve an instance, we create a snapshot image, starting in the API:

https://github.com/openstack/nova/blob/b6a245f0425a07be3871a976952646d2bdd44533/nova/compute/api.py#L3244

That snapshot image_id is passed down to the compute service to do the actual snapshot and upload from the virt driver:

https://github.com/openstack/nova/blob/b6a245f0425a07be3871a976952646d2bdd44533/nova/compute/manager.py#L4598

We then store that snapshot image_id in the instance system_metadata for later when it's unshleved:

https://github.com/openstack/nova/blob/b6a245f0425a07be3871a976952646d2bdd44533/nova/compute/manager.py#L4601

When we unshelve, we get that snapshot image from glance:

https://github.com/openstack/nova/blob/b6a245f0425a07be3871a976952646d2bdd44533/nova/conductor/manager.py#L641

We then use that to update the instance.image_ref field to point at the snapshot image:

https://github.com/openstack/nova/blob/b6a245f0425a07be3871a976952646d2bdd44533/nova/compute/manager.py#L4764

It looks like the problem is that we then reset the instance.image_ref to the old image id before we unshelved:

https://github.com/openstack/nova/blob/b6a245f0425a07be3871a976952646d2bdd44533/nova/compute/manager.py#L4796

I have no idea why we do that, and that's probably the bug.

Matt Riedemann (mriedem) wrote :

Looks like this was introduced with https://review.openstack.org/#/c/72407/ as maybe a way to pass the shelved image_id via the instance.image_ref field to the driver.spawn() method, but that doesn't make sense now.

Matt Riedemann (mriedem) wrote :

Looking at the review comments on https://review.openstack.org/#/c/72407/, it looks like this was intentional:

Nikola Dipanov
Feb 10, 2014

Patch Set 2: I would prefer that you didn't merge this

Looking at the code - I am not sure we actually want to do this.

The instance should keep it's old image in the db once it has been unshelved, but it needs the new image because it will download it in the compute manager when it calls driver.spawn.

At the very least, we should put the actual image back to the instance once the unshelve is done, (so end of manager call).

Going forward, we might want to change what gets passed to spawn so that it can decide what to download. Keep in mind that right now we have the image as a block device in the db (even though we don't use it)

--

I have no idea why we should say the instance is backed by the original image ref rather than the snapshot image ref from the shelved offloaded instance, that's totally confusing and wrong IMO.

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: New → Triaged
importance: Undecided → Medium
Matt Riedemann (mriedem) wrote :

Now I'm seeing this:

https://github.com/openstack/nova/blob/b6a245f0425a07be3871a976952646d2bdd44533/nova/compute/manager.py#L4797

So apparently we set the instance.image_ref back to the original image_ref because after we successfully spawn the guest during unshelve, we delete the snapshot image reference for some reason, I have no idea why.

Kalle Happonen (kalle-happonen) wrote :

Thanks for the activity on this Matt. I don't think there's an obvious solution with only upsides for this problem.

Changing the image ref to the original image has its upsides. It's end-user friendly, since their instance behaves like it was "shelved", not like it was snapshotted, killed and respawned. E.g. rebuild and rescues of instances work better when it has the original image in the image_ref.

Not deleting the snapshot and having the image_ref as the snapshot might also have other negative side effects like possibly aggressively growing the glance image repo for a tenant. Then some cleanup logic would have to be added, which adds complexity and doesn't sound pretty.

If this only affects the qcow2 storage case (I'm not familiar with all storage backend options), a good compromise could be to

1) If the original image exists in glance, do a qemu-img rebase to that image. Then the VM is in the state we want.

2) If the original image can't be found, update the image-ref to the snapshot, and don't delete the snapshot.

But just my 2c from the admin side :). I'm not familiar enough with the nova codebase to know the feasibility of this.

Fix proposed to branch: master
Review: https://review.openstack.org/524726

Changed in nova:
status: Triaged → In Progress
Sun Mengyun (kmehxhcr) wrote :

I reproduced this bug and found that the directory of base file is stored in disk_info, which is used in finish_migration function. So, may be we can modify this parameter to rebase to original image.

Matt Riedemann (mriedem) wrote :

@Sun, unshelve does not call finish_migration() so I'm not sure what you're referring to. I guess because of the relationship between resizing a server *after* unshelving it?

Matt Riedemann (mriedem) wrote :

Removing myself as I'm not actively working on this.

Changed in nova:
assignee: Matt Riedemann (mriedem) → nobody
status: In Progress → Confirmed
Matt Riedemann (mriedem) wrote :

Just thought of this, but nova does store the originally requested image_id in the request_specs table in the nova_api database - so if we needed to find that for whatever reason, we could. And then leave the instance.image_ref pointing at the actual current image that the instance is using (which could be a shelved snapshot).

Matt Riedemann (mriedem) wrote :

More thoughts in the ML on my idea from comment 11:

http://lists.openstack.org/pipermail/openstack-dev/2018-September/134855.html

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/524726
Reason: I'm not pursuing this.

Fix proposed to branch: master
Review: https://review.opendev.org/696084

Changed in nova:
assignee: nobody → Alexandre arents (aarents)
status: Confirmed → In Progress

Reviewed: https://review.opendev.org/743537
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ce2203456660083119cdbb7e73c1ad15e6e0a074
Submitter: Zuul
Branch: master

commit ce2203456660083119cdbb7e73c1ad15e6e0a074
Author: Alexandre Arents <email address hidden>
Date: Tue Nov 26 10:26:32 2019 +0000

    Make _rebase_with_qemu_img() generic

    Move volume_delete related logic away from this method, in order to make
    it generic and usable elsewhere.

    Change-Id: I17357d85f845d4160cb7c7784772530a1e92af76
    Related-Bug: #1732428

Reviewed: https://review.opendev.org/696084
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8953a689467f8c3e996086392251de67953a45ba
Submitter: Zuul
Branch: master

commit 8953a689467f8c3e996086392251de67953a45ba
Author: Alexandre Arents <email address hidden>
Date: Tue Nov 26 10:26:32 2019 +0000

    Rebase qcow2 images when unshelving an instance

    During unshelve, instance is spawn with image created by shelve
    and is deleted just after, instance.image_ref still point
    to the original instance build image.

    In qcow2 environment, this is an issue because instance backing file
    don't match anymore instance.image_ref and during live-migration/resize,
    target host will fetch image corresponding to instance.image_ref
    involving instance corruption.

    This change fetches original image and rebase instance disk on it.
    This avoid image_ref mismatch and bring back storage benefit to keep common
    image in cache.

    If original image is no more available in glance, backing file is merged into
    disk(flatten), ensuring instance integrity during next live-migration/resize
    operation.

    Change-Id: I1a33fadf0b7439cf06c06cba2bc06df6cef0945b
    Closes-Bug: #1732428

Changed in nova:
status: In Progress → Fix Released

Related fix proposed to branch: master
Review: https://review.opendev.org/749205

Change abandoned by yatin (<email address hidden>) on branch: master
Review: https://review.opendev.org/749035
Reason: In favor of https://review.opendev.org/#/c/749205/

Reviewed: https://review.opendev.org/749205
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=fe52b6c25bebdd1b459c7a59fbb8d9f6de200c9d
Submitter: Zuul
Branch: master

commit fe52b6c25bebdd1b459c7a59fbb8d9f6de200c9d
Author: Alexandre Arents <email address hidden>
Date: Tue Sep 1 08:26:25 2020 +0000

    Update image_base_image_ref during rebuild.

    In different location we assume system_metadata.image_base_image_ref
    exists, because it is set during instance creation in method
    _populate_instance_for_create

    But once instance is rebuild, all system_metadata image property a dropped
    and replace by new image property and without setting back
    image_base_image_ref.

    This change propose to set image_base_image_ref during rebuild.

    In specific case of shelve/unshelve in Qcow2 backend, image_base_image_ref is
    used to rebase disk image, so we ensure this property is set as instance may
    have been rebuild before the fix was apply.

    Related-Bug: #1732428
    Closes-Bug: #1893618
    Change-Id: Ia3031ea1f7db8b398f02d2080ca603ded8970200

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers