Possible data loss from createImage action

Bug #1852106 reported by Brian Rosmaita on 2019-11-11
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Glance
Undecided
Unassigned
OpenStack Compute (nova)
High
Brian Rosmaita

Bug Description

Description
===========

When an instance is booted from a image created from an encrypted Cinder volume and several images are created from that instance, deleting any one of these images will render the remaining images unusable.

The scenario
============

1. User creates a volume V-1 of an encrypted type in Cinder; Cinder automatically stores the encryption key in Barbican with key_id c-1.

2. User uploads this volume as an image to Glance as I-1. Cinder stores the encryption key in Barbican with key_id c-2 and puts the metadata
  cinder_encryption_key_id: c-2
on the image.

The idea is to preserve a 1-1 relation between key_ids and resources so that when a resource is deleted, its key in Barbican can also be deleted with no potential for data loss. This prevents cruft from accumulating in the user's Barbican account. In Train, this deletion-from-Barbican process has been automated; if the metadata
  cinder_encryption_key_deletion_policy: on_image_deletion
is present on the image, Glance will delete the key from Barbican when the image is deleted. Beginning with Train, Cinder puts the deletion_policy metadata on all volumes uploaded as images to Glance.

3. User boots an instance from image I-1. Nova will store all the image metadata from the image.

4. User does the createImage action on the instance. Nova creates an image I-2 and copies over the image metadata, putting
  cinder_encryption_key_id: c-2
  cinder_encryption_key_deletion_policy: on_image_deletion
on the image.

5. If the user deletes I-2, key c-2 will be deleted from Barbican, thereby rendering image I-1 unusable. Similarly, if the user has created a bunch of images from the instance, deleting one of them will render all the remainder useless.

NOTE: if the user creates a volume from image I-1, Cinder will create a new Barbican secret for the resulting volume. So deleting I-1 (and hence key c-2) won't affect the usability of any volumes created from it.

This bug has been around for a while, but it required the user to manually delete the Barbican secret that multiple images depend on. The Cinder/Glance change in Train to automate the process makes this scenario much more likely to happen.

The immediate workaround is to add 'cinder_encryption_key_deletion_policy' to the non_inheritable_image_properties list in nova.conf.

The long term solution is for Nova to clone the encryption key in Barbican so that Nova always puts a unique key_id on the created image.

Matt Riedemann (mriedem) on 2019-11-11
tags: added: barbican encryption snapshot volumes
Lee Yarwood (lyarwood) wrote :

Okay, so while this is a valid bug I don't think the overall use case has ever worked.

cinder_encryption_key_id isn't used anywhere within openstack/nova [1]. For the libvirt driver at least we only `support` ephemeral encryption with the LVM imagebackend. There we use instance.ephemeral_key_uuid to track the key used to encrypt and then decrypt the encrypted disk(s). AFAICT this a fresh key created each time we launch an instance if CONF.ephemeral_storage_encryption.enabled is True [2], at no time do we lookup and/or use cinder_encryption_key_id from the image metadata.

As a result I can't see how we could ever launch (and importantly allow the guest OS to boot) an instance using an encrypted image that itself was created from an encrypted volume.

Do you know if this has ever been tested and if so where/how?

[1] http://codesearch.openstack.org/?q=cinder_encryption_key_id&i=nope&files=&repos=
[2] https://github.com/openstack/nova/blob/7aa88029bbf6311033457c32801963da01e88ecb/nova/compute/api.py#L1808-L1820

Brian Rosmaita (brian-rosmaita) wrote :

I haven't actually reproduced this, it came up when I was discussing some new tempest tests with the Glance team at the PTG.

I'll see if I can find someplace to test this scenario. But if Step #3 cannot occur, that certainly reduces the urgency of this bug!

Brian Rosmaita (brian-rosmaita) wrote :

I talked to Eric Harney, who did the original implementation of Cinder volume encryption, and he said that there was no support added to Nova to boot this kind of image (as it was not part of the intended use case).

He thinks the instance may go to ACTIVE status but be unusable. In such a case, what would happen if the createImage action were called on the instance? (Would be a pretty dumb thing to do, since you'd presumably get a snapshot that wasn't bootable, but this bug would happen when you deleted the unusable snapshot.)

Brian Rosmaita (brian-rosmaita) wrote :

Benny Kopilov at Red Hat was able to reproduce this in Train.

1. image1 is created by the Cinder upload-volume-to-image action.

2. instance1 is created from image1. instance1 is ACTIVE, but unusable (no OS inside).

3. image2 is created by the 'nova image-create' action on the instance. This image contains the same values for cinder_encryption_key_id and cinder_encryption_key_deletion_policy as image1.

4. deleting image2 will result in deleting the key for image1

Lee Yarwood (lyarwood) wrote :

So this is a Glance bug at this point, Nova has no concept of cinder_encryption_key_id and that's why the instance booted at step 2 isn't able to load an OS from the encrypted image.

Moving the bug to Glance so they can decide how to handle delete requests where the same cinder_encryption_key_id is referenced by multiple images.

affects: nova → glance
Brian Rosmaita (brian-rosmaita) wrote :

Added Nova to the bug. Quickest fix will be to modify the Nova non_inheritable_image_properties setting so that the problematic metadata are not inherited. Operators can make such a change without upgrading.

Glance fix will take longer and require an upgrade. (I'm not saying we shouldn't do it, just that it would be good to fix a possible data loss bug as fast as possible.)

Brian Rosmaita (brian-rosmaita) wrote :

Patch for the quick nova change:
https://review.opendev.org/#/c/706298/

Fix proposed to branch: master
Review: https://review.opendev.org/707738

Changed in nova:
assignee: nobody → Brian Rosmaita (brian-rosmaita)
status: New → In Progress

Change abandoned by Brian Rosmaita (<email address hidden>) on branch: master
Review: https://review.opendev.org/706298
Reason: Superseded by https://review.opendev.org/#/c/707738/

Reviewed: https://review.opendev.org/707738
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=963fd8c0f956bdf5c6c8987aa5d9f836386fd5ed
Submitter: Zuul
Branch: master

commit 963fd8c0f956bdf5c6c8987aa5d9f836386fd5ed
Author: Brian Rosmaita <email address hidden>
Date: Thu Feb 13 11:09:08 2020 -0500

    Reject boot request for unsupported images

    Nova has never supported direct booting of an image of an encrypted
    volume uploaded to Glance via the Cinder upload-volume-to-image
    process, but instead of rejecting such a request, an 'active' but
    unusable instance is created. This patch allows Nova to use image
    metadata to detect such an image and reject the boot request.

    Change-Id: Idf84ccff254d26fa13473fe9741ddac21cbcf321
    Related-bug: #1852106
    Closes-bug: #1863611

Reviewed: https://review.opendev.org/708126
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bc290840127c3179227a662584404f9c0178d588
Submitter: Zuul
Branch: master

commit bc290840127c3179227a662584404f9c0178d588
Author: Brian Rosmaita <email address hidden>
Date: Thu Feb 13 11:09:08 2020 -0500

    Absolutely-non-inheritable image properties

    Inheritance of image properties from the image an instance was booted
    from to an image created from that instance is governed by the
    non_inheritable_image_properties configuration option. However, there
    are some image properties (for example, those used for image signature
    validation or to reference a cinder encryption key id) which it makes
    no sense to inherit under any circumstances. Additionally,
    misconfiguration of the non-inheritable properties can lead to data
    loss under the circumstances described in Bug #1852106. So it would
    be better if these properties were not subject to configuration.

    The initial set of absolutely non-inheritable image properties
    consists of those associated with cinder encryption keys and image
    signature validation.

    Change-Id: I4332b9c343b6c2b50226baa8f78396c2012dabd1
    Closes-bug: #1852106

Changed in nova:
status: In Progress → Fix Released
Changed in nova:
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers