Possible data loss from createImage action

Bug #1852106 reported by Brian Rosmaita
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Glance
Triaged
Undecided
Unassigned
OpenStack Compute (nova)
Fix Released
High
Brian Rosmaita
Queens
In Progress
High
Lee Yarwood
Rocky
In Progress
High
Lee Yarwood
Stein
In Progress
High
Lee Yarwood
Train
In Progress
High
Lee Yarwood

Bug Description

Description
===========

When an instance is booted from a image created from an encrypted Cinder volume and several images are created from that instance, deleting any one of these images will render the remaining images unusable.

The scenario
============

1. User creates a volume V-1 of an encrypted type in Cinder; Cinder automatically stores the encryption key in Barbican with key_id c-1.

2. User uploads this volume as an image to Glance as I-1. Cinder stores the encryption key in Barbican with key_id c-2 and puts the metadata
  cinder_encryption_key_id: c-2
on the image.

The idea is to preserve a 1-1 relation between key_ids and resources so that when a resource is deleted, its key in Barbican can also be deleted with no potential for data loss. This prevents cruft from accumulating in the user's Barbican account. In Train, this deletion-from-Barbican process has been automated; if the metadata
  cinder_encryption_key_deletion_policy: on_image_deletion
is present on the image, Glance will delete the key from Barbican when the image is deleted. Beginning with Train, Cinder puts the deletion_policy metadata on all volumes uploaded as images to Glance.

3. User boots an instance from image I-1. Nova will store all the image metadata from the image.

4. User does the createImage action on the instance. Nova creates an image I-2 and copies over the image metadata, putting
  cinder_encryption_key_id: c-2
  cinder_encryption_key_deletion_policy: on_image_deletion
on the image.

5. If the user deletes I-2, key c-2 will be deleted from Barbican, thereby rendering image I-1 unusable. Similarly, if the user has created a bunch of images from the instance, deleting one of them will render all the remainder useless.

NOTE: if the user creates a volume from image I-1, Cinder will create a new Barbican secret for the resulting volume. So deleting I-1 (and hence key c-2) won't affect the usability of any volumes created from it.

This bug has been around for a while, but it required the user to manually delete the Barbican secret that multiple images depend on. The Cinder/Glance change in Train to automate the process makes this scenario much more likely to happen.

The immediate workaround is to add 'cinder_encryption_key_deletion_policy' to the non_inheritable_image_properties list in nova.conf.

The long term solution is for Nova to clone the encryption key in Barbican so that Nova always puts a unique key_id on the created image.

Matt Riedemann (mriedem)
tags: added: barbican encryption snapshot volumes
Revision history for this message
Lee Yarwood (lyarwood) wrote :

Okay, so while this is a valid bug I don't think the overall use case has ever worked.

cinder_encryption_key_id isn't used anywhere within openstack/nova [1]. For the libvirt driver at least we only `support` ephemeral encryption with the LVM imagebackend. There we use instance.ephemeral_key_uuid to track the key used to encrypt and then decrypt the encrypted disk(s). AFAICT this a fresh key created each time we launch an instance if CONF.ephemeral_storage_encryption.enabled is True [2], at no time do we lookup and/or use cinder_encryption_key_id from the image metadata.

As a result I can't see how we could ever launch (and importantly allow the guest OS to boot) an instance using an encrypted image that itself was created from an encrypted volume.

Do you know if this has ever been tested and if so where/how?

[1] http://codesearch.openstack.org/?q=cinder_encryption_key_id&i=nope&files=&repos=
[2] https://github.com/openstack/nova/blob/7aa88029bbf6311033457c32801963da01e88ecb/nova/compute/api.py#L1808-L1820

Revision history for this message
Brian Rosmaita (brian-rosmaita) wrote :

I haven't actually reproduced this, it came up when I was discussing some new tempest tests with the Glance team at the PTG.

I'll see if I can find someplace to test this scenario. But if Step #3 cannot occur, that certainly reduces the urgency of this bug!

Revision history for this message
Brian Rosmaita (brian-rosmaita) wrote :

I talked to Eric Harney, who did the original implementation of Cinder volume encryption, and he said that there was no support added to Nova to boot this kind of image (as it was not part of the intended use case).

He thinks the instance may go to ACTIVE status but be unusable. In such a case, what would happen if the createImage action were called on the instance? (Would be a pretty dumb thing to do, since you'd presumably get a snapshot that wasn't bootable, but this bug would happen when you deleted the unusable snapshot.)

Revision history for this message
Brian Rosmaita (brian-rosmaita) wrote :

Benny Kopilov at Red Hat was able to reproduce this in Train.

1. image1 is created by the Cinder upload-volume-to-image action.

2. instance1 is created from image1. instance1 is ACTIVE, but unusable (no OS inside).

3. image2 is created by the 'nova image-create' action on the instance. This image contains the same values for cinder_encryption_key_id and cinder_encryption_key_deletion_policy as image1.

4. deleting image2 will result in deleting the key for image1

Revision history for this message
Lee Yarwood (lyarwood) wrote :

So this is a Glance bug at this point, Nova has no concept of cinder_encryption_key_id and that's why the instance booted at step 2 isn't able to load an OS from the encrypted image.

Moving the bug to Glance so they can decide how to handle delete requests where the same cinder_encryption_key_id is referenced by multiple images.

affects: nova → glance
Revision history for this message
Brian Rosmaita (brian-rosmaita) wrote :

Added Nova to the bug. Quickest fix will be to modify the Nova non_inheritable_image_properties setting so that the problematic metadata are not inherited. Operators can make such a change without upgrading.

Glance fix will take longer and require an upgrade. (I'm not saying we shouldn't do it, just that it would be good to fix a possible data loss bug as fast as possible.)

Revision history for this message
Brian Rosmaita (brian-rosmaita) wrote :

Patch for the quick nova change:
https://review.opendev.org/#/c/706298/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/707738

Changed in nova:
assignee: nobody → Brian Rosmaita (brian-rosmaita)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Brian Rosmaita (<email address hidden>) on branch: master
Review: https://review.opendev.org/706298
Reason: Superseded by https://review.opendev.org/#/c/707738/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/708126

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/707738
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=963fd8c0f956bdf5c6c8987aa5d9f836386fd5ed
Submitter: Zuul
Branch: master

commit 963fd8c0f956bdf5c6c8987aa5d9f836386fd5ed
Author: Brian Rosmaita <email address hidden>
Date: Thu Feb 13 11:09:08 2020 -0500

    Reject boot request for unsupported images

    Nova has never supported direct booting of an image of an encrypted
    volume uploaded to Glance via the Cinder upload-volume-to-image
    process, but instead of rejecting such a request, an 'active' but
    unusable instance is created. This patch allows Nova to use image
    metadata to detect such an image and reject the boot request.

    Change-Id: Idf84ccff254d26fa13473fe9741ddac21cbcf321
    Related-bug: #1852106
    Closes-bug: #1863611

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/708577

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/708578

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.opendev.org/708608

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.opendev.org/708609

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/708126
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bc290840127c3179227a662584404f9c0178d588
Submitter: Zuul
Branch: master

commit bc290840127c3179227a662584404f9c0178d588
Author: Brian Rosmaita <email address hidden>
Date: Thu Feb 13 11:09:08 2020 -0500

    Absolutely-non-inheritable image properties

    Inheritance of image properties from the image an instance was booted
    from to an image created from that instance is governed by the
    non_inheritable_image_properties configuration option. However, there
    are some image properties (for example, those used for image signature
    validation or to reference a cinder encryption key id) which it makes
    no sense to inherit under any circumstances. Additionally,
    misconfiguration of the non-inheritable properties can lead to data
    loss under the circumstances described in Bug #1852106. So it would
    be better if these properties were not subject to configuration.

    The initial set of absolutely non-inheritable image properties
    consists of those associated with cinder encryption keys and image
    signature validation.

    Change-Id: I4332b9c343b6c2b50226baa8f78396c2012dabd1
    Closes-bug: #1852106

Changed in nova:
status: In Progress → Fix Released
melanie witt (melwitt)
Changed in nova:
importance: Undecided → High
Revision history for this message
Brian Rosmaita (brian-rosmaita) wrote :

The nova fixes make this less urgent on the Glance side. Glance team consensus is that we should wait for the Barbican Secret Consumer API [0] to be implemented [1] before working on this in order to come up with a more thorough solution.

[0] https://specs.openstack.org/openstack/barbican-specs/specs/train/secret-consumers.html
[1] https://storyboard.openstack.org/#!/story/2005770

Changed in glance:
status: New → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/train)

Reviewed: https://review.opendev.org/708577
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=240d0309023fcaf20df44f819e9b3e14af97f526
Submitter: Zuul
Branch: stable/train

commit 240d0309023fcaf20df44f819e9b3e14af97f526
Author: Brian Rosmaita <email address hidden>
Date: Thu Feb 13 11:09:08 2020 -0500

    Reject boot request for unsupported images

    Nova has never supported direct booting of an image of an encrypted
    volume uploaded to Glance via the Cinder upload-volume-to-image
    process, but instead of rejecting such a request, an 'active' but
    unusable instance is created. This patch allows Nova to use image
    metadata to detect such an image and reject the boot request.

    Change-Id: Idf84ccff254d26fa13473fe9741ddac21cbcf321
    Related-bug: #1852106
    Closes-bug: #1863611
    (cherry picked from commit 963fd8c0f956bdf5c6c8987aa5d9f836386fd5ed)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/708578
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6e71909b0b8ad8d20d0863e714cad2a7b120f658
Submitter: Zuul
Branch: stable/stein

commit 6e71909b0b8ad8d20d0863e714cad2a7b120f658
Author: Brian Rosmaita <email address hidden>
Date: Thu Feb 13 11:09:08 2020 -0500

    Reject boot request for unsupported images

    Nova has never supported direct booting of an image of an encrypted
    volume uploaded to Glance via the Cinder upload-volume-to-image
    process, but instead of rejecting such a request, an 'active' but
    unusable instance is created. This patch allows Nova to use image
    metadata to detect such an image and reject the boot request.

    Change-Id: Idf84ccff254d26fa13473fe9741ddac21cbcf321
    Related-bug: #1852106
    Closes-bug: #1863611
    (cherry picked from commit 963fd8c0f956bdf5c6c8987aa5d9f836386fd5ed)
    (cherry picked from commit 240d0309023fcaf20df44f819e9b3e14af97f526)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/rocky)

Reviewed: https://review.opendev.org/708608
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=02551d6d6a3baee49309cffbc3bef2508fcafe72
Submitter: Zuul
Branch: stable/rocky

commit 02551d6d6a3baee49309cffbc3bef2508fcafe72
Author: Brian Rosmaita <email address hidden>
Date: Thu Feb 13 11:09:08 2020 -0500

    Reject boot request for unsupported images

    Nova has never supported direct booting of an image of an encrypted
    volume uploaded to Glance via the Cinder upload-volume-to-image
    process, but instead of rejecting such a request, an 'active' but
    unusable instance is created. This patch allows Nova to use image
    metadata to detect such an image and reject the boot request.

    Conflicts:
            nova/compute/api.py

    NOTE(lyarwood): conflict as I06fad233006c7bab14749a51ffa226c3801f951b
    ("Add support for translating CPU policy extra specs, image meta") is
    not present in stable/rocky.

    Change-Id: Idf84ccff254d26fa13473fe9741ddac21cbcf321
    Related-bug: #1852106
    Closes-bug: #1863611
    (cherry picked from commit 963fd8c0f956bdf5c6c8987aa5d9f836386fd5ed)
    (cherry picked from commit 240d0309023fcaf20df44f819e9b3e14af97f526)
    (cherry picked from commit 6e71909b0b8ad8d20d0863e714cad2a7b120f658)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/queens)

Reviewed: https://review.opendev.org/708609
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=418b9450e4b9bab15a4a0f92d013e53d18de3342
Submitter: Zuul
Branch: stable/queens

commit 418b9450e4b9bab15a4a0f92d013e53d18de3342
Author: Brian Rosmaita <email address hidden>
Date: Thu Feb 13 11:09:08 2020 -0500

    Reject boot request for unsupported images

    Nova has never supported direct booting of an image of an encrypted
    volume uploaded to Glance via the Cinder upload-volume-to-image
    process, but instead of rejecting such a request, an 'active' but
    unusable instance is created. This patch allows Nova to use image
    metadata to detect such an image and reject the boot request.

    Change-Id: Idf84ccff254d26fa13473fe9741ddac21cbcf321
    Related-bug: #1852106
    Closes-bug: #1863611
    (cherry picked from commit 963fd8c0f956bdf5c6c8987aa5d9f836386fd5ed)
    (cherry picked from commit 240d0309023fcaf20df44f819e9b3e14af97f526)
    (cherry picked from commit 6e71909b0b8ad8d20d0863e714cad2a7b120f658)
    (cherry picked from commit 02551d6d6a3baee49309cffbc3bef2508fcafe72)

tags: added: in-stable-queens
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.