no signature check for cached images

Bug #1793159 reported by Josephine Seifert
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
New
Undecided
Unassigned

Bug Description

Currently Nova only checks an image's signature directly after downloading it from Glance. The image is then cached on the corresponding compute node.

When Nova is reading the image file from cache and actually transfers it into the desired target storage when creating a server resource, the signature should be checked once again, since the image might have been tampered with in the cache. This has to be done somewhere in nova/virt/libvirt/imagebackend.py .

Tags: image-cache
Revision history for this message
Matt Riedemann (mriedem) wrote :

Can you provide some more details regarding the statement, "since the image might have been tampered with in the cache"? Can you provide a recreate scenario for example? Otherwise this sounds like a whack-a-mole problem where we could justify needing to check the image signature at any point the image is referenced, which sounds expensive.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Is this a duplicate of bug 1785668?

tags: added: image-cache
Revision history for this message
Markus Hentsch (mhen) wrote :

> Is this a duplicate of bug 1785668?

Yes, but it follows a different reasoning. The other report (bug 1785668) states that the image signature check should fail in their scenario, since the user spawning the instance should not be able to retrieve the corresponding validation certificate from Barbican due to the user's permissions.

In our case the reasoning is from a security standpoint regarding the cache. When an instance is spawned from an image, it is either A) downloaded (and cached) from Glance or B) already cached and simply re-used from the cache. In case A the image signature is validated and the process aborted if there's a validation error. In case B the image is blindly used from the cache. In both cases a transformation from image to instance happens.

In Nova's code "CONF.instances_path" (in conjunction with "CONF.image_cache_subdirectory_name") is used as the cache location. The configuration commentary for "instances_path" states:

> It can point to locally attached storage or a directory on NFS.

furthermore it states that it defaults to "$state_path/instances" where the commentary for "state_path" also states:

> In some scenarios (for example migrations) it makes sense to use a storage location which is shared between multiple compute hosts (for example via NFS).

Those statements imply that there are setups using a shared network storage for the cache location in one way or another. That means the source of a cached image is not necessarily a local (and trusted) one. The (cached) image data on a NFS source may have been tampered with from another system, for instance.
Given this scenario, in our opinion there should also be a validation of cached images' signatures when they are to be transformed into an instance.

Revision history for this message
Matthew Booth (mbooth-9) wrote :

The image cache is intended to be immutable. We assume this implicitly everywhere, and any alteration to it would certainly cause errors. It is not exposed externally anywhere.

If we assume that an attacker is able to write to modify host storage, how often do you propose we should check the image cache? Only at instance creation as proposed here? What if it is modified after that? Given that the instance storage uses the same storage as the image cache, how should we protect the instance itself against modification?

Revision history for this message
Markus Hentsch (mhen) wrote :

Thanks for your comment Matthew!

> If we assume that an attacker is able to write to modify host storage, how often do you propose we should check the image cache? Only at instance creation as proposed here?

Exactly. It is already checked when it enters the cache (downloaded from Glance) and it should be checked each time it is transformed into an instance.

> What if it is modified after that?

That is only relevant the next time it is transformed into a new instance again, at which point it is verified again as per our proposal. We're assuming a security focused environment here, where Nova uses encrypted LVM volumes as ephemeral storage backend, for example. This means there's no dynamic copy-on-write mechanism involved or anything that might make tampering with the image relevant to the instance after the transformation process.

> Given that the instance storage uses the same storage as the image cache, how should we protect the instance itself against modification?

As written above, we assume an environment where an instance's ephemeral storage (disk) itself is not located within "CONF.instances_path" but e.g. as an encrypted volume in the LVM backend of Nova. This means, the storage of the instance can not be tampered with via the same location that "CONF.state_path" points to.

Revision history for this message
John Garbutt (johngarbutt) wrote :

Seems this is about the case where:

* image cache is shared between hypervisors and remote
* ... but local ephemeral disks are encrypted

Generally if you have access to modify the image cache, all bets are off, so you get little protection from re-checking the image cache. Seems less clear cut in the above use case.

Feel like I need someone who understands the security aspects of this to comment on what extra protection we get here.

Would having the image cache encrypted with a separate key, shared with all the hypervisors, be better than a signature recheck? I am unsure.

Revision history for this message
Matthew Booth (mbooth-9) wrote :
Download full text (3.2 KiB)

There may be a valid use case with some additional work, but it's far from defined here, and would require significantly more work than is suggested here. Let me have a go:

1. The user would like Nova to provide assurance that the instance they create is initialised with an unmodified glance image. The user does not trust actors with access to compute host storage.

I think this is a fundamentally impossible proposition because Nova is a program on local storage. The user has no way to independently verify that their storage is encrypted, or that any checking was performed.

2. The user would like Nova to provide assurance that the instance they create is initialised with an unmodified glance image. The user trusts the system administrator of the compute host, but is concerned about casual (non-premeditated) access to user data, and unauthorised access.

The user trusts that the admin has taken reasonable efforts to make the Nova executable tamper-proof, e.g. by using a system like Tripwire. The user trusts that the admin and/or the admin's organisation, makes it difficult to casually circumvent this protection without detection.

As you alluded to above, the only protection Nova currently provides for ephemeral data at rest is encryption, and that's only supported by the LVM backend. The instance is encrypted using a key stored in Barbican. As mentioned above we're trusting the Nova executable in this scenario, so it's ok that Nova proxies the user's credentials to Barbican as long as it doesn't store either the credentials or fetched key anywhere. Lets declare the security of Barbican explicitly out of scope.

If the host is configured to use flat, qcow2, or unencrypted LVM they have no defence, as anybody with access to modify the image cache can equally access instance storage.

Unfortunately, 'encrypted' LVM as currently implemented is similarly vulnerable. When starting the instance we first created an LV for the disk which will store encrypted data. However, we then create a dm-crypt device which we initialise with the key we obtained from Barbican. This presents an unencrypted block device to the host, which we then present to the instance. Any attacker needs only use the dm-crypt device rather than the underlying device, both of which have the same access. The dm-crypt device is only removed if the instance is deleted, or implicitly if the compute host is rebooted.

We would need to address this issue first before it's worth implementing anything else, and given that LVM encryption is essentially useless we should probably do this anyway if any users believe they are gaining anything from this feature. I believe native qemu encryption for ephemeral devices would help here.

Once we have what we believe to be tamper-proof local storage, I think the only robust way to verify the glance image would be to copy it into the tamper-proof storage, and then verify the contents of the tamper-proof storage.

3. The user would like Nova to provide assurance that the instance they create is initialised with an unmodified glance image. The user trusts the compute host completely, but does not trust NFS storage because of its lack of security a...

Read more...

Revision history for this message
Markus Hentsch (mhen) wrote :

Okay, let's summarize our proposed use case to get a better picture of the scenario:

The cloud provider has set "CONF.ephemeral_storage_encryption.enabled=True", "CONF.images_type=lvm" and setup the related options accordingly. As a result, Nova VM instances that aren't booted from a volume will use encrypted LVM volumes as their boot disk, located locally on the compute host.

Futhermore, the cloud provider wants to support VM migration scenarios and sets "CONF.state_path" to a NFS directory shared between a group of compute hosts that will support migration. "CONF.instances_path" is set to its default value ("$state_path/instances") to support this scenario. As a result, the image cache (which is always a subdirectory of "CONF.instances_path") is not a local source from the compute host's perspective anymore.

Finally, the cloud provider has set "CONF.verify_glance_signatures=True", so that images downloaded from Glance have their signature checked using a certificate that is retrieved on-demand from Barbican.

Given this environment, whenever an instance is created from an image, two major cases come into play:

A) The image is not cached on the corresponding compute host. It is downloaded into Nova's image cache from Glance. During this, the image signature is verified. An encrypted LVM volume is created. Finally, the image is transformed (essentially copied) into the encrypted LVM volume, which then acts as the boot block storage device for the VM instance.

B) The image is cached on the corresponding compute host. An encrypted LVM volume is created. The desired image located in the cache is directly transformed (copied) into the encrypted LVM volume, which then acts as the boot block storage device for the VM instance.

The problem in case B is that the signature is not checked again. When an image is transformed into an instance it should get its signature verified if the image originates from an external source (not located on the compute host itself). In the environment described above, both the Glance server as well as the image cache (shared NFS) resemble external sources. Both of the sources might have been tampered with even if the compute host itself is trusted.

Regarding the points of performance / computational costs:

As with many other security-related options (e.g. ephemeral storage encryption, signature checks), the behavior of additional signature verification for the image cache could be introduced as an optional mechanism that has to be explicitly actived via a setting in nova.conf

Furthermore, when the transformation from image to instance disk happens, the image has to be computed completely anyway. Based on this, another possibility could be to not complement the existing signature verification (case A, image -> cache) with an additional verification (case B, cache -> instance) but to move the signature verification mechanism closer to the transformation process altogether, so it is used uniformly for both cases at the same point in the process (image data -> instance)

Revision history for this message
Markus Hentsch (mhen) wrote :

Thanks for your input Matthew!

I was already writing the use case summary (comment #8) when you responded in the mean time, so it isn't an answer to your post directly. Let me catch up on that:

I agree that the access to the dmcrypt endpoint is a serious issue. Cinder mitigated this by introducing native LUKS support for LibVirt/QEMU. Similar mechanisms could be evaluated for Nova's LUKS-based ephemeral storage as well, but this is out of scope of this topic and requires a separate discussion I think.

From reading your post, our defined scenario seems most similar to point number 3 of comment #7. I'm curious about your related statement:

> This is trivial to implement: document that operators should not configure NFS storage.

What shared storage method should operators configure for migration scenarios instead that would make the signature check unnecessary?

Revision history for this message
Matthew Booth (mbooth-9) wrote :

From comment 8:

===
The cloud provider has set "CONF.ephemeral_storage_encryption.enabled=True", "CONF.images_type=lvm" and setup the related options accordingly. As a result, Nova VM instances that aren't booted from a volume will use encrypted LVM volumes as their boot disk, located locally on the compute host.

Futhermore, the cloud provider wants to support VM migration scenarios and sets "CONF.state_path" to a NFS directory shared between a group of compute hosts that will support migration. "CONF.instances_path" is set to its default value ("$state_path/instances") to support this scenario. As a result, the image cache (which is always a subdirectory of "CONF.instances_path") is not a local source from the compute host's perspective anymore.
===

There are some invalid assumptions here:

1. Instances on NFS storage is not required for live migration.
2. LVM doesn't use NFS storage, so would not be shared anyway.
3. Live migration of instances using LVM storage is not supported in any case.

If your only concern is that you don't trust NFS, then simply don't configure it and use images_type=flat or images_type=qcow2. Both of these can be live migrated without requiring NFS. This obviously requires a block migration as it doesn't use shared storage (and is therefore slow), but any solution involving LVM, even if live migration were supported, would also obviously involve a block migration.

If your concern is that you also don't trust local storage you're opening the can of worms I described in my point number 2.

From your comment 9, though, I think we can close this? It's interesting to discover that LVM encryption is useless, but not strictly related.

Revision history for this message
Markus Hentsch (mhen) wrote :

Thanks for bearing with me Matthew!

> 1. Instances on NFS storage is not required for live migration.
> 2. LVM doesn't use NFS storage, so would not be shared anyway.

I see now. This was the root of my misinterpretation of the configuration docs. I wrongly assumed that the "CONF.state_path" always needed to be shared in migration scenarios for state synchronization purposes, regardless of whether it actually contains the ephemeral disks (flat, qcow2) or not (lvm). I do understand now that this isn't the case.

> 3. Live migration of instances using LVM storage is not supported in any case.

Thanks for pointing that out! This is valuable information as well.

> From your comment 9, though, I think we can close this? It's interesting to discover that LVM encryption is useless, but not strictly related.

I agree. Due to the facts you pointed out, there seems to be no scenario where the proposed cache verification would provide any substantial benefit considering the big picture you described in paragraph 2 of comment #7.

Revision history for this message
melanie witt (melwitt) wrote :

I'm going to mark this as a duplicate of bug 1785668 for the sake of trying to clean up the bug backlog a bit. If you think this should really be separate, let me know and I will un-duplicate it.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.