Comment 13 for bug 1990157

Revision history for this message
Brian Rosmaita (brian-rosmaita) wrote : Re: Malicious image data modification can happen when using COW

I guess that I'm not being clear about my position here. What I'm saying is:

1. This exploit is a straightforward implication of knowledge that has been discussed publicly (even if not a lot of people paid attention to it; my point is that it's "out there"). So I think it's important to publish an OSS{AN} (whatever the VMT thinks is appropriate) to clue in/remind operators that there is a known vulnerability that they can take action about immediately, to wit:
- deploy separate internal/external glance-api so that multiple locations are not shown to end users
- or, if that looks too destabilizing to do instantly in a deployment, restrict the methods of image broadcast (publicize, communitize, share) until ^^ is done

2. Soon as we get the OSS{AN} published (which I think could happen next week), open this bug and use the PTG to discuss a long-term solution in public with any operators who care to attend. Since Red Hat in particular has an interest in OpenStack + Ceph configurations, we can reach out to some RH product managers who can attend and provide input, or will be able to get the word out to some large operators, who will hopefully provide direct input. There are some big tradeoffs here that we can't assess on our own. Right now, everything is aimed toward speed, and we heed help assessing how much of a slowdown people are willing to accept (if any), and under what circumstances.

3. I personally have never liked this non-checksummed image creation and consumption, but it's what operators have been willing to accept for performance. What I particularly don't like is that the current situation makes it *impossible* under some configurations of nova/glance/cinder to guarantee a verification chain for an image. If you don't use Ceph or the cinder glance_store, you are guaranteed a hash check of sha512 (or stronger, if the operator has configured it) at the point when the image is consumed. (IMO, this is just as strong as image signature verification [0], with none of the hassle for end users.) But this isn't available for some configurations, and maybe that's OK; it's an operator (and their customers, who can vote with their feet) choice. But maybe not everyone is aware of this choice (which will hopefully be addressed by item 1 above). Note that this is independent of the exploit discussed by this bug, which is malicious image provision via manipulation of glance's location record. Even if we re-do locations so that only nova can set them, there's still the issue of image data substitution in the backend without modifying the location uri recorded in glance.

4. I think an acceptable compromise would be to rely on image signature verification for deployment configurations that allow non-checksummed images. This is not an immediate solution because signature verification is not supported in those configurations where it would be really useful. It imposes a speed penalty, but it's also right in your face that you are making a speed/security tradeoff, because you (the end user) are adding a bunch of image metadata specifically for this purpose.
  Additionally, the image-signature-verification implementation is inconsistent between glance, nova, and cinder [1]. This would be a good opportunity to get it sorted out to make it really useful.

Or there may be another way to solve this. But I think at this point we should get the OSS{AN} out right away so we can work on the solution in public with the community.

[0] Current glance multihash validation is just as good as image signature verification, because if you don't trust the infrastructure to perform the hash check correctly, it would also be pointless to trust the infrastructure to handle the signature verification correctly. At least that's my opinion. I think the only way to have surety about this in a public cloud would be to upload your own small image that is used to boot a vm, checks itself for integrity, then downloads a signed payload from some external location and verifies the payload, then rewrites the boot disk and reboots itself. There's probably already a service that does this somewhere.

[1] roughly:
- In glance, if you have all the img_signature* properties set before the data is uploaded, glance will verify the signature. If any are missing, glance doesn't do the verification. This is an intentional design so that you can sign images created by Nova or uploaded by Cinder (you download, compute the signature, and then set all the properties).
- In nova, if you turn the verify_glance_signatures option on, if the img_signature* properties are incomplete or not available, the instance goes to an error state. Nova also has an option to check for the validity of the images's signing cert; glance and cinder don't to this
- In cinder, if verify_glance_signatures is enabled, cinder looks for the img_signature* properties; if they any of them are there, it uses them to verify the image download (volume goes to error state if verification fails); but unlike nova, if no img_signature* properties are present, cinder allows the volume creation.