0-size images allow unprivileged user to deplete glance resources

Bug #1401170 reported by George Shuklin on 2014-12-10
262
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Glance
Wishlist
Unassigned
OpenStack Security Advisory
Undecided
Unassigned
OpenStack Security Notes
Undecided
Eric Brown

Bug Description

Glance allows to create 0-size images ('glance image-create' without parameters). Those images do not consume resources of storage backend and do not hit any limits for size, but take up space in database.

Malicious user can cause database resource depletion with endless flood of 'image-create' requests. Because an empty request is small it will cause more strain on openstack than on the attacker.

RateLimit on API requests allows to delay consequences of attack, but does not prevent it.

Here is simple script to run attack:
while true;do curl -i -X POST -H 'X-Auth-Token: ***' http://glance-endpoint:9292/v1/images;done

My estimation for database growth is about 1Mb/minute (with extra-slow shell-based attack, but a specially crafted script will allow to run it with RateLimit speed).

description: updated
description: updated
summary: - 0-sized images allow unpriveleged user to deplete glance resources
+ 0-size images allow unprivileged user to deplete glance resources
Changed in ossa:
status: New → Incomplete
Grant Murphy (gmurphy) wrote :

Thanks for the report, the OSSA task is set to incomplete pending additional security review from glance-coresec.

The OSSA tasks have been set to incomplete pending review from glance-coresec.
This looks like a valid avenue for denial of service attacks.

Jeremy Stanley (fungi) wrote :

Traditionally we've not considered this sort of exploit a security vulnerability. The lack of built-in quota for particular kinds of database entries isn't necessarily a design flaw, but even if it can/should be fixed it's likely not going to get addressed in stable backports, is not something for which we would issue a security advisory, and so doesn't need to be kept under secret embargo. Does anyone else disagree?

Flavio Percoco (flaper87) wrote :

Hey George,

I agree it sounds scary that you can create that many images without any limit but that's like that by-design. Therefore, I don't think this is a security issue. For example, what would be the difference between creating 0-sized image without data and 0-sized images with data?

I mean, it's a normal workflow for users to create the image and then upload the data:

$ glance image-create --name test
$ glance image-upload $IMAGE_ID < my_file.qcow2

Glance has never had any kind of rate-limit and I hardly doubt it will in the near future. Rate limits are easy to have outside Glance and they could also be shared across multiple services by using things like haproxy.

I agree with Jeremy here, I'll also mark Glance's bug as Invalid. I'm open to debate on the Glance side, though.

Changed in glance:
status: New → Invalid
George Shuklin (george-shuklin) wrote :

If image have something inside, it will be accounted and billed. Zero-sized images are not.

I've ran test for 15 minutes and got many funny problems with few parts of OS: horizon almost dying, glance-client just can not show image list (after 10minutes of 100% cpu for mysql server): https://bugs.launchpad.net/python-glanceclient/+bug/1401188

If this is not a security bug, it should at least be considered a normal bug, because in multi-tennant environment one non-privileged user can cause problems to other users and/or administrators.

Jeremy Stanley (fungi) wrote :

It sounds like the concern raised is less about the rate at which images can be created (particularly if they contain no initial data), but rather the lack of a limit on the total number of images a tenant can create (as it's enforced only on aggregate size instead).

Stuart McLaren (stuart-mclaren) wrote :

This is something I'd like to see addressed.

We (HP) actually carry a patch for this internally.

We limit the creation of images to x images over time period t.

The new parameters are image_create_ratelimit_cap and image_create_ratelimit_sec

The approach is a fairly simple one (in the db layer) -- I'd be interested to know if it would potentially form the base of an upstream patch?

def image_create(context, values):
    """Create an image from the values dictionary."""
    session = _get_session()
    if 'owner' in values:
        # We check that this user (tenant) hasn't exceeded
        # the per-user maximum number of images limit
        query = (session.query(models.Image).
                 filter_by(owner=values['owner']))
        num_images = query.filter_by(deleted=False).count()

        if (CONF.image_create_ratelimit_cap != 0 and
            CONF.image_create_ratelimit_sec != 0):
            delta = datetime.timedelta(seconds=CONF.image_create_ratelimit_sec)
            dt = timeutils.normalize_time(timeutils.utcnow() - delta)
            count = query.filter(models.Image.created_at > dt).count()

            if count >= CONF.image_create_ratelimit_cap:
                msg = _("Maximum number of images (%s) per unit time "
                        "(%s seconds) exceeded" %
                        (CONF.image_create_ratelimit_cap,
                         CONF.image_create_ratelimit_sec))
                LOG.warn(msg + _(" owner: %s" % values['owner']))
                raise webob.exc.HTTPForbidden(explanation=msg)

    return _image_update(context, values, None, False)

George Shuklin (george-shuklin) wrote :

Yep. Lack of quota is the main problem. Ratelimit helps to reduce servility (down to acceptable level).

Thierry Carrez (ttx) wrote :

Adding ossg-coresec

Thierry Carrez (ttx) wrote :

I think it's pretty clear a quota for this should be added.

What's under question here is if it's appropriate to add a feature in a stable release (the missing quota) to cover that case. Even if we get the backward compatibility right, it's still absent from the release documentation and any "Juno" book that was published so far. It's a trade-off between stability (and promises we make around stable branches) and security (and promises we make to keep you secure from known issues).

Is there a possible workaround that could be used by stable branch users ?

For what it worth, on a three node Icehouse setup:

* a non admin user is able to create an infinite amount of 0-sized images

* after about 20.000 images creation, admin user can no longer list image using: "glance image-list --all-tenants", the command failed after ~5 minutes with "maximum recursion depth exceeded"

* however, admin and other users are not affected when using "glance image-list" alone...

Using our latest classification (https://wiki.openstack.org/wiki/Vulnerability_Management#Incident_report_taxonomy) I suggest a Class B2 for this bug.

Jeremy Stanley (fungi) wrote :

I agree with B2. There is no amplification involved, the entries themselves would take a while to actually overrun the database itself, there's no implied impact on other tenants, and admins generally will have access to alternative means of deleting these entries which don't require them to be able to use the API to generate a list.

Jeremy Stanley (fungi) wrote :

Also, since the discussion surrounding this bug is already nearly public with an openstack-dev mailing list thread[1] linking to it directly, I think we should at the very least take it out of embargoed status.

[1] http://lists.openstack.org/pipermail/openstack-dev/2014-December/052750.html

Thierry Carrez (ttx) wrote :

Even if the general type of issue was unfortunately partially disclosed on the ML, I would argue that nobody can guess where a quota is actually missing in such an exploitable fashion... so I would rather keep this embargoed until we make a final decision on it.

Since this is easily exploitable (although there aren't that many Glance public servers), I would prefer we have some workaround that public Glance servers operators can opt to deploy before we disclose this. For example an "optional patch" that an OSSN on the issue could mention.

Robert Clark (robert-clark) wrote :

Amplification quotients need to be considered carefully. An attacker who can leverage this attack, which is predicated on having valid authentication credentials on the cloud, could trivally spin up N number of compute instances to generate the glance requests.

Flavio Percoco (flaper87) wrote :

I think I misunderstood the original request of this bug (sorry about that). I agree a quota for the number of images that can be created makes sense.

I think it'd be hard to provide a fix for this without adding a new config option to Glance. Probably, the smallest possible fix would be to just add a limit for the number of images that can be created (ignoring the rate-limit, therefore just 1 config option) and provide a sensible value that won't require most deployments to change config files as well.

Thoughts? It should be fairly straightforward to do so.

(Moved the bug back to New)

Changed in glance:
status: Invalid → New

May be just add a limit with default value set to infinity? This will not change running installation in any way, but give administrators way to limit.

... And there is one more thing I didn't thought initially. Deleted images is not deleted from database, just marked as 'deleted'. Than means tenant can flood database with create/delete requests of 0-size images. Because there is no backend operations (nothing to store to swift) it will just flood database with new entries, and those entries will be invisible for everyone, except as select count(*) from images; in glance database.

My proposal:

1) Forbid creation of zero-size images without 'location' field. Is any valid case for such requests? They stay in 'queued' state indefinitely.
2) Add field to config to set quota per tenant.

What to do with Create/Delete combo I just don't know...

I'll write proof of concept script to flood glance database (faster than bash+curl).

> 1) Forbid creation of zero-size images without 'location' field. Is any valid case for such requests? They stay in 'queued' state indefinitely.

That is too much of a change to the API in my opinion. For example it would break Nova. You could enter a bogus location anyway.

> What to do with Create/Delete combo I just don't know...

This is an (early) WIP patch to limit creation of images to x images over time period t:

 https://review.openstack.org/#/c/141071/

FWIW I previously pushed up a quota patch that would have prevented this issue, but it didn't get any traction:

https://review.openstack.org/#/c/42122

Are people currently more open to a simple db layer approach to quotas (as per https://review.openstack.org/#/c/42122)?

If so, I can resurrect that patch (adding support for 'simple' db as requested by John Bresnahan.
As far as I can tell that approach is consistent with the existing storage usage quota.

As regards backporting to stable: it may be a resonable thing to do if quotas are disabled by default and the relative code was in a single block which would not be entered unless quotas were configured.

Jeremy Stanley (fungi) wrote :

Though I believe historically if we've backported features which allow the admin to change configuration and mitigate an attack, we've not issued a security advisory (since the patch does not automatically protect an environment against this without additional intervention from the admin to enable it).

Changed in glance:
assignee: nobody → Stuart McLaren (stuart-mclaren)
status: New → In Progress

As there is already a change under public review in addition to the mailing list discussion, I propose we open this bug next Monday if nobody have objections...

And with a lack of amplification nor impact to other tenants, would this still warrant an OSSN ?

Thierry Carrez (ttx) wrote :

I still think this warrants an OSSN, but we'll need to offer a workaround (worst case, a non-stable-supported patch to add the new quota).

+1 for opening it since it's pretty clear from proposed patches and public discussion at this point.

information type: Private Security → Public Security

Here script to test the scale of the problem: melt.py (in attachment). All data (glance endpoint and token configured inside script)

Two cases:
1) Just create many empty images
2) -d option: Create and delete (will ignore quotas for images and cause database growth).

I ran it in '-d' for 8 hours.

Create-and-delete is not a serious problem. Heavy attack (6 workers) has been growing glance database from 30Mb to 338Mb (in 8 hours, with single HDD for mysql).

Thierry Carrez (ttx) wrote :

Looks like the impact is limited and could be mistaken for normal activity, so I propose class C1 (i.e. not issue an OSSA, even if some parties may decide to issue a CVE about it)

This is not considered a practical vulnerability (class C1), thus I mark the OSSA task as won't fix.

Changed in ossa:
status: Incomplete → Won't Fix

I think there is some confusion. All attack consisted of two parts:

1. No quota for image number and no ratelimit for image creation.
2. 'Leftovers' from pair of create/delete.

I found that 'leftovers' is not big deal. But overeating glance resources is. Now there is no limitation on how much or how fast user can create/delete/modify images, and this is a serious problem.

@George, the Vulnerability Management Team deemed the overeating acceptable in term of security. Thus we have removed the OSSA task (which means no advisory for this bug), though the bug is still open to hardening and should be fixed.

Change abandoned by Stuart McLaren (<email address hidden>) on branch: master
Review: https://review.openstack.org/141071
Reason: Abandoning. We should try to look at better rate limiting.

Grant Murphy (gmurphy) on 2015-09-02
Changed in ossn:
assignee: nobody → Grant Murphy (gmurphy)
Grant Murphy (gmurphy) on 2015-09-02
Changed in ossn:
status: New → Fix Committed
Eric Brown (ericwb) wrote :
Changed in ossn:
assignee: Grant Murphy (gmurphy) → Eric Brown (ericwb)
Changed in glance:
importance: Undecided → High
Nathan Kinder (nkinder) wrote :

This has been published as OSSN-0057:

  https://wiki.openstack.org/wiki/OSSN/OSSN-0057

Changed in ossn:
status: Fix Committed → Fix Released
Changed in glance:
status: In Progress → Won't Fix
importance: High → Wishlist
assignee: Stuart McLaren (stuart-mclaren) → nobody
Erno Kuvaja (jokke) wrote :

ratelimiting should be done in the loadbalancer rather than implemented in the service.

Mitigation instructions are in the related OSSN.

To post a comment you must log in.
This report contains Public Security information  Edit
Everyone can see this security related information.

Other bug subscribers

Bug attachments