OpenStack Security Advisory

0-size images allow unprivileged user to deplete glance resources

Bug #1401170 reported by George Shuklin on 2014-12-10

262

This bug affects 1 person

	Status	Importance	Assigned to
Glance	Won't Fix	Wishlist	Unassigned
OpenStack Security Advisory	Won't Fix	Undecided	Unassigned
OpenStack Security Notes	Fix Released	Undecided	Eric Brown

Bug Description

Glance allows to create 0-size images ('glance image-create' without parameters). Those images do not consume resources of storage backend and do not hit any limits for size, but take up space in database.

Malicious user can cause database resource depletion with endless flood of 'image-create' requests. Because an empty request is small it will cause more strain on openstack than on the attacker.

RateLimit on API requests allows to delay consequences of attack, but does not prevent it.

Here is simple script to run attack:
while true;do curl -i -X POST -H 'X-Auth-Token: ***' http://glance-endpoint:9292/v1/images;done

My estimation for database growth is about 1Mb/minute (with extra-slow shell-based attack, but a specially crafted script will allow to run it with RateLimit speed).

See original description

George Shuklin (george-shuklin) on 2014-12-10

description:

updated

George Shuklin (george-shuklin) on 2014-12-10

description:	updated
summary:	- 0-sized images allow unpriveleged user to deplete glance resources + 0-size images allow unprivileged user to deplete glance resources

Tristan Cacqueray (tristan-cacqueray) on 2014-12-10

Changed in ossa:
status:	New → Incomplete

Revision history for this message

Grant Murphy (gmurphy) wrote on 2014-12-10:

Thanks for the report, the OSSA task is set to incomplete pending additional security review from glance-coresec.

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2014-12-10:

The OSSA tasks have been set to incomplete pending review from glance-coresec.
This looks like a valid avenue for denial of service attacks.

Revision history for this message

Jeremy Stanley (fungi) wrote on 2014-12-10:

Traditionally we've not considered this sort of exploit a security vulnerability. The lack of built-in quota for particular kinds of database entries isn't necessarily a design flaw, but even if it can/should be fixed it's likely not going to get addressed in stable backports, is not something for which we would issue a security advisory, and so doesn't need to be kept under secret embargo. Does anyone else disagree?

Revision history for this message

Flavio Percoco (flaper87) wrote on 2014-12-10:

Hey George,

I agree it sounds scary that you can create that many images without any limit but that's like that by-design. Therefore, I don't think this is a security issue. For example, what would be the difference between creating 0-sized image without data and 0-sized images with data?

I mean, it's a normal workflow for users to create the image and then upload the data:

$ glance image-create --name test
$ glance image-upload $IMAGE_ID < my_file.qcow2

Glance has never had any kind of rate-limit and I hardly doubt it will in the near future. Rate limits are easy to have outside Glance and they could also be shared across multiple services by using things like haproxy.

I agree with Jeremy here, I'll also mark Glance's bug as Invalid. I'm open to debate on the Glance side, though.

Changed in glance:
status:	New → Invalid

Revision history for this message

George Shuklin (george-shuklin) wrote on 2014-12-10:

If image have something inside, it will be accounted and billed. Zero-sized images are not.

I've ran test for 15 minutes and got many funny problems with few parts of OS: horizon almost dying, glance-client just can not show image list (after 10minutes of 100% cpu for mysql server): https://bugs.launchpad.net/python-glanceclient/+bug/1401188

If this is not a security bug, it should at least be considered a normal bug, because in multi-tennant environment one non-privileged user can cause problems to other users and/or administrators.

Revision history for this message

Jeremy Stanley (fungi) wrote on 2014-12-11:

It sounds like the concern raised is less about the rate at which images can be created (particularly if they contain no initial data), but rather the lack of a limit on the total number of images a tenant can create (as it's enforced only on aggregate size instead).

Revision history for this message

Stuart McLaren (stuart-mclaren) wrote on 2014-12-11:

This is something I'd like to see addressed.

We (HP) actually carry a patch for this internally.

We limit the creation of images to x images over time period t.

The new parameters are image_create_ratelimit_cap and image_create_ratelimit_sec

The approach is a fairly simple one (in the db layer) -- I'd be interested to know if it would potentially form the base of an upstream patch?

def image_create(context, values):
    """Create an image from the values dictionary."""
    session = _get_session()
    if 'owner' in values:
        # We check that this user (tenant) hasn't exceeded
        # the per-user maximum number of images limit
        query = (session.query(models.Image).
                 filter_by(owner=values['owner']))
        num_images = query.filter_by(deleted=False).count()

        if (CONF.image_create_ratelimit_cap != 0 and
            CONF.image_create_ratelimit_sec != 0):
            delta = datetime.timedelta(seconds=CONF.image_create_ratelimit_sec)
            dt = timeutils.normalize_time(timeutils.utcnow() - delta)
            count = query.filter(models.Image.created_at > dt).count()

            if count >= CONF.image_create_ratelimit_cap:
                msg = _("Maximum number of images (%s) per unit time "
                        "(%s seconds) exceeded" %
                        (CONF.image_create_ratelimit_cap,
                         CONF.image_create_ratelimit_sec))
                LOG.warn(msg + _(" owner: %s" % values['owner']))
                raise webob.exc.HTTPForbidden(explanation=msg)

return _image_update(context, values, None, False)

Revision history for this message

George Shuklin (george-shuklin) wrote on 2014-12-11:

Yep. Lack of quota is the main problem. Ratelimit helps to reduce servility (down to acceptable level).

Revision history for this message

Thierry Carrez (ttx) wrote on 2014-12-11:

Adding ossg-coresec

Revision history for this message

Thierry Carrez (ttx) wrote on 2014-12-11:

#10

I think it's pretty clear a quota for this should be added.

What's under question here is if it's appropriate to add a feature in a stable release (the missing quota) to cover that case. Even if we get the backward compatibility right, it's still absent from the release documentation and any "Juno" book that was published so far. It's a trade-off between stability (and promises we make around stable branches) and security (and promises we make to keep you secure from known issues).

Is there a possible workaround that could be used by stable branch users ?

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2014-12-11:

#11

For what it worth, on a three node Icehouse setup:

* a non admin user is able to create an infinite amount of 0-sized images

* after about 20.000 images creation, admin user can no longer list image using: "glance image-list --all-tenants", the command failed after ~5 minutes with "maximum recursion depth exceeded"

* however, admin and other users are not affected when using "glance image-list" alone...

Using our latest classification (https://wiki.openstack.org/wiki/Vulnerability_Management#Incident_report_taxonomy) I suggest a Class B2 for this bug.

Revision history for this message

Jeremy Stanley (fungi) wrote on 2014-12-11:

#12

I agree with B2. There is no amplification involved, the entries themselves would take a while to actually overrun the database itself, there's no implied impact on other tenants, and admins generally will have access to alternative means of deleting these entries which don't require them to be able to use the API to generate a list.

Revision history for this message

Jeremy Stanley (fungi) wrote on 2014-12-11:

#13

Also, since the discussion surrounding this bug is already nearly public with an openstack-dev mailing list thread[1] linking to it directly, I think we should at the very least take it out of embargoed status.

[1] http://lists.openstack.org/pipermail/openstack-dev/2014-December/052750.html

Revision history for this message

Thierry Carrez (ttx) wrote on 2014-12-12:

#14

Even if the general type of issue was unfortunately partially disclosed on the ML, I would argue that nobody can guess where a quota is actually missing in such an exploitable fashion... so I would rather keep this embargoed until we make a final decision on it.

Since this is easily exploitable (although there aren't that many Glance public servers), I would prefer we have some workaround that public Glance servers operators can opt to deploy before we disclose this. For example an "optional patch" that an OSSN on the issue could mention.

Revision history for this message

Robert Clark (robert-clark) wrote on 2014-12-12:

#15

Amplification quotients need to be considered carefully. An attacker who can leverage this attack, which is predicated on having valid authentication credentials on the cloud, could trivally spin up N number of compute instances to generate the glance requests.

Revision history for this message

Flavio Percoco (flaper87) wrote on 2014-12-12:

#16

I think I misunderstood the original request of this bug (sorry about that). I agree a quota for the number of images that can be created makes sense.

I think it'd be hard to provide a fix for this without adding a new config option to Glance. Probably, the smallest possible fix would be to just add a limit for the number of images that can be created (ignoring the rate-limit, therefore just 1 config option) and provide a sensible value that won't require most deployments to change config files as well.

Thoughts? It should be fairly straightforward to do so.

(Moved the bug back to New)

Changed in glance:
status:	Invalid → New

Revision history for this message

George Shuklin (george-shuklin) wrote on 2014-12-12:

#17

May be just add a limit with default value set to infinity? This will not change running installation in any way, but give administrators way to limit.

... And there is one more thing I didn't thought initially. Deleted images is not deleted from database, just marked as 'deleted'. Than means tenant can flood database with create/delete requests of 0-size images. Because there is no backend operations (nothing to store to swift) it will just flood database with new entries, and those entries will be invisible for everyone, except as select count(*) from images; in glance database.

My proposal:

1) Forbid creation of zero-size images without 'location' field. Is any valid case for such requests? They stay in 'queued' state indefinitely.
2) Add field to config to set quota per tenant.

What to do with Create/Delete combo I just don't know...

I'll write proof of concept script to flood glance database (faster than bash+curl).

Revision history for this message

Stuart McLaren (stuart-mclaren) wrote on 2014-12-13:

#18

> 1) Forbid creation of zero-size images without 'location' field. Is any valid case for such requests? They stay in 'queued' state indefinitely.

That is too much of a change to the API in my opinion. For example it would break Nova. You could enter a bogus location anyway.

> What to do with Create/Delete combo I just don't know...

This is an (early) WIP patch to limit creation of images to x images over time period t:

https://review.openstack.org/#/c/141071/

FWIW I previously pushed up a quota patch that would have prevented this issue, but it didn't get any traction:

https://review.openstack.org/#/c/42122

Are people currently more open to a simple db layer approach to quotas (as per https://review.openstack.org/#/c/42122)?

If so, I can resurrect that patch (adding support for 'simple' db as requested by John Bresnahan.
As far as I can tell that approach is consistent with the existing storage usage quota.

As regards backporting to stable: it may be a resonable thing to do if quotas are disabled by default and the relative code was in a single block which would not be entered unless quotas were configured.

Revision history for this message

Jeremy Stanley (fungi) wrote on 2014-12-13:

#19

Though I believe historically if we've backported features which allow the admin to change configuration and mitigate an attack, we've not issued a security advisory (since the patch does not automatically protect an environment against this without additional intervention from the admin to enable it).

Revision history for this message

Stuart McLaren (stuart-mclaren) wrote on 2014-12-14:

#20

https://review.openstack.org/#/c/141071

Changed in glance:
assignee:	nobody → Stuart McLaren (stuart-mclaren)
status:	New → In Progress

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2014-12-16:

#21

As there is already a change under public review in addition to the mailing list discussion, I propose we open this bug next Monday if nobody have objections...

And with a lack of amplification nor impact to other tenants, would this still warrant an OSSN ?

Revision history for this message

Thierry Carrez (ttx) wrote on 2014-12-22:

#22

I still think this warrants an OSSN, but we'll need to offer a workaround (worst case, a non-stable-supported patch to add the new quota).

+1 for opening it since it's pretty clear from proposed patches and public discussion at this point.

information type:

Private Security → Public Security

Revision history for this message

George Shuklin (george-shuklin) wrote on 2014-12-24:

#23

melt.py Edit (1.4 KiB, text/x-python)

Here script to test the scale of the problem: melt.py (in attachment). All data (glance endpoint and token configured inside script)

Two cases:
1) Just create many empty images
2) -d option: Create and delete (will ignore quotas for images and cause database growth).

Revision history for this message

George Shuklin (george-shuklin) wrote on 2014-12-25:

#24

I ran it in '-d' for 8 hours.

Create-and-delete is not a serious problem. Heavy attack (6 workers) has been growing glance database from 30Mb to 338Mb (in 8 hours, with single HDD for mysql).

Revision history for this message

Thierry Carrez (ttx) wrote on 2015-01-05:

#25

Looks like the impact is limited and could be mistaken for normal activity, so I propose class C1 (i.e. not issue an OSSA, even if some parties may decide to issue a CVE about it)

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2015-01-12:

#26

This is not considered a practical vulnerability (class C1), thus I mark the OSSA task as won't fix.

Changed in ossa:
status:	Incomplete → Won't Fix

Revision history for this message

George Shuklin (george-shuklin) wrote on 2015-01-12:

#27

I think there is some confusion. All attack consisted of two parts:

1. No quota for image number and no ratelimit for image creation.
2. 'Leftovers' from pair of create/delete.

I found that 'leftovers' is not big deal. But overeating glance resources is. Now there is no limitation on how much or how fast user can create/delete/modify images, and this is a serious problem.

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2015-01-12:

#28

@George, the Vulnerability Management Team deemed the overeating acceptable in term of security. Thus we have removed the OSSA task (which means no advisory for this bug), though the bug is still open to hardening and should be fixed.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-09: Change abandoned on glance (master)

#29

Change abandoned by Stuart McLaren (<email address hidden>) on branch: master
Review: https://review.openstack.org/141071
Reason: Abandoning. We should try to look at better rate limiting.

Grant Murphy (gmurphy) on 2015-09-02

Changed in ossn:
assignee:	nobody → Grant Murphy (gmurphy)

Grant Murphy (gmurphy) on 2015-09-02

Changed in ossn:
status:	New → Fix Committed

Revision history for this message

Eric Brown (ericwb) wrote on 2015-09-02:

#30

https://review.openstack.org/#/c/219901/

Changed in ossn:
assignee:	Grant Murphy (gmurphy) → Eric Brown (ericwb)

Nikhil Komawar (nikhil-komawar) on 2015-10-07

Changed in glance:
importance:	Undecided → High

Revision history for this message

Nathan Kinder (nkinder) wrote on 2015-10-15:

#31

This has been published as OSSN-0057:

https://wiki.openstack.org/wiki/OSSN/OSSN-0057

Changed in ossn:
status:	Fix Committed → Fix Released

Brian Rosmaita (brian-rosmaita) on 2017-09-15

Changed in glance:
status:	In Progress → Won't Fix
importance:	High → Wishlist
assignee:	Stuart McLaren (stuart-mclaren) → nobody

Revision history for this message

Erno Kuvaja (jokke) wrote on 2017-09-15:

#32

ratelimiting should be done in the loadbalancer rather than implemented in the service.

Revision history for this message

Brian Rosmaita (brian-rosmaita) wrote on 2017-09-15:

#33

Mitigation instructions are in the related OSSN.

Report a bug

This report contains Public Security information

Everyone can see this security related information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

melt.py Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.