Images v2 api image-create vulnerability

Bug #1545092 reported by Brian Rosmaita on 2016-02-12
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Glance
Undecided
Unassigned
OpenStack Security Advisory
Undecided
Unassigned
OpenStack Security Notes
Undecided
Luke Hinds

Bug Description

This report applies to all versions of Glance.

The POST v2/images call creates an image (record) in 'queued' status. There is no limit enforced in glance on the number of images a single tenant may create, just on the total amount of storage a single user may consume [0]. Thus a user could either maliciously or by mistake clog up multiple database tables (images, image_properties, image_tags, image_members) with useless image records, thereby causing a denial of service.

This is a concern because the approved 2016.0 DefCore specification requires the 'images-v2-index' capability [1, 2]. The tempest test for this capability functions by creating several image records and then checking the GET v2/images response to make sure all these records are returned [3]. Thus any cloud that wishes to qualify under 2016.01 must expose POST v2/images to all end users, thereby exposing such clouds to this vulnerability, which could otherwise be mitigated by restricting POST v2/images to trusted users.

[0] https://github.com/openstack/glance/blob/132906146dd74a2eeae67706e19e4fa44559bb8b/etc/glance-api.conf#L89
[1] https://github.com/openstack/defcore/blob/master/2016.01.json#L48
[2] https://github.com/openstack/defcore/blob/master/2016.01.json#L1391-L1412
[3] https://github.com/openstack/tempest/blob/df88737b9cdaabb5633b4fefb723676e71cd1af0/tempest/api/image/v2/test_images.py#L184-L191

CVE References

Since this report concerns a possible security risk, an incomplete security advisory task has been added while the core security reviewers for the affected project or projects confirm the bug and discuss the scope of any vulnerability along with potential solutions.

So is Glance missing a quota on the number of image a tenant can create ? It sounds like a well known fact, is there a reason why this is reported as a Private ?

description: updated
Brian Rosmaita (brian-rosmaita) wrote :

It's a knowable fact, but I'm not sure how well-known it is. I'm also not sure how many operators expose this call to end-users. I'm just being conservative -- I figure it's easy to go private -> public, but pretty much impossible to go the other direction.

Jeremy Stanley (fungi) wrote :

You're saying a malicious actor could "fill up" the database with queued image records? Have you seen this in practice or is the impact speculative for now? How quickly can a user theoretically add entries, how large is each entry, and at what point is their accumulation likely to cause impact to other tenants using the system in a typical deployment (let's consider a relatively small and therefore more vulnerable deployment for the sake of argument)? Is it an attack which can be accomplished in mere minutes? Hours? Days?

Changed in ossa:
status: New → Incomplete

Well that's pretty effective, glance image-list can be slowed down beyond usability pretty fast if an user can create (empty) image.

Brian Rosmaita (brian-rosmaita) wrote :
Download full text (4.8 KiB)

@fungi: I have not observed this in practice. But ... the default config in Glance allows additional image properties; default limit on properties/image is 128; default limit on members/image is 128; default limit on tags/image is 128. So for each row added to the images table, you could have up to 128 rows added to image_properties and image_tags.

Here's from my devstack, creating an image record with some core image properties, 128 additional properties, 128 tags:
devVM! time curl -X POST -H "x-auth-token: $TK" -d @big-image-request.json http://localhost:9292/v2/images
{"prop104": "val104", "prop21": "val21", "prop49": "val49", "prop48": "val48", "prop41": "val41", "prop40": "val40", "prop43": "val43", "prop42": "val42", "prop45": "val45", "prop44": "val44", "prop47": "val47", "prop46": "val46", "prop74": "val74", "prop75": "val75", "prop76": "val76", "prop77": "val77", "prop70": "val70", "prop71": "val71", "prop72": "val72", "prop73": "val73", "prop78": "val78", "prop79": "val79", "name": "freaking-big-image", "architecture": "frank-lloyd-wright", "container_format": "bare", "min_ram": 1024, "prop63": "val63", "prop62": "val62", "prop61": "val61", "prop60": "val60", "prop67": "val67", "prop66": "val66", "prop65": "val65", "prop64": "val64", "prop69": "val69", "prop68": "val68", "prop109": "val109", "prop108": "val108", "os_distro": "custom", "prop82": "val82", "tags": ["tag125", "tag124", "tag127", "tag126", "tag121", "tag120", "tag123", "tag122", "tag23", "tag22", "tag21", "tag20", "tag27", "tag26", "tag25", "tag24", "tag29", "tag28", "tag110", "tag111", "tag112", "tag113", "tag114", "tag115", "tag116", "tag117", "tag118", "tag119", "tag38", "tag39", "tag34", "tag35", "tag36", "tag37", "tag30", "tag31", "tag32", "tag33", "tag4", "tag5", "tag6", "tag7", "tag0", "tag1", "tag2", "tag3", "tag8", "tag9", "tag109", "tag108", "tag107", "tag106", "tag105", "tag104", "tag103", "tag102", "tag101", "tag100", "tag49", "tag48", "tag41", "tag40", "tag43", "tag42", "tag45", "tag44", "tag47", "tag46", "tag58", "tag59", "tag52", "tag53", "tag50", "tag51", "tag56", "tag57", "tag54", "tag55", "tag67", "tag66", "tag65", "tag64", "tag63", "tag62", "tag61", "tag60", "tag69", "tag68", "tag70", "tag71", "tag72", "tag73", "tag74", "tag75", "tag76", "tag77", "tag78", "tag79", "tag89", "tag88", "tag85", "tag84", "tag87", "tag86", "tag81", "tag80", "tag83", "tag82", "tag98", "tag99", "tag96", "tag97", "tag94", "tag95", "tag92", "tag93", "tag90", "tag91", "tag16", "tag17", "tag14", "tag15", "tag12", "tag13", "tag10", "tag11", "tag18", "tag19"], "prop114": "val114", "prop115": "val115", "prop116": "val116", "prop117": "val117", "prop110": "val110", "prop111": "val111", "prop98": "val98", "prop99": "val99", "prop96": "val96", "prop97": "val97", "prop94": "val94", "prop95": "val95", "prop92": "val92", "prop93": "val93", "prop90": "val90", "prop91": "val91", "prop16": "val16", "prop17": "val17", "prop14": "val14", "prop15": "val15", "prop12": "val12", "prop13": "val13", "prop10": "val10", "prop11": "val11", "checksum": null, "prop18": "val18", "prop19": "val19", "prop4": "val4", "prop5": "val5", "prop6": "val6", "prop7": "val7", "prop0": "val0", "prop1": "val1", "prop...

Read more...

Oh well the image-list slowdown is specific to glanceclient, openstack client is not affected since it paginate by default...
Here is my reproducer to create a lot of image (without metadata):

TOKEN=$(openstack token issue | awk '/\ id/ { print $4 }')
GLANCE=$(openstack catalog show image | awk '/publicURL/ { print $4 }')
while true; do
   curl -X POST -H "X-Auth-Token: ${TOKEN}" -H "x-image-meta-name: dummy_$idx" ${GLANCE}/v1/images;
done

At least on 2015.1.1 there is no quota for image creation, and since the "glance image-list" execution time is proportional to the number of image, it doesn't take long before the client is not usable.

Jeremy Stanley (fungi) wrote :

Got it, so the impact is mostly that it can render image listing from clients unusable if the deployment allows untrusted users to upload images for "global" availability rather than merely for their own use?

Nikhil Komawar (nikhil-komawar) wrote :

I am confirming that this issue is real. Thanks for raising this Brian, the description looks great.

However, I would like to elaborate a bit more on the problem, probably answering a few of fungi's questions.

I think the intent was to express the effect of loosely coupled image record creation, that is almost a no-cost operation to the user with that of the relatively heavy weight image activation process that includes data uploads.

I see there are a few problems here:

1. Image records that are created are put in queued status and are query-able (visible to the user). All the more, user can set the limited set of properties, tags etc. metadata on these images -- at a negligible cost.
2. glanceclient allows successful image creation without any parameters (glance image-create) -- cost to user is significantly reduced.
3. Although, a default page size is imposed on the query one may choose to include a upper limit on the page size thereby increasing the size and complexity of the image query. A large number of such images each, with significant number of image properties, tags, members can result into slowness of the response, sometimes even resulting into 500s.
4. Listing of image through the Nova proxy Images API will be even worse experience (as the image list will be returned as Glance DB -> Glance registry -> Glance API -> glanceclient -> Nova API -> user/novaclient. Likelyhood of 5xx responses is quite high in this case.
5. I am skeptical on the DOS here as modern day applications especially those that include geo-spatial data create so much more data in the DB. The question (to me at least) is of the query-able data (or in BigData term you can think of as Hot data).

I think we need to consider this affecting the v1 Images API too as that is still required to be deployed parallel to Nova and if exposed to users it will have the same effect (for the sake of completeness of the security impact).

Though the experience of the query or client (as described in the comments above) can be improved by imposing stricter defaults in the image-list query.

Thoughts?

Brian Rosmaita (brian-rosmaita) wrote :

Thanks for the analysis, Nikhil.

Your point (5) is well-taken about the capabilities of modern DBMS, although I imagine that the glance DB servers on most clouds are probably not very beefy, as the Glance DB doesn't see anywhere near the amount of transactions as, say, the Nova DB.

My main point in bringing this up is the DefCore 2016.01 situation. To pass the 'images-v2-index' test, clouds that may not have exposed the v2 image-create call to end-users previously will have to expose it now. Even if it's not a complete DOS, as you explain in point (4), it can lead to a pretty bad user experience.

Changed in glance:
status: New → Confirmed

I agree with you on both accounts.

1. Glance DB may not be the most transaction efficient resources in OpenStack clouds.
2. And definitely on the non-DefCore-friendly aspect of exposing the create image calls using v2 API including bad user experience and may be even a issue for heavily used small private clouds.

My intent on pointing out the v1 issue was that if this happens to be communicated as a risk to the community then we should also confirm the issue with v1 either in the same description or explicitly as v1 issue, for completeness sake.

Thanks for raising this Brian.

Some overall observations.

1) The glance server paginates list responses (default images listed per response -- limit_param_default -- = 25), so the glance client also paginates image listing.
2) The maximum number of images per page is limited on the server side (default limit -- api_limit_max -- is 1000; this can be reduced by operators)
3) By default, user X cannot add images to user Y's listing [1]

So a user can just add (lots of) database records which won't be seen by other users.

Do we have hard data on how much user X can slow down listing (or other db intensive operations) for user Y?

Eg if we're using mysql, and user X creates 1,000,000 queued images, what is the effect -- in terms of slowdown -- on user Y's listing/other operations?

Is this a quota issue or a rate limit issue?

For example, would this quota patch be enough to mark as fixed? https://review.openstack.org/#/c/244573/

If a quota is in place a user can still create lots of database records by continuously creating and deleting queued images. Is that considered ok because the operator can purge deleted records?

[1] Marking images as public is admin only by default. The v1 issue of user A spamming user Y with shared images has it's own CVE.

> Do we have hard data on how much user X can slow down listing (or other db intensive operations) for user Y?

I don't have any "real life" data, but in devstack as user X, I added 1K images like the one above (128 additional properties, 128 tags), and when user X and user Y do simultaneous image-list requests, user X was getting a 500 and user Y was seeing the response time double for 2 public images + 1 private image. (But talk about a seriously underpowered database node!).

> Is this a quota issue or a rate limit issue?

As your create-and-delete example illustrates, it's probably both.

> I didn't try this, but I bet if user X creates all those images and than adds Y as a member to all of them, with v2 user Y won't be spammed, but user Y's image list query may be slowed down a bit (although since you don't have to marshall all those rows into JSON as you would with v1, it might not be a big deal).

> user Y was seeing the response time double

Ok, so it does seem to affect third parties in some way. Out of curiousity, when it doubled was that an increase of seconds/microseconds?

>> Is this a quota issue or a rate limit issue?

> As your create-and-delete example illustrates, it's probably both.

On the rate limiting side...

I know of at least one ex-cloud that ported the Swift rate limiting code to Glance....

But I wonder if a better approach to rate limiting would be to not use python but rather something in front of the server, eg haproxy:

https://blog.codecentric.de/en/2014/12/haproxy-http-header-rate-limiting/

Here's some previous advocation of haproxy for rate limiting:

http://www.gossamer-threads.com/lists/openstack/operators/28606
http://lists.openstack.org/pipermail/openstack-operators/2014-June/004611.html

Here's someone suggesting Repose:

http://lists.openstack.org/pipermail/openstack-operators/2014-June/004622.html

It's probably ok to suggest 'something else' for the rate limiting side of this. Although some example docs would be helpful to users.

> Ok, so it does seem to affect third parties in some way. Out of curiousity, when it doubled was that an increase of seconds/microseconds?

It went from around 1 sec to 2 sec. But like I said, that was on a devstack instance, so I don't know what kind of inferences we can draw. (I went back to my devstack instance to try to get some better data, but have managed to completely hose it for reasons that may have nothing to do with this issue.)

> It's probably ok to suggest 'something else' for the rate limiting side of this. Although some example docs would be helpful to users.

I agree. I don't think we want to put rate limiting into glance (or suggest deployers do so), when as you point out, there are some external products that are flexible and configurable. We've had pretty good experience with Repose.
https://repose.atlassian.net/wiki/display/REPOSE/Rate+Limiting+filter

> I don't think we want to put rate limiting into glance

I'd be ok with adding something like the rate limiting middleware that exists in other projects in the future -- it could be convenient for some use cases.

But I don't think it's needed to address the rate limiting side of this bug. Deployers can BYORL (bring your own rate limiter).

@Brian

What do you think is needed here to fix? Is https://review.openstack.org/#/c/244573 enough? (It limits the number of images a user can have, minus 'killed' and 'deleted' images.)

We could try to specifically limit 'queued' images. eg have a separate total for that. But I'm not 100% sure if that's useful or not. Users could still have lots of active zero size images.

@Stuart

I'm wondering whether the rate limiting itself could be a short-term fix, especially since it looks like we'll be doing quotas work in Newton, plus image import which would allow operators to screen zero size images (they wouldn't be valid in any format, i don't think).

My impression of the discussion so far is that people don't think this is a really big deal, possibly not even a security bug. If it's not a serious vulnerability, then maybe we don't need to rush a patch into place.

It's not a big deal since the abuse is obvious and easy to recover (e.g. remove queued image from db). However this is annoying since there are no safe way to let user create images. Perhaps a quota on the number of images a tenant can create would do the trick ?

As for the security impact, I'd like to triage this as a B1 type of bug according to VMT taxonomy ( https://security.openstack.org/vmt-process.html#incident-report-taxonomy ). Like that a Security Note could be issued until a proper patch is proposed.

Flavio Percoco (flaper87) wrote :

Thanks to Brian for reporting this issue and to Nikhil and Stuart for following up.

I think, eventually, we want to:

1) Stop shipping our own wsgi container and let deployers pick their own
2) Recommend deployers to always use a rate limit. I think, as Stuart also suggested, that a third party rate limit software should do the trick here.

As far as the database goes, I believe 244573 should help to prevent attackers from exploiting this issue.

Brian, would you agree with this?

I agree with recommending deployers to always use a rate limit.

My concern about 244573 is basically what Fei Long said on PS 18. The tenant that owns the public images for the cloud may need to have an extremely high image quota, and with 244573 the only way that tenant can have a high enough quota is for everyone else to also have that same high quota. So I'm not sure that 244573 will really help ... we need a real quota system. On the other hand, 244573 is better than nothing, and deployers could also drop the max properties/tags/members allowed on images to limit the amount of junk a single user could stuff into the database. On the third hand, I'm thinking that a very low rate limit by itself might do the trick here, and if we're going to do quotas in Newton, then maybe 244573 isn't going to be very useful.

That's probably no help at all. I'll have to think some more.

I've subscribed OSSG-coresec to discuss an eventual document about rate limiting.
If nobody objects, I'd like to close the OSSA task and remove the privacy setting by the end of this week.

I've switched the OSSA task to Opinion until we figure out how to coordinate this disclosure with a Security Note.

Changed in ossa:
status: Incomplete → Opinion

Just want to clarify what I said about the DefCore tempest test (way up top in the bug report). In addition to creating an image record, it does upload a small amount of "data", too.

I think it's OK to make this public, though I'm just speaking for myself here. We should definitely confirm with the Glance PTL. The mitigation to this attack is a combination of rate limiting the POST v2/images call plus vigilance on what's happening in the database.

It looks like the quota specs proposed near the end of Mitaka will not go into Newton, so an actual fix won't happen until at least the O release.

Thanks for your response Brian.

The risk you mention is more than merely creating the image records for sure. I think the image record creation is already a known issue to be simply a nuisance at this point however, combined with the image properties and members could be seen as a issue. Nevertheless, I do feel that the list response can be handled by fine tuning pagination to ensure slowness is not significant.

The issue of DB to be filled with useless records is something that production clouds should carefully consider when choosing the size of the DB server.

Yes, the quota work isn't planned for Newton and is likely to not begin to materialize until Ocata. So, the rate limiting solutions and the size of DB server are some of the recommendations that can be given as a part of the security notice.

Overall, I think we have a good direction and set of small recommendations to make this a public discussion.

Travis McPeak (travis-mcpeak) wrote :

Any update on this? Seems like we're leaning towards a note. Should this be a public or private note?

It seems like everyone agreed this can be made public, however there isn't much urgency too.
If it's not too cumbersome for you, would it be possible to approve the OSSN here prior to marking this bug public ? Like that, in the event of abuses there will be mitigation readily available.

Good idea on creating a OSSN here prior to marking this bug public. We're all caught up in newton release and getting to such convoluted bugs (that need more b/w) will take a bit more time.

It seems like a glance image-create ossn may also covers bug 1546507 (image deletion through location re-use) and 1606495 (network scan through copy_from).

Luke Hinds (lhinds) on 2016-09-08
Changed in ossn:
assignee: nobody → Luke Hinds (lhinds)
Luke Hinds (lhinds) wrote :

OSSN-0076 reserved.

Luke Hinds (lhinds) wrote :

I am working on a draft note now. Seems the consensus is to suggest rate limiting, so here is a similar OSSN I created for Keystone: http://lists.openstack.org/pipermail/openstack-dev/2016-July/099776.html - perhaps something of the same ilk again?

I think this also adds incentive to expand the rate limiting section in the api endpoint section of the security guide. Will make a note do this, after this OSSN is released.

Luke Hinds (lhinds) wrote :

Hello,

Please review / provide any comments....

Images v2 api image-create vulnerability
---

### Summary ###
No limits are enforced within the Glance image service `v2/images` API POST method for authenticated users, resulting in possible denial of service attacks through database table saturation.

### Affected Services / Software ###
All versions of Glance image service.

### Discussion ###
Within the Glance image service, calls to the POST method within v2/images creates an image (record) in `queued` status. There is no limit enforced within the Glance API on the number of images a single tenant may create, just on the total amount of storage a single user may consume.

Therefore a user could either maliciously or unintentionally fill multiple database tables (images, image_properties, image_tags, image_members) with useless image records, thereby causing a denial of service by lengthening transaction response times in the Glance database.

### Recommended Actions ###
For all versions of Glance that expose the v2/images API, operators are recommended to deploy external rate-limiting proxies or web application firewalls, to provide a front layer of protection to glance.

The following solutions may be considered, however it is key that the operator carefully plans and considers the individual performance needs of users and services within their OpenStack cloud, when configuring any rate limiting functionality.

#### Repose ####
Repose provides a rate limiting filter, that can utilise limits by IP, Role (OpenStack Identity v3 filter), or header.

https://repose.atlassian.net/wiki/display/REPOSE/Rate+Limiting+Filter

#### NGINX ####
NGINX provides the limit_req_module, which can be used to provide a global rate limit. By means of a `map`, it can be limited to just the POST method.

Further details can be found on the nginx site:
http://nginx.org/en/docs/http/ngx_http_limit_req_module.html

#### HAProxy ####
HAProxy can provide inherent rate-limiting using stick-tables with a General
Purpose Counter (gpc)

Further details can be found on the haproxy website:
http://blog.haproxy.com/2012/02/27/use-a-load-balancer-as-a-first-row-of-defense-against-ddos)

#### Apache ####
A number of solutions can be explored here as follows.

##### mod_ratelimit #####
http://httpd.apache.org/docs/2.4/mod/mod_ratelimit.html

##### mod_qos #####
http://opensource.adnovum.ch/mod_qos/dos.html

##### mod_evasive #####
https://www.digitalocean.com/community/tutorials/how-to-protect-against-dos-and-ddos-with-mod_evasive-for-apache-on-centos-7)

##### mod_security #####
https://www.modsecurity.org/

### Contacts / References ###
Author: Luke Hinds, Red Hat
This OSSN : https://wiki.openstack.org/wiki/OSSN/OSSN-0076
Original LaunchPad Bug : https://bugs.launchpad.net/ossn/+bug/1545092
OpenStack Security ML : <email address hidden>
OpenStack Security Group : https://launchpad.net/~openstack-ossg

Luke Hinds (lhinds) wrote :

Any feedback on the draft ^ ?

Note that I'm not a member of the ossg-coresec group, though the proposed security note looks good to me. Perhaps the recommended actions should also mention that image creation can be restricted to administrator only with the glance add_image policy.

Luke Hinds (lhinds) wrote :

Good call Tristan,

<snip>

For all versions of Glance that expose the v2/images API, operators are recommended to consider restricting image creation to administrator only with the glance `add_image` policy and use an external rate-limiting proxy or web application firewall.

To restrict image creation to the role admin only, amend `/etc/glance/policy.json` as follows

    "add_image": "role:admin",

Rate-limiting solutions may also be utilised, however it is key that the operator carefully plans and considers the individual performance needs of users and services within their OpenStack cloud, when configuring any rate limiting functionality.

#### Repose ####
<snip>

Luke Hinds (lhinds) wrote :

Sorry, that should be 'The following Rate-limiting...'

@Luke: sorry about the delay on this. I'm reading it now.

Some quick background:

When I filed this bug, I assumed that most people didn't expose the Images v1 API to end users. That assumption may be false. Even so, because the Compute API proxies many useful image-related calls, there's been no necessity to expose either of the Images APIs at all and still have a full functioning cloud. What prompted this report was the DefCore test that *required* some v2 Images API calls to be exposed; hence, operators who want to qualify their clouds as OpenStack powered would have to expose this v2 call and thereby expose the vulnerability.

Comments:

(1) This issue applies to *both* v1 and v2, it's not a v2-only vulnerability.

(2) Changing the policy to admin-only is tricky, because if you do it on the Glance nodes that Nova uses, you won't be able to create snapshots from the Compute API. I'd suggest really emphasizing the rate limiting, because that would protect the operator from buggy scripts written by trusted users. Then you could mention that depending upon the operator's topology, the operator could consider restricting the "add_image" policy to trusted users identified by some particular role in their cloud, but this should only be done for those cases in which there are Glance nodes dedicated to end-user access only (that is, the nodes are not used by any openstack services).

(3) The Images v1 API is DEPRECATED in Newton, and Nova is now using the v2 API by default. I don't know whether it's worth pointing that out.

(4) The rate-limiting discussion looks good to me.

Luke Hinds (lhinds) wrote :
Download full text (3.4 KiB)

Thanks Brian, let me know if this is ok:

Glance Image service v1 and v2 api image-create vulnerability
---

### Summary ###
No limits are enforced within the Glance image service for both v1 and
v2 `/images` API POST method for authenticated users, resulting in possible
denial of service attacks through database table saturation.

### Affected Services / Software ###
All versions of Glance image service.

### Discussion ###
Within the Glance image service, calls to the POST method within v1 or
v2/images creates an image (record) in `queued` status. There is no limit
enforced within the Glance API on the number of images a single tenant may
create, just on the total amount of storage a single user may consume.

Therefore a user could either maliciously or unintentionally fill multiple
database tables (images, image_properties, image_tags, image_members) with
useless image records, thereby causing a denial of service by lengthening
transaction response times in the Glance database.

### Recommended Actions ###
For all versions of Glance that expose either the v1 and v2/images API,
operators are recommended to deploy external rate-limiting proxies or web
application firewalls, to provide a front layer of protection to glance.

The following solutions may be considered, however it is key that the operator
carefully plans and considers the individual performance needs of users and
services within their OpenStack cloud, when configuring any rate limiting
functionality.

#### Repose ####
Repose provides a rate limiting filter, that can utilise limits by IP,
Role (OpenStack Identity v3 filter) or header.

https://repose.atlassian.net/wiki/display/REPOSE/Rate+Limiting+Filter

#### NGINX ####
NGINX provides the limit_req_module, which can be used to provide a global rate
limit. By means of a `map`, it can be limited to just the POST method.

Further details can be found on the nginx site:
http://nginx.org/en/docs/http/ngx_http_limit_req_module.html

#### HAProxy ####
HAProxy can provide inherent rate-limiting using stick-tables with a General
Purpose Counter (gpc)

Further details can be found on the haproxy website:
http://blog.haproxy.com/2012/02/27/use-a-load-balancer-as-a-first-row-of-defense-against-ddos)

#### Apache ####
A number of solutions can be explored here as follows.

##### mod_ratelimit #####
http://httpd.apache.org/docs/2.4/mod/mod_ratelimit.html

##### mod_qos #####
http://opensource.adnovum.ch/mod_qos/dos.html

##### mod_evasive #####
https://www.digitalocean.com/community/tutorials/how-to-protect-against-dos-and-ddos-with-mod_evasive-for-apache-on-centos-7)

##### mod_security #####
https://www.modsecurity.org/

#### Limit `add_image` to admin role ####

Another possible mitigation is to restrict image creation to the admin role,
however this should only be done for those cases in which there are Glance nodes
dedicated to end-user access only. Restriction to admin only on Glance nodes
that serve OpenStack services will remove the ability to create snapshots from
the Compute API.

To restrict image creation to the role admin only, amend
`/etc/glance/policy.json` accordingly.

    "add_image": "role:admin",

### Contacts / Reference...

Read more...

Luke Hinds (lhinds) wrote :

A few weird word wraps, but will fix those and run a `tox -e checkbuild` to clean up whitespace / wrapping nits.

Thanks, Luke. I think it looks good. I'm asking another Glance sec team member to take a look for a sanity check.

I don't know whether this is worth mentioning or not, you can decide based on your usual practice.

The attack is escalated by creating images that have lots of properties or image members. These are the default settings:

allow_additional_image_properties = true
image_property_quota = 128
image_member_quota = 128
image_tag_quota = 128
image_location_quota = 10

Luke Hinds (lhinds) wrote :

Hi Brian,

So by setting the quotas would that help negate the denial of service attack?

Not sure I'd say it would negate it, but it could maybe slow it down a bit. Under the default setting, an attacker could add one image with the following result:

images table: 1 new row
image_properties table: 128 new rows
image_tags table: 128 new rows

This would have to be via separate API requests:
image_members table: 128 new rows

The problem is that these are system-wide quotas, so restricting image_property_quota to 5, say, would mean that all users have a max of 5 image properties. So maybe it's not worth mentioning.

Luke Hinds (lhinds) wrote :

Hi Brian,

Any luck getting another Glance sec team member in to review this?

Luke

Luke, sorry about the delay, I'll follow up.

Luke's note looks good to me.
Regarding image_properties and image_tags table, +1 to what Brian mentioned. But, the solution there is rate-limiting again. So, if we are recommending that people use rate-limiting in general, what Brian mentioned would be addressed with that (unless we are recommending rate-limiting specifically on image create, which we we are not. So, we should be okay).

Also, I wonder if we should mention explicitly that rate-limiting really doesn't eliminate the attack mentioned here. It only slows it down.

I think Hemanth has a good idea about calling out the rate-limiting as a mitigation, not a solution. Maybe something like this:

### Recommended Actions ### <-- add new sentences at the end of the first paragraph

For all versions of Glance that expose either the v1 and v2/images API,
operators are recommended to deploy external rate-limiting proxies or web
application firewalls, to provide a front layer of protection to glance.
The Glance database should be monitored for abnormal growth. Although
rate-limiting does not eliminate this attack vector, it will slow it to
the point where you can react prior to a denial of service occurring.

#### Limit `add_image` to admin role #### <-- change last sentence

Another possible mitigation is to restrict image creation to the admin role,
however this should only be done for those cases in which there are Glance
nodes dedicated to end-user access only. Restriction to admin only on Glance
nodes that serve OpenStack services will, for example, remove the ability to
create snapshots from the Compute API or to create bootable volumes from
Cinder.

Otherwise, looks good to me!

Luke Hinds (lhinds) wrote :
Download full text (3.7 KiB)

k, one last round up below and an ack needed from VMT and we are good to go.

Glance Image service v1 and v2 api image-create vulnerability
---

### Summary ###
No limits are enforced within the Glance image service for both v1 and
v2 `/images` API POST method for authenticated users, resulting in possible
denial of service attacks through database table saturation.

### Affected Services / Software ###
All versions of Glance image service.

### Discussion ###
Within the Glance image service, calls to the POST method within v1 or
v2/images creates an image (record) in `queued` status. There is no limit
enforced within the Glance API on the number of images a single tenant may
create, just on the total amount of storage a single user may consume.

Therefore a user could either maliciously or unintentionally fill multiple
database tables (images, image_properties, image_tags, image_members) with
useless image records, thereby causing a denial of service by lengthening
transaction response times in the Glance database.

### Recommended Actions ###
For all versions of Glance that expose either the v1 and v2/images API,
operators are recommended to deploy external rate-limiting proxies or web
application firewalls, to provide a front layer of protection to glance. The
Glance database should be monitored for abnormal growth. Although rate-limiting
does not eliminate this attack vector, it will slow it to the point where you
can react prior to a denial of service occurring.

The following solutions may be considered, however it is key that the operator
carefully plans and considers the individual performance needs of users and
services within their OpenStack cloud, when configuring any rate limiting
functionality.

#### Repose ####
Repose provides a rate limiting filter, that can utilise limits by IP,
Role (OpenStack Identity v3 filter) or header.

https://repose.atlassian.net/wiki/display/REPOSE/Rate+Limiting+Filter

#### NGINX ####
NGINX provides the limit_req_module, which can be used to provide a global rate
limit. By means of a `map`, it can be limited to just the POST method.

Further details can be found on the nginx site:
http://nginx.org/en/docs/http/ngx_http_limit_req_module.html

#### HAProxy ####
HAProxy can provide inherent rate-limiting using stick-tables with a General
Purpose Counter (gpc)

Further details can be found on the haproxy website:
http://blog.haproxy.com/2012/02/27/use-a-load-balancer-as-a-first-row-of-defense-against-ddos)

#### Apache ####
A number of solutions can be explored here as follows.

##### mod_ratelimit #####
http://httpd.apache.org/docs/2.4/mod/mod_ratelimit.html

##### mod_qos #####
http://opensource.adnovum.ch/mod_qos/dos.html

##### mod_evasive #####
https://www.digitalocean.com/community/tutorials/how-to-protect-against-dos-and-ddos-with-mod_evasive-for-apache-on-centos-7)

##### mod_security #####
https://www.modsecurity.org/

#### Limit `add_image` to admin role ####

Another possible mitigation is to restrict image creation to the admin role,
however this should only be done for those cases in which there are Glance nodes
dedicated to end-user access only. Restriction to admin only on Glance
node...

Read more...

Thanks, Luke. LGTM except for two typos I noticed: in the ##### mod_evasive ##### and #### HAProxy #### sections, there's an extraneous ')' at the end of the link. The content is great.

Luke Hinds (lhinds) wrote :

Thanks Brian. Tristan and fellow VMT'ers, once OK with you I would like to send this out to downstream stakeholders.

LGTM.
Thanks, Luke.

I hate to be a pest, especially since some of the delay on this is my fault, but it would be helpful to be able to speak about this at the design summit next week. How close are we to issuing the OSSN?

Luke Hinds (lhinds) wrote :

Hi Brian,

I actually have it ready to go, but I am having issues finding someone who can show me the embargo advisory text needed for when the note goes to downstream stakeholders (I took over OSSNs a couple of months ago, but only handled publics up until now). I expect someone will reply to me very shortly.

I am right you would like to discuss more at the summit, before we send the note out?

Luke

@Luke I've forwarded you a copy of the previous advance notice for OSSN. For the record it's called a "pre-OSSN" and it includes a short introduction paragraph along the OSSN.

Do we have a public disclosure date already ?

Luke Hinds (lhinds) wrote :

@Tristan, we typically do 14 days from when I send, but seems Brian would like to discuss this at the summit, so we can do a week later if there are no objections?

Its a pretty common 'you should use a rate limiter' type OSSN, so should not be anything very new to operators.

Luke Hinds (lhinds) wrote :

Had a chat with some other OSSP cores and the consensus was that we could be flexible. So I think with the summit coming up and Brian wanting to discuss this, I will send this out and do a seven day embargo period.

@Luke, sorry I wasn't clear, but I think you have the right idea. I was hoping the note could go out soon so that it would be OK to discuss this issue with the glance people working on image import (which will also expose this vulnerability). It most likely won't come up until the Friday afternoon work session, so a 7 day embargo would be fine if you can swing it. Thanks!

Luke Hinds (lhinds) wrote :

@Brian, OSSN just went out, and I have no issue with seven days (and anyone that does, has seven days to say so), so looks like Friday is looking OK.

@Luke, sounds good, thanks!

Luke Hinds (lhinds) on 2016-10-27
Changed in ossn:
status: New → Fix Released
information type: Private Security → Public
Jeremy Stanley (fungi) on 2016-10-28
description: updated
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers