Bug #1545092 “Images v2 api image-create vulnerability” : Bugs : OpenStack Security Notes

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2016-02-12:

#1

Since this report concerns a possible security risk, an incomplete security advisory task has been added while the core security reviewers for the affected project or projects confirm the bug and discuss the scope of any vulnerability along with potential solutions.

So is Glance missing a quota on the number of image a tenant can create ? It sounds like a well known fact, is there a reason why this is reported as a Private ?

description:

updated

Revision history for this message

Brian Rosmaita (brian-rosmaita) wrote on 2016-02-12:

#2

It's a knowable fact, but I'm not sure how well-known it is. I'm also not sure how many operators expose this call to end-users. I'm just being conservative -- I figure it's easy to go private -> public, but pretty much impossible to go the other direction.

Revision history for this message

Jeremy Stanley (fungi) wrote on 2016-02-12:

#3

You're saying a malicious actor could "fill up" the database with queued image records? Have you seen this in practice or is the impact speculative for now? How quickly can a user theoretically add entries, how large is each entry, and at what point is their accumulation likely to cause impact to other tenants using the system in a typical deployment (let's consider a relatively small and therefore more vulnerable deployment for the sake of argument)? Is it an attack which can be accomplished in mere minutes? Hours? Days?

Changed in ossa:
status:	New → Incomplete

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2016-02-12:

#4

Well that's pretty effective, glance image-list can be slowed down beyond usability pretty fast if an user can create (empty) image.

Revision history for this message

Brian Rosmaita (brian-rosmaita) wrote on 2016-02-12:

#5

Download full text (4.8 KiB)

@fungi: I have not observed this in practice. But ... the default config in Glance allows additional image properties; default limit on properties/image is 128; default limit on members/image is 128; default limit on tags/image is 128. So for each row added to the images table, you could have up to 128 rows added to image_properties and image_tags.

Here's from my devstack, creating an image record with some core image properties, 128 additional properties, 128 tags:
devVM! time curl -X POST -H "x-auth-token: $TK" -d @big-image-request.json http://localhost:9292/v2/images
{"prop104": "val104", "prop21": "val21", "prop49": "val49", "prop48": "val48", "prop41": "val41", "prop40": "val40", "prop43": "val43", "prop42": "val42", "prop45": "val45", "prop44": "val44", "prop47": "val47", "prop46": "val46", "prop74": "val74", "prop75": "val75", "prop76": "val76", "prop77": "val77", "prop70": "val70", "prop71": "val71", "prop72": "val72", "prop73": "val73", "prop78": "val78", "prop79": "val79", "name": "freaking-big-image", "architecture": "frank-lloyd-wright", "container_format": "bare", "min_ram": 1024, "prop63": "val63", "prop62": "val62", "prop61": "val61", "prop60": "val60", "prop67": "val67", "prop66": "val66", "prop65": "val65", "prop64": "val64", "prop69": "val69", "prop68": "val68", "prop109": "val109", "prop108": "val108", "os_distro": "custom", "prop82": "val82", "tags": ["tag125", "tag124", "tag127", "tag126", "tag121", "tag120", "tag123", "tag122", "tag23", "tag22", "tag21", "tag20", "tag27", "tag26", "tag25", "tag24", "tag29", "tag28", "tag110", "tag111", "tag112", "tag113", "tag114", "tag115", "tag116", "tag117", "tag118", "tag119", "tag38", "tag39", "tag34", "tag35", "tag36", "tag37", "tag30", "tag31", "tag32", "tag33", "tag4", "tag5", "tag6", "tag7", "tag0", "tag1", "tag2", "tag3", "tag8", "tag9", "tag109", "tag108", "tag107", "tag106", "tag105", "tag104", "tag103", "tag102", "tag101", "tag100", "tag49", "tag48", "tag41", "tag40", "tag43", "tag42", "tag45", "tag44", "tag47", "tag46", "tag58", "tag59", "tag52", "tag53", "tag50", "tag51", "tag56", "tag57", "tag54", "tag55", "tag67", "tag66", "tag65", "tag64", "tag63", "tag62", "tag61", "tag60", "tag69", "tag68", "tag70", "tag71", "tag72", "tag73", "tag74", "tag75", "tag76", "tag77", "tag78", "tag79", "tag89", "tag88", "tag85", "tag84", "tag87", "tag86", "tag81", "tag80", "tag83", "tag82", "tag98", "tag99", "tag96", "tag97", "tag94", "tag95", "tag92", "tag93", "tag90", "tag91", "tag16", "tag17", "tag14", "tag15", "tag12", "tag13", "tag10", "tag11", "tag18", "tag19"], "prop114": "val114", "prop115": "val115", "prop116": "val116", "prop117": "val117", "prop110": "val110", "prop111": "val111", "prop98": "val98", "prop99": "val99", "prop96": "val96", "prop97": "val97", "prop94": "val94", "prop95": "val95", "prop92": "val92", "prop93": "val93", "prop90": "val90", "prop91": "val91", "prop16": "val16", "prop17": "val17", "prop14": "val14", "prop15": "val15", "prop12": "val12", "prop13": "val13", "prop10": "val10", "prop11": "val11", "checksum": null, "prop18": "val18", "prop19": "val19", "prop4": "val4", "prop5": "val5", "prop6": "val6", "prop7": "val7", "prop0": "val0", "prop1": "val1", "prop...

@fungi: I have not observed this in practice.  But ... the default config in Glance allows additional image properties; default limit on properties/image is 128; default limit on members/image is 128; default limit on tags/image is 128.  So for each row added to the images table, you could have up to 128 rows added to image_properties and image_tags.

Here's from my devstack, creating an image record with some core image properties, 128 additional properties, 128 tags:
devVM! time curl -X POST -H "x-auth-token: $TK" -d @big-image-request.json http://localhost:9292/v2/images
{"prop104": "val104", "prop21": "val21", "prop49": "val49", "prop48": "val48", "prop41": "val41", "prop40": "val40", "prop43": "val43", "prop42": "val42", "prop45": "val45", "prop44": "val44", "prop47": "val47", "prop46": "val46", "prop74": "val74", "prop75": "val75", "prop76": "val76", "prop77": "val77", "prop70": "val70", "prop71": "val71", "prop72": "val72", "prop73": "val73", "prop78": "val78", "prop79": "val79", "name": "freaking-big-image", "architecture": "frank-lloyd-wright", "container_format": "bare", "min_ram": 1024, "prop63": "val63", "prop62": "val62", "prop61": "val61", "prop60": "val60", "prop67": "val67", "prop66": "val66", "prop65": "val65", "prop64": "val64", "prop69": "val69", "prop68": "val68", "prop109": "val109", "prop108": "val108", "os_distro": "custom", "prop82": "val82", "tags": ["tag125", "tag124", "tag127", "tag126", "tag121", "tag120", "tag123", "tag122", "tag23", "tag22", "tag21", "tag20", "tag27", "tag26", "tag25", "tag24", "tag29", "tag28", "tag110", "tag111", "tag112", "tag113", "tag114", "tag115", "tag116", "tag117", "tag118", "tag119", "tag38", "tag39", "tag34", "tag35", "tag36", "tag37", "tag30", "tag31", "tag32", "tag33", "tag4", "tag5", "tag6", "tag7", "tag0", "tag1", "tag2", "tag3", "tag8", "tag9", "tag109", "tag108", "tag107", "tag106", "tag105", "tag104", "tag103", "tag102", "tag101", "tag100", "tag49", "tag48", "tag41", "tag40", "tag43", "tag42", "tag45", "tag44", "tag47", "tag46", "tag58", "tag59", "tag52", "tag53", "tag50", "tag51", "tag56", "tag57", "tag54", "tag55", "tag67", "tag66", "tag65", "tag64", "tag63", "tag62", "tag61", "tag60", "tag69", "tag68", "tag70", "tag71", "tag72", "tag73", "tag74", "tag75", "tag76", "tag77", "tag78", "tag79", "tag89", "tag88", "tag85", "tag84", "tag87", "tag86", "tag81", "tag80", "tag83", "tag82", "tag98", "tag99", "tag96", "tag97", "tag94", "tag95", "tag92", "tag93", "tag90", "tag91", "tag16", "tag17", "tag14", "tag15", "tag12", "tag13", "tag10", "tag11", "tag18", "tag19"], "prop114": "val114", "prop115": "val115", "prop116": "val116", "prop117": "val117", "prop110": "val110", "prop111": "val111", "prop98": "val98", "prop99": "val99", "prop96": "val96", "prop97": "val97", "prop94": "val94", "prop95": "val95", "prop92": "val92", "prop93": "val93", "prop90": "val90", "prop91": "val91", "prop16": "val16", "prop17": "val17", "prop14": "val14", "prop15": "val15", "prop12": "val12", "prop13": "val13", "prop10": "val10", "prop11": "val11", "checksum": null, "prop18": "val18", "prop19": "val19", "prop4": "val4", "prop5": "val5", "prop6": "val6", "prop7": "val7", "prop0": "val0", "prop1": "val1", "prop2": "val2", "prop3": "val3", "prop8": "val8", "prop9": "val9", "prop112": "val112", "prop113": "val113", "prop121": "val121", "prop120": "val120", "prop123": "val123", "prop122": "val122", "prop89": "val89", "prop88": "val88", "prop85": "val85", "prop84": "val84", "prop87": "val87", "prop86": "val86", "prop81": "val81", "prop80": "val80", "prop83": "val83", "size": null, "virtual_size": null, "prop118": "val118", "disk_format": "raw", "prop119": "val119", "os_version": "1.2.3.4", "schema": "/v2/schemas/image", "instance_uuid": "96042ceb-6868-475c-a1cd-57d6d6cf905a", "visibility": "private", "min_disk": 1024, "prop30": "val30", "prop31": "val31", "prop32": "val32", "prop33": "val33", "prop34": "val34", "prop35": "val35", "prop36": "val36", "prop37": "val37", "prop38": "val38", "prop39": "val39", "prop105": "val105", "protected": false, "prop103": "val103", "prop50": "val50", "prop102": "val102", "prop101": "val101", "prop100": "val100", "updated_at": "2016-02-12T20:27:08Z", "file": "/v2/images/b932894f-4497-484e-84ed-3605af072f52/file", "id": "b932894f-4497-484e-84ed-3605af072f52", "prop29": "val29", "prop28": "val28", "prop27": "val27", "prop26": "val26", "prop25": "val25", "prop24": "val24", "prop23": "val23", "prop22": "val22", "owner": "9beb1ae94c0f46318dd4bf5d94723e94", "prop20": "val20", "prop107": "val107", "status": "queued", "prop106": "val106", "prop58": "val58", "prop59": "val59", "prop52": "val52", "prop53": "val53", "created_at": "2016-02-12T20:27:08Z", "prop51": "val51", "prop56": "val56", "prop57": "val57", "prop54": "val54", "prop55": "val55", "self": "/v2/images/b932894f-4497-484e-84ed-3605af072f52"}
real	0m0.580s
user	0m0.004s
sys	0m0.004s

So it looks like you can create a lot of images pretty quickly.

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2016-02-12:

#6

Oh well the image-list slowdown is specific to glanceclient, openstack client is not affected since it paginate by default...
Here is my reproducer to create a lot of image (without metadata):

TOKEN=$(openstack token issue | awk '/\ id/ { print $4 }')
GLANCE=$(openstack catalog show image | awk '/publicURL/ { print $4 }')
while true; do
curl -X POST -H "X-Auth-Token: ${TOKEN}" -H "x-image-meta-name: dummy_$idx" ${GLANCE}/v1/images;
done

At least on 2015.1.1 there is no quota for image creation, and since the "glance image-list" execution time is proportional to the number of image, it doesn't take long before the client is not usable.

Revision history for this message

Jeremy Stanley (fungi) wrote on 2016-02-12:

#7

Got it, so the impact is mostly that it can render image listing from clients unusable if the deployment allows untrusted users to upload images for "global" availability rather than merely for their own use?

Revision history for this message

Nikhil Komawar (nikhil-komawar) wrote on 2016-02-12:

#8

I am confirming that this issue is real. Thanks for raising this Brian, the description looks great.

However, I would like to elaborate a bit more on the problem, probably answering a few of fungi's questions.

I think the intent was to express the effect of loosely coupled image record creation, that is almost a no-cost operation to the user with that of the relatively heavy weight image activation process that includes data uploads.

I see there are a few problems here:

1. Image records that are created are put in queued status and are query-able (visible to the user). All the more, user can set the limited set of properties, tags etc. metadata on these images -- at a negligible cost.
2. glanceclient allows successful image creation without any parameters (glance image-create) -- cost to user is significantly reduced.
3. Although, a default page size is imposed on the query one may choose to include a upper limit on the page size thereby increasing the size and complexity of the image query. A large number of such images each, with significant number of image properties, tags, members can result into slowness of the response, sometimes even resulting into 500s.
4. Listing of image through the Nova proxy Images API will be even worse experience (as the image list will be returned as Glance DB -> Glance registry -> Glance API -> glanceclient -> Nova API -> user/novaclient. Likelyhood of 5xx responses is quite high in this case.
5. I am skeptical on the DOS here as modern day applications especially those that include geo-spatial data create so much more data in the DB. The question (to me at least) is of the query-able data (or in BigData term you can think of as Hot data).

I think we need to consider this affecting the v1 Images API too as that is still required to be deployed parallel to Nova and if exposed to users it will have the same effect (for the sake of completeness of the security impact).

Though the experience of the query or client (as described in the comments above) can be improved by imposing stricter defaults in the image-list query.

Thoughts?

I am confirming that this issue is real. Thanks for raising this Brian, the description looks great.

However, I would like to elaborate a bit more on the problem, probably answering a few of fungi's questions.

I think the intent was to express the effect of loosely coupled image record creation, that is almost a no-cost operation to the user with that of the relatively heavy weight image activation process that includes data uploads.

I see there are a few problems here:

1. Image records that are created are put in queued status and are query-able (visible to the user). All the more, user can set the limited set of properties, tags etc. metadata on these images -- at a negligible cost.
2. glanceclient allows successful image creation without any parameters (glance image-create) -- cost to user is significantly reduced.
3. Although, a default page size is imposed on the query one may choose to include a upper limit on the page size thereby increasing the size and complexity of the image query. A large number of such images each, with significant number of image properties, tags, members can result into slowness of the response, sometimes even resulting into 500s.
4. Listing of image through the Nova proxy Images API will be even worse experience (as the image list will be returned as Glance DB -> Glance registry -> Glance API -> glanceclient -> Nova API -> user/novaclient. Likelyhood of 5xx responses is quite high in this case. 
5. I am skeptical on the DOS here as modern day applications especially those that include geo-spatial data create so much more data in the DB. The question (to me at least) is of the query-able data (or in BigData term you can think of as Hot data).

I think we need to consider this affecting the v1 Images API too as that is still required to be deployed parallel to Nova and if exposed to users it will have the same effect (for the sake of completeness of the security impact).

Though the experience of the query or client (as described in the comments above) can be improved by imposing stricter defaults in the image-list query.

Thoughts?

Revision history for this message

Brian Rosmaita (brian-rosmaita) wrote on 2016-02-13:

#9

Thanks for the analysis, Nikhil.

Your point (5) is well-taken about the capabilities of modern DBMS, although I imagine that the glance DB servers on most clouds are probably not very beefy, as the Glance DB doesn't see anywhere near the amount of transactions as, say, the Nova DB.

My main point in bringing this up is the DefCore 2016.01 situation. To pass the 'images-v2-index' test, clouds that may not have exposed the v2 image-create call to end-users previously will have to expose it now. Even if it's not a complete DOS, as you explain in point (4), it can lead to a pretty bad user experience.

Nikhil Komawar (nikhil-komawar) on 2016-02-13

Changed in glance:
status:	New → Confirmed

Revision history for this message

Nikhil Komawar (nikhil-komawar) wrote on 2016-02-13:

#10

I agree with you on both accounts.

1. Glance DB may not be the most transaction efficient resources in OpenStack clouds.
2. And definitely on the non-DefCore-friendly aspect of exposing the create image calls using v2 API including bad user experience and may be even a issue for heavily used small private clouds.

My intent on pointing out the v1 issue was that if this happens to be communicated as a risk to the community then we should also confirm the issue with v1 either in the same description or explicitly as v1 issue, for completeness sake.

Revision history for this message

Stuart McLaren (stuart-mclaren) wrote on 2016-02-15:

#11

Thanks for raising this Brian.

Some overall observations.

1) The glance server paginates list responses (default images listed per response -- limit_param_default -- = 25), so the glance client also paginates image listing.
2) The maximum number of images per page is limited on the server side (default limit -- api_limit_max -- is 1000; this can be reduced by operators)
3) By default, user X cannot add images to user Y's listing [1]

So a user can just add (lots of) database records which won't be seen by other users.

Do we have hard data on how much user X can slow down listing (or other db intensive operations) for user Y?

Eg if we're using mysql, and user X creates 1,000,000 queued images, what is the effect -- in terms of slowdown -- on user Y's listing/other operations?

Is this a quota issue or a rate limit issue?

For example, would this quota patch be enough to mark as fixed? https://review.openstack.org/#/c/244573/

If a quota is in place a user can still create lots of database records by continuously creating and deleting queued images. Is that considered ok because the operator can purge deleted records?

[1] Marking images as public is admin only by default. The v1 issue of user A spamming user Y with shared images has it's own CVE.

Revision history for this message

Brian Rosmaita (brian-rosmaita) wrote on 2016-02-15:

#12

> Do we have hard data on how much user X can slow down listing (or other db intensive operations) for user Y?

I don't have any "real life" data, but in devstack as user X, I added 1K images like the one above (128 additional properties, 128 tags), and when user X and user Y do simultaneous image-list requests, user X was getting a 500 and user Y was seeing the response time double for 2 public images + 1 private image. (But talk about a seriously underpowered database node!).

> Is this a quota issue or a rate limit issue?

As your create-and-delete example illustrates, it's probably both.

> I didn't try this, but I bet if user X creates all those images and than adds Y as a member to all of them, with v2 user Y won't be spammed, but user Y's image list query may be slowed down a bit (although since you don't have to marshall all those rows into JSON as you would with v1, it might not be a big deal).

Revision history for this message

Stuart McLaren (stuart-mclaren) wrote on 2016-02-15:

#13

> user Y was seeing the response time double

Ok, so it does seem to affect third parties in some way. Out of curiousity, when it doubled was that an increase of seconds/microseconds?

Revision history for this message

Stuart McLaren (stuart-mclaren) wrote on 2016-02-15:

#14

>> Is this a quota issue or a rate limit issue?

> As your create-and-delete example illustrates, it's probably both.

On the rate limiting side...

I know of at least one ex-cloud that ported the Swift rate limiting code to Glance....

But I wonder if a better approach to rate limiting would be to not use python but rather something in front of the server, eg haproxy:

https://blog.codecentric.de/en/2014/12/haproxy-http-header-rate-limiting/

Here's some previous advocation of haproxy for rate limiting:

http://www.gossamer-threads.com/lists/openstack/operators/28606
http://lists.openstack.org/pipermail/openstack-operators/2014-June/004611.html

Here's someone suggesting Repose:

http://lists.openstack.org/pipermail/openstack-operators/2014-June/004622.html

It's probably ok to suggest 'something else' for the rate limiting side of this. Although some example docs would be helpful to users.

Revision history for this message

Brian Rosmaita (brian-rosmaita) wrote on 2016-02-15:

#15

> Ok, so it does seem to affect third parties in some way. Out of curiousity, when it doubled was that an increase of seconds/microseconds?

It went from around 1 sec to 2 sec. But like I said, that was on a devstack instance, so I don't know what kind of inferences we can draw. (I went back to my devstack instance to try to get some better data, but have managed to completely hose it for reasons that may have nothing to do with this issue.)

> It's probably ok to suggest 'something else' for the rate limiting side of this. Although some example docs would be helpful to users.

I agree. I don't think we want to put rate limiting into glance (or suggest deployers do so), when as you point out, there are some external products that are flexible and configurable. We've had pretty good experience with Repose.
https://repose.atlassian.net/wiki/display/REPOSE/Rate+Limiting+filter

Revision history for this message

Stuart McLaren (stuart-mclaren) wrote on 2016-02-16:

#16

> I don't think we want to put rate limiting into glance

I'd be ok with adding something like the rate limiting middleware that exists in other projects in the future -- it could be convenient for some use cases.

But I don't think it's needed to address the rate limiting side of this bug. Deployers can BYORL (bring your own rate limiter).

@Brian

What do you think is needed here to fix? Is https://review.openstack.org/#/c/244573 enough? (It limits the number of images a user can have, minus 'killed' and 'deleted' images.)

We could try to specifically limit 'queued' images. eg have a separate total for that. But I'm not 100% sure if that's useful or not. Users could still have lots of active zero size images.

Revision history for this message

Brian Rosmaita (brian-rosmaita) wrote on 2016-02-16:

#17

@Stuart

I'm wondering whether the rate limiting itself could be a short-term fix, especially since it looks like we'll be doing quotas work in Newton, plus image import which would allow operators to screen zero size images (they wouldn't be valid in any format, i don't think).

My impression of the discussion so far is that people don't think this is a really big deal, possibly not even a security bug. If it's not a serious vulnerability, then maybe we don't need to rush a patch into place.

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2016-02-17:

#18

It's not a big deal since the abuse is obvious and easy to recover (e.g. remove queued image from db). However this is annoying since there are no safe way to let user create images. Perhaps a quota on the number of images a tenant can create would do the trick ?

As for the security impact, I'd like to triage this as a B1 type of bug according to VMT taxonomy ( https://security.openstack.org/vmt-process.html#incident-report-taxonomy ). Like that a Security Note could be issued until a proper patch is proposed.

Revision history for this message

Flavio Percoco (flaper87) wrote on 2016-02-18:

#19

Thanks to Brian for reporting this issue and to Nikhil and Stuart for following up.

I think, eventually, we want to:

1) Stop shipping our own wsgi container and let deployers pick their own
2) Recommend deployers to always use a rate limit. I think, as Stuart also suggested, that a third party rate limit software should do the trick here.

As far as the database goes, I believe 244573 should help to prevent attackers from exploiting this issue.

Brian, would you agree with this?

Revision history for this message

Brian Rosmaita (brian-rosmaita) wrote on 2016-02-18:

#20

I agree with recommending deployers to always use a rate limit.

My concern about 244573 is basically what Fei Long said on PS 18. The tenant that owns the public images for the cloud may need to have an extremely high image quota, and with 244573 the only way that tenant can have a high enough quota is for everyone else to also have that same high quota. So I'm not sure that 244573 will really help ... we need a real quota system. On the other hand, 244573 is better than nothing, and deployers could also drop the max properties/tags/members allowed on images to limit the amount of junk a single user could stuff into the database. On the third hand, I'm thinking that a very low rate limit by itself might do the trick here, and if we're going to do quotas in Newton, then maybe 244573 isn't going to be very useful.

That's probably no help at all. I'll have to think some more.

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2016-02-29:

#21

I've subscribed OSSG-coresec to discuss an eventual document about rate limiting.
If nobody objects, I'd like to close the OSSA task and remove the privacy setting by the end of this week.

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2016-05-16:

#22

I've switched the OSSA task to Opinion until we figure out how to coordinate this disclosure with a Security Note.

Changed in ossa:
status:	Incomplete → Opinion

Revision history for this message

Brian Rosmaita (brian-rosmaita) wrote on 2016-05-31:

#23

Just want to clarify what I said about the DefCore tempest test (way up top in the bug report). In addition to creating an image record, it does upload a small amount of "data", too.

I think it's OK to make this public, though I'm just speaking for myself here. We should definitely confirm with the Glance PTL. The mitigation to this attack is a combination of rate limiting the POST v2/images call plus vigilance on what's happening in the database.

It looks like the quota specs proposed near the end of Mitaka will not go into Newton, so an actual fix won't happen until at least the O release.

Revision history for this message

Nikhil Komawar (nikhil-komawar) wrote on 2016-05-31:

#24

Thanks for your response Brian.

The risk you mention is more than merely creating the image records for sure. I think the image record creation is already a known issue to be simply a nuisance at this point however, combined with the image properties and members could be seen as a issue. Nevertheless, I do feel that the list response can be handled by fine tuning pagination to ensure slowness is not significant.

The issue of DB to be filled with useless records is something that production clouds should carefully consider when choosing the size of the DB server.

Yes, the quota work isn't planned for Newton and is likely to not begin to materialize until Ocata. So, the rate limiting solutions and the size of DB server are some of the recommendations that can be given as a part of the security notice.

Overall, I think we have a good direction and set of small recommendations to make this a public discussion.

Revision history for this message

Travis McPeak (travis-mcpeak) wrote on 2016-08-17:

#25

Any update on this? Seems like we're leaning towards a note. Should this be a public or private note?

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2016-08-18:

#26

It seems like everyone agreed this can be made public, however there isn't much urgency too.
If it's not too cumbersome for you, would it be possible to approve the OSSN here prior to marking this bug public ? Like that, in the event of abuses there will be mitigation readily available.

Revision history for this message

Nikhil Komawar (nikhil-komawar) wrote on 2016-08-31:

#27

Good idea on creating a OSSN here prior to marking this bug public. We're all caught up in newton release and getting to such convoluted bugs (that need more b/w) will take a bit more time.

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2016-09-01:

#28

It seems like a glance image-create ossn may also covers bug 1546507 (image deletion through location re-use) and 1606495 (network scan through copy_from).

Luke Hinds (lhinds) on 2016-09-08

Changed in ossn:
assignee:	nobody → Luke Hinds (lhinds)

Revision history for this message

Luke Hinds (lhinds) wrote on 2016-09-09:

#29

OSSN-0076 reserved.

Revision history for this message

Luke Hinds (lhinds) wrote on 2016-09-26:

#30

I am working on a draft note now. Seems the consensus is to suggest rate limiting, so here is a similar OSSN I created for Keystone: http://lists.openstack.org/pipermail/openstack-dev/2016-July/099776.html - perhaps something of the same ilk again?

I think this also adds incentive to expand the rate limiting section in the api endpoint section of the security guide. Will make a note do this, after this OSSN is released.

Revision history for this message

Luke Hinds (lhinds) wrote on 2016-09-29:

#31

Hello,

Please review / provide any comments....

Images v2 api image-create vulnerability
---

### Summary ###
No limits are enforced within the Glance image service `v2/images` API POST method for authenticated users, resulting in possible denial of service attacks through database table saturation.

### Affected Services / Software ###
All versions of Glance image service.

### Discussion ###
Within the Glance image service, calls to the POST method within v2/images creates an image (record) in `queued` status. There is no limit enforced within the Glance API on the number of images a single tenant may create, just on the total amount of storage a single user may consume.

Therefore a user could either maliciously or unintentionally fill multiple database tables (images, image_properties, image_tags, image_members) with useless image records, thereby causing a denial of service by lengthening transaction response times in the Glance database.

### Recommended Actions ###
For all versions of Glance that expose the v2/images API, operators are recommended to deploy external rate-limiting proxies or web application firewalls, to provide a front layer of protection to glance.

The following solutions may be considered, however it is key that the operator carefully plans and considers the individual performance needs of users and services within their OpenStack cloud, when configuring any rate limiting functionality.

#### Repose ####
Repose provides a rate limiting filter, that can utilise limits by IP, Role (OpenStack Identity v3 filter), or header.

https://repose.atlassian.net/wiki/display/REPOSE/Rate+Limiting+Filter

#### NGINX ####
NGINX provides the limit_req_module, which can be used to provide a global rate limit. By means of a `map`, it can be limited to just the POST method.

Further details can be found on the nginx site:
http://nginx.org/en/docs/http/ngx_http_limit_req_module.html

#### HAProxy ####
HAProxy can provide inherent rate-limiting using stick-tables with a General
Purpose Counter (gpc)

Further details can be found on the haproxy website:
http://blog.haproxy.com/2012/02/27/use-a-load-balancer-as-a-first-row-of-defense-against-ddos)

#### Apache ####
A number of solutions can be explored here as follows.

##### mod_ratelimit #####
http://httpd.apache.org/docs/2.4/mod/mod_ratelimit.html

##### mod_qos #####
http://opensource.adnovum.ch/mod_qos/dos.html

##### mod_evasive #####
https://www.digitalocean.com/community/tutorials/how-to-protect-against-dos-and-ddos-with-mod_evasive-for-apache-on-centos-7)

##### mod_security #####
https://www.modsecurity.org/

### Contacts / References ###
Author: Luke Hinds, Red Hat
This OSSN : https://wiki.openstack.org/wiki/OSSN/OSSN-0076
Original LaunchPad Bug : https://bugs.launchpad.net/ossn/+bug/1545092
OpenStack Security ML : <email address hidden>
OpenStack Security Group : https://launchpad.net/~openstack-ossg

Hello,

Please review / provide any comments....

Images v2 api image-create vulnerability
---

### Summary ###
No limits are enforced within the Glance image service `v2/images` API POST method for authenticated users, resulting in possible denial of service attacks through database table saturation.

### Affected Services / Software ###
All versions of Glance image service.

### Discussion ###
Within the Glance image service, calls to the POST method within v2/images creates an image (record) in `queued` status. There is no limit enforced within the Glance API on the number of images a single tenant may create, just on the total amount of storage a single user may consume.

Therefore a user could either maliciously or unintentionally fill multiple database tables (images, image_properties, image_tags, image_members) with useless image records, thereby causing a denial of service by lengthening transaction response times in the Glance database.

### Recommended Actions ###
For all versions of Glance that expose the v2/images API, operators are recommended to deploy external rate-limiting proxies or web application firewalls, to provide a front layer of protection to glance.

The following solutions may be considered, however it is key that the operator carefully plans and considers the individual performance needs of users and services within their OpenStack cloud, when configuring any rate limiting functionality.

#### Repose ####
Repose provides a rate limiting filter, that can utilise limits by IP, Role (OpenStack Identity v3 filter), or header.

https://repose.atlassian.net/wiki/display/REPOSE/Rate+Limiting+Filter

#### NGINX ####
NGINX provides the limit_req_module, which can be used to provide a global rate limit. By means of a `map`, it can be limited to just the POST method.

Further details can be found on the nginx site:
http://nginx.org/en/docs/http/ngx_http_limit_req_module.html

#### HAProxy ####
HAProxy can provide inherent rate-limiting using stick-tables with a General
Purpose Counter (gpc)

Further details can be found on the haproxy website:
http://blog.haproxy.com/2012/02/27/use-a-load-balancer-as-a-first-row-of-defense-against-ddos)

#### Apache ####
A number of solutions can be explored here as follows.

##### mod_ratelimit #####
http://httpd.apache.org/docs/2.4/mod/mod_ratelimit.html

##### mod_qos #####
http://opensource.adnovum.ch/mod_qos/dos.html

##### mod_evasive #####
https://www.digitalocean.com/community/tutorials/how-to-protect-against-dos-and-ddos-with-mod_evasive-for-apache-on-centos-7)

##### mod_security #####
https://www.modsecurity.org/

### Contacts / References ###
Author: Luke Hinds, Red Hat
This OSSN : https://wiki.openstack.org/wiki/OSSN/OSSN-0076 
Original LaunchPad Bug : https://bugs.launchpad.net/ossn/+bug/1545092 
OpenStack Security ML : openstack-security@lists.openstack.org 
OpenStack Security Group : https://launchpad.net/~openstack-ossg

Revision history for this message

Luke Hinds (lhinds) wrote on 2016-10-04:

#32

Any feedback on the draft ^ ?

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2016-10-04:

#33

Note that I'm not a member of the ossg-coresec group, though the proposed security note looks good to me. Perhaps the recommended actions should also mention that image creation can be restricted to administrator only with the glance add_image policy.

Revision history for this message

Luke Hinds (lhinds) wrote on 2016-10-04:

#34

Good call Tristan,

<snip>

For all versions of Glance that expose the v2/images API, operators are recommended to consider restricting image creation to administrator only with the glance `add_image` policy and use an external rate-limiting proxy or web application firewall.

To restrict image creation to the role admin only, amend `/etc/glance/policy.json` as follows

"add_image": "role:admin",

Rate-limiting solutions may also be utilised, however it is key that the operator carefully plans and considers the individual performance needs of users and services within their OpenStack cloud, when configuring any rate limiting functionality.

#### Repose ####
<snip>

Revision history for this message

Luke Hinds (lhinds) wrote on 2016-10-04:

#35

Sorry, that should be 'The following Rate-limiting...'

Revision history for this message

Brian Rosmaita (brian-rosmaita) wrote on 2016-10-04:

#36

@Luke: sorry about the delay on this. I'm reading it now.

Revision history for this message

Brian Rosmaita (brian-rosmaita) wrote on 2016-10-04:

#37

Some quick background:

When I filed this bug, I assumed that most people didn't expose the Images v1 API to end users. That assumption may be false. Even so, because the Compute API proxies many useful image-related calls, there's been no necessity to expose either of the Images APIs at all and still have a full functioning cloud. What prompted this report was the DefCore test that *required* some v2 Images API calls to be exposed; hence, operators who want to qualify their clouds as OpenStack powered would have to expose this v2 call and thereby expose the vulnerability.

Comments:

(1) This issue applies to *both* v1 and v2, it's not a v2-only vulnerability.

(2) Changing the policy to admin-only is tricky, because if you do it on the Glance nodes that Nova uses, you won't be able to create snapshots from the Compute API. I'd suggest really emphasizing the rate limiting, because that would protect the operator from buggy scripts written by trusted users. Then you could mention that depending upon the operator's topology, the operator could consider restricting the "add_image" policy to trusted users identified by some particular role in their cloud, but this should only be done for those cases in which there are Glance nodes dedicated to end-user access only (that is, the nodes are not used by any openstack services).

(3) The Images v1 API is DEPRECATED in Newton, and Nova is now using the v2 API by default. I don't know whether it's worth pointing that out.

(4) The rate-limiting discussion looks good to me.

Revision history for this message

Luke Hinds (lhinds) wrote on 2016-10-05:

#38

Download full text (3.4 KiB)

Thanks Brian, let me know if this is ok:

Glance Image service v1 and v2 api image-create vulnerability
---

### Summary ###
No limits are enforced within the Glance image service for both v1 and
v2 `/images` API POST method for authenticated users, resulting in possible
denial of service attacks through database table saturation.

### Affected Services / Software ###
All versions of Glance image service.

### Discussion ###
Within the Glance image service, calls to the POST method within v1 or
v2/images creates an image (record) in `queued` status. There is no limit
enforced within the Glance API on the number of images a single tenant may
create, just on the total amount of storage a single user may consume.

Therefore a user could either maliciously or unintentionally fill multiple
database tables (images, image_properties, image_tags, image_members) with
useless image records, thereby causing a denial of service by lengthening
transaction response times in the Glance database.

### Recommended Actions ###
For all versions of Glance that expose either the v1 and v2/images API,
operators are recommended to deploy external rate-limiting proxies or web
application firewalls, to provide a front layer of protection to glance.

The following solutions may be considered, however it is key that the operator
carefully plans and considers the individual performance needs of users and
services within their OpenStack cloud, when configuring any rate limiting
functionality.

#### Repose ####
Repose provides a rate limiting filter, that can utilise limits by IP,
Role (OpenStack Identity v3 filter) or header.

https://repose.atlassian.net/wiki/display/REPOSE/Rate+Limiting+Filter

#### NGINX ####
NGINX provides the limit_req_module, which can be used to provide a global rate
limit. By means of a `map`, it can be limited to just the POST method.

Further details can be found on the nginx site:
http://nginx.org/en/docs/http/ngx_http_limit_req_module.html

#### HAProxy ####
HAProxy can provide inherent rate-limiting using stick-tables with a General
Purpose Counter (gpc)

Further details can be found on the haproxy website:
http://blog.haproxy.com/2012/02/27/use-a-load-balancer-as-a-first-row-of-defense-against-ddos)

#### Apache ####
A number of solutions can be explored here as follows.

##### mod_ratelimit #####
http://httpd.apache.org/docs/2.4/mod/mod_ratelimit.html

##### mod_qos #####
http://opensource.adnovum.ch/mod_qos/dos.html

##### mod_evasive #####
https://www.digitalocean.com/community/tutorials/how-to-protect-against-dos-and-ddos-with-mod_evasive-for-apache-on-centos-7)

##### mod_security #####
https://www.modsecurity.org/

#### Limit `add_image` to admin role ####

Another possible mitigation is to restrict image creation to the admin role,
however this should only be done for those cases in which there are Glance nodes
dedicated to end-user access only. Restriction to admin only on Glance nodes
that serve OpenStack services will remove the ability to create snapshots from
the Compute API.

To restrict image creation to the role admin only, amend
`/etc/glance/policy.json` accordingly.

"add_image": "role:admin",

### Contacts / Reference...

	Status	Importance	Assigned to
Glance	Opinion	Undecided	Unassigned
OpenStack Security Advisory	Opinion	Undecided	Unassigned
OpenStack Security Notes	Fix Released	Undecided	Luke Hinds

Changed in ossn:
status:	New → Fix Released
information type:	Private Security → Public

OpenStack Security Notes

Images v2 api image-create vulnerability

Bug Description

CVE References

Other bug subscribers

Remote bug watches