OpenStack Identity (keystone)

potential DOS with revoke by id or audit_id

Bug #1553324 reported by Guang Yee on 2016-03-04

This bug affects 1 person

	Status	Importance	Assigned to
OpenStack Identity (keystone)	Fix Released	Undecided	Unassigned
OpenStack Security Advisory	Won't Fix	Undecided	Unassigned
OpenStack Security Notes	Fix Released	Undecided	Luke Hinds

Bug Description

With our default policy, token revocation can be self-served. Without any rate limiting or SIEM mechanism in place, any user can potentially flood the revocation_event table to cause significant performance degradation or worst DOS. I've attached a simple script to continuously revoking one's own token and time the token validation time. On a vanilla devstack, with dogpile cache enabled, token validation time go from roughly 40ms to over 300ms with about 2K revocation events.

We don't cleanup the revocation events till token expiration plus expiration_buffer, which is 30 minutes by default. With the default token TTL of 3600 seconds, a user could potentially fill up at least a few thousand events during that time.

I know this may sound like a broken record, and yes, rate limiting or SIEM should be used. Perhaps we can add this to security hardening or OSSN?

In the long run, I think we need to rethink how we handle revoke by ID as self-service. Now that we have shadow user accounts, maybe we can implement something to suspend that user if we detect token revocation abuse?

Revision history for this message

Guang Yee (guang-yee) wrote on 2016-03-04:

script to revoke token and time token validation Edit (1.0 KiB, text/x-sh)

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2016-03-04:

Since this report concerns a possible security risk, an incomplete security advisory task has been added while the core security reviewers for the affected project or projects confirm the bug and discuss the scope of any vulnerability along with potential solutions.

Changed in ossa:
status:	New → Incomplete
description:	updated

Revision history for this message

Steve Martinelli (stevemar) wrote on 2016-03-04:

this is another bug that has come up due to rate limiting, we should really consider adding documentation to list known limitations. this bug would definitely fall into that category.

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2016-03-07:

I've subscribed OSSG-coresec to discuss an eventual document about rate limiting.

Revision history for this message

Dolph Mathews (dolph) wrote on 2016-03-07:

> I know this may sound like a broken record

Yes, there are already open bugs and blog posts regarding the performance degradation caused by revocation events. A few existing references on this exact topic:

* Successive runs of identity tempest tests take more and more time to finish https://bugs.launchpad.net/keystone/+bug/1471665

* Keystone token revocations cripple validation performance http://www.mattfischer.com/blog/?p=672

* Old revocation events must be purged https://bugs.launchpad.net/keystone/+bug/1456797

I'd suggest this be closed as a duplicate of 1471665.

Revision history for this message

Guang Yee (guang-yee) wrote on 2016-03-07:

My biggest concern is that token revocation is a self-service operation, which means *any* user can potentially flood the revocation_event table without any kind rate limiting or SIEM.

We are pruning revocation events on the next revocation call.

https://github.com/openstack/keystone/blob/master/keystone/revoke/backends/sql.py#L103

I am not worry about pruning old events. Rather, any ordinary user can flood the table in a short amount of time, which crippling performance.

Revision history for this message

Morgan Fainberg (mdrnstm) wrote on 2016-03-10:

Basically this is still an issue / has always really be an issue (with the TRL in memcache, you could get the system to prevent new tokens from being issued/revoked in the same way, in SQL you can bloat the DB with tokens, even though they can be flushed, look ups became very slow).

Disable revocation events (and know revocations are not happening) is likely the best course of action or implement a rate limiter.

This isn't really a "fixable" bug in keystone unless we implement rate limiting directly.

Revision history for this message

Dolph Mathews (dolph) wrote on 2016-03-11:

We could change the behavior of "delete token" (which currently operates by targeting a unique audit ID) to instead revoke ALL of a user's tokens issued prior to the event (by instead targeting the user ID). That way we only store a single revocation event that we can simply update repeatedly (with a more recent timestamp).

Revision history for this message

Morgan Fainberg (mdrnstm) wrote on 2016-03-11:

@Dolph,

We tried that before and it ended up causing a large number of failures because long running tasks (snapshots, etc) all failed since tokens were expired. That fix would need to wait until/if we fix not using the user's token to authorize long running actions.

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2016-03-21:

#10

So is there a good use-case for being able to revoke only one token ?

Revision history for this message

Brant Knudson (blk-u) wrote on 2016-03-21:

#11

Horizon gets tokens for the user and revokes them when the user logs out. A user might have several sessions going so it needs to revoke individual tokens.

Revision history for this message

Travis McPeak (travis-mcpeak) wrote on 2016-03-29:

#12

Agreed that in many cases rate limiting isn't handled well or at all in OpenStack and this would be a good reference to use for writing a rate limiting OSSN.

Revision history for this message

Robert Clark (robert-clark) wrote on 2016-03-30:

#13

+1 OSSN

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2016-04-04:

#14

According to comment #5, this is already almost public, should we keep this report private to coordinate with the Security Note ?

Or perhaps the OSSN can be done in the open too ?

Revision history for this message

Travis McPeak (travis-mcpeak) wrote on 2016-04-04:

#15

Opening it publicly will definitely help speed along the OSSN process as it increases the pool of possible authors.

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2016-04-11:

#16

I'm opening the bug now Travis, Thank!

description:	updated
information type:	Private Security → Public

Luke Hinds (lhinds) on 2016-04-28

Changed in ossn:
assignee:	nobody → Luke Hinds (lhinds)

Tristan Cacqueray (tristan-cacqueray) on 2016-04-30

Changed in ossa:
status:	Incomplete → Won't Fix

Revision history for this message

Lance Bragstad (lbragstad) wrote on 2016-07-06:

#17

Adam Young landed a patch that probably had an impact on the numbers reported here [0]. Guang, do you know if the numbers supplied in the description were obtained with this patch in master? If not, maybe it'd be worth recreating to see if there is a difference in performance/behavior?

[0] https://review.openstack.org/#/c/311652/

Revision history for this message

Lance Bragstad (lbragstad) wrote on 2016-07-06:

#18

I would be in favor of marking this as a duplicate of 1471665.

Revision history for this message

Luke Hinds (lhinds) wrote on 2016-07-06:

#19

There is an OSSN pending a +2 from docs core that addresses this:

https://review.openstack.org/#/c/313896/3/security-notes/OSSN-0068

If I still have my test VM, I will check if Adam's patch was in there.

Revision history for this message

Guang Yee (guang-yee) wrote on 2016-07-06:

#20

Lance, I can retest with the latest master. But in theory Adam's patch won't help as it is doing linear search. For a small number of revocations it works fine. But if user is flooding the revocation events table, linear search may get even worst. But I'll test it out to see what the latest number look like.

Rahul U Nair (rahulunair) on 2016-08-17

Changed in ossn:
assignee:	Luke Hinds (lhinds) → Rahul U Nair (rahulunair)

Rahul U Nair (rahulunair) on 2016-08-17

Changed in ossn:
status:	New → Confirmed

Rahul U Nair (rahulunair) on 2016-08-17

Changed in ossn:
assignee:	Rahul U Nair (rahulunair) → nobody
assignee:	nobody → Luke Hinds (lhinds)
status:	Confirmed → Fix Released

Revision history for this message

Luke Hinds (lhinds) wrote on 2016-08-17:

#21

Thanks Rahul

Revision history for this message

Steve Martinelli (stevemar) wrote on 2016-11-15:

#22

https://review.openstack.org/#/q/topic:bug/1524030 should fix this, time to determine revocation events is now flat once there are 80 revocation events.

Changed in keystone:
status:	New → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

script to revoke token and time token validation Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.