potential DOS with revoke by id or audit_id

Bug #1553324 reported by Guang Yee
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Identity (keystone)
Fix Released
Undecided
Unassigned
OpenStack Security Advisory
Won't Fix
Undecided
Unassigned
OpenStack Security Notes
Fix Released
Undecided
Luke Hinds

Bug Description

With our default policy, token revocation can be self-served. Without any rate limiting or SIEM mechanism in place, any user can potentially flood the revocation_event table to cause significant performance degradation or worst DOS. I've attached a simple script to continuously revoking one's own token and time the token validation time. On a vanilla devstack, with dogpile cache enabled, token validation time go from roughly 40ms to over 300ms with about 2K revocation events.

We don't cleanup the revocation events till token expiration plus expiration_buffer, which is 30 minutes by default. With the default token TTL of 3600 seconds, a user could potentially fill up at least a few thousand events during that time.

I know this may sound like a broken record, and yes, rate limiting or SIEM should be used. Perhaps we can add this to security hardening or OSSN?

In the long run, I think we need to rethink how we handle revoke by ID as self-service. Now that we have shadow user accounts, maybe we can implement something to suspend that user if we detect token revocation abuse?

Revision history for this message
Guang Yee (guang-yee) wrote :
Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Since this report concerns a possible security risk, an incomplete security advisory task has been added while the core security reviewers for the affected project or projects confirm the bug and discuss the scope of any vulnerability along with potential solutions.

Changed in ossa:
status: New → Incomplete
description: updated
Revision history for this message
Steve Martinelli (stevemar) wrote :

this is another bug that has come up due to rate limiting, we should really consider adding documentation to list known limitations. this bug would definitely fall into that category.

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

I've subscribed OSSG-coresec to discuss an eventual document about rate limiting.

Revision history for this message
Dolph Mathews (dolph) wrote :

> I know this may sound like a broken record

Yes, there are already open bugs and blog posts regarding the performance degradation caused by revocation events. A few existing references on this exact topic:

* Successive runs of identity tempest tests take more and more time to finish https://bugs.launchpad.net/keystone/+bug/1471665

* Keystone token revocations cripple validation performance http://www.mattfischer.com/blog/?p=672

* Old revocation events must be purged https://bugs.launchpad.net/keystone/+bug/1456797

I'd suggest this be closed as a duplicate of 1471665.

Revision history for this message
Guang Yee (guang-yee) wrote :

My biggest concern is that token revocation is a self-service operation, which means *any* user can potentially flood the revocation_event table without any kind rate limiting or SIEM.

We are pruning revocation events on the next revocation call.

https://github.com/openstack/keystone/blob/master/keystone/revoke/backends/sql.py#L103

I am not worry about pruning old events. Rather, any ordinary user can flood the table in a short amount of time, which crippling performance.

Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

Basically this is still an issue / has always really be an issue (with the TRL in memcache, you could get the system to prevent new tokens from being issued/revoked in the same way, in SQL you can bloat the DB with tokens, even though they can be flushed, look ups became very slow).

Disable revocation events (and know revocations are not happening) is likely the best course of action or implement a rate limiter.

This isn't really a "fixable" bug in keystone unless we implement rate limiting directly.

Revision history for this message
Dolph Mathews (dolph) wrote :

We could change the behavior of "delete token" (which currently operates by targeting a unique audit ID) to instead revoke ALL of a user's tokens issued prior to the event (by instead targeting the user ID). That way we only store a single revocation event that we can simply update repeatedly (with a more recent timestamp).

Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

@Dolph,

We tried that before and it ended up causing a large number of failures because long running tasks (snapshots, etc) all failed since tokens were expired. That fix would need to wait until/if we fix not using the user's token to authorize long running actions.

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

So is there a good use-case for being able to revoke only one token ?

Revision history for this message
Brant Knudson (blk-u) wrote :

Horizon gets tokens for the user and revokes them when the user logs out. A user might have several sessions going so it needs to revoke individual tokens.

Revision history for this message
Travis McPeak (travis-mcpeak) wrote :

Agreed that in many cases rate limiting isn't handled well or at all in OpenStack and this would be a good reference to use for writing a rate limiting OSSN.

Revision history for this message
Robert Clark (robert-clark) wrote :

+1 OSSN

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

According to comment #5, this is already almost public, should we keep this report private to coordinate with the Security Note ?

Or perhaps the OSSN can be done in the open too ?

Revision history for this message
Travis McPeak (travis-mcpeak) wrote :

Opening it publicly will definitely help speed along the OSSN process as it increases the pool of possible authors.

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

I'm opening the bug now Travis, Thank!

description: updated
information type: Private Security → Public
Luke Hinds (lhinds)
Changed in ossn:
assignee: nobody → Luke Hinds (lhinds)
Changed in ossa:
status: Incomplete → Won't Fix
Revision history for this message
Lance Bragstad (lbragstad) wrote :

Adam Young landed a patch that probably had an impact on the numbers reported here [0]. Guang, do you know if the numbers supplied in the description were obtained with this patch in master? If not, maybe it'd be worth recreating to see if there is a difference in performance/behavior?

[0] https://review.openstack.org/#/c/311652/

Revision history for this message
Lance Bragstad (lbragstad) wrote :

I would be in favor of marking this as a duplicate of 1471665.

Revision history for this message
Luke Hinds (lhinds) wrote :

There is an OSSN pending a +2 from docs core that addresses this:

https://review.openstack.org/#/c/313896/3/security-notes/OSSN-0068

If I still have my test VM, I will check if Adam's patch was in there.

Revision history for this message
Guang Yee (guang-yee) wrote :

Lance, I can retest with the latest master. But in theory Adam's patch won't help as it is doing linear search. For a small number of revocations it works fine. But if user is flooding the revocation events table, linear search may get even worst. But I'll test it out to see what the latest number look like.

Changed in ossn:
assignee: Luke Hinds (lhinds) → Rahul U Nair (rahulunair)
Changed in ossn:
status: New → Confirmed
Changed in ossn:
assignee: Rahul U Nair (rahulunair) → nobody
assignee: nobody → Luke Hinds (lhinds)
status: Confirmed → Fix Released
Revision history for this message
Luke Hinds (lhinds) wrote :

Thanks Rahul

Revision history for this message
Steve Martinelli (stevemar) wrote :

https://review.openstack.org/#/q/topic:bug/1524030 should fix this, time to determine revocation events is now flat once there are 80 revocation events.

Changed in keystone:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.