restarting memcached loses revoked token list

Bug #1182920 reported by Adam Young on 2013-05-22
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Identity (keystone)
High
Unassigned
OpenStack Security Notes
Undecided
Unassigned

Bug Description

With the memcached backend to tokens, the revoked token list only lasts as long as the memcached server is up and running. Thus, if the Keystone server is restarted, all token revocations are dropped, and they will not show up in later token revocation list requests.

To reproduce:

0. Set up keystone using the memcached backend to tokens
1. Create a token and ensure that it can be used via auth_token middleware
2. Revoke the token
3. fetch the revocation list and ensure that the token is listed
4. restart Keystone
5. fetch the revocation list and see that the token is not listed
6. wait until the token revocation list has timed out and reissue request to a different remote service, token will be processed as valid.

There are some mitigating factors. Just submitting the revoked token to the same service after keystone has restarted will not show the error if the service is using memcached to hold the revocation list. Restarting the memcache instance used by the remote server will dump the revocation list, and the problem will then appear.

I have not yet verified this problem, but it follows from memcache.

Keystone uses a local memcache instance. If it were to be set up to use a remote memcache service, it would take restarting memcache to trigger the problem, and not keystone.

This affects PKI tokens only. UUID tokens are not affected: restarting memcached will drop them all, and they will report as invalid tokens.

The KVS backend is affected as well.

Thierry Carrez (ttx) wrote :

Adding keystone-core for patching

Changed in keystone:
importance: Undecided → High
status: New → Confirmed
Thierry Carrez (ttx) on 2013-05-24
Changed in ossa:
status: New → Incomplete
Thierry Carrez (ttx) wrote :

That sounds like a vulnerability, but it seems to derive from the very concept of storing revocation lists on volatile storage... So is it a implementation issue or an architectural issue ? (can we fix it ?)

To make sure I got it right... this means that revocation lists are valid only as long as you don't restart storage on both middleware and keystone servers... and this affects PKI tokens with either memcache or KVS backends ?

Changed in ossa:
assignee: nobody → Thierry Carrez (ttx)
Dolph Mathews (dolph) wrote :

Agree; I'm not sure what the solution to this would be, other than not using using volatile storage. Perhaps issuing guidance to use a much shorter default [token] expiration (current default is 86400 seconds) in combination with PKI tokens?

Thierry Carrez (ttx) wrote :

@Adam: is this actually fixable ? Or should we go with a security note with Dolph's advice ?

Adam Young (ayoung) wrote :

I think a security note is most appropriate.

There might be ways to mitigate in the future, such as running memcached on multiple machines to ensure redundancy should the Keystone server fail. In a clustered environment, it will only be an issue if all of the memcached machines shutdown.

Memcachedb might also be a potential way to mitigate.

http://memcachedb.org/

It might also be possible to record the revocation list upon update, and to read that value in on start up. It depend on how common revocation events are whether that will have a negative impact, but I suspect it would be negligible.

Thierry Carrez (ttx) wrote :

Adding Rob Clark to discuss the opportunity of solving this one using a security note (OSSN).

Robert Clark (robert-clark) wrote :

Seems appropriate for an OSSN, I'll get it started once this is public (I'll be asking another OSSG member to run with it).

Thierry Carrez (ttx) wrote :

OK, will open to public in a few unless someone objects.

Thierry Carrez (ttx) on 2013-06-24
no longer affects: ossa
information type: Private Security → Public
Robert Clark (robert-clark) wrote :

Restarting memcached loses revoked token list
----

### Summary ###
When a cloud is deployed using Memcache as a backend for Keystone tokens there is a security concern that restarting Memcached will loose the list of revoked tokens, potentially allowing bad tokens / users to access the system after they had been revoked.

### Affected Services / Software ###
Keystone, Memcache

### Discussion ###
There might be ways to mitigate in the future, such as running memcached on multiple machines to ensure redundancy should the Keystone server fail. In a clustered environment, it will only be an issue if all of the memcached machines shutdown.

Memcachedb might also be a potential way to mitigate. http://memcachedb.org/

NOTE: Some deployments may intentionally flush Memcached in response to https://bugs.launchpad.net/ossn/+bug/1179955 - please exercise caution when considering how to approach this problem.

### Recommended Actions ###
This is a fundamental problem with using in-memory ephemeral storage for security information. If your deployment has strong security requirements or a reliance on up-to-date revoked token information we suggest you consider using an on-disk DB such as MySQL / PostgreSQL or perhaps look into Memcachedb.

### Contacts / References ###
This OSSN : https://bugs.launchpad.net/ossn/+bug/1182920
OpenStack Security ML : openstack-security at lists.openstack.org
OpenStack Security Group : https://launchpad.net/~openstack-ossg

Changed in ossn:
status: New → In Progress
Morgan Fainberg (mdrnstm) wrote :

A new blueprint for addressing this issue has been registered: https://blueprints.launchpad.net/keystone/+spec/revocation-backend

The plan is to split the token persistence driver from the revocation list to ensure the revocation list uses a stable-storage backend instead of the volatile memcached/kvs backends.

tags: added: blueprint
Robert Clark (robert-clark) wrote :

Restarting memcached loses revoked token list
----

### Summary ###
When a cloud is deployed using Memcache as a backend for Keystone tokens there is a security concern that restarting Memcached will loose the list of revoked tokens, potentially allowing bad tokens / users to access the system after they had been revoked.

### Affected Services / Software ###
Keystone, Memcache

### Discussion ###
There might be ways to mitigate in the future, such as running memcached on multiple machines to ensure redundancy should the Keystone server fail. In a clustered environment, it will only be an issue if all of the memcached machines shutdown.

Memcachedb might also be a potential way to mitigate. http://memcachedb.org/

NOTE: Some deployments may intentionally flush Memcached in response to https://bugs.launchpad.net/ossn/+bug/1179955 - please exercise caution when considering how to approach this problem.

### Recommended Actions ###
This is a fundamental problem with using in-memory ephemeral storage for security information. If your deployment has strong security requirements or a reliance on up-to-date revoked token information we suggest you consider using an on-disk DB such as MySQL / PostgreSQL or perhaps look into Memcachedb.

### Contacts / References ###
This OSSN : https://bugs.launchpad.net/ossn/+bug/1182920
Blueprint : https://blueprints.launchpad.net/keystone/+spec/revocation-backend
OpenStack Security ML : openstack-security at lists.openstack.org
OpenStack Security Group : https://launchpad.net/~openstack-ossg

Changed in ossn:
status: In Progress → Fix Released
Changed in keystone:
status: Confirmed → Triaged
Morgan Fainberg (mdrnstm) wrote :

With regards to using "memcached" backend, there isn't a whole lot we can do to prevent this. There are initiatives to remove the persistence of tokens completely (revocation events, and non-persistent tokens).

I don't see how Keystone can otherwise address a fundamental issue with the persistent store for tokens / revocation list.

I am marking this as Invalid as it is not something we directly can address. It would be possible (with the new dogpile system for KVS token backends) to leverage Redis or other storage systems that potentially have a disk-back to restore from when restarting the external service.

Changed in keystone:
status: Triaged → Invalid
Morgan Fainberg (mdrnstm) wrote :

OpenStack also addressed this with a Security Note.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers