Bug #1607553 “Revocation event caching is broken across processe...” : Bugs : OpenStack Identity (keystone)

Lance Bragstad (lbragstad) on 2016-07-28

tags:	added: revoke
tags:	added: fernet

Revision history for this message

Lance Bragstad (lbragstad) wrote on 2016-07-28:

#1

I also have a feeling this has something to do with request local caching in keystone, because I get different results when I run the tempest tests serially. In that case I usually get *less* failures than when I run them concurrently.

Revision history for this message

Lance Bragstad (lbragstad) wrote on 2016-07-29:

#2

Download full text (3.3 KiB)

I attempted to make some progress on debugging this. To do so - I've applied a bunch of logging statements in keystone to give me a little more information about the cache [0]. I have also set the following config values in my keystone.conf.

[DEFAULT]
default_log_levels = oslo_cache=DEBUG
[cache]
debug_cache_backend = true
[token]
provider = fernet

The default_log_levels is what is going to tell us when things hit or miss the cache, as well as when specific keys are removed [1].

When I run the tempest test from the description above, the keystone.log gives me pretty verbose cache logging [2]. The test output also gives the token_id that failed the test [3]. From the test output you can get the token_id and using the output from the logs, you can trace it to the appropriate cache key.

The token is cached twice according to the logs. It's cached once by the _validate_token() [4] method and once by the validate_non_persistent_token() method [5].

The validate_non_persistent_token() key is: 4764d7ee10ba82a2933bb8b55609364fea7b9d1b
The _validate_token() key is: 21f8fb149ffc2c01b1d46d878f1a6117e195e14a

We can see that the cache invalidation is doing it's job because both keys are deleted from the cache when the token is revoked:

CACHE_DELETE: "'4764d7ee10ba82a2933bb8b55609364fea7b9d1b'"
CACHE_DELETE: "'21f8fb149ffc2c01b1d46d878f1a6117e195e14a'"

Everything looks fine up to this point. The keys have been removed from the token cache and a revocation event should have been persisted. The next time the token from the test is validated, the validation will rebuild the entire authorization context from the database and then compare it to the list of revocation events. Revocation events are cached out of the box in keystone, and the revocation cache region should have been invalidated when a new revocation event was added to the list [6].

But this isn't the case. For some reason, the revocation API continues to return the old revocation list to the token provider API when validating the now deleted token. I have a feeling this has something to do with keystone approach to request local caching. If I set [revoke]caching=False, the tests pass consistently every time. Devstack deploys keystone with 5 processes and 1 thread for both the public and admin endpoints. When I switch the process count to 1 and the threads to 1, the test passes consistently with revocation caching enabled.

I have a feeling when multiple processes are running - the revocation cache isn't getting invalidated across all processes, which would make sense with what Morgan stated [7]

[0] http://cdn.pasteraw.com/o35mn2ipac41f1hx68zz37alj3bwob
[1] https://github.com/openstack/oslo.cache/blob/d2a5e1d041c504ade2352e6449397645fb2449d3/oslo_cache/core.py#L65-L100
[2] http://cdn.pasteraw.com/ms750ntgl1nv00z6a16jfew5jft23ml
[3] http://cdn.pasteraw.com/67vnkdoo4n07f2xsqyruv6qlj9fi0xc
[4] https://github.com/openstack/keystone/blob/627eeeaef421867bf501f450cd207c3fb94d56ee/keystone/token/provider.py#L299-L300
[5] https://github.com/openstack/keystone/blob/627eeeaef421867bf501f450cd207c3fb94d56ee/keystone/token/provider.py#L296-L297
[6] https://github.com/openstack/keystone/blob/627eeeaef4...