Race condition when rapidly deleting and creating tokens

Bug #1099966 reported by Jay Pipes
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Identity (keystone)
Won't Fix
Medium
Unassigned

Bug Description

token backend is SQL. PKI enabled. Multi-node setup with database on separate node from keystone server.

The symptom of this looks like this:

http://paste.openstack.org/show/29472/

Which occurs on random tests in Tempest's identity admin tests, but consistently, when executing the tests in a PKI environment with the database server on a separate node. This apparently does not occur when using devstack, which has a local MySQL instance (and may have some caching enabled?)

The race condition happens like so:

Thread 1:

POST /tokens with auth data, passing the token matching the PKI CMS record for a user

Hits this block of code:

https://github.com/openstack/keystone/blob/master/keystone/token/controllers.py#L124 [1]

The call to token_api.create_token() fails with an IntegrityError from SQLAlchemy. This is a planned-for event, apparently, as the code on line 132 [2] catches Exception, with the following in-line code comment:

            # an identical token may have been created already.
            # if so, return the token_data as it is also identical

now in Thread 2:

A call to DELETE /tokens (or possibly some token expiration code?) proceeds to delete the same token for the user that just resulted in the IntegrityError raised in thread 1.

back in Thread 1:

The call to token_api.get_token() now fails with a NotFound exception, which causes the original exception (IntegrityError) to be re-raised and sent back across the wire to the end-user.

Proposed Solution:

Instead of re-raising the original exception on line 139 [3], instead drop into a simple loop with a randomized timeout that calls create_token() again with the token ID and token data from line 125.

[1] Same block in Folsom: https://github.com/openstack/keystone/blob/stable/folsom/keystone/service.py#L437
[2] Line 445 in Folsom code.
[3] Line 452 in Folsom code.

Dolph Mathews (dolph)
Changed in keystone:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Ante Karamatić (ivoks) wrote :

How about checking if the token exist before creating the same one?

try:
    self.token_api.get_token(context=context,
                                                   token_id=token_id)
except exception.TokenNotFound:
    self.token_api.create_token(...

In that case, if token exists, everything is fine (it might even get deleted just after we fetch it).

Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

We have included microsecond data (should be unique per token except in some fairly narrow scenarios). Further improvements are likely to require a lot of work for not a lot of benefit.

At this point I don't think we're seeing much of this error occurring either in test or real deployments, so I'm marking this as "Wont Fix". We can revisit this later if it turns out to resurface.

Changed in keystone:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.