Poor performance on delete_tokens due to missing indexes

Bug #1332666 reported by Haneef Ali
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Identity (keystone)
Fix Released
Medium
Justin Shepherd

Bug Description

Due to user deletion (e.g. when heat deletes a temporary user) all tokens in the active token list need to be scanned to match the user_id. Similarly, tokens that need to match the trust_id require a scan of all active tokens.

This causes an issue as we select all the columns (especially the extra column that contains a significant amount of data). At 15000 active tokens it is not impossible to load 15000x10k of token data for the scan. Under some circumstances this will cause a lot of work to be done, fill up buffers, and return 0 active tokens. Indexes on the user_id and trust_id columns in the token table solve this issue.

The issue primarily is seen when it locks up either an eventlet worker or a mod_wsgi process (as the DB queries do not yield to other coroutines in eventlet when using MySQLDB or consume the resources for the mod_wsgi worker until the results are returned).

The query looks like:

SELECT token.id AS token_id, token.expires AS token_expires, token.extra AS token_extra, token.valid AS token_valid, token.user_id AS token_user_id, token.trust_id AS token_trust_id FROM token WHERE token.valid = 1 AND token.expires > '2014-06-19 23:18:48.196884' AND token.user_id = 'f6d9db238d084998aaef92ce425edff0';

This query most of the time uses the index "idx_token_expires" which results in too many rows. Some times depending on the load using this index matches more than 50000 rows in our performance run which is as good as full table scan.

As the queries in question use "user_id" in the where clause, the above query can be optimized by adding index on user_id. The same performance run after adding the index on user_id doesn't show any degradation.

Revision history for this message
Dolph Mathews (dolph) wrote :

As an aside, the poll frequency for the revocation list is configurable. In addition, I hope you're purging expired tokens from your backend (e.g. keystone-manage token_flush) to reduce the number of persisted tokens.

I believe such an index was proposed in the past, and rejected... although I don't recall why (I'm sure the answer is buried in gerrit somewhere). I can't think of any reason why it wouldn't be an improvement?

tags: added: performance
Changed in keystone:
importance: Undecided → Medium
status: New → Triaged
tags: added: sql
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to keystone (master)

Fix proposed to branch: master
Review: https://review.openstack.org/102041

Changed in keystone:
assignee: nobody → Justin Shepherd (jshepher)
status: Triaged → In Progress
summary: - Kesytone token poor performance. Need index on user_id
+ Keystone token poor performance. Need index on user_id
Changed in keystone:
assignee: Justin Shepherd (jshepher) → Dolph Mathews (dolph)
Dolph Mathews (dolph)
Changed in keystone:
assignee: Dolph Mathews (dolph) → Justin Shepherd (jshepher)
Revision history for this message
Morgan Fainberg (mdrnstm) wrote : Re: Keystone token poor performance. Need index on user_id

I don't think this bug is specifically related to the /v2.0/revoked call. The revoked call itself is very specifically listing just expired tokens and would not directly benefit from an index on the user_id column:

 query = session.query(TokenModel.id, TokenModel.expires)
        query = query.filter(TokenModel.expires > now)
        token_references = query.filter_by(valid=False)

I am looking into the root cause of this query, it might be something we can optimize without needing to do a query for user_id in the WHERE clause for each call to retrieve the revoked list.

I am *very* concerned about performing any migrations on the token table that are not needed.

Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

There is in-fact only a couple of specific cases we would generate the WHERE clause and lookup based upon user id in the token backend and all of these revolve around the use of delete_tokens and list_tokens, which should not be in the critical path for /v2.0/revoked. There is something else going on here.

Revision history for this message
Dolph Mathews (dolph) wrote :

The query in the bug report is likely caused by middleware, but it's not GET /v2.0/revoked -- it looks like UUID token validation (GET /v2.0/tokens/{token_id} or GET /v3/auths/tokens).

Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

@Dolph, I don't think it's in the token validation line either, we only filter on user_id during a token_delete (for user, for project, for trust, consumer):

https://github.com/openstack/keystone/blob/master/keystone/token/backends/sql.py#L95
https://github.com/openstack/keystone/blob/master/keystone/token/backends/sql.py#L144
https://github.com/openstack/keystone/blob/master/keystone/token/backends/sql.py#L160

Those should only be called by:

https://github.com/openstack/keystone/blob/master/keystone/token/core.py#L338
https://github.com/openstack/keystone/blob/master/keystone/token/core.py#L173

UUID validation (and direct PKI validation) still do a token .get on the token_id which is an indexed PK, not filtered by user_id.

Revision history for this message
Dolph Mathews (dolph) wrote :

Morgan: you broke your own links above in your recent refactor to create a token.persistence package :) Use permalinks!

Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

@dolph, dang it! I did, i usually try to do that but missed this time :P

Changed in keystone:
milestone: none → juno-rc1
Dolph Mathews (dolph)
Changed in keystone:
milestone: juno-rc1 → none
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on keystone (master)

Change abandoned by Morgan Fainberg (<email address hidden>) on branch: master
Review: https://review.openstack.org/102041
Reason: Abandoning this change as the conversation has not continued. Feel free to reopen the change, hit me up in IRC, or continue with the bug so we can help identify the reason you're seeing the high incidence of the where user_id query. Once we know the source of these queries it will be possible to either look at fixing the underlying issue or re-evaluate adding this index.

Revision history for this message
Morgan Fainberg (mdrnstm) wrote : Re: Keystone token poor performance. Need index on user_id

I'm not seeing where the call that would be spamming keystone for token lookup by user_id is. If we can track this down we can explore fixing that query or adding the index.

Changed in keystone:
status: In Progress → Incomplete
assignee: Justin Shepherd (jshepher) → nobody
Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

After doing work with the Triple-O team looking into issues with builds failing, it looks like the source of the where clauses for user_id is heat deleting temporary users that have no tokens.

These queries can take a significant time depending on token data due to needing to scan all active tokens for the specific user.

Changed in keystone:
status: Incomplete → In Progress
Changed in keystone:
milestone: none → juno-rc1
Changed in keystone:
assignee: nobody → Morgan Fainberg (mdrnstm)
summary: - Keystone token poor performance. Need index on user_id
+ Poor performance on delete_tokens due to missing indexes
description: updated
Changed in keystone:
assignee: Morgan Fainberg (mdrnstm) → Justin Shepherd (jshepher)
tags: added: icehouse-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to keystone (master)

Reviewed: https://review.openstack.org/102041
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=4a462a47fc95635ac542aa4259e7b3c461e72408
Submitter: Jenkins
Branch: master

commit 4a462a47fc95635ac542aa4259e7b3c461e72408
Author: galstrom21 <email address hidden>
Date: Mon Jun 23 17:30:03 2014 -0500

    Adding an index on token.user_id and token.trust_id

    This is a performance update to ensure that we are scanning the fewest
    number of rows on a user delete (causing token revocations). Without
    these indexes it is possible to scan all valid tokens, causing
    significant overhead, to find the user or trust matching tokens.

    Due to selecting the extra column (needed for other matches in
    some cases) this can also cause issues with buffer pool sizes.

    Change-Id: I202b5c87a221d8dba99d16b0a1baa7546fef093b
    Closes-Bug: 1332666

Changed in keystone:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in keystone:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in keystone:
milestone: juno-rc1 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.