OpenStack Identity (keystone)

Poor performance on delete_tokens due to missing indexes

Bug #1332666 reported by Haneef Ali on 2014-06-20

This bug affects 5 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Identity (keystone)	Fix Released	Medium	Justin Shepherd	OpenStack Identity (keystone) 2014.2 "juno"

Bug Description

Due to user deletion (e.g. when heat deletes a temporary user) all tokens in the active token list need to be scanned to match the user_id. Similarly, tokens that need to match the trust_id require a scan of all active tokens.

This causes an issue as we select all the columns (especially the extra column that contains a significant amount of data). At 15000 active tokens it is not impossible to load 15000x10k of token data for the scan. Under some circumstances this will cause a lot of work to be done, fill up buffers, and return 0 active tokens. Indexes on the user_id and trust_id columns in the token table solve this issue.

The issue primarily is seen when it locks up either an eventlet worker or a mod_wsgi process (as the DB queries do not yield to other coroutines in eventlet when using MySQLDB or consume the resources for the mod_wsgi worker until the results are returned).

The query looks like:

SELECT token.id AS token_id, token.expires AS token_expires, token.extra AS token_extra, token.valid AS token_valid, token.user_id AS token_user_id, token.trust_id AS token_trust_id FROM token WHERE token.valid = 1 AND token.expires > '2014-06-19 23:18:48.196884' AND token.user_id = 'f6d9db238d084998aaef92ce425edff0';

This query most of the time uses the index "idx_token_expires" which results in too many rows. Some times depending on the load using this index matches more than 50000 rows in our performance run which is as good as full table scan.

As the queries in question use "user_id" in the where clause, the above query can be optimized by adding index on user_id. The same performance run after adding the index on user_id doesn't show any degradation.

See original description

Tags:

Revision history for this message

Dolph Mathews (dolph) wrote on 2014-06-23:

As an aside, the poll frequency for the revocation list is configurable. In addition, I hope you're purging expired tokens from your backend (e.g. keystone-manage token_flush) to reduce the number of persisted tokens.

I believe such an index was proposed in the past, and rejected... although I don't recall why (I'm sure the answer is buried in gerrit somewhere). I can't think of any reason why it wouldn't be an improvement?

tags:	added: performance
Changed in keystone:
importance:	Undecided → Medium
status:	New → Triaged
tags:	added: sql

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-06-23: Fix proposed to keystone (master)

Fix proposed to branch: master
Review: https://review.openstack.org/102041

Changed in keystone:
assignee:	nobody → Justin Shepherd (jshepher)
status:	Triaged → In Progress

Lance Bragstad (lbragstad) on 2014-07-02

summary:

- Kesytone token poor performance. Need index on user_id
+ Keystone token poor performance. Need index on user_id

OpenStack Infra (hudson-openstack) on 2014-07-03

Changed in keystone:
assignee:	Justin Shepherd (jshepher) → Dolph Mathews (dolph)

Dolph Mathews (dolph) on 2014-07-03

Changed in keystone:
assignee:	Dolph Mathews (dolph) → Justin Shepherd (jshepher)

Revision history for this message

Morgan Fainberg (mdrnstm) wrote on 2014-07-22: Re: Keystone token poor performance. Need index on user_id

I don't think this bug is specifically related to the /v2.0/revoked call. The revoked call itself is very specifically listing just expired tokens and would not directly benefit from an index on the user_id column:

query = session.query(TokenModel.id, TokenModel.expires)
query = query.filter(TokenModel.expires > now)
token_references = query.filter_by(valid=False)

I am looking into the root cause of this query, it might be something we can optimize without needing to do a query for user_id in the WHERE clause for each call to retrieve the revoked list.

I am *very* concerned about performing any migrations on the token table that are not needed.

Revision history for this message

Morgan Fainberg (mdrnstm) wrote on 2014-07-22:

There is in-fact only a couple of specific cases we would generate the WHERE clause and lookup based upon user id in the token backend and all of these revolve around the use of delete_tokens and list_tokens, which should not be in the critical path for /v2.0/revoked. There is something else going on here.

Revision history for this message

Dolph Mathews (dolph) wrote on 2014-07-22:

The query in the bug report is likely caused by middleware, but it's not GET /v2.0/revoked -- it looks like UUID token validation (GET /v2.0/tokens/{token_id} or GET /v3/auths/tokens).

Revision history for this message

Morgan Fainberg (mdrnstm) wrote on 2014-07-22:

@Dolph, I don't think it's in the token validation line either, we only filter on user_id during a token_delete (for user, for project, for trust, consumer):

https://github.com/openstack/keystone/blob/master/keystone/token/backends/sql.py#L95
https://github.com/openstack/keystone/blob/master/keystone/token/backends/sql.py#L144
https://github.com/openstack/keystone/blob/master/keystone/token/backends/sql.py#L160

Those should only be called by:

https://github.com/openstack/keystone/blob/master/keystone/token/core.py#L338
https://github.com/openstack/keystone/blob/master/keystone/token/core.py#L173

UUID validation (and direct PKI validation) still do a token .get on the token_id which is an indexed PK, not filtered by user_id.

Revision history for this message

Dolph Mathews (dolph) wrote on 2014-07-29:

Morgan: you broke your own links above in your recent refactor to create a token.persistence package :) Use permalinks!

Revision history for this message

Morgan Fainberg (mdrnstm) wrote on 2014-07-29:

@dolph, dang it! I did, i usually try to do that but missed this time :P

Lance Bragstad (lbragstad) on 2014-09-04

Changed in keystone:
milestone:	none → juno-rc1

Dolph Mathews (dolph) on 2014-09-08

Changed in keystone:
milestone:	juno-rc1 → none

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-17: Change abandoned on keystone (master)

Change abandoned by Morgan Fainberg (<email address hidden>) on branch: master
Review: https://review.openstack.org/102041
Reason: Abandoning this change as the conversation has not continued. Feel free to reopen the change, hit me up in IRC, or continue with the bug so we can help identify the reason you're seeing the high incidence of the where user_id query. Once we know the source of these queries it will be possible to either look at fixing the underlying issue or re-evaluate adding this index.

Revision history for this message

Morgan Fainberg (mdrnstm) wrote on 2014-09-21: Re: Keystone token poor performance. Need index on user_id

#10

I'm not seeing where the call that would be spamming keystone for token lookup by user_id is. If we can track this down we can explore fixing that query or adding the index.

Changed in keystone:
status:	In Progress → Incomplete
assignee:	Justin Shepherd (jshepher) → nobody

Revision history for this message

Morgan Fainberg (mdrnstm) wrote on 2014-09-25:

#11

After doing work with the Triple-O team looking into issues with builds failing, it looks like the source of the where clauses for user_id is heat deleting temporary users that have no tokens.

These queries can take a significant time depending on token data due to needing to scan all active tokens for the specific user.

Changed in keystone:
status:	Incomplete → In Progress

Morgan Fainberg (mdrnstm) on 2014-09-25

Changed in keystone:
milestone:	none → juno-rc1

OpenStack Infra (hudson-openstack) on 2014-09-25

Changed in keystone:
assignee:	nobody → Morgan Fainberg (mdrnstm)

Morgan Fainberg (mdrnstm) on 2014-09-25

summary:

- Keystone token poor performance. Need index on user_id
+ Poor performance on delete_tokens due to missing indexes

Morgan Fainberg (mdrnstm) on 2014-09-25

description:	updated
Changed in keystone:
assignee:	Morgan Fainberg (mdrnstm) → Justin Shepherd (jshepher)
tags:	added: icehouse-backport-potential

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-26: Fix merged to keystone (master)

#12

Reviewed: https://review.openstack.org/102041
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=4a462a47fc95635ac542aa4259e7b3c461e72408
Submitter: Jenkins
Branch: master

commit 4a462a47fc95635ac542aa4259e7b3c461e72408
Author: galstrom21 <email address hidden>
Date: Mon Jun 23 17:30:03 2014 -0500

Adding an index on token.user_id and token.trust_id

    This is a performance update to ensure that we are scanning the fewest
    number of rows on a user delete (causing token revocations). Without
    these indexes it is possible to scan all valid tokens, causing
    significant overhead, to find the user or trust matching tokens.

Due to selecting the extra column (needed for other matches in
some cases) this can also cause issues with buffer pool sizes.

Change-Id: I202b5c87a221d8dba99d16b0a1baa7546fef093b
Closes-Bug: 1332666

Changed in keystone:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2014-09-30

Changed in keystone:
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2014-10-16

Changed in keystone:
milestone:	juno-rc1 → 2014.2

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1277298

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.