flush_expired_tokens API fails when more than 200k tokens in the token table

Bug #1609511 reported by Sam Leong
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Identity (keystone)
In Progress
Low
Sam Leong

Bug Description

When there are over 200k tokens need to be flushed, it results the following error in mysql error.log -

2016-07-29 02:00:18 40452 [Warning] WSREP: transaction size limit (1073741824) exceeded: 1073774592
2016-07-29 02:00:19 40452 [ERROR] WSREP: rbr write fail, data_len: 0, 2

In the current code, it triggers a series of DELETE for a batch of 1000 tokens until all the expired tokens are deleted. However, all the DELETE transactions are not yet committed until the method exits.

For big size of transactions like that, we definitely cannot reply on the auto-commit in the end. Instead, explicit commit should be triggered for every single transaction.

The following is the error in keystone.log -

289518 27060 (keystone): 2016-07-29 01:15:25,397 CRITICAL log logging_excepthook OperationalError: (_mysql_exceptions.OperationalError) (1180, 'Got error 5 during COMMIT')
1289519 Traceback (most recent call last):
1289520 File "/opt/stack/service/keystone-20160719T001534Z/venv/bin//keystone-manage", line 10, in <module>
1289521 sys.exit(main())
1289522 File "/opt/stack/venv/keystone-20160719T001534Z/lib/python2.7/site-packages/keystone/cmd/manage.py", line 47, in main
1289523 cli.main(argv=sys.argv, config_files=config_files)
1289524 File "/opt/stack/venv/keystone-20160719T001534Z/lib/python2.7/site-packages/keystone/cmd/cli.py", line 1010, in main
1289525 CONF.command.cmd_class.main()
1289526 File "/opt/stack/venv/keystone-20160719T001534Z/lib/python2.7/site-packages/keystone/cmd/cli.py", line 562, in main
1289527 token_manager.flush_expired_tokens()
1289528 File "/opt/stack/venv/keystone-20160719T001534Z/lib/python2.7/site-packages/keystone/token/persistence/backends/sql.py", line 286, in flush_expired_tokens
1289529 LOG.info(_LI('Total expired tokens removed: %d'), total_removed)
1289530 File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
1289531 self.gen.next()
1289532 File "/opt/stack/venv/keystone-20160719T001534Z/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 760, in _transaction_scope
1289533 yield resource
1289534 File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
1289535 self.gen.next()
1289536 File "/opt/stack/venv/keystone-20160719T001534Z/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 495, in _session
1289537 self._end_session_transaction(self.session)
1289538 File "/opt/stack/venv/keystone-20160719T001534Z/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 516, in _end_session_transaction
1289539 session.commit()
1289540 File "/opt/stack/venv/keystone-20160719T001534Z/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 801, in commit
1289541 self.transaction.commit()
1289542 File "/opt/stack/venv/keystone-20160719T001534Z/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 396, in commit
1289543 t[1].commit()
1289544 File "/opt/stack/venv/keystone-20160719T001534Z/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1574, in commit
1289545 self._do_commit()
1289546 File "/opt/stack/venv/keystone-20160719T001534Z/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1605, in _do_commit
1289547 self.connection._commit_impl()
1289548 File "/opt/stack/venv/keystone-20160719T001534Z/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 690, in _commit_impl
1289549 self._handle_dbapi_exception(e, None, None, None, None)
1289550 File "/opt/stack/venv/keystone-20160719T001534Z/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1337, in _handle_dbapi_exception
1289551 util.raise_from_cause(newraise, exc_info)
1289552 File "/opt/stack/venv/keystone-20160719T001534Z/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 200, in raise_from_cause
1289553 reraise(type(exception), exception, tb=exc_tb, cause=cause)
1289554 File "/opt/stack/venv/keystone-20160719T001534Z/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 688, in _commit_impl
1289555 self.engine.dialect.do_commit(self.connection)
1289556 File "/opt/stack/venv/keystone-20160719T001534Z/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 2514, in do_commit
1289557 dbapi_connection.commit()
1289558 OperationalError: (_mysql_exceptions.OperationalError) (1180, 'Got error 5 during COMMIT')

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to keystone (master)

Fix proposed to branch: master
Review: https://review.openstack.org/351428

Changed in keystone:
assignee: nobody → Sam Leong (chio-fai-sam-leong)
status: New → In Progress
Revision history for this message
Steve Martinelli (stevemar) wrote :

How did the token table get that large? Was the cron job to prune old tokens not running?

Revision history for this message
Steve Martinelli (stevemar) wrote :

Not trying to challenge the validity of the bug, just trying to gauge it's severity

Revision history for this message
Sam Leong (chio-fai-sam-leong) wrote :

Steve, we are running a perf testing on this.

The test case is to create 8k VMs, each VM creation will need roughly 20 tokens and it will end up about 160k tokens in a day at some points. We run this as a cron job once a day and if we were to remove that many tokens, it will break the cron job and the number of tokens will keep accumulated.

Revision history for this message
Steve Martinelli (stevemar) wrote :

With a move toward non-persistence and the fact that this was brought up in a performance test, I'm lowering the severity of the bug. I'm sure someone in production could hit the error, but if that is the case, I'd recommend they use fernet tokens or run the cron job more frequently.

Changed in keystone:
importance: Undecided → Wishlist
importance: Wishlist → Low
Revision history for this message
Steve Martinelli (stevemar) wrote :

Looks like someone hit this in a production environment: https://bugs.launchpad.net/keystone/+bug/1649616

Revision history for this message
Lance Bragstad (lbragstad) wrote :

Could we track one with the other?

Revision history for this message
Steve Martinelli (stevemar) wrote :

@Lance, yes, we can, let's track that instead.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on keystone (master)

Change abandoned by Lance Bragstad (<email address hidden>) on branch: master
Review: https://review.openstack.org/351428
Reason: This was addressed with https://review.openstack.org/#/c/454351/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.