Keystone Token Flush job does not complete in HA deployed environment

Bug #1649616 reported by Alex Krzos on 2016-12-13
42
This bug affects 6 people
Affects Status Importance Assigned to Milestone
OpenStack Identity (keystone)
Medium
Peter Sabaini
Newton
Medium
Raildo Mascena de Sousa Filho
Ocata
Medium
Raildo Mascena de Sousa Filho
Ubuntu Cloud Archive
Undecided
Jorge Niedbalski
Mitaka
Medium
Jorge Niedbalski
Newton
Medium
Jorge Niedbalski
Ocata
Medium
Jorge Niedbalski
puppet-keystone
Medium
Juan Antonio Osorio Robles
tripleo
Medium
Juan Antonio Osorio Robles
keystone (Ubuntu)
High
Jorge Niedbalski
Xenial
High
Jorge Niedbalski
Yakkety
High
Jorge Niedbalski
Zesty
High
Jorge Niedbalski

Bug Description

[Impact]

 * The Keystone token flush job can get into a state where it will never complete because the transaction size exceeds the mysql galara transaction size - wsrep_max_ws_size (1073741824).

[Test Case]

1. Authenticate many times
2. Observe that keystone token flush job runs (should be a very long time depending on disk) >20 hours in my environment
3. Observe errors in mysql.log indicating a transaction that is too large

Actual results:
Expired tokens are not actually flushed from the database without any errors in keystone.log. Only errors appear in mysql.log.

Expected results:
Expired tokens to be removed from the database

[Additional info:]

It is likely that you can demonstrate this with less than 1 million tokens as the >1 million token table is larger than 13GiB and the max transaction size is 1GiB, my token bench-marking Browbeat job creates more than needed.

Once the token flush job can not complete the token table will never decrease in size and eventually the cloud will run out of disk space.

Furthermore the flush job will consume disk utilization resources. This was demonstrated on slow disks (Single 7.2K SATA disk). On faster disks you will have more capacity to generate tokens, you can then generate the number of tokens to exceed the transaction size even faster.

Log evidence:
[root@overcloud-controller-0 log]# grep " Total expired" /var/log/keystone/keystone.log
2016-12-08 01:33:40.530 21614 INFO keystone.token.persistence.backends.sql [-] Total expired tokens removed: 1082434
2016-12-09 09:31:25.301 14120 INFO keystone.token.persistence.backends.sql [-] Total expired tokens removed: 1084241
2016-12-11 01:35:39.082 4223 INFO keystone.token.persistence.backends.sql [-] Total expired tokens removed: 1086504
2016-12-12 01:08:16.170 32575 INFO keystone.token.persistence.backends.sql [-] Total expired tokens removed: 1087823
2016-12-13 01:22:18.121 28669 INFO keystone.token.persistence.backends.sql [-] Total expired tokens removed: 1089202
[root@overcloud-controller-0 log]# tail mysqld.log
161208 1:33:41 [Warning] WSREP: transaction size limit (1073741824) exceeded: 1073774592
161208 1:33:41 [ERROR] WSREP: rbr write fail, data_len: 0, 2
161209 9:31:26 [Warning] WSREP: transaction size limit (1073741824) exceeded: 1073774592
161209 9:31:26 [ERROR] WSREP: rbr write fail, data_len: 0, 2
161211 1:35:39 [Warning] WSREP: transaction size limit (1073741824) exceeded: 1073774592
161211 1:35:40 [ERROR] WSREP: rbr write fail, data_len: 0, 2
161212 1:08:16 [Warning] WSREP: transaction size limit (1073741824) exceeded: 1073774592
161212 1:08:17 [ERROR] WSREP: rbr write fail, data_len: 0, 2
161213 1:22:18 [Warning] WSREP: transaction size limit (1073741824) exceeded: 1073774592
161213 1:22:19 [ERROR] WSREP: rbr write fail, data_len: 0, 2

Disk utilization issue graph is attached. The entire job in that graph takes from the first spike is disk util(~5:18UTC) and culminates in about ~90 minutes of pegging the disk (between 1:09utc to 2:43utc).

[Regression Potential]
* Not identified

Related branches

Steve Martinelli (stevemar) wrote :

Hi Alex,

So someone finally hit bug https://bugs.launchpad.net/keystone/+bug/1609511 in a production environment.

A few questions so I gauge the severity properly.

  - Is it possible to run the cron job that flushes tokens more frequently?

  - What is preventing you from moving to fernet tokens? They are not persisted.

Lastly, Sam Leong had a patch for this issue, it's available here: https://review.openstack.org/#/c/351428/

Unfortunately, Sam has stopped working on keystone these days.

Alex Krzos (akrzos) wrote :

Hi Steve,

Actually this was not on a production cloud but a performance testing cloud. This bug is likely a duplicate to the one you linked me over to. One major difference is there are no errors in the keystone log file as the previous bug noted though. (but there is still an error in mysql log file)

To answer your questions though (in the context of performance testing)

  - Is it possible to run the cron job that flushes tokens more frequently?

I suppose we can do this, to benchmark more "deterministically" we should probably clean the token table out prior to each iteration (357 iterations during this test)

  - What is preventing you from moving to fernet tokens? They are not persisted.

I am actually running the same benchmarks with Fernet tokens as well.

As you already pointed out there is several options to address this:
* Increase frequency of token flushing (at expense of the resource consumption demanded by token flushes)
* Switch to Fernet tokens (No need for a flush, but different performance characteristics)
* Increase size of allowed mysql transaction (increase the size of the flush)

Depending upon the workload in the cloud, some options might be better I suppose.

Lance Bragstad (lbragstad) wrote :

Alex - is the only difference between this report and https://bugs.launchpad.net/keystone/+bug/1609511 the fact the 1609511 has a more detailed error in the logs? What logging level was set when you ran into this issue?

Changed in keystone:
importance: Undecided → Medium

What should someone do in the case this is hit in production ?

Fernet could be used, but the migration can take some time. What to do in the meantime ?

On the other hand, nobody has hit this in production yet (at least not reported publicly).
Should we mark this as won't fix for now rather than working in a fix that is not useful for anyone now ?

Alex Krzos (akrzos) wrote :

Lance,

I reviewed the log from the other bug (1609511) another question here is how come with that cloud the error was logged into keystone.log vs my cloud the error showed up under mysql but was silent in keystone.log. I will re-run on another setup I have to see if this is still the case or if I just missed the fact keystone does log it or something is configured to silence that issue.

Alex Krzos (akrzos) wrote :
Download full text (9.3 KiB)

I reran to confirm on a similar setup but with a slightly better disk layout (RAID 1 Mirror rather than single SATA disk) and the results are the same. I reviewed the Keystone log files and did find the same stack trace as the other bug. So this is logged correctly with the deployment that I am using.

Aside from the tokens never clearing and the job running for almost 24 hours, when this job runs and doesn't complete the cloud still pays the same penalty in disk utilization attempting to remove the expired tokens(Every day forth).

Here is the log output related to the unsuccessful token flush and I will attach another disk io % util on the controller that the database processes are handling the request from the token flush process.

2017-01-06 03:43:05.810 1013422 INFO keystone.token.persistence.backends.sql [-] Total expired tokens removed: 1003291 [25/213]
2017-01-06 04:06:23.540 1013422 ERROR oslo_db.sqlalchemy.exc_filters [-] DBAPIError exception wrapped from (pymysql.err.InternalError) (1180, u'Got error 5 during COMMIT')
2017-01-06 04:06:23.540 1013422 ERROR oslo_db.sqlalchemy.exc_filters Traceback (most recent call last):
2017-01-06 04:06:23.540 1013422 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 688, in _commit_impl
2017-01-06 04:06:23.540 1013422 ERROR oslo_db.sqlalchemy.exc_filters self.engine.dialect.do_commit(self.connection)
2017-01-06 04:06:23.540 1013422 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib64/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 2514, in do_commit
2017-01-06 04:06:23.540 1013422 ERROR oslo_db.sqlalchemy.exc_filters dbapi_connection.commit()
2017-01-06 04:06:23.540 1013422 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 719, in commit
2017-01-06 04:06:23.540 1013422 ERROR oslo_db.sqlalchemy.exc_filters self._read_ok_packet()
2017-01-06 04:06:23.540 1013422 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 698, in _read_ok_packet
2017-01-06 04:06:23.540 1013422 ERROR oslo_db.sqlalchemy.exc_filters pkt = self._read_packet()
2017-01-06 04:06:23.540 1013422 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 906, in _read_packet
2017-01-06 04:06:23.540 1013422 ERROR oslo_db.sqlalchemy.exc_filters packet.check_error()
2017-01-06 04:06:23.540 1013422 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 367, in check_error
2017-01-06 04:06:23.540 1013422 ERROR oslo_db.sqlalchemy.exc_filters err.raise_mysql_exception(self._data)
2017-01-06 04:06:23.540 1013422 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 120, in raise_mysql_exception
2017-01-06 04:06:23.540 1013422 ERROR oslo_db.sqlalchemy.exc_filters _check_mysql_exception(errinfo)
2017-01-06 04:06:23.540 1013422 ERROR oslo_db.sqlalchemy.exc_filters File "...

Read more...

Alex Krzos (akrzos) wrote :

Do note that this is at the "end" of the token flush job. The actual job ran for much longer, but this end part pegs the disk out attempting to commit.

tags: added: canonical-bootstack
Peter Sabaini (peter-sabaini) wrote :

We seem to be hitting this in production.

Mysql log has:

170309 9:06:23 [Warning] WSREP: transaction size limit (1073774592) exceeded: 1073807360
170309 9:06:23 [ERROR] WSREP: rbr write fail, data_len: 0, 2

The leftover GRA log files contain only DELETE statements for keystone.token afaict.

Keystone log contains:

(oslo_db.sqlalchemy.exc_filters): 2017-03-09 09:07:06,576 ERROR DBAPIError exception wrapped from (pymysql.err.InternalError) (1180, u'Got error 5 during COMMIT')
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 688, in _commit_impl
    self.engine.dialect.do_commit(self.connection)
  File "/usr/lib/python2.7/dist-packages/sqlalchemy/dialects/mysql/base.py", line 2514, in do_commit
    dbapi_connection.commit()
  File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 758, in commit
    self._read_ok_packet()
  File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 737, in _read_ok_packet
    pkt = self._read_packet()
  File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 966, in _read_packet
    packet.check_error()
  File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line 394, in check_error
    err.raise_mysql_exception(self._data)
  File "/usr/lib/python2.7/dist-packages/pymysql/err.py", line 120, in raise_mysql_exception
    _check_mysql_exception(errinfo)
  File "/usr/lib/python2.7/dist-packages/pymysql/err.py", line 115, in _check_mysql_exception
    raise InternalError(errno, errorvalue)
InternalError: (1180, u'Got error 5 during COMMIT')

The token table has ~500k entries.

Lance Bragstad (lbragstad) wrote :

What if it were possible to give `keystone-manage token_flush` a batch size as an argument? This will give deployments with massive amounts of expired token the ability to clean them out in smaller batches.

It's certainly more intensive for the operator and really only makes sense for recovery situations like this. Thoughts?

Peter Sabaini (peter-sabaini) wrote :

Lance, the token_flush batches fine, this has been introduced with the fix to bug #1188378.

The problem afaict is just that this is done in a single transaction, with the result that this again can break replication.

Fix proposed to branch: master
Review: https://review.openstack.org/454351

Changed in keystone:
assignee: nobody → Peter Sabaini (peter-sabaini)
status: New → In Progress
Changed in puppet-keystone:
assignee: nobody → Juan Antonio Osorio Robles (juan-osorio-robles)
Changed in tripleo:
assignee: nobody → Juan Antonio Osorio Robles (juan-osorio-robles)

Reviewed: https://review.openstack.org/456086
Committed: https://git.openstack.org/cgit/openstack/puppet-keystone/commit/?id=bce06b24ac9439329fd8973a477f1ff9aca7be15
Submitter: Jenkins
Branch: master

commit bce06b24ac9439329fd8973a477f1ff9aca7be15
Author: Juan Antonio Osorio Robles <email address hidden>
Date: Wed Apr 12 10:09:28 2017 +0300

    Run token flush cron job twice a day

    Running the token flush once a day is seldom for larger deployments.
    This changes the frequency to be twice a day instead of once.

    Change-Id: Ia0b0fb422318712f4b0f4d023cbb3a61d40bb85d
    Related-Bug: #1649616

Changed in tripleo:
milestone: none → pike-2
Changed in puppet-keystone:
status: New → Triaged
Changed in tripleo:
status: New → Triaged
Changed in puppet-keystone:
importance: Undecided → Medium
Changed in tripleo:
importance: Undecided → Medium

Reviewed: https://review.openstack.org/456182
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=65e643aca2202f031db94f1ccd3d44e195e5e772
Submitter: Jenkins
Branch: master

commit 65e643aca2202f031db94f1ccd3d44e195e5e772
Author: Juan Antonio Osorio Robles <email address hidden>
Date: Wed Apr 12 14:31:53 2017 +0300

    Run token flush cron job hourly by default

    Running this job once a day has proven problematic for large
    deployments as seen in the bug report. Setting it to run hourly
    would be an improvement to the current situation, as the flushes
    wouldn't need to process as much data.

    Note that this only affects people using UUID as the token provider.

    Change-Id: I462e4da2bfdbcba0403ecde5d613386938e2283a
    Related-Bug: #1649616

Reviewed: https://review.openstack.org/457553
Committed: https://git.openstack.org/cgit/openstack/puppet-keystone/commit/?id=f694b5551f896042df6aeb751c65986ef3342f54
Submitter: Jenkins
Branch: master

commit f694b5551f896042df6aeb751c65986ef3342f54
Author: Juan Antonio Osorio Robles <email address hidden>
Date: Tue Apr 18 13:13:27 2017 +0300

    Change keystone token flush to run hourly

    In a recent commit [1] the keystone token flush cron job was changed to
    run twice a day. However, this change was not enough for big
    deployments.

    After getting some customer feedback and looking at what other projects
    are doing [2] [3] [4]. It seems that running this job hourly is the way
    to go.

    [1] Ia0b0fb422318712f4b0f4d023cbb3a61d40bb85d
    [2] https://www.ibm.com/support/knowledgecenter/en/SSB27U_6.4.0/com.ibm.zvm.v640.hcpo4/exptoken.htm
    [3] https://review.openstack.org/#/c/88670/8
    [4] https://github.com/openstack/charm-keystone/blob/master/templates/keystone-token-flush

    Change-Id: I6ec7ec8111bd93e5638cfe96189e36f0e0691d65
    Related-Bug: #1649616

Reviewed: https://review.openstack.org/454351
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=dc7f81083180eeb5233f7007e3d2514cc0d7c6d3
Submitter: Jenkins
Branch: master

commit dc7f81083180eeb5233f7007e3d2514cc0d7c6d3
Author: Peter Sabaini <email address hidden>
Date: Thu Apr 6 23:06:29 2017 +0200

    Make flushing tokens more robust

    Commit token flushes between batches in order to lower resource
    consumption and make flushing more robust for replication

    Change-Id: I9be37e420353a336a8acd820eadd47d4bcf7324f
    Closes-Bug: #1649616

Changed in keystone:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/458002
Committed: https://git.openstack.org/cgit/openstack/puppet-keystone/commit/?id=90ffc7f6008370e9d9893ab69f31683352c854c3
Submitter: Jenkins
Branch: stable/ocata

commit 90ffc7f6008370e9d9893ab69f31683352c854c3
Author: Juan Antonio Osorio Robles <email address hidden>
Date: Tue Apr 18 13:13:27 2017 +0300

    Change keystone token flush to run hourly

    In a recent commit [1] the keystone token flush cron job was changed to
    run twice a day. However, this change was not enough for big
    deployments.

    After getting some customer feedback and looking at what other projects
    are doing [2] [3] [4]. It seems that running this job hourly is the way
    to go.

    [1] Ia0b0fb422318712f4b0f4d023cbb3a61d40bb85d
    [2] https://www.ibm.com/support/knowledgecenter/en/SSB27U_6.4.0/com.ibm.zvm.v640.hcpo4/exptoken.htm
    [3] https://review.openstack.org/#/c/88670/8
    [4] https://github.com/openstack/charm-keystone/blob/master/templates/keystone-token-flush

    Conflicts:
     manifests/cron/token_flush.pp
     spec/acceptance/keystone_federation_identity_provider_spec.rb
     spec/acceptance/keystone_federation_shibboleth_spec.rb
     spec/acceptance/keystone_wsgi_apache_spec.rb
     spec/classes/keystone_cron_token_flush_spec.rb

    (cherry picked from commit f694b5551f896042df6aeb751c65986ef3342f54)
    Change-Id: I6ec7ec8111bd93e5638cfe96189e36f0e0691d65
    Related-Bug: #1649616

tags: added: in-stable-ocata

Reviewed: https://review.openstack.org/458005
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=c1fc74c0f3a8ba34032ac40ee67ef3bc4b7c9d3e
Submitter: Jenkins
Branch: stable/ocata

commit c1fc74c0f3a8ba34032ac40ee67ef3bc4b7c9d3e
Author: Juan Antonio Osorio Robles <email address hidden>
Date: Wed Apr 12 14:31:53 2017 +0300

    Run token flush cron job hourly by default

    Running this job once a day has proven problematic for large
    deployments as seen in the bug report. Setting it to run hourly
    would be an improvement to the current situation, as the flushes
    wouldn't need to process as much data.

    Note that this only affects people using UUID as the token provider.

    Change-Id: I462e4da2bfdbcba0403ecde5d613386938e2283a
    Related-Bug: #1649616
    (cherry picked from commit 65e643aca2202f031db94f1ccd3d44e195e5e772)

tags: added: sts
Changed in keystone (Ubuntu Xenial):
importance: Undecided → Medium
Changed in keystone (Ubuntu Yakkety):
importance: Undecided → Medium
Changed in keystone (Ubuntu Zesty):
importance: Undecided → Medium
Changed in keystone (Ubuntu Xenial):
status: New → Triaged
Changed in keystone (Ubuntu Yakkety):
status: New → Triaged
Changed in keystone (Ubuntu Zesty):
status: New → Triaged
Changed in cloud-archive:
assignee: nobody → Jorge Niedbalski (niedbalski)
status: New → In Progress
Changed in keystone (Ubuntu):
assignee: nobody → Jorge Niedbalski (niedbalski)
importance: Undecided → High
status: New → In Progress
Changed in keystone (Ubuntu Xenial):
assignee: nobody → Jorge Niedbalski (niedbalski)
status: Triaged → In Progress
Changed in keystone (Ubuntu Yakkety):
assignee: nobody → Jorge Niedbalski (niedbalski)
importance: Medium → High
status: Triaged → In Progress
Changed in keystone (Ubuntu Zesty):
assignee: nobody → Jorge Niedbalski (niedbalski)
importance: Medium → High
status: Triaged → In Progress
Jorge Niedbalski (niedbalski) wrote :
Jorge Niedbalski (niedbalski) wrote :
Jorge Niedbalski (niedbalski) wrote :
Changed in keystone (Ubuntu Xenial):
importance: Medium → High
Jorge Niedbalski (niedbalski) wrote :
Eric Desrochers (slashd) on 2017-05-31
tags: added: sts-sru-needed
description: updated
James Page (james-page) wrote :

Uploaded to xenial-proposed for SRU team review.

James Page (james-page) wrote :

Uploaded to yakkety and zesty proposed for SRU team review.

Eric Desrochers (slashd) on 2017-06-08
tags: added: sts-sru-done
removed: sts-sru-needed
Edward Hope-Morley (hopem) wrote :

@slashd this sru is not finished yet so i'll re-instate the -needed tag (that we use for tracking purposes)

tags: added: sts-sru-needed
removed: sts-sru-done
Eric Desrochers (slashd) on 2017-06-08
tags: added: sts-sru-needd
removed: sts-sru-needed
tags: added: sts-sru-needed
removed: sts-sru-needd
Changed in tripleo:
milestone: pike-2 → pike-3

This issue was fixed in the openstack/keystone 12.0.0.0b2 development milestone.

Hello Alex, or anyone else affected,

Accepted keystone into zesty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/keystone/2:11.0.2-0ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in keystone (Ubuntu Zesty):
status: In Progress → Fix Committed
tags: added: verification-needed
James Page (james-page) wrote :

Hello Alex, or anyone else affected,

Accepted keystone into ocata-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:ocata-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-ocata-needed to verification-ocata-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-ocata-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-ocata-needed

Reviewed: https://review.openstack.org/469069
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=6074166b29a0546c92e3a1c152370d5b39c1dafe
Submitter: Jenkins
Branch: stable/ocata

commit 6074166b29a0546c92e3a1c152370d5b39c1dafe
Author: Peter Sabaini <email address hidden>
Date: Thu Apr 6 23:06:29 2017 +0200

    Make flushing tokens more robust

    Commit token flushes between batches in order to lower resource
    consumption and make flushing more robust for replication

    Change-Id: I9be37e420353a336a8acd820eadd47d4bcf7324f
    Closes-Bug: #1649616
    (cherry picked from commit dc7f81083180eeb5233f7007e3d2514cc0d7c6d3)

As part of a recent change in the Stable Release Update verification policy we would like to inform that for a bug to be considered verified for a given release a verification-done-$RELEASE tag needs to be added to the bug where $RELEASE is the name of the series the package that was tested (e.g. verification-done-xenial). Please note that the global 'verification-done' tag can no longer be used for this purpose.

Thank you!

Hello Alex, or anyone else affected,

Accepted keystone into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/keystone/2:10.0.1-0ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-yakkety to verification-done-yakkety. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-yakkety. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in keystone (Ubuntu Yakkety):
status: In Progress → Fix Committed
tags: added: verification-needed-yakkety
Changed in keystone (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed-xenial
Andy Whitcroft (apw) wrote :

Hello Alex, or anyone else affected,

Accepted keystone into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/keystone/2:9.3.0-0ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

James Page (james-page) wrote :

Hello Alex, or anyone else affected,

Accepted keystone into mitaka-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:mitaka-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-mitaka-needed to verification-mitaka-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-mitaka-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-mitaka-needed
James Page (james-page) wrote :

Hello Alex, or anyone else affected,

Accepted keystone into newton-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:newton-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-newton-needed to verification-newton-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-newton-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-newton-needed
Chris Halse Rogers (raof) wrote :

Is testing for this bug part of the standard test-suite run for the point-release stable update (https://bugs.launchpad.net/ubuntu/+source/swift/+bug/1696139)?

If so, please mention it so that this can be released.

If not, please test this so that the point-releases can be released :).

Changed in tripleo:
status: Triaged → In Progress

Reviewed: https://review.openstack.org/480287
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=0b5c5c03ecb6cd261ec06b4e2465c8d88b8c1725
Submitter: Jenkins
Branch: master

commit 0b5c5c03ecb6cd261ec06b4e2465c8d88b8c1725
Author: Raildo Mascena <email address hidden>
Date: Tue Jul 4 14:10:16 2017 -0300

    Fixing flushing tokens workflow

    During a backport patch [0] for this fix
    it was found some problems in the previous
    approach like, It didn't enabled back the
    session.autocommit. Another comment was we should
    create a new session and commit on it instead of
    disable/enable autocommit.

    After this, we should backport this change in order
    to fix the previous releases, instead of the other
    one.

    [0] https://review.openstack.org/#/c/469514

    Change-Id: Ifc024ba0e86bb71f4ab8b019917782bc5bf3be7a
    Closes-Bug: #1649616

Reviewed: https://review.openstack.org/482552
Committed: https://git.openstack.org/cgit/openstack/puppet-keystone/commit/?id=c1bda5f81e35ec597f8d684a35c2c1446c8b1527
Submitter: Jenkins
Branch: stable/newton

commit c1bda5f81e35ec597f8d684a35c2c1446c8b1527
Author: Juan Antonio Osorio Robles <email address hidden>
Date: Tue Apr 18 13:13:27 2017 +0300

    Change keystone token flush to run hourly

    In a recent commit [1] the keystone token flush cron job was changed to
    run twice a day. However, this change was not enough for big
    deployments.

    After getting some customer feedback and looking at what other projects
    are doing [2] [3] [4]. It seems that running this job hourly is the way
    to go.

    [1] Ia0b0fb422318712f4b0f4d023cbb3a61d40bb85d
    [2] https://www.ibm.com/support/knowledgecenter/en/SSB27U_6.4.0/com.ibm.zvm.v640.hcpo4/exptoken.htm
    [3] https://review.openstack.org/#/c/88670/8
    [4] https://github.com/openstack/charm-keystone/blob/master/templates/keystone-token-flush

    Conflicts:
     manifests/cron/token_flush.pp
     spec/acceptance/keystone_federation_identity_provider_spec.rb
     spec/acceptance/keystone_federation_shibboleth_spec.rb
     spec/acceptance/keystone_wsgi_apache_spec.rb
     spec/classes/keystone_cron_token_flush_spec.rb

    (cherry picked from commit f694b5551f896042df6aeb751c65986ef3342f54)
    Change-Id: I6ec7ec8111bd93e5638cfe96189e36f0e0691d65
    Related-Bug: #1649616
    (cherry picked from commit 90ffc7f6008370e9d9893ab69f31683352c854c3)

tags: added: in-stable-newton
Jorge Niedbalski (niedbalski) wrote :

This is the verification that I ran through:

1) juju run --application keystone "sudo test -f /etc/cron.d/keystone-token-flush && echo 'Exists'" --format=yaml

2) Commented /etc/cron.d/keystone-token-flush

3) Killed all the mysql threads

  *) SELECT trx_mysql_thread_id as ID, trx_id, trx_query, trx_started, trx_state FROM information_schema.innodb_trx WHERE trx_state = "RUNNING" AND trx_query LIKE "DELETE FROM token%" ORDER BY trx_started;

  *) KILL thread_id;
  *) KILL CONNECTION thread_id;

4) Manually run the token_flush command

  *) sudo -H -u keystone bash -c "/usr/bin/keystone-manage token_flush > /tmp/token-flush.log"

After running this, provide me the the following details as attachments into this case. (Please connect to the database using the before mentioned method).

5) Re-enable the cronjob, and perform the following queries on the database to check
if there were any locks or stuck transactions.

SHOW open tables WHERE In_use > 0;
SELECT * FROM information_schema.innodb_locks;
SELECT * FROM information_schema.innodb_trx ORDER BY trx_started;
SHOW ENGINE innodb status;

Jorge Niedbalski (niedbalski) wrote :

For the case of xenial/mitaka the results for 5) http://pastebin.ubuntu.com/25082901/,
which indeed shows the problem as solved

Marking the verification as done on X/M.

tags: added: verification-done-xenial verification-mitaka-done
removed: verification-mitaka-needed verification-needed-xenial
tags: added: verification-needed-zesty
Jorge Niedbalski (niedbalski) wrote :

Same results for yakkety/newton for 5) http://pastebin.ubuntu.com/25084654/

Marking the verification as done for Y/N

tags: added: verification-done-yakkety verification-newton-done
removed: verification-needed-yakkety verification-newton-needed
Jorge Niedbalski (niedbalski) wrote :

Same results for ocata (http://pastebin.ubuntu.com/25085019/)

tags: added: verification-done verification-done-zesty verification-ocata-done
removed: verification-needed verification-needed-zesty verification-ocata-needed

Reviewed: https://review.openstack.org/482601
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=83fccfaf8dc2ae2f4b1d47b09d44cb5c995f4d52
Submitter: Jenkins
Branch: stable/ocata

commit 83fccfaf8dc2ae2f4b1d47b09d44cb5c995f4d52
Author: Raildo Mascena <email address hidden>
Date: Tue Jul 4 14:10:16 2017 -0300

    Fixing flushing tokens workflow

    During a backport patch [0] for this fix
    it was found some problems in the previous
    approach like, It didn't enabled back the
    session.autocommit. Another comment was we should
    create a new session and commit on it instead of
    disable/enable autocommit.

    After this, we should backport this change in order
    to fix the previous releases, instead of the other
    one.

    [0] https://review.openstack.org/#/c/469514

    Change-Id: Ifc024ba0e86bb71f4ab8b019917782bc5bf3be7a
    Closes-Bug: #1649616
    (cherry picked from commit 0b5c5c03ecb6cd261ec06b4e2465c8d88b8c1725)

Reviewed: https://review.openstack.org/482606
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=058ea4262459be90f0721891b6ac2d29600e77c1
Submitter: Jenkins
Branch: stable/newton

commit 058ea4262459be90f0721891b6ac2d29600e77c1
Author: Raildo Mascena <email address hidden>
Date: Tue Jul 4 14:10:16 2017 -0300

    Fixing flushing tokens workflow

    During a backport patch [0] for this fix
    it was found some problems in the previous
    approach like, It didn't enabled back the
    session.autocommit. Another comment was we should
    create a new session and commit on it instead of
    disable/enable autocommit.

    After this, we should backport this change in order
    to fix the previous releases, instead of the other
    one.

    [0] https://review.openstack.org/#/c/469514

    Change-Id: Ifc024ba0e86bb71f4ab8b019917782bc5bf3be7a
    Closes-Bug: #1649616
    (cherry picked from commit 0b5c5c03ecb6cd261ec06b4e2465c8d88b8c1725)

Change abandoned by Steve Martinelli (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/469514
Reason: see referred patch in previous comment

The verification of the Stable Release Update for keystone has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package keystone - 2:11.0.2-0ubuntu1

---------------
keystone (2:11.0.2-0ubuntu1) zesty; urgency=medium

  [ Jorge Niedbalski ]
  * d/p/0001-Make-flushing-tokens-more-robust.patch: Commit token flushes
    between batches in order to lower resource consumption and
    make flushing more robust for replication (LP: #1649616).

  [ James Page ]
  * New upstream stable release for OpenStack Ocata (LP: #1696139).

 -- James Page <email address hidden> Wed, 07 Jun 2017 16:01:45 +0100

Changed in keystone (Ubuntu Zesty):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package keystone - 2:10.0.1-0ubuntu2

---------------
keystone (2:10.0.1-0ubuntu2) yakkety; urgency=high

  * d/p/0001-Make-flushing-tokens-more-robust.patch: Commit token flushes
    between batches in order to lower resource consumption and
    make flushing more robust for replication (LP: #1649616).

 -- Jorge Niedbalski <email address hidden> Wed, 07 Jun 2017 13:07:50 +0100

Changed in keystone (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package keystone - 2:9.3.0-0ubuntu2

---------------
keystone (2:9.3.0-0ubuntu2) xenial; urgency=high

  * d/p/0001-Make-flushing-tokens-more-robust.patch: Commit
    token flushes between batches in order to lower resource
    consumption and make flushing more robust for replication
    (LP: #1649616).

 -- Jorge Niedbalski <email address hidden> Wed, 07 Jun 2017 10:33:50 +0100

Changed in keystone (Ubuntu Xenial):
status: Fix Committed → Fix Released
James Page (james-page) wrote :

The verification of the Stable Release Update for keystone has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

James Page (james-page) wrote :

This bug was fixed in the package keystone - 2:11.0.2-0ubuntu1~cloud0
---------------

 keystone (2:11.0.2-0ubuntu1~cloud0) xenial-ocata; urgency=medium
 .
   * New upstream release for the Ubuntu Cloud Archive.
 .
 keystone (2:11.0.2-0ubuntu1) zesty; urgency=medium
 .
   [ Jorge Niedbalski ]
   * d/p/0001-Make-flushing-tokens-more-robust.patch: Commit token flushes
     between batches in order to lower resource consumption and
     make flushing more robust for replication (LP: #1649616).
 .
   [ James Page ]
   * New upstream stable release for OpenStack Ocata (LP: #1696139).

Reviewed: https://review.openstack.org/482585
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=6299a169b4c17c4cc3946ebe4f66342ac2054d6b
Submitter: Jenkins
Branch: stable/newton

commit 6299a169b4c17c4cc3946ebe4f66342ac2054d6b
Author: Harry Rybacki <email address hidden>
Date: Wed Jul 12 13:25:30 2017 +0000

    Run token flush cron job hourly by default

    Running this job once a day has proven problematic for large
    deployments as seen in the bug report. Setting it to run hourly
    would be an improvement to the current situation, as the flushes
    wouldn't need to process as much data.

    Note that this only affects people using UUID as the token provider.

    Change-Id: I462e4da2bfdbcba0403ecde5d613386938e2283a
    Related-Bug: #1649616
    (cherry picked from commit 65e643aca2202f031db94f1ccd3d44e195e5e772)

The verification of the Stable Release Update for keystone has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

James Page (james-page) wrote :

This bug was fixed in the package keystone - 2:9.3.0-0ubuntu2~cloud0
---------------

 keystone (2:9.3.0-0ubuntu2~cloud0) trusty-mitaka; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 keystone (2:9.3.0-0ubuntu2) xenial; urgency=high
 .
   * d/p/0001-Make-flushing-tokens-more-robust.patch: Commit
     token flushes between batches in order to lower resource
     consumption and make flushing more robust for replication
     (LP: #1649616).

James Page (james-page) wrote :

The verification of the Stable Release Update for keystone has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

James Page (james-page) wrote :

This bug was fixed in the package keystone - 2:10.0.1-0ubuntu2~cloud0
---------------

 keystone (2:10.0.1-0ubuntu2~cloud0) xenial-newton; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 keystone (2:10.0.1-0ubuntu2) yakkety; urgency=high
 .
   * d/p/0001-Make-flushing-tokens-more-robust.patch: Commit token flushes
     between batches in order to lower resource consumption and
     make flushing more robust for replication (LP: #1649616).

This issue was fixed in the openstack/keystone 11.0.3 release.

This issue was fixed in the openstack/keystone 10.0.3 release.

This issue was fixed in the openstack/keystone 12.0.0.0b3 development milestone.

Changed in tripleo:
milestone: pike-3 → pike-rc1
Changed in tripleo:
milestone: pike-rc1 → queens-1
Changed in tripleo:
status: In Progress → Fix Released
tags: added: sts-sru-done
removed: sts-sru-needed
Changed in keystone (Ubuntu):
status: In Progress → Invalid
Changed in cloud-archive:
status: In Progress → Invalid
Changed in puppet-keystone:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers