Keystone Token Flush job does not complete in HA deployed environment
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Identity (keystone) |
Fix Released
|
Medium
|
Peter Sabaini | ||
Newton |
Won't Fix
|
Medium
|
Raildo Mascena de Sousa Filho | ||
Ocata |
Fix Released
|
Medium
|
Raildo Mascena de Sousa Filho | ||
Ubuntu Cloud Archive |
Invalid
|
Undecided
|
Jorge Niedbalski | ||
Mitaka |
Fix Released
|
Medium
|
Jorge Niedbalski | ||
Newton |
Fix Released
|
Medium
|
Jorge Niedbalski | ||
Ocata |
Fix Released
|
Medium
|
Jorge Niedbalski | ||
puppet-keystone |
Fix Released
|
Medium
|
Juan Antonio Osorio Robles | ||
tripleo |
Fix Released
|
Medium
|
Juan Antonio Osorio Robles | ||
keystone (Ubuntu) |
Invalid
|
High
|
Jorge Niedbalski | ||
Xenial |
Fix Released
|
High
|
Jorge Niedbalski | ||
Yakkety |
Fix Released
|
High
|
Jorge Niedbalski | ||
Zesty |
Fix Released
|
High
|
Jorge Niedbalski |
Bug Description
[Impact]
* The Keystone token flush job can get into a state where it will never complete because the transaction size exceeds the mysql galara transaction size - wsrep_max_ws_size (1073741824).
[Test Case]
1. Authenticate many times
2. Observe that keystone token flush job runs (should be a very long time depending on disk) >20 hours in my environment
3. Observe errors in mysql.log indicating a transaction that is too large
Actual results:
Expired tokens are not actually flushed from the database without any errors in keystone.log. Only errors appear in mysql.log.
Expected results:
Expired tokens to be removed from the database
[Additional info:]
It is likely that you can demonstrate this with less than 1 million tokens as the >1 million token table is larger than 13GiB and the max transaction size is 1GiB, my token bench-marking Browbeat job creates more than needed.
Once the token flush job can not complete the token table will never decrease in size and eventually the cloud will run out of disk space.
Furthermore the flush job will consume disk utilization resources. This was demonstrated on slow disks (Single 7.2K SATA disk). On faster disks you will have more capacity to generate tokens, you can then generate the number of tokens to exceed the transaction size even faster.
Log evidence:
[root@overcloud
2016-12-08 01:33:40.530 21614 INFO keystone.
2016-12-09 09:31:25.301 14120 INFO keystone.
2016-12-11 01:35:39.082 4223 INFO keystone.
2016-12-12 01:08:16.170 32575 INFO keystone.
2016-12-13 01:22:18.121 28669 INFO keystone.
[root@overcloud
161208 1:33:41 [Warning] WSREP: transaction size limit (1073741824) exceeded: 1073774592
161208 1:33:41 [ERROR] WSREP: rbr write fail, data_len: 0, 2
161209 9:31:26 [Warning] WSREP: transaction size limit (1073741824) exceeded: 1073774592
161209 9:31:26 [ERROR] WSREP: rbr write fail, data_len: 0, 2
161211 1:35:39 [Warning] WSREP: transaction size limit (1073741824) exceeded: 1073774592
161211 1:35:40 [ERROR] WSREP: rbr write fail, data_len: 0, 2
161212 1:08:16 [Warning] WSREP: transaction size limit (1073741824) exceeded: 1073774592
161212 1:08:17 [ERROR] WSREP: rbr write fail, data_len: 0, 2
161213 1:22:18 [Warning] WSREP: transaction size limit (1073741824) exceeded: 1073774592
161213 1:22:19 [ERROR] WSREP: rbr write fail, data_len: 0, 2
Disk utilization issue graph is attached. The entire job in that graph takes from the first spike is disk util(~5:18UTC) and culminates in about ~90 minutes of pegging the disk (between 1:09utc to 2:43utc).
[Regression Potential]
* Not identified
Related branches
- Drew Freiberger (community): Approve
- Jill Rouleau (community): Needs Information
-
Diff: 52 lines (+46/-0)1 file modifiedbootstack-ops/lp1649616-keystone-manage (+46/-0)
Changed in keystone: | |
importance: | Undecided → Medium |
tags: | added: canonical-bootstack |
Changed in puppet-keystone: | |
assignee: | nobody → Juan Antonio Osorio Robles (juan-osorio-robles) |
Changed in tripleo: | |
assignee: | nobody → Juan Antonio Osorio Robles (juan-osorio-robles) |
Changed in tripleo: | |
milestone: | none → pike-2 |
Changed in puppet-keystone: | |
status: | New → Triaged |
Changed in tripleo: | |
status: | New → Triaged |
Changed in puppet-keystone: | |
importance: | Undecided → Medium |
Changed in tripleo: | |
importance: | Undecided → Medium |
tags: | added: sts |
Changed in keystone (Ubuntu Xenial): | |
importance: | Undecided → Medium |
Changed in keystone (Ubuntu Yakkety): | |
importance: | Undecided → Medium |
Changed in keystone (Ubuntu Zesty): | |
importance: | Undecided → Medium |
Changed in keystone (Ubuntu Xenial): | |
status: | New → Triaged |
Changed in keystone (Ubuntu Yakkety): | |
status: | New → Triaged |
Changed in keystone (Ubuntu Zesty): | |
status: | New → Triaged |
Changed in cloud-archive: | |
assignee: | nobody → Jorge Niedbalski (niedbalski) |
status: | New → In Progress |
Changed in keystone (Ubuntu): | |
assignee: | nobody → Jorge Niedbalski (niedbalski) |
importance: | Undecided → High |
status: | New → In Progress |
Changed in keystone (Ubuntu Xenial): | |
assignee: | nobody → Jorge Niedbalski (niedbalski) |
status: | Triaged → In Progress |
Changed in keystone (Ubuntu Yakkety): | |
assignee: | nobody → Jorge Niedbalski (niedbalski) |
importance: | Medium → High |
status: | Triaged → In Progress |
Changed in keystone (Ubuntu Zesty): | |
assignee: | nobody → Jorge Niedbalski (niedbalski) |
importance: | Medium → High |
status: | Triaged → In Progress |
tags: | added: sts-sru-needed |
description: | updated |
tags: |
added: sts-sru-done removed: sts-sru-needed |
tags: |
added: sts-sru-needd removed: sts-sru-needed |
tags: |
added: sts-sru-needed removed: sts-sru-needd |
Changed in tripleo: | |
milestone: | pike-2 → pike-3 |
Changed in tripleo: | |
status: | Triaged → In Progress |
tags: | added: verification-needed-zesty |
Changed in tripleo: | |
milestone: | pike-3 → pike-rc1 |
Changed in tripleo: | |
milestone: | pike-rc1 → queens-1 |
Changed in tripleo: | |
status: | In Progress → Fix Released |
tags: |
added: sts-sru-done removed: sts-sru-needed |
Changed in keystone (Ubuntu): | |
status: | In Progress → Invalid |
Changed in cloud-archive: | |
status: | In Progress → Invalid |
Changed in puppet-keystone: | |
status: | Triaged → Fix Released |
Hi Alex,
So someone finally hit bug https:/ /bugs.launchpad .net/keystone/ +bug/1609511 in a production environment.
A few questions so I gauge the severity properly.
- Is it possible to run the cron job that flushes tokens more frequently?
- What is preventing you from moving to fernet tokens? They are not persisted.
Lastly, Sam Leong had a patch for this issue, it's available here: https:/ /review. openstack. org/#/c/ 351428/
Unfortunately, Sam has stopped working on keystone these days.