Keystone ends up in error state when revoking big number of tokens at once

Bug #1571626 reported by Mikhail Chernik
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Dmitry Ilyin
Mitaka
Fix Released
High
Dmitry Ilyin
Newton
Fix Committed
High
Dmitry Ilyin
Mirantis OpenStack
Status tracked in 10.0.x
10.0.x
Fix Committed
High
Dmitry Ilyin

Bug Description

Environment:
Reproduced on RackSpace lab, 3 controllers, 197 computes, VxLAN+DVR, MOS 9.0 ISO 188

Detailed description:
Keystone caches the whole revoke tree, which can exceed the 1M memcached object size limit if huge number of tokens get revoked at the same time (details: https://github.com/lericson/pylibmc/issues/184) .

from keystone adimn log file http://paste.openstack.org/show/494384/
After that keystone breaks its operation and cluster in not usable.

Keystone error:

2016-04-18 09:44:57.484 33105 ERROR keystone.common.wsgi File "/usr/lib/python2.7/dist-packages/keystone/revoke/core.py", line 225, in check_token
2016-04-18 09:44:57.484 33105 ERROR keystone.common.wsgi if self._get_revoke_tree().is_revoked(token_values):

Steps to reproduce:
1. set backend = dogpile.cache.pylibmc' in [cache] section in /etc/keystone/keystone.conf
2. Perform raly tests. All rally tests was failed excluding only three of them (results of the three tests are attached - rally_report.html)
3. found the following bug https://bugs.launchpad.net/mos/+bug/1571626
4. tried http://paste.openstack.org/show/494498/ + [revoke]caching=False in keystone.conf - No error any more, but request takes more then 60 sec and we get 504 even if request "opestack user list"

run rally scenario KeystoneBasic.add_and_remove_user_role, against large cluster. Example scenario:
{
  "kw": {
    "runner": {
      "type": "constant",
      "concurrency": 20,
      "times": 1970
    },
    "sla": {
      "failure_rate": {
        "max": 0
      }
    },
    "context": {
      "api_versions": {
        "keystone": {
          "version": 2
        }
      }
    }
  },
  "name": "KeystoneBasic.add_and_remove_user_role",
  "pos": 0
}

diagnostic snapshot: http://mos-scale-share.mirantis.com/fuel-snapshot-2016-04-17_17-47-57.tar.xz
etc and log folders from controller nodes: http://mos-scale-share.mirantis.com/controller-data.tar.xz

Revision history for this message
Alexander Makarov (amakarov) wrote :

First glance: revocation tree grows larger 1M and gets inacceptable for caching in memcached.
I.e. memcahced doesn't accept such size.
The problem appeared to be well-known in the Community and they suggest just turning revocation caching off.

Revision history for this message
Alexander Makarov (amakarov) wrote :

In the /etc/keystone/keystone.conf:

[revoke]
caching = False

Boris Bobrov (bbobrov)
Changed in mos:
assignee: Boris Bobrov (bbobrov) → MOS Keystone (mos-keystone)
Dina Belova (dbelova)
tags: added: area-keystone
Changed in mos:
status: New → Confirmed
importance: Undecided → High
milestone: none → 9.0
Revision history for this message
Bug Checker Bot (bug-checker) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

expected result

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
description: updated
description: updated
description: updated
description: updated
description: updated
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

memcached.conf can have a "-I" parameter, can we please bump that up from 1 MB to 10 MB?

-I <size>
Override the default size of each slab page. Default is 1mb. Default is 1m, minimum is 1k, max is 128m.

Probably like this?
http://paste.openstack.org/show/494633/

Thanks,
Dims

tags: added: blocker-for-qa
Boris Bobrov (bbobrov)
Changed in mos:
assignee: MOS Keystone (mos-keystone) → MOS Puppet Team (mos-puppet)
Revision history for this message
Dmitry Ilyin (idv1985) wrote :

https://review.openstack.org/#/c/308525/
Here is the fix but I haven't tested it yet

Changed in mos:
assignee: MOS Puppet Team (mos-puppet) → Dmitry Ilyin (idv1985)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/308525
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=252f675c82f6663cd893a9a82054019ea334ffdb
Submitter: Jenkins
Branch: master

commit 252f675c82f6663cd893a9a82054019ea334ffdb
Author: Dmitry Ilyin <email address hidden>
Date: Wed Apr 20 21:49:49 2016 +0300

    Increase memcached item size to 10m

    1M is not enough to store the entire revocation list

    Change-Id: Ifd4b8b788e87c3ab129f4ea877394e4be4e76755
    Closes-Bug: 1571626

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/310254

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/310254
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=6d5a2afd090191ccb53db931cbd77ca8e786a6f4
Submitter: Jenkins
Branch: stable/mitaka

commit 6d5a2afd090191ccb53db931cbd77ca8e786a6f4
Author: Dmitry Ilyin <email address hidden>
Date: Wed Apr 20 21:49:49 2016 +0300

    Increase memcached item size to 10m

    1M is not enough to store the entire revocation list

    Change-Id: Ifd4b8b788e87c3ab129f4ea877394e4be4e76755
    Closes-Bug: 1571626

Changed in mos:
status: Confirmed → Fix Committed
Andrew Kalach (akndex)
Changed in mos:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.