cached tokens break Liberty to Mitaka upgrade

Bug #1592169 reported by Matt Fischer on 2016-06-13
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Identity (keystone)
High
Colleen Murphy
Mitaka
High
Colleen Murphy
Newton
High
Colleen Murphy

Bug Description

Sequence of events.

- Fernet tokens (didnt test with UUID)
- Running cluster with Liberty from about 6 weeks ago, so close to stable.
- Upgrade Keystone to Mitaka (automated)
- Tokens fail to issue for about 5 minutes, after this time, all the cached tokens are gone
- Everything works after that. See also Work-around at bottom.

Annotated logs:

Token call works to this point.

db_sync is running here, but code is still Liberty, DB now Mitaka:
An unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-04dcb954-ae4e-41fa-b235-aa0b05ac8b44)
An unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-d27eee3a-723a-412e-a7b0-37ffd511c221)
An unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-265b6261-bcac-44f1-a806-8696b455ff5a)

Puppet bounces Keystone, the restarted code is Mitaka:
Discovering versions from the identity service failed when creating the password plugin. Attempting to determine version from URL.

Tokens fail to generate here due to the caching format changing. This will continue for about 5 minutes or so, I suspect it depends on whats in the cache and timeouts.
An unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-8b835f67-4a21-42d3-9030-b4dbfd820238)
An unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-b92bcd56-87da-4977-b82e-c717c7120f4f)
An unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-a787163f-20c1-493f-9b34-82708dea4191)
An unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-e2ab7bf1-3483-438e-8425-06e5cfbf2e37)

Keystone log is full of this:

2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi Traceback (most recent call last):
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/keystone/common/wsgi.py", line 249, in __call__
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi result = method(context, **params)
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/oslo_log/versionutils.py", line 165, in wrapped
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi return func_or_cls(*args, **kwargs)
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/keystone/token/controllers.py", line 100, in authenticate
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi context, auth)
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/keystone/token/controllers.py", line 310, in _authenticate_local
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi user_id, tenant_id)
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/keystone/token/controllers.py", line 391, in _get_project_roles_and_ref
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi user_id, tenant_id)
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/keystone/common/manager.py", line 124, in wrapped
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi __ret_val = __f(*args, **kwargs)
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/dogpile/cache/region.py", line 1053, in decorate
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi should_cache_fn)
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/dogpile/cache/region.py", line 657, in get_or_create
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi async_creator) as value:
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/dogpile/core/dogpile.py", line 158, in __enter__
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi return self._enter()
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/dogpile/core/dogpile.py", line 98, in _enter
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi generated = self._enter_create(createdtime)
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/dogpile/core/dogpile.py", line 149, in _enter_create
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi created = self.creator()
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/dogpile/cache/region.py", line 625, in gen_value
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi created_value = creator()
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/dogpile/cache/region.py", line 1049, in creator
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi return fn(*arg, **kw)
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/keystone/assignment/core.py", line 149, in get_roles_for_user_and_project
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi user_id=user_id, project_id=tenant_id, effective=True)
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/keystone/common/manager.py", line 124, in wrapped
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi __ret_val = __f(*args, **kwargs)
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/keystone/assignment/core.py", line 977, in list_role_assignments
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi strip_domain_roles)
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/keystone/assignment/core.py", line 886, in _list_effective_role_assignments
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi refs = self._strip_domain_roles(refs)
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/keystone/assignment/core.py", line 721, in _strip_domain_roles
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi if _role_is_global(ref['role_id']):
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/keystone/assignment/core.py", line 717, in _role_is_global
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi return (ref['domain_id'] is None)
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi KeyError: 'domain_id'
2016-06-13 21:37:58.947 35 ERROR keystone.common.wsgi

Work-around: run flush_all in memcache (telnet localhost 11211) every few seconds during the upgrade

Matt Fischer (mfisch) on 2016-06-13
summary: - liberty to mitaka upgrade (stable) has broken caching
+ cached tokens break liberty to Mitaka upgrade
summary: - cached tokens break liberty to Mitaka upgrade
+ cached tokens break Liberty to Mitaka upgrade
Matt Fischer (mfisch) on 2016-06-13
description: updated
description: updated
tags: added: fernet
Matt Fischer (mfisch) on 2016-06-13
description: updated
Dolph Mathews (dolph) on 2016-06-13
Changed in keystone:
importance: Undecided → High
tags: added: mitaka-backport-potential
Changed in keystone:
status: New → Triaged
Matt Fischer (mfisch) wrote :

So per some IRC discussion, I don't expect that cached Liberty tokens will be valid in Mitaka, however, I do expect that the mere presence of cached Liberty tokens won't break new token issuance in Mitaka. I think this is a fair expectation and this has worked in every release since Havana when I started working on openstack.

Changed in keystone:
milestone: none → newton-3

Fix proposed to branch: master
Review: https://review.openstack.org/347543

Changed in keystone:
assignee: nobody → Colleen Murphy (krinkle)
status: Triaged → In Progress

Reviewed: https://review.openstack.org/347543
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=bc99dc76775d22eca01b818f37de35a76ece9d72
Submitter: Jenkins
Branch: master

commit bc99dc76775d22eca01b818f37de35a76ece9d72
Author: Colleen Murphy <email address hidden>
Date: Tue Jul 26 13:02:42 2016 -0700

    Add dummy domain_id column to cached role

    When token caching is turned on, upgrading from stable/liberty to
    stable/mitaka or master causes tokens to fail to be issued for the
    time-to-live of the cache. This is because as part of the token
    issuance the token's role is looked up, and the cached version of the
    role immediately after upgrade does not have a domain_id field, even
    though that column was successfully added to the role database. This
    patch hacks around that by artificially adding a null domain_id value
    to the role reference.

    This must be done in the manager, as opposed to the driver, because it
    is the manager that is caching the value and so modifying the value
    returned by the driver has no effect.

    Change-Id: I55c791486f2a26ae995f693370b016895176a16f
    Closes-bug: #1592169

Changed in keystone:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/357415
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=3a1e41fecfe9e4b7dfa8391476202c3113456c89
Submitter: Jenkins
Branch: stable/mitaka

commit 3a1e41fecfe9e4b7dfa8391476202c3113456c89
Author: Colleen Murphy <email address hidden>
Date: Tue Jul 26 13:02:42 2016 -0700

    Add dummy domain_id column to cached role

    When token caching is turned on, upgrading from stable/liberty to
    stable/mitaka or master causes tokens to fail to be issued for the
    time-to-live of the cache. This is because as part of the token
    issuance the token's role is looked up, and the cached version of the
    role immediately after upgrade does not have a domain_id field, even
    though that column was successfully added to the role database. This
    patch hacks around that by artificially adding a null domain_id value
    to the role reference.

    This must be done in the manager, as opposed to the driver, because it
    is the manager that is caching the value and so modifying the value
    returned by the driver has no effect.

    Change-Id: I55c791486f2a26ae995f693370b016895176a16f
    Closes-bug: #1592169
    (cherry picked from commit bc99dc76775d22eca01b818f37de35a76ece9d72)

This issue was fixed in the openstack/keystone 10.0.0.0b3 development milestone.

This issue was fixed in the openstack/keystone 9.2.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers