memcache pool reap issue

Bug #1570158 reported by Matt Fischer
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Identity (keystone)
Triaged
Medium
Unassigned
Liberty
In Progress
Medium
David Stanek

Bug Description

There seems to be a code error in the memcache pool cleanup. I'm seeing this on stable/liberty (built as of 2 weeks ago). I don't have a specific reproducer for this, it just seems to happen. Looking at the code I don't really understand how this can happen.

2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi [req-30e23a7d-754d-4a89-904f-1804a20029f2 - - - - -] deque index out of range
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi Traceback (most recent call last):
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/keystone/common/wsgi.py", line 248, in __call__
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi result = method(context, **params)
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/keystone/common/controller.py", line 138, in inner
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi context['subject_token_id']))
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/keystone/token/provider.py", line 188, in validate_token
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi token = self._validate_token(unique_id)
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/dogpile/cache/region.py", line 1053, in decorate
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi should_cache_fn)
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/dogpile/cache/region.py", line 657, in get_or_create
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi async_creator) as value:
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/dogpile/core/dogpile.py", line 158, in __enter__
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi return self._enter()
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/dogpile/core/dogpile.py", line 91, in _enter
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi value = value_fn()
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/dogpile/cache/region.py", line 610, in get_value
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi value = self.backend.get(key)
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/dogpile/cache/backends/memcached.py", line 161, in get
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi value = self.client.get(key)
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/keystone/common/cache/backends/memcache_pool.py", line 36, in _run_method
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi return getattr(client, __name)(*args, **kwargs)
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi self.gen.next()
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/keystone/common/cache/_memcache_pool.py", line 132, in acquire
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi self._drop_expired_connections()
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi File "/venv/local/lib/python2.7/site-packages/keystone/common/cache/_memcache_pool.py", line 166, in _drop_expired_connections
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi while self.queue and self.queue[0].ttl < now:
2016-04-13 20:50:29.124 46 ERROR keystone.common.wsgi IndexError: deque index out of range

Revision history for this message
Matt Fischer (mfisch) wrote :

Update:

This occurs under extremely high load. We were serving 150x our normal number of requests on that node. Nobody has confessed yet on my team to doing this but even under load it shouldn't happen.

Revision history for this message
Lance Bragstad (lbragstad) wrote :

Matt, are you able to recreate this with master or mitaka?

summary: - memcache pool reap issue (stable/liberty)
+ memcache pool reap issue
Revision history for this message
Matt Fischer (mfisch) wrote :

I can't repro this except in a larger cluster than I can do reasonably in my virtual environment. I will be upgrading to M after the summit. The code looks the same in M however.

Is there a reason why we cleanup expired memcache connections in the same context as token validation and not elsewhere? Seems like it would slow things down.

Dolph Mathews (dolph)
Changed in keystone:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to keystone (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/345551

Revision history for this message
David Stanek (dstanek) wrote :

This looks to be the same bug as https://bugs.launchpad.net/oslo.cache/+bug/1523379

Is it possible for you to use the backend from oslo.cache instead of the one included with Liberty? In anticipation of that actually working I posted a WIP backport from oslo.cache.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on keystone (stable/liberty)

Change abandoned by David Stanek (<email address hidden>) on branch: stable/liberty
Review: https://review.openstack.org/345551
Reason: This was just to show how to solved the issue for someone using liberty. It was not intended to merge.

Revision history for this message
Steve Martinelli (stevemar) wrote :

Matt, this looks like a dupe of bug 1523379. Which has a fix merged to oslo.cache master. Let me know if you need it backported to Mitaka or Liberty.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.