Downtime of one memcache instance leads to failures after recovery

Bug #1406547 reported by Sergey Yudin
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
New
High
Fuel Library (Deprecated)
5.0.x
New
Undecided
Fuel Library (Deprecated)
5.1.x
New
High
Fuel Library (Deprecated)
6.0.x
New
Undecided
Fuel Library (Deprecated)
6.1.x
New
High
Fuel Library (Deprecated)

Bug Description

Way to reproduce:

1) restart an instance of memcache
2) spawn dozen of VMs
3) investifgate failed instances and find out that keystone failed to auth few tokens of few services

like this:

<188>Dec 30 14:57:23 node-1 keystone-all 2014-12-30 14:57:23.402 3910 WARNING keystone.common.wsgi [-] Could not find token, b6749f3eb5ea4d24886e2c2cf074db36

ERROR cinder.scheduler.filter_scheduler [req-8db91d3c-2f3f-4115-ab9f-c419f2074651 95a31653ffc342e2b275629104b75eb5 69b89f2718db433e9f647b6fd5bd69b9 - - -] Error scheduling None from last vol-service: cinder : [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/dist-packages/taskflow/engines/action_engine/executor.py", line 36, in _execute_task\n result = task.execute(**arguments)\n', u' File "/usr/lib/python2.7/dist-packages/cinder/volume/flows/manager/create_volume.py", line 265, in execute\n image_id),\n', u' File "/usr/lib/python2.7/dist-packages/cinder/image/glance.py", line 243, in get_location\n _reraise_translated_image_exception(image_id)\n', u' File "/usr/lib/python2.7/dist-packages/cinder/image/glance.py", line 241, in get_location\n image_meta = client.call(context, \'get\', image_id)\n', u' File "/usr/lib/python2.7/dist-packages/cinder/image/glance.py", line 158, in call\n return getattr(client.images, method)(*args, **kwargs)\n', u' File "/usr/lib/python2.7/dist-packages/glanceclient/v2/images.py", line 79, in get\n resp, body = self.http_client.json_request(\'GET\', url)\n', u' File "/usr/lib/python2.7/dist-packages/glanceclient/common/http.py", line 266, in json_request\n resp, body_iter = self._http_request(url, method, **kwargs)\n', u' File "/usr/lib/python2.7/dist-packages/glanceclient/common/http.py", line 249, in _http_request\n raise exc.from_response(resp, body_str)\n', u'ImageNotAuthorized: Not authorized for image fed30bc9-91cc-4a18-b76c-e2ff209d219e.\n']

Revision history for this message
Sergey Yudin (tsipa740) wrote :

More easier way to reproduce:

1) on 1st screen on one of controllers run:
/etc/init.d/keystone stop ; /etc/init.d/memcached stop ; sleep 1m ; /etc/init.d/memcached start ; /etc/init.d/keystone start

2) on 2nd screen on one of the controllers in parallel run:
while true ; do date ; for f in `seq 1 20` ; do ( out=$(glance --debug image-list 2>&1) || echo $out ) & done ; wait ; done

3) enjoy random failures almost infinitely

summary: - Downtime of on memcache instance leads to failures after recovery
+ Downtime of one memcache instance leads to failures after recovery
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Please provide a version of Fuel used, keystone config and if possible, the logs snapshot

Changed in fuel:
status: New → Incomplete
importance: Undecided → High
milestone: none → 6.1
assignee: nobody → Fuel Library Team (fuel-library)
Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Any updates?

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

This bug was incomplete for more than 4 weeks. We cannot investigate it further so we are setting the status to Invalid. If you think it is not correct, please feel free to provide requested information and reopen the bug, and we will look into it further.

Revision history for this message
Michael Polenchuk (mpolenchuk) wrote :

In order to prevent excessive effort spent validating tokens, some services caches previously-seen tokens in-process (or in memcached) for N seconds (default is 300s).

Revision history for this message
Sergey Yudin (tsipa740) wrote :

I'd like to actually raise this issue again.

I tried to reproduce it on 5.1.2, but for sure this issue must be in upstream too. Could you please provide which logs you actually needed?

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.