Mirantis OpenStack

Series 5.0.x
Bug #1370324
Activity log

Activity log for bug #1370324

Date	Who	What changed	Old value	New value	Message
2014-09-17 04:09:59	Alexander Ignatov	bug			added bug
2014-09-17 04:14:06	Alexander Ignatov	mos: status	New	Confirmed
2014-09-17 04:14:08	Alexander Ignatov	mos: milestone		6.0
2014-09-17 04:14:32	Alexander Ignatov	tags		keystone
2014-09-17 04:14:50	Alexander Ignatov	mos: assignee		Roman Podoliaka (rpodolyaka)
2014-09-17 04:16:22	Alexander Ignatov	tags	keystone	keystone memcached
2014-09-17 17:58:34	Roman Podoliaka	summary	"keystone tenant-list" hangs sometimes	Keystone hangs trying to set a lock in Memcache
2014-09-17 18:15:11	Roman Podoliaka	description	due to incorrect logic in python-memcache, keystone tries to write data into a dead memcache backend, skipping alive backends. That happens because backend traverse logic is randomized and can potentially miss alive servers in the pool. When most servers of the pool are dead, the probability of failure is relatively high. Practically that issue shows up during deployment, when keystone is used in the environment, where some controllers have not been deployed yet. The issue is a heizenbug and it depends on the randomly generated data.	Preconditions: 1. Keystone is configured to use Memcache backend=keystone.cache.memcache_pool backend_argument=url:10.108.12.3,10.108.12.5,10.108.12.6 backend_argument=pool_maxsize:100 2. Memcached is deployed on each of 3 controllers. 3. 2 of 3 memcached servers are down (only the one on the primary controller is up: 10.108.12.3 is up, 10.108.12.5 and 10.108.12.6 are down) Result: the keystone API hangs when a user do something like "keystone tenant-list" Debug showed that keystone-all process is stuck while trying to set a lock: http://xsnippet.org/360179/ . The lock itself is implemented by setting a key in the memcache. The problem here is that python-memcache shards the keys among the configured memcached instances in a way, so that it can possibly fail to find an available server. This is due to how the retry logic is implemented: https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L381-L396 . For a particular key (e.g. _lockusertokens-ee7e5f7374a8488bb2087e106a8834f7) 10 attempts won't be enough, and the loop within _get_server() will iterate over unavailable memcache servers. This can be easily reproduced by http://xsnippet.org/360181/
2014-09-17 18:16:51	Roman Podoliaka	description	Preconditions: 1. Keystone is configured to use Memcache backend=keystone.cache.memcache_pool backend_argument=url:10.108.12.3,10.108.12.5,10.108.12.6 backend_argument=pool_maxsize:100 2. Memcached is deployed on each of 3 controllers. 3. 2 of 3 memcached servers are down (only the one on the primary controller is up: 10.108.12.3 is up, 10.108.12.5 and 10.108.12.6 are down) Result: the keystone API hangs when a user do something like "keystone tenant-list" Debug showed that keystone-all process is stuck while trying to set a lock: http://xsnippet.org/360179/ . The lock itself is implemented by setting a key in the memcache. The problem here is that python-memcache shards the keys among the configured memcached instances in a way, so that it can possibly fail to find an available server. This is due to how the retry logic is implemented: https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L381-L396 . For a particular key (e.g. _lockusertokens-ee7e5f7374a8488bb2087e106a8834f7) 10 attempts won't be enough, and the loop within _get_server() will iterate over unavailable memcache servers. This can be easily reproduced by http://xsnippet.org/360181/	Preconditions: 1. Keystone is configured to use Memcache backend=keystone.cache.memcache_pool backend_argument=url:10.108.12.3,10.108.12.5,10.108.12.6 backend_argument=pool_maxsize:100 2. Memcached is deployed on each of 3 controllers. 3. 2 of 3 memcached servers are down (only the one on the primary controller is up: 10.108.12.3 is up, 10.108.12.5 and 10.108.12.6 are down) Result: the keystone API hangs when a user do something like "keystone tenant-list" Debug showed that keystone-all process is stuck while trying to set a lock: http://xsnippet.org/360179/ . The lock itself is implemented by setting a key in the memcache. The problem here is that python-memcache shards the keys among the configured memcached instances in a way, so that it can possibly fail to find an available server. This is due to how the retry logic is implemented: https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L381-L396 . For a particular key (e.g. _lockusertokens-ee7e5f7374a8488bb2087e106a8834f7) 10 attempts won't be enough, and the loop within _get_server() will yield only unavailable memcache servers. This can be easily reproduced by http://xsnippet.org/360181/
2014-09-17 18:18:03	Roman Podoliaka	mos: assignee	Roman Podoliaka (rpodolyaka)
2014-09-17 18:25:58	Roman Podoliaka	mos: status	Confirmed	Triaged
2014-09-17 21:19:26	Alexander Ignatov	mos: status	Triaged	In Progress
2014-09-18 16:58:43	Bogdan Dobrelya	nominated for series		mos/5.1.x
2014-09-18 16:58:43	Bogdan Dobrelya	bug task added		mos/5.1.x
2014-09-18 16:59:03	Bogdan Dobrelya	nominated for series		mos/6.0.x
2014-09-18 16:59:03	Bogdan Dobrelya	bug task added		mos/6.0.x
2014-09-18 16:59:16	Bogdan Dobrelya	mos/5.1.x: milestone	6.0	5.1.1
2014-09-18 16:59:23	Bogdan Dobrelya	mos/6.0.x: status	New	Confirmed
2014-09-18 16:59:30	Bogdan Dobrelya	mos/6.0.x: importance	Undecided	High
2014-09-18 17:00:08	Bogdan Dobrelya	mos/5.1.x: status	In Progress	Confirmed
2014-09-18 17:00:18	Bogdan Dobrelya	mos/6.0.x: milestone		6.0
2014-09-18 17:58:49	Roman Podoliaka	description	Preconditions: 1. Keystone is configured to use Memcache backend=keystone.cache.memcache_pool backend_argument=url:10.108.12.3,10.108.12.5,10.108.12.6 backend_argument=pool_maxsize:100 2. Memcached is deployed on each of 3 controllers. 3. 2 of 3 memcached servers are down (only the one on the primary controller is up: 10.108.12.3 is up, 10.108.12.5 and 10.108.12.6 are down) Result: the keystone API hangs when a user do something like "keystone tenant-list" Debug showed that keystone-all process is stuck while trying to set a lock: http://xsnippet.org/360179/ . The lock itself is implemented by setting a key in the memcache. The problem here is that python-memcache shards the keys among the configured memcached instances in a way, so that it can possibly fail to find an available server. This is due to how the retry logic is implemented: https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L381-L396 . For a particular key (e.g. _lockusertokens-ee7e5f7374a8488bb2087e106a8834f7) 10 attempts won't be enough, and the loop within _get_server() will yield only unavailable memcache servers. This can be easily reproduced by http://xsnippet.org/360181/	Preconditions: 1. Keystone is configured to use Memcache backend=keystone.cache.memcache_pool backend_argument=url:10.108.12.3,10.108.12.5,10.108.12.6 backend_argument=pool_maxsize:100 2. Memcached is deployed on each of 3 controllers. 3. 2 of 3 memcached servers are down (only the one on the primary controller is up: 10.108.12.3 is up, 10.108.12.5 and 10.108.12.6 are down) Result: the keystone API hangs when a user do something like "keystone tenant-list". haproxy will drop the connection after 60s timeout. strace shows that keystone tries to connect to unavailable servers in a loop, ignoring the available one. Debug showed that keystone-all process is stuck while trying to set a lock: http://xsnippet.org/360179/ . The lock itself is implemented by setting a key in the memcache. The problem here is that python-memcache shards the keys among the configured memcached instances in a way, so that it can possibly fail to find an available server. This is due to how the retry logic is implemented: https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L381-L396 . For a particular key (e.g. _lockusertokens-ee7e5f7374a8488bb2087e106a8834f7) 10 attempts won't be enough, and the loop within _get_server() will yield only unavailable memcache servers. This can be easily reproduced by http://xsnippet.org/360181/
2014-09-18 18:30:29	Dmitry Mescheryakov	nominated for series		mos/5.0.x
2014-09-18 18:30:29	Dmitry Mescheryakov	bug task added		mos/5.0.x
2014-09-18 18:30:33	Dmitry Mescheryakov	mos/5.0.x: status	New	Incomplete
2014-09-18 18:30:34	Dmitry Mescheryakov	mos/5.0.x: status	Incomplete	Confirmed
2014-09-18 18:30:36	Dmitry Mescheryakov	mos/5.0.x: importance	Undecided	High
2014-09-18 18:30:39	Dmitry Mescheryakov	mos/5.0.x: milestone		5.0.3
2014-09-22 23:46:00	Bogdan Dobrelya	mos/5.0.x: assignee		MOS Keystone (mos-keystone)
2014-09-22 23:46:09	Bogdan Dobrelya	mos/5.1.x: assignee		MOS Keystone (mos-keystone)
2014-09-22 23:46:17	Bogdan Dobrelya	mos/6.0.x: assignee		MOS Keystone (mos-keystone)
2014-09-24 10:33:53	Alexander Makarov	mos/5.0.x: assignee	MOS Keystone (mos-keystone)	Alexander Makarov (amakarov)
2014-09-24 10:33:56	Alexander Makarov	mos/5.1.x: assignee	MOS Keystone (mos-keystone)	Alexander Makarov (amakarov)
2014-09-24 10:33:59	Alexander Makarov	mos/5.0.x: assignee	Alexander Makarov (amakarov)
2014-09-24 10:34:06	Alexander Makarov	mos/5.0.x: assignee		Alexander Makarov (amakarov)
2014-09-24 10:34:09	Alexander Makarov	mos/6.0.x: assignee	MOS Keystone (mos-keystone)	Alexander Makarov (amakarov)
2014-09-25 15:04:30	Alexander Makarov	attachment added		get_server_fix.patch https://bugs.launchpad.net/mos/+bug/1370324/+attachment/4214980/+files/get_server_fix.patch
2014-11-13 17:00:00	Alexander Makarov	mos/5.0.x: status	Confirmed	Fix Committed
2014-11-13 17:00:02	Alexander Makarov	mos/5.1.x: status	Confirmed	Fix Committed
2014-11-13 17:00:10	Alexander Makarov	mos/6.0.x: status	Confirmed	Fix Committed