Activity log for bug #1370324

Date Who What changed Old value New value Message
2014-09-17 04:09:59 Alexander Ignatov bug added bug
2014-09-17 04:14:06 Alexander Ignatov mos: status New Confirmed
2014-09-17 04:14:08 Alexander Ignatov mos: milestone 6.0
2014-09-17 04:14:32 Alexander Ignatov tags keystone
2014-09-17 04:14:50 Alexander Ignatov mos: assignee Roman Podoliaka (rpodolyaka)
2014-09-17 04:16:22 Alexander Ignatov tags keystone keystone memcached
2014-09-17 17:58:34 Roman Podoliaka summary "keystone tenant-list" hangs sometimes Keystone hangs trying to set a lock in Memcache
2014-09-17 18:15:11 Roman Podoliaka description due to incorrect logic in python-memcache, keystone tries to write data into a dead memcache backend, skipping alive backends. That happens because backend traverse logic is randomized and can potentially miss alive servers in the pool. When most servers of the pool are dead, the probability of failure is relatively high. Practically that issue shows up during deployment, when keystone is used in the environment, where some controllers have not been deployed yet. The issue is a heizenbug and it depends on the randomly generated data. Preconditions: 1. Keystone is configured to use Memcache backend=keystone.cache.memcache_pool backend_argument=url:10.108.12.3,10.108.12.5,10.108.12.6 backend_argument=pool_maxsize:100 2. Memcached is deployed on each of 3 controllers. 3. 2 of 3 memcached servers are down (only the one on the primary controller is up: 10.108.12.3 is up, 10.108.12.5 and 10.108.12.6 are down) Result: the keystone API hangs when a user do something like "keystone tenant-list" Debug showed that keystone-all process is stuck while trying to set a lock: http://xsnippet.org/360179/ . The lock itself is implemented by setting a key in the memcache. The problem here is that python-memcache shards the keys among the configured memcached instances in a way, so that it *can* possibly fail to find an available server. This is due to how the retry logic is implemented: https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L381-L396 . For a particular key (e.g. _lockusertokens-ee7e5f7374a8488bb2087e106a8834f7) 10 attempts won't be enough, and the loop within _get_server() will iterate over unavailable memcache servers. This can be easily reproduced by http://xsnippet.org/360181/
2014-09-17 18:16:51 Roman Podoliaka description Preconditions: 1. Keystone is configured to use Memcache backend=keystone.cache.memcache_pool backend_argument=url:10.108.12.3,10.108.12.5,10.108.12.6 backend_argument=pool_maxsize:100 2. Memcached is deployed on each of 3 controllers. 3. 2 of 3 memcached servers are down (only the one on the primary controller is up: 10.108.12.3 is up, 10.108.12.5 and 10.108.12.6 are down) Result: the keystone API hangs when a user do something like "keystone tenant-list" Debug showed that keystone-all process is stuck while trying to set a lock: http://xsnippet.org/360179/ . The lock itself is implemented by setting a key in the memcache. The problem here is that python-memcache shards the keys among the configured memcached instances in a way, so that it *can* possibly fail to find an available server. This is due to how the retry logic is implemented: https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L381-L396 . For a particular key (e.g. _lockusertokens-ee7e5f7374a8488bb2087e106a8834f7) 10 attempts won't be enough, and the loop within _get_server() will iterate over unavailable memcache servers. This can be easily reproduced by http://xsnippet.org/360181/ Preconditions: 1. Keystone is configured to use Memcache backend=keystone.cache.memcache_pool backend_argument=url:10.108.12.3,10.108.12.5,10.108.12.6 backend_argument=pool_maxsize:100 2. Memcached is deployed on each of 3 controllers. 3. 2 of 3 memcached servers are down (only the one on the primary controller is up: 10.108.12.3 is up, 10.108.12.5 and 10.108.12.6 are down) Result: the keystone API hangs when a user do something like "keystone tenant-list" Debug showed that keystone-all process is stuck while trying to set a lock: http://xsnippet.org/360179/ . The lock itself is implemented by setting a key in the memcache. The problem here is that python-memcache shards the keys among the configured memcached instances in a way, so that it *can* possibly fail to find an available server. This is due to how the retry logic is implemented: https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L381-L396 . For a particular key (e.g. _lockusertokens-ee7e5f7374a8488bb2087e106a8834f7) 10 attempts won't be enough, and the loop within _get_server() will yield *only* unavailable memcache servers. This can be easily reproduced by http://xsnippet.org/360181/
2014-09-17 18:18:03 Roman Podoliaka mos: assignee Roman Podoliaka (rpodolyaka)
2014-09-17 18:25:58 Roman Podoliaka mos: status Confirmed Triaged
2014-09-17 21:19:26 Alexander Ignatov mos: status Triaged In Progress
2014-09-18 16:58:43 Bogdan Dobrelya nominated for series mos/5.1.x
2014-09-18 16:58:43 Bogdan Dobrelya bug task added mos/5.1.x
2014-09-18 16:59:03 Bogdan Dobrelya nominated for series mos/6.0.x
2014-09-18 16:59:03 Bogdan Dobrelya bug task added mos/6.0.x
2014-09-18 16:59:16 Bogdan Dobrelya mos/5.1.x: milestone 6.0 5.1.1
2014-09-18 16:59:23 Bogdan Dobrelya mos/6.0.x: status New Confirmed
2014-09-18 16:59:30 Bogdan Dobrelya mos/6.0.x: importance Undecided High
2014-09-18 17:00:08 Bogdan Dobrelya mos/5.1.x: status In Progress Confirmed
2014-09-18 17:00:18 Bogdan Dobrelya mos/6.0.x: milestone 6.0
2014-09-18 17:58:49 Roman Podoliaka description Preconditions: 1. Keystone is configured to use Memcache backend=keystone.cache.memcache_pool backend_argument=url:10.108.12.3,10.108.12.5,10.108.12.6 backend_argument=pool_maxsize:100 2. Memcached is deployed on each of 3 controllers. 3. 2 of 3 memcached servers are down (only the one on the primary controller is up: 10.108.12.3 is up, 10.108.12.5 and 10.108.12.6 are down) Result: the keystone API hangs when a user do something like "keystone tenant-list" Debug showed that keystone-all process is stuck while trying to set a lock: http://xsnippet.org/360179/ . The lock itself is implemented by setting a key in the memcache. The problem here is that python-memcache shards the keys among the configured memcached instances in a way, so that it *can* possibly fail to find an available server. This is due to how the retry logic is implemented: https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L381-L396 . For a particular key (e.g. _lockusertokens-ee7e5f7374a8488bb2087e106a8834f7) 10 attempts won't be enough, and the loop within _get_server() will yield *only* unavailable memcache servers. This can be easily reproduced by http://xsnippet.org/360181/ Preconditions: 1. Keystone is configured to use Memcache backend=keystone.cache.memcache_pool backend_argument=url:10.108.12.3,10.108.12.5,10.108.12.6 backend_argument=pool_maxsize:100 2. Memcached is deployed on each of 3 controllers. 3. 2 of 3 memcached servers are down (only the one on the primary controller is up: 10.108.12.3 is up, 10.108.12.5 and 10.108.12.6 are down) Result: the keystone API hangs when a user do something like "keystone tenant-list". haproxy will drop the connection after 60s timeout. strace shows that keystone tries to connect to unavailable servers in a loop, ignoring the available one. Debug showed that keystone-all process is stuck while trying to set a lock: http://xsnippet.org/360179/ . The lock itself is implemented by setting a key in the memcache. The problem here is that python-memcache shards the keys among the configured memcached instances in a way, so that it *can* possibly fail to find an available server. This is due to how the retry logic is implemented: https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L381-L396 . For a particular key (e.g. _lockusertokens-ee7e5f7374a8488bb2087e106a8834f7) 10 attempts won't be enough, and the loop within _get_server() will yield *only* unavailable memcache servers. This can be easily reproduced by http://xsnippet.org/360181/
2014-09-18 18:30:29 Dmitry Mescheryakov nominated for series mos/5.0.x
2014-09-18 18:30:29 Dmitry Mescheryakov bug task added mos/5.0.x
2014-09-18 18:30:33 Dmitry Mescheryakov mos/5.0.x: status New Incomplete
2014-09-18 18:30:34 Dmitry Mescheryakov mos/5.0.x: status Incomplete Confirmed
2014-09-18 18:30:36 Dmitry Mescheryakov mos/5.0.x: importance Undecided High
2014-09-18 18:30:39 Dmitry Mescheryakov mos/5.0.x: milestone 5.0.3
2014-09-22 23:46:00 Bogdan Dobrelya mos/5.0.x: assignee MOS Keystone (mos-keystone)
2014-09-22 23:46:09 Bogdan Dobrelya mos/5.1.x: assignee MOS Keystone (mos-keystone)
2014-09-22 23:46:17 Bogdan Dobrelya mos/6.0.x: assignee MOS Keystone (mos-keystone)
2014-09-24 10:33:53 Alexander Makarov mos/5.0.x: assignee MOS Keystone (mos-keystone) Alexander Makarov (amakarov)
2014-09-24 10:33:56 Alexander Makarov mos/5.1.x: assignee MOS Keystone (mos-keystone) Alexander Makarov (amakarov)
2014-09-24 10:33:59 Alexander Makarov mos/5.0.x: assignee Alexander Makarov (amakarov)
2014-09-24 10:34:06 Alexander Makarov mos/5.0.x: assignee Alexander Makarov (amakarov)
2014-09-24 10:34:09 Alexander Makarov mos/6.0.x: assignee MOS Keystone (mos-keystone) Alexander Makarov (amakarov)
2014-09-25 15:04:30 Alexander Makarov attachment added get_server_fix.patch https://bugs.launchpad.net/mos/+bug/1370324/+attachment/4214980/+files/get_server_fix.patch
2014-11-13 17:00:00 Alexander Makarov mos/5.0.x: status Confirmed Fix Committed
2014-11-13 17:00:02 Alexander Makarov mos/5.1.x: status Confirmed Fix Committed
2014-11-13 17:00:10 Alexander Makarov mos/6.0.x: status Confirmed Fix Committed