keystone behavior when one memcache backend is down

Bug #1332058 reported by Sergii Golovatiuk
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Committed
Critical
Yuriy Taraday
OpenStack Identity (keystone)
Fix Released
Medium
Yuriy Taraday
keystonemiddleware
Fix Released
Medium
Morgan Fainberg

Bug Description

Hi,

Our implementation uses dogpile.cache.memcached as a backend for tokens. Recently, I have found interesting behavior when one of memcache regions went down. There is a 3-6 second delay when I try to get a token. If I have 2 backends then I have 6-12 seconds delay. It's very easy to test

Test connection using

for i in {1..20}; do (time keystone token-get >> log2) 2>&1 | grep real | awk '{print $2}'; done

Block one memcache backend using

iptables -I INPUT -p tcp --dport 11211 -j DROP (Simulation power outage of node)

Test the speed using

for i in {1..20}; do (time keystone token-get >> log2) 2>&1 | grep real | awk '{print $2}'; done

Also I straced keystone process with

strace -tt -s 512 -o /root/log1 -f -p PID

and got

26872 connect(9, {sa_family=AF_INET, sin_port=htons(11211), sin_addr=inet_addr("10.108.2.3")}, 16) = -1 EINPROGRESS (Operation now in progress)

though this IP is down

Also I checked the code

https://github.com/openstack/keystone/blob/master/keystone/common/kvs/core.py#L210-L237
https://github.com/openstack/keystone/blob/master/keystone/common/kvs/core.py#L285-L289
 https://github.com/openstack/keystone/blob/master/keystone/common/kvs/backends/memcached.py#L96

and was not able to find any piece of details how keystone treats with backend when it's down

There should be a logic which temporarily blocks backend when it's not accessible. After timeout period, backend should be probed (but not blocking get/set operations of current backends) and if connection is successful it should be added back to operation. Here is a sample how it could be implemented

http://dogpilecache.readthedocs.org/en/latest/usage.html#changing-backend-behavior

Tags: ha
Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

This behavior is the way the python memcache clients themselves work. This isn't specific to dogpile, keystone, or anything else.

The basic behavior is 'try and wait for a timeout'. Not sure what the best solution to this will be in the short-term. In the long term, the real solution will be non-persistent (no need to store) the tokens, which would eliminate the need for memcache in this regard.

Revision history for this message
Meg McRoberts (dreidellhasa) wrote :

Documented as "Known Issue" in 5.0.1 Release Notes

Revision history for this message
Dolph Mathews (dolph) wrote :

5.0.1 of what?

Revision history for this message
Dolph Mathews (dolph) wrote :

Based on Morgan's comment, and the fact that this is a "known issue" somewhere, it doesn't sound like there's anything for us to do in Keystone?

Changed in keystone:
status: New → Incomplete
Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

Currently, keystone is not ready for High availability and doesn't give any control on HA options for memcached.
I think there should be some specific values in keystone.conf where operators can tune libmemcached or pylibmc better than default settings.

[cache]
backend_behaviors=

This option should specify the behavior for backend (http://sendapatch.se/projects/pylibmc/behaviors.html#failover)

    regions.behaviors = {
        "tcp_nodelay": False,
        "ketama": True,
        "failure_limit": 2,
        "_retry_timeout": 30,
        "_auto_eject_hosts": True
    }

and toss all these settings during dogpile backend registration as specified at https://pypi.python.org/pypi/dogpile.cache

Changed in keystone:
status: Incomplete → Confirmed
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Changed in mos:
milestone: none → 5.1
importance: Undecided → High
assignee: nobody → MOS Keystone (mos-keystone)
status: New → Confirmed
tags: added: ha
Revision history for this message
Alexei Kornienko (alexei-kornienko) wrote :

I need to know type of memcached connector that is used by keystone. best option is pylibmc

Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote : Re: [Bug 1332058] Re: keystone behavior when one memcache backend is down

currently, Fuel uses python-memcached. It lacks of HA features. pylibmc is
supported by dogpile, though pylibmc behaviors are not tunable by keystone.
*make_region*().configure() should be invoked where options are specified
for HA

--
Best regards,
Sergii Golovatiuk,
Skype #golserge
IRC #holser

On Wed, Aug 6, 2014 at 4:03 PM, Alexei Kornienko <email address hidden>
wrote:

> I need to know type of memcached connector that is used by keystone.
> best option is pylibmc
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1332058
>
> Title:
> keystone behavior when one memcache backend is down
>
> Status in OpenStack Identity (Keystone):
> Confirmed
> Status in Mirantis OpenStack:
> Confirmed
>
> Bug description:
> Hi,
>
> Our implementation uses dogpile.cache.memcached as a backend for
> tokens. Recently, I have found interesting behavior when one of
> memcache regions went down. There is a 3-6 second delay when I try to
> get a token. If I have 2 backends then I have 6-12 seconds delay. It's
> very easy to test
>
> Test connection using
>
> for i in {1..20}; do (time keystone token-get >> log2) 2>&1 | grep
> real | awk '{print $2}'; done
>
> Block one memcache backend using
>
> iptables -I INPUT -p tcp --dport 11211 -j DROP (Simulation power
> outage of node)
>
> Test the speed using
>
> for i in {1..20}; do (time keystone token-get >> log2) 2>&1 | grep
> real | awk '{print $2}'; done
>
> Also I straced keystone process with
>
> strace -tt -s 512 -o /root/log1 -f -p PID
>
> and got
>
> 26872 connect(9, {sa_family=AF_INET, sin_port=htons(11211),
> sin_addr=inet_addr("10.108.2.3")}, 16) = -1 EINPROGRESS (Operation now
> in progress)
>
> though this IP is down
>
> Also I checked the code
>
>
> https://github.com/openstack/keystone/blob/master/keystone/common/kvs/core.py#L210-L237
>
> https://github.com/openstack/keystone/blob/master/keystone/common/kvs/core.py#L285-L289
>
> https://github.com/openstack/keystone/blob/master/keystone/common/kvs/backends/memcached.py#L96
>
> and was not able to find any piece of details how keystone treats with
> backend when it's down
>
> There should be a logic which temporarily blocks backend when it's not
> accessible. After timeout period, backend should be probed (but not
> blocking get/set operations of current backends) and if connection is
> successful it should be added back to operation. Here is a sample how
> it could be implemented
>
> http://dogpilecache.readthedocs.org/en/latest/usage.html#changing-
> backend-behavior
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/keystone/+bug/1332058/+subscriptions
>

Changed in mos:
assignee: MOS Keystone (mos-keystone) → Yuriy Taraday (yorik-sar)
Changed in mos:
assignee: Yuriy Taraday (yorik-sar) → Alexei Kornienko (alexei-kornienko)
Changed in mos:
importance: High → Critical
Revision history for this message
Dolph Mathews (dolph) wrote :

Based on comment #5 it sounds like the keystone side of this is just looking for keystone.conf [cache] backend_argument - which obviously already exists.

Changed in keystone:
status: Confirmed → Invalid
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

@Dolph, then I try to use

backend=dogpile.cache.pylibmc
and
backend_argument=behaviors:tcp_nodelay:False

I recieve an error from keystone:
ERROR: __init__() got an unexpected keyword argument 'behaviors' (HTTP 400)

Changed in keystone:
status: Invalid → New
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Well, it could be because of old version I use (http://sendapatch.se/projects/pylibmc/index.html)...

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

As far as I can see Fuel puppet manifests (keystone_config) should be adjusted as well

Changed in keystone:
status: New → Invalid
Changed in fuel:
assignee: nobody → Bogdan Dobrelya (bogdando)
status: New → Triaged
importance: Undecided → High
milestone: none → 5.1
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Deployment part of this bug is tracked here:

https://bugs.launchpad.net/fuel/+bug/1340657

no longer affects: fuel
Changed in mos:
assignee: Alexei Kornienko (alexei-kornienko) → Yuriy Taraday (yorik-sar)
Revision history for this message
Tomasz 'Zen' Napierala (tzn) wrote :

If this is worked on, please change status to "in progress" just to keep the situation clear

Changed in mos:
status: Confirmed → In Progress
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

The bug is fixed in MOS by this commit: https://gerrit.mirantis.com/#/c/21408/8

Changed in mos:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to keystone (master)

Fix proposed to branch: master
Review: https://review.openstack.org/119452

Changed in keystone:
assignee: nobody → Yuriy Taraday (yorik-sar)
status: Invalid → In Progress
Changed in keystone:
assignee: Yuriy Taraday (yorik-sar) → Morgan Fainberg (mdrnstm)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to keystonemiddleware (master)

Fix proposed to branch: master
Review: https://review.openstack.org/119774

Changed in keystonemiddleware:
assignee: nobody → Yuriy Taraday (yorik-sar)
status: New → In Progress
Changed in keystone:
assignee: Morgan Fainberg (mdrnstm) → Yuriy Taraday (yorik-sar)
Dolph Mathews (dolph)
Changed in keystone:
milestone: none → juno-rc1
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to keystone (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/121166

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on keystone (master)

Change abandoned by Morgan Fainberg (<email address hidden>) on branch: master
Review: https://review.openstack.org/121166
Reason: The parent needs a rebase will regenerate the config there.

Changed in keystone:
assignee: Yuriy Taraday (yorik-sar) → Morgan Fainberg (mdrnstm)
Changed in keystone:
assignee: Morgan Fainberg (mdrnstm) → Yuriy Taraday (yorik-sar)
Changed in keystone:
assignee: Yuriy Taraday (yorik-sar) → Morgan Fainberg (mdrnstm)
Changed in keystone:
assignee: Morgan Fainberg (mdrnstm) → Yuriy Taraday (yorik-sar)
Changed in keystone:
assignee: Yuriy Taraday (yorik-sar) → Morgan Fainberg (mdrnstm)
Changed in keystonemiddleware:
milestone: none → 1.2.0
importance: Undecided → Medium
Changed in keystonemiddleware:
assignee: Yuriy Taraday (yorik-sar) → Morgan Fainberg (mdrnstm)
Changed in keystone:
assignee: Morgan Fainberg (mdrnstm) → Yuriy Taraday (yorik-sar)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to keystone (master)

Reviewed: https://review.openstack.org/119452
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=0010803288748fcd3ce7dba212a54bffe7a61a0c
Submitter: Jenkins
Branch: master

commit 0010803288748fcd3ce7dba212a54bffe7a61a0c
Author: Yuriy Taraday <email address hidden>
Date: Thu Aug 28 14:27:58 2014 +0400

    Add a pool of memcached clients

    This patchset adds a pool of memcache clients. This pool allows for reuse of
    a client object, prevents too many client object from being instantiated, and
    maintains proper tracking of dead servers so as to limit delays
    when a server (or all servers) become unavailable.

    The new memcache pool backend is available either by being set as the memcache
    backend or by using keystone.token.persistence.backends.memcache_pool.Token for
    the Token memcache persistence driver.

    [memcache]
    servers = 127.0.0.1:11211
    dead_retry = 300
    socket_timeout = 3
    pool_maxsize = 10
    pool_unused_timeout = 60

    Where:
    - servers - comma-separated list of host:port pairs (was already there);
    - dead_retry - number of seconds memcached server is considered dead
      before it is tried again;
    - socket_timeout - timeout in seconds for every call to a server;
    - pool_maxsize - max total number of open connections in the pool;
    - pool_unused_timeout - number of seconds a connection is held unused in
      the pool before it is closed;

    The new memcache pool backend can be used as the driver for the Keystone
    caching layer. To use it as caching driver, set
    'keystone.cache.memcache_pool' as the value of the [cache]\backend option,
    the other options are the same as above, but with 'memcache_' prefix:

    [cache]
    backend = keystone.cache.memcache_pool
    memcache_servers = 127.0.0.1:11211
    memcache_dead_retry = 300
    memcache_socket_timeout = 3
    memcache_pool_maxsize = 10
    memcache_pool_unused_timeout = 60

    Co-Authored-By: Morgan Fainberg <email address hidden>
    Closes-bug: #1332058
    Closes-bug: #1360446
    Change-Id: I3544894482b30a47fcd4fac8948d03136fd83f14

Changed in keystone:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to keystonemiddleware (master)

Reviewed: https://review.openstack.org/119774
Committed: https://git.openstack.org/cgit/openstack/keystonemiddleware/commit/?id=045cddcea2ecefccecbb40d4249b915c3f1faae3
Submitter: Jenkins
Branch: master

commit 045cddcea2ecefccecbb40d4249b915c3f1faae3
Author: Morgan Fainberg <email address hidden>
Date: Sun Sep 21 13:20:35 2014 -0700

    Add an optional advanced pool of memcached clients

    This patchset adds an advanced eventlet safe pool of memcache clients. This
    allows the deployer to configure auth_token middleware to utilize the new
    pool by simply setting 'memcache_use_advanced_pool' to true. Optional
    tunables for the memcache pool have also been added.

    Co-Authored-By: Morgan Fainberg <email address hidden>
    Closes-bug: #1332058
    Closes-bug: #1360446
    Change-Id: I08082b46ce692cf4df449d48dac94718f1e98a6c

Changed in keystonemiddleware:
status: In Progress → Fix Committed
Dolph Mathews (dolph)
Changed in keystonemiddleware:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in keystone:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in keystone:
milestone: juno-rc1 → 2014.2
Revision history for this message
OSCI Robot (oscirobot) wrote :

RPM package keystone has been built for project openstack/keystone
Package version == 2014.1.1, package release == fuel5.0.3.mira8.git.de075cb.7361afa

Changeset: https://review.fuel-infra.org/561
project: openstack/keystone
branch: openstack-ci/fuel-5.0.3/2014.1.1
author: Alexander Makarov
committer: Alexander Makarov
subject: Update a pool of memcached clients from upstream
status: patchset-created

Files placed on repository:
openstack-keystone-2014.1.1-fuel5.0.3.mira8.git.de075cb.7361afa.noarch.rpm
openstack-keystone-doc-2014.1.1-fuel5.0.3.mira8.git.de075cb.7361afa.noarch.rpm
python-keystone-2014.1.1-fuel5.0.3.mira8.git.de075cb.7361afa.noarch.rpm

NOTE: Changeset is not merged, created temporary package repository.
RPM repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-5.0.3-stable-561/centos

Revision history for this message
OSCI Robot (oscirobot) wrote :

DEB package keystone has been built for project openstack/keystone
Package version == 2014.1.1, package release == fuel5.0.3~mira8+git.de075cb.7361afa

Changeset: https://review.fuel-infra.org/561
project: openstack/keystone
branch: openstack-ci/fuel-5.0.3/2014.1.1
author: Alexander Makarov
committer: Alexander Makarov
subject: Update a pool of memcached clients from upstream
status: patchset-created

Files placed on repository:
keystone-doc_2014.1.1-fuel5.0.3~mira8+git.de075cb.7361afa_all.deb
keystone_2014.1.1-fuel5.0.3~mira8+git.de075cb.7361afa_all.deb
python-keystone_2014.1.1-fuel5.0.3~mira8+git.de075cb.7361afa_all.deb

NOTE: Changeset is not merged, created temporary package repository.
DEB repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.0.3-stable-561/ubuntu

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to keystone (master)

Reviewed: https://review.opendev.org/737579
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=bb0393623ca8687714342d2b0cc73cc6c126ecde
Submitter: Zuul
Branch: master

commit bb0393623ca8687714342d2b0cc73cc6c126ecde
Author: Lance Bragstad <email address hidden>
Date: Tue Jun 23 11:37:06 2020 -0500

    Write a symptom for checking memcache connections

    This makes it easier for operators to troubleshoot connection issues to
    Memcached.

    Related-Bug: 1332058

    Change-Id: I6e67363822480314b93608bb1eae3514f1480f6d

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.