client connection leak to memcached under eventlet due to threadlocal

Bug #1360446 reported by Aleksandr Shaposhnikov
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Identity (keystone)
Medium
Yuriy Taraday
Icehouse
Medium
Unassigned
keystonemiddleware
Medium
Morgan Fainberg

Bug Description

When Keystone configured with memcached as backend and token storage keystone didn't reuse connections to it and starting to fail after having more than 500 connections to the memcached.

Steps to reproduce:

1. Configure keystone with memcached as backend.
2. Create moreless good load (creating of VM's creates a lot of connections) on keystone and watch for connections to memcached using netstat, ex. netstat -an |grep -c ":11211"

Expected behavior:
connections number should be reasonable and be not more that the number of connections to the keystone (ideally :)

Observed bahavior:
Number of connections growing and seems than
1. They didn't reused at all.
2. Lifetime of some connection is 600 seconds.
3. It looks like not all the connections stay for 600 seconds.

<UPDATE from MorganFainberg>
This is specific to deploying under eventlet and the python-memcached library and it's explicit/unavoidable use of threadlocal. Use of threadlocal under eventlet causes the client connections to leak until the GC / kernel cleans up the connections. This was confirmed to only affect eventlet with patch threading.

Keystone deployed under apache is not affected.

All services deployed with keystonemiddleware that utilize eventlet and memcache for token cache are also affected.

description: updated
description: updated
tags: added: memcached
Dolph Mathews (dolph)
Changed in keystone:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Dolph Mathews (dolph) wrote :

Is this using kvs/dogpile or memcached directly? And icehouse or master?

Revision history for this message
Aleksandr Shaposhnikov (alashai8) wrote :

I found this on Icehouse and it was using dogpile with memcache as backend.

Revision history for this message
Dolph Mathews (dolph) wrote :

Morgan: is there anything we can do in keystone to fix this? or is it up to the dogpile backend?

Changed in keystone:
status: Triaged → Incomplete
tags: added: dogpile.cache
Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

This is up to fixing the dogpile backend. That being said it isn't hard to add this type of work to the dogpile backend. We already do some of this with the in-memory backend we use for testing.

If we have a clear solution to this (I think I have an idea on how to do this.. Similar to the pool in auth_token) this could easily be implemented in the course of an hour or so.

This is also likely something we can work with Mike Bayer and include upstream in dogpile directly.

So let's do this:

Work on a short term fix here that will land for Juno and is back portable (ensuring compat with older dogpile packaging). Then contribute the longer term fix to upstream dogpile.

I'll discuss a little more in depth on irc wen I'm back off phone only access.

Changed in keystone:
milestone: none → juno-rc1
Revision history for this message
Mike Bayer (zzzeek) wrote :

hey @morganfainberg, looked for you on IRC but think you left already. Let me know what the nature of this is, e.g. if this has to do with the memcached implicit threadlocal thing they do, someone was asking me about this the other day.

Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

@zzzeek Just missed you on IRC, but I am not sure about the threadlocal issue (doesn't ring a bell). This appears to just be lack of re-use of the actual client connection object to memcached. We got around some of this issue in keystoneclient by using a memcachepool implementation that creates up to the number of worker threads worth of memcache connections and when it's done with the connection it puts it back into the pool (with a little logic to handle dead connections).

In this case i'm guessing they have a lot of activity and some connections aren't being closed due to either GC delay or other memcached client library oddities, so under load you end up with stale connections which cause this related error case.

Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

Socket limits, FD limits, etc.

Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

After further research, this looks to be an eventlet + memcache client (due to threadlocal implementation) issue.

Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

Under apache deployment this issue will not occur. It appears as if standard thread deployment under eventlet will also prevent this issue.

Based on a conversation in IRC todya we should have some patches posted soon to address the thread local issue (pending review/updates to make them supportable) for Juno and potentially backported to Icehouse. The longer term fix will include some new libraries and updates to dogpile.cache.

The immediate fix is to deploy keystone under apache instead of eventlet (or not use the Memcache backends)

Changed in keystone:
status: Incomplete → Triaged
Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

This affects any server using keystonemiddleware that runs under eventlet and uses memcache for token caching. The fix for keystonemiddleware will likely be based on the Keystone fix.

Changed in keystonemiddleware:
importance: Undecided → High
status: New → Triaged
tags: added: icehouse-backport-potential
summary: - Keystone server didn't reuse connections to memcached
+ client connection leak to memcached under eventlet due to threadlocal
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to keystone (master)

Fix proposed to branch: master
Review: https://review.openstack.org/119452

Changed in keystone:
assignee: nobody → Yuriy Taraday (yorik-sar)
status: Triaged → In Progress
Changed in keystone:
assignee: Yuriy Taraday (yorik-sar) → Morgan Fainberg (mdrnstm)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to keystonemiddleware (master)

Fix proposed to branch: master
Review: https://review.openstack.org/119774

Changed in keystonemiddleware:
assignee: nobody → Yuriy Taraday (yorik-sar)
status: Triaged → In Progress
Changed in keystone:
assignee: Morgan Fainberg (mdrnstm) → Razumovsky Peter (prazumovsky)
Changed in keystone:
assignee: Razumovsky Peter (prazumovsky) → Yuriy Taraday (yorik-sar)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to keystone (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/121166

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on keystone (master)

Change abandoned by Morgan Fainberg (<email address hidden>) on branch: master
Review: https://review.openstack.org/121166
Reason: The parent needs a rebase will regenerate the config there.

Changed in keystone:
assignee: Yuriy Taraday (yorik-sar) → Morgan Fainberg (mdrnstm)
Changed in keystone:
assignee: Morgan Fainberg (mdrnstm) → Yuriy Taraday (yorik-sar)
Changed in keystone:
assignee: Yuriy Taraday (yorik-sar) → Morgan Fainberg (mdrnstm)
Changed in keystone:
assignee: Morgan Fainberg (mdrnstm) → Yuriy Taraday (yorik-sar)
Changed in keystone:
assignee: Yuriy Taraday (yorik-sar) → Morgan Fainberg (mdrnstm)
Changed in keystonemiddleware:
milestone: none → 1.2.0
importance: High → Medium
Changed in keystonemiddleware:
assignee: Yuriy Taraday (yorik-sar) → Morgan Fainberg (mdrnstm)
Changed in keystone:
assignee: Morgan Fainberg (mdrnstm) → Yuriy Taraday (yorik-sar)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to keystone (master)

Reviewed: https://review.openstack.org/119452
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=0010803288748fcd3ce7dba212a54bffe7a61a0c
Submitter: Jenkins
Branch: master

commit 0010803288748fcd3ce7dba212a54bffe7a61a0c
Author: Yuriy Taraday <email address hidden>
Date: Thu Aug 28 14:27:58 2014 +0400

    Add a pool of memcached clients

    This patchset adds a pool of memcache clients. This pool allows for reuse of
    a client object, prevents too many client object from being instantiated, and
    maintains proper tracking of dead servers so as to limit delays
    when a server (or all servers) become unavailable.

    The new memcache pool backend is available either by being set as the memcache
    backend or by using keystone.token.persistence.backends.memcache_pool.Token for
    the Token memcache persistence driver.

    [memcache]
    servers = 127.0.0.1:11211
    dead_retry = 300
    socket_timeout = 3
    pool_maxsize = 10
    pool_unused_timeout = 60

    Where:
    - servers - comma-separated list of host:port pairs (was already there);
    - dead_retry - number of seconds memcached server is considered dead
      before it is tried again;
    - socket_timeout - timeout in seconds for every call to a server;
    - pool_maxsize - max total number of open connections in the pool;
    - pool_unused_timeout - number of seconds a connection is held unused in
      the pool before it is closed;

    The new memcache pool backend can be used as the driver for the Keystone
    caching layer. To use it as caching driver, set
    'keystone.cache.memcache_pool' as the value of the [cache]\backend option,
    the other options are the same as above, but with 'memcache_' prefix:

    [cache]
    backend = keystone.cache.memcache_pool
    memcache_servers = 127.0.0.1:11211
    memcache_dead_retry = 300
    memcache_socket_timeout = 3
    memcache_pool_maxsize = 10
    memcache_pool_unused_timeout = 60

    Co-Authored-By: Morgan Fainberg <email address hidden>
    Closes-bug: #1332058
    Closes-bug: #1360446
    Change-Id: I3544894482b30a47fcd4fac8948d03136fd83f14

Changed in keystone:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to keystonemiddleware (master)

Reviewed: https://review.openstack.org/119774
Committed: https://git.openstack.org/cgit/openstack/keystonemiddleware/commit/?id=045cddcea2ecefccecbb40d4249b915c3f1faae3
Submitter: Jenkins
Branch: master

commit 045cddcea2ecefccecbb40d4249b915c3f1faae3
Author: Morgan Fainberg <email address hidden>
Date: Sun Sep 21 13:20:35 2014 -0700

    Add an optional advanced pool of memcached clients

    This patchset adds an advanced eventlet safe pool of memcache clients. This
    allows the deployer to configure auth_token middleware to utilize the new
    pool by simply setting 'memcache_use_advanced_pool' to true. Optional
    tunables for the memcache pool have also been added.

    Co-Authored-By: Morgan Fainberg <email address hidden>
    Closes-bug: #1332058
    Closes-bug: #1360446
    Change-Id: I08082b46ce692cf4df449d48dac94718f1e98a6c

Changed in keystonemiddleware:
status: In Progress → Fix Committed
Dolph Mathews (dolph)
Changed in keystonemiddleware:
status: Fix Committed → Fix Released
Revision history for this message
Dolph Mathews (dolph) wrote :

Backporting this to stable/icehouse looks like it's going to take some care rebasing, if anyone wants to tackle that :)

Revision history for this message
Lance Bragstad (lbragstad) wrote :

Looks like Brant had some comment on the patch right before it merged that never were addressed. Should a separate patch be pushed to address those concerns in master before back-porting both to icehouse?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to keystone (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/124443

Thierry Carrez (ttx)
Changed in keystone:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to keystone (master)

Reviewed: https://review.openstack.org/124443
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=bdc0f68210a29e7ad02734d11fb88e5c31930cd8
Submitter: Jenkins
Branch: master

commit bdc0f68210a29e7ad02734d11fb88e5c31930cd8
Author: Lance Bragstad <email address hidden>
Date: Fri Sep 26 15:31:24 2014 +0000

    Address some late comments for memcache clients

    This change addresses some late comments from the following review:
    https://review.openstack.org/#/c/119452/31

    Change-Id: I031620f9085ff914aa9b99c21387f953b6ada171
    Related-Bug: #1360446

Thierry Carrez (ttx)
Changed in keystone:
milestone: juno-rc1 → 2014.2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to keystone (feature/hierarchical-multitenancy)

Related fix proposed to branch: feature/hierarchical-multitenancy
Review: https://review.openstack.org/129376

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to keystone (feature/hierarchical-multitenancy)
Download full text (8.3 KiB)

Reviewed: https://review.openstack.org/129376
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=6f806bdc9b58206ecccf29f79df1257e737e9f5b
Submitter: Jenkins
Branch: feature/hierarchical-multitenancy

commit fdbad9f530ea4478d96437b021c9b5cc6d338901
Author: Nathan Kinder <email address hidden>
Date: Wed Oct 15 16:21:01 2014 -0700

    Restrict certain APIs to cloud admin in domain-aware policy

    Some of the APIs in the domain-aware policy file are currently
    allowed by any "admin" user, when they should really be locked
    down to the cloud admin. Without this, users who are a project
    admin will be allowed to do things like manage regions, IdPs,
    and other objects that they should not be allowed to touch.

    Change-Id: Ifca8bc2fffd2d8c1bf02373d1fadd459a77f836c
    Closes-bug: #1381809

commit 062786bc53533edf78a24e35688d7183c0b57175
Author: Brad Topol <email address hidden>
Date: Mon Sep 8 11:28:02 2014 -0500

    Clean up federated identity audit code

    Change-Id: I110eb40c83f1de25bff9215b0490269f5941316a

commit 1056f9abfb283abb083538b7588a006c1b242d1b
Author: wanghong <email address hidden>
Date: Thu Oct 9 15:39:27 2014 +0800

    obsolete deployment docs

    Now we use 'database' section instead, but the doc does not synchronize.

    Change-Id: Ie73ec8225ce1290a4b8fdbb5b9db4c566b5ada22
    Closes-Bug: #1377101

commit 1b2fc1e10469bf5ff97b8a825ba404dd8f602320
Author: David Stanek <email address hidden>
Date: Thu Sep 4 17:59:58 2014 +0000

    Fixes a spelling error in hacking tests

    bp more-code-style-automation

    Change-Id: I9159aba128415d6e3a1f9ee9147c7cba19abeffe

commit 2520502724c549fb7ad846203ed60eb86c21aed3
Author: OpenStack Proposal Bot <email address hidden>
Date: Tue Oct 7 19:12:29 2014 +0000

    Updated from global requirements

    Change-Id: If2d591bba119998e41f109f4099ba4147821171e

commit 8af522af96c4bc0f6d0f7de48f6433fd19115d54
Author: Henry Nash <email address hidden>
Date: Tue Oct 7 10:01:47 2014 +0100

    Remove deprecated KVS trust backend.

    The trust backend is one of the KVS backends that was marked as
    deprecated, for removal in Kilo. This patch removes it.

    Partially implements: bp removed-as-of-kilo

    Change-Id: Ib67cd33419d09e219d90ab8c50d375964a12640c

commit a96b20238919037837156e238e708abff415cade
Author: Steve Martinelli <email address hidden>
Date: Fri Sep 26 14:40:22 2014 -0400

    Add v3 openstackclient CLI examples

    Add some notes about authenticating with v3 keystone and
    openstackclient. Also add some examples that don't exist in v2.0,
    like domains and groups.

    Change-Id: I92f9f9ab3ed4657f0771ad284ee6c4c613eca27c

commit 495b44ae0ed3e69e21022ccfc9e2d67ba4d0a97e
Author: Steve Martinelli <email address hidden>
Date: Thu Sep 25 12:08:15 2014 -0400

    Update the CLI examples to also use openstackclient

    In the CLI example section, use openstackclient examples and
    keystoneclient examples.

    Change-Id: Ia13730fbac5900998993c56d9a792b392a1ba3ac

commit 4f9add8029de5f9463b9bd9ca4f933f1be79c021
Author: Steve Martinelli <stevemar@c...

Read more...

Revision history for this message
Piyush Srivastava (piyush0101) wrote :

Does this happen if you use unix domain sockets to connect to memcache?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers