keystonemiddleware connections to memcached from neutron-server grow beyond configured values

Bug #1883659 reported by Justinas Balciunas on 2020-06-16
282
This bug affects 6 people
Affects Status Importance Assigned to Milestone
OpenStack Security Advisory
Undecided
Unassigned
keystonemiddleware
Undecided
Unassigned
oslo.cache
Undecided
Unassigned

Bug Description

Using: keystone-17.0.0, Ussuri

I've noticed a very odd behaviour of keystone_authtoken with memcached and neutron-server. The connection count to memcached grows over time, ignoring the settings of memcache_pool_maxsize and memcache_pool_unused_timeout. The keystone_authtoken middleware configuration is as follows:

[keystone_authtoken]
www_authenticate_uri = http://keystone_vip:5000
auth_url = http://keystone_vip:35357
auth_type = password
project_domain_id = default
user_domain_id = default
project_name = service
username = neutron
password = neutron_password_here
cafile =
memcache_security_strategy = ENCRYPT
memcache_secret_key = secret_key_here
memcached_servers = memcached_server_1:11211,memcached_server_2:11211,memcached_server_3:11211
memcache_pool_maxsize = 100
memcache_pool_unused_timeout = 600
token_cache_time = 3600

Commenting out memcached settings under [keystone_authtoken] and restarting neutron-server drops the connection count in memcached to normal levels, i.e. hundreds, rather than thousands when neutron-server is using memcached. Neutron team (slaweq) suggested this is a Keystone issue because quote: "Neutron is just using keystonemiddleware as one of the middlewares in the pipeline".

Grafana memcached connection graphs: https://ibb.co/p3TCJqC AND https://ibb.co/nmmvvH4

The drops in the graphs indicate the restart of the neutron-server, so not sure if this is something to be expected, or there is an issue with the configuration, or it's a bug?

summary: - keystonemiddleware connections to memcached from neutron-server grows
+ keystonemiddleware connections to memcached from neutron-server grow
beyond configured values
Gage Hugo (gagehugo) wrote :

Added keystonemiddleware

Gage Hugo (gagehugo) wrote :

Added oslo.cache, not 100% sure which is affected yet.

no longer affects: keystone

Few additions:
1) the situation is not noticeable immediately, therefore automated tests don't trigger this as the whole setup (three memcached nodes, three neutron-servers with keystone_authtoken configured to use memcached) needs to run for a while to see that memcached connection count has exceeded the defined limits;
2) it was also observed that only two memcached nodes out of three are being hit by the uncontrollable growth in the number connections, i.e. one memcached node takes the most load, the second trails by 30-40% less and the third serves usual connection count;
3) the open connection count rises until the limits in memcached configuration are reached (25k per memcached node in my case);

Pierre Riteau (priteau) wrote :

I can confirm that I am seeing this issue with neutron-server, using three memcached servers through keystonemiddleware. This is with the Train release deployed on CentOS 8 with Kolla, which uses the following RDO packages:

openstack-neutron-15.1.0-1.el8.noarch
python3-keystonemiddleware-7.0.1-2.el8.noarch
python3-oslo-cache-1.37.0-2.el8.noarch

Pierre Riteau (priteau) wrote :

I am able to make the problem go away with this extra setting in neutron.conf:

[keystone_authtoken]
memcache_use_advanced_pool = True

This is the documentation for this setting:

# (Optional) Use the advanced (eventlet safe) memcached client pool. The
# advanced pool will only work under python 2.x. (boolean value)

This description dates from 2016. For now I haven't seen any issue enabling this setting with Python 3.

Radosław Piliszek (yoctozepto) wrote :

It affects keystonemiddleware but I guess the fix is needed in oslo.cache

Changed in keystonemiddleware:
status: New → Confirmed
Changed in oslo.cache:
status: New → Confirmed
Herve Beraud (herveberaud) wrote :

Hello,

If I correctly understood this top you say that the connections grow more than allowed by the given config, right?

Few weeks ago another bug was opened [1] and it was due to `flush_on_reconnect` that can cause exponential raising of connections to memcached servers.

IIRC this option was mostly introduced for keystone's.

The submitted patch [1] is moving flush_on_reconnect from code to oslo.cache config block to be configurable.

It could be worth to follow a bit this track, and maybe try to turn off flush_on_reconnect manually and then observe the behavior with your context.

So either you can edit the code to remove this option, or you may try to apply this patch [1] to disable it by using config.

Please let me know if it help you.

[1] https://review.opendev.org/#/c/742193/

Herve Beraud (herveberaud) wrote :

s/top/topic/

Radosław Piliszek (yoctozepto) wrote :

There is something linking this issue to https://bugs.launchpad.net/neutron/+bug/1864418 (neutron unable to run behind mod_wsgi). I sense a threading issue. Could be just me. :-)

Jeremy Stanley (fungi) wrote :

It looks like this may be the same as public security bug 1892852 and bug 1888394.

Changed in ossa:
status: New → Incomplete
information type: Public → Public Security
To post a comment you must log in.
This report contains Public Security information  Edit
Everyone can see this security related information.

Other bug subscribers