Lack of free connections to memcached cause keystone middleware to stall
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
fuel-ccp |
Fix Released
|
High
|
Fuel CCP Bug Team |
Bug Description
Deployment: bare-metal, 200 nodes (http://
Software: Ubuntu 16.04, k8s by Kargo, services deployed in HA configuration (3 replicas), memcached on a separate node.
*******
It's observed that duration of requests to API services is distributed in following manner: 2/3 last 10-20ms, 1/3 last slightly more than 3 seconds (see chart attached). Even if the same request repeated many times it may finish fast or last long. The issue even affects requests that do not do any DB calls:
vagrant@node1:~$ time curl -g -iv -X GET http://
Note: Unnecessary use of -X or --request, GET is already inferred.
* Trying 10.224.233.196...
* Connected to neutron-server.ccp (10.224.233.196) port 9696 (#0)
> GET /v2.0/ HTTP/1.1
> Host: neutron-
> User-Agent: fpython-
> Accept: application/json
> X-Auth-Token: gAAAAABYYr8WzhW
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Content-Type: application/json
Content-Type: application/json
< Content-Length: 536
Content-Length: 536
< X-Openstack-
X-Openstack-
< Date: Tue, 27 Dec 2016 20:10:51 GMT
Date: Tue, 27 Dec 2016 20:10:51 GMT
<
* Connection #0 to host neutron-server.ccp left intact
{"resources": [{"links": [{"href": "http://
real 0m3.069s
user 0m0.012s
sys 0m0.004s
*******
In the logs there is a gap in 3 seconds:
2016-12-27 20:11:04.488 37 DEBUG keystonemiddlew
ddleware/
2016-12-27 20:11:07.516 37 DEBUG keystoneauth.
7b4c3e0bb65cb1f
/var/lib/
The issue affects not only Neutron but any projects using Keystone middleware.
*******
Part of API service config:
[keystone_
auth_uri = http://
auth_url = http://
auth_type = password
project_domain_id = default
user_domain_id = default
project_name = service
username = neutron
password = password
memcached_servers = memcached.
Changed in fuel-ccp: | |
status: | New → Triaged |
importance: | Undecided → High |
assignee: | nobody → Fuel CCP Bug Team (fuel-ccp-bugs) |
Keystone middleware uses memached-based cache. However memcached client is threading.local, meaning it will be not shared between different greenlets. Client pool size is not limited (https:/ /github. com/openstack/ keystonemiddlew are/blob/ stable/ newton/ keystonemiddlew are/auth_ token/_ cache.py# L71-L86) and grow infinitely. Taking into account that there are many workers (default to number of cores =48) and number of greenlets it is very easy to go beyond default connection limit of memcached (=1024 https:/ /github. com/memcached/ memcached/ wiki/Configurin gServer# connection- limit).
When the connection limit is reached memcached stops accepting new connections. From client side the attempt lasts for 3 seconds (https:/ /github. com/linsomniac/ python- memcached/ blob/master/ memcache. py#L106), after that the connection fails, token is not retrieved and middleware goes directly to Keystone (this is observed in logs).
Memcached monitors number of rejected connections (the number if far beyond the limit):
root@memcached- 288280083- gadd7:/ # echo "stats" | nc localhost 11211
...
STAT listen_disabled_num 38200