Comment 12 for bug 1621541

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Eugene is absolutely right: the problem is that two of three memcached servers are down and we wait $socket_timeout to understand that, as the node is down and can't reply with 'connection refused'.

Please note the timings:

root@node-2:~# time nova list
+----+------+--------+------------+-------------+----------+
| ID | Name | Status | Task State | Power State | Networks |
+----+------+--------+------------+-------------+----------+
+----+------+--------+------------+-------------+----------+

real 0m8.651s
user 0m0.590s
sys 0m0.109s

http://paste.openstack.org/show/569331/

^ as you can see in haproxy logs both Keystone and Nova slowed down (as they both use memcached: Keystone uses it directly and in nova-api it's keystone_authmiddleware which stores validated tokens in memcached).

strace'ing nova-api allows to see that we try to connect to a memcached server and wait $socket_timeout (3s by default):

http://paste.openstack.org/show/569327/

until we finally find a server thats works.

The memcache client we use allows to set $socket_timeout and $dead_retry (for how long to ignore this particular memcache server if we could not connect to it) values, but this is done per process and we run multiple forks of nova-api and keystone to handle concurrent requests, thus you'll need to "warm up" those first. When you do, you'll see, that nova-api responds quickly:

http://paste.openstack.org/show/570138/