Comment 15 for bug 1405549

Revision history for this message
Boris Bobrov (bbobrov) wrote :

One of our suspects is a big number of keystone processes.

The information about dead hosts is not shared between processes. This results in the following workflow:

We have 3 controllers, each controller has, say, 6 keystone processes. Overall 18 keystones.

1. User goes to horizon, provides his credentials and presses "sign-in";
2. Horizon sends ~10 queries to keystone (this can be seen from Keystone logs), each query depends on the previous one.
3. Query 1 goes to keystone-1;
4. Keystone-1 doesn't know yet about dead memcache host. It probes that host, timeouts after 3 seconds and marks the host as dead. Overall time spent to sign-in is 3 seconds by now;
5. Horizon receives reply from keystone and makes another query;
6. Most likely, this request goes to keystone-2. keystone-2, being a separate process, doesn't know yet about dead memcache host. It probes that host, timeouts after 3 seconds and marks the host as dead. This results in 3 more seconds of time spent to sign-in, 6 seconds overall.
7. Horizon receives reply from keystone-2 and makes another query.
8. Most likely, this request goes to keystone-3. keystone-3, being a separate process, doesn't know yet about dead memcache hosts. [...] 9 seconds overall.

So, signing in takes 10 queries to keystone, each takes 3 seconds, 30 seconds overall.

What do you folks think?