[upgrade_levels]compute=auto grinds the API response times when a cell is down
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Confirmed
|
Medium
|
Unassigned |
Bug Description
A lot of my notes are in https:/
To simulate a down cell, I changed the database_connection value for the cell1 cell to be an invalid IP (192.0.0.1) and then restarted <email address hidden>.
With the default configs in devstack, the service was hanging trying to respond to a simple GET / request to list versions. It looks like the problem is because each nova.compute.
This is a snip of the API log while waiting for the GET / response:
http://
As a result I got this unhelpful client side error:
http://
I know that's where the failure was because I was also getting this:
Feb 13 00:09:57 downcell <email address hidden>[14623]: DEBUG nova.compute.rpcapi [None req-53ebccae-
The minimum nova-compute service version isn't getting cached in nova-api if running under uwsgi anyway for which I reported bug 1815692.
The way I worked around the issue was by setting [upgrade_
Also note the default database max_attempts and retry_interval are 10 which means for each API object created that hits this, it's going to take 100 seconds to timeout per route handler per API worker. I count 31 route handlers that create an API object, so that's by default 3100 seconds or about ~52 minutes per worker on startup.
Thanks for reporting this, I do change my max_retries value in devstack during testing to avoid long waits.