Comment 0 for bug 2045168

Revision history for this message
Rodrigo Barbieri (rodrigo-barbieri2010) wrote :

Focal-ussuri customer env with lots of resources.

when trying to load the project>instance page, if the total amount of time loading data takes more than 26 seconds, the page enters a reload loop until the browser times out in 5 minutes.

The 26 seconds number was obtained in the following way:

1) 5 minute browser timeout was observed when trying to load the page
2) logs were inspected and noticed that some queries were taking very long, like glance ~12 secs, neutron ~8 seconds, etc. Queries to nova take at most 3 seconds.
3) in a separate env with zero resources where it would load instantly, I added a time.sleep in the api/glance.py file when invoking glance for images (glance is invoked multiple times when loading the instances page). Sleeping 14 seconds times out on 5 minutes, but sleeping 13 seconds does not timeout and loads quickly. When it times out with 14 seconds, I tailed the logs and noticed that the same group of requests were being repeated for a while, always starting with the flavors request. With the 13 seconds sleep the requests would not repeat.
4) Removed the sleep from the api/glance.py file and added a sleep of 26 secs in the project/instances/views.py file get_data method right after

image_dict, flavor_dict, volume_dict = futurist_utils.call_functions_parallel(self._get_images, self._get_flavors, self._get_volumes)

With 26 seconds sleep it does not timeout nor repeat the requests, the page loads fine. But with 27 seconds sleep it times out on 5 minutes and keeps repeating the requests on the logs.

My conclusion is that the get_data method does not tolerate taking longer than 26 seconds to finish loading the page, and "reloads" itself, entering a loop that never finished if the page cannot be loaded in less than 26 seconds.

Ideally this internal timeout that causes a reload loop should be configurable and more tolerant by default.