when trying to load the project>instance page, if the total amount of time loading data takes more than 26 seconds, the page enters a reload loop until the browser times out in 5 minutes.
The 26 seconds number was obtained in the following way:
1) 5 minute browser timeout was observed when trying to load the page
2) logs were inspected and noticed that some queries were taking very long, like glance ~12 secs, neutron ~8 seconds, etc. Queries to nova take at most 3 seconds.
3) in a separate env with zero resources where it would load instantly, I added a time.sleep in the api/glance.py file when invoking glance for images (glance is invoked multiple times when loading the instances page). Sleeping 14 seconds times out on 5 minutes, but sleeping 13 seconds does not timeout and loads quickly. When it times out with 14 seconds, I tailed the logs and noticed that the same group of requests were being repeated for a while, always starting with the flavors request. With the 13 seconds sleep the requests would not repeat.
4) Removed the sleep from the api/glance.py file and added a sleep of 26 secs in the project/instances/views.py file get_data method right after
With 26 seconds sleep it does not timeout nor repeat the requests, the page loads fine. But with 27 seconds sleep it times out on 5 minutes and keeps repeating the requests on the logs.
My conclusion is that the get_data method does not tolerate taking longer than 26 seconds to finish loading the page, and "reloads" itself, entering a loop that never finished if the page cannot be loaded in less than 26 seconds.
Ideally this internal timeout that causes a reload loop should be configurable and more tolerant by default.
Focal-ussuri customer env with lots of resources.
when trying to load the project>instance page, if the total amount of time loading data takes more than 26 seconds, the page enters a reload loop until the browser times out in 5 minutes.
The 26 seconds number was obtained in the following way:
1) 5 minute browser timeout was observed when trying to load the page instances/ views.py file get_data method right after
2) logs were inspected and noticed that some queries were taking very long, like glance ~12 secs, neutron ~8 seconds, etc. Queries to nova take at most 3 seconds.
3) in a separate env with zero resources where it would load instantly, I added a time.sleep in the api/glance.py file when invoking glance for images (glance is invoked multiple times when loading the instances page). Sleeping 14 seconds times out on 5 minutes, but sleeping 13 seconds does not timeout and loads quickly. When it times out with 14 seconds, I tailed the logs and noticed that the same group of requests were being repeated for a while, always starting with the flavors request. With the 13 seconds sleep the requests would not repeat.
4) Removed the sleep from the api/glance.py file and added a sleep of 26 secs in the project/
image_dict, flavor_dict, volume_dict = futurist_ utils.call_ functions_ parallel( self._get_ images, self._get_flavors, self._get_volumes)
With 26 seconds sleep it does not timeout nor repeat the requests, the page loads fine. But with 27 seconds sleep it times out on 5 minutes and keeps repeating the requests on the logs.
My conclusion is that the get_data method does not tolerate taking longer than 26 seconds to finish loading the page, and "reloads" itself, entering a loop that never finished if the page cannot be loaded in less than 26 seconds.
Ideally this internal timeout that causes a reload loop should be configurable and more tolerant by default.