StarlingX

Bug #1833730
Comment #13

Comment 13 for bug 1833730

Revision history for this message

Bart Wensley (bartwensley) wrote on 2019-07-19:

#13

I have done some testing in the WP_1-2 lab where the issue was originally raised, with a load from July 16th. I am not able to reproduce the 12 minute "openstack server list" outage time. This is likely due to several fixes that have gone in since this LP was raised, including a fix to reduce the amount of time before pods are evicted when a node becomes unavailable.

However, I do see that the "openstack server list" command still fails for over a minute when the standby controller is rebooted and the nova-api-proxy pod is running on that controller. I did some testing with 2 replicas running for the nova-api-proxy (as per Tao's suggestion above) and this improves the time significantly - the "openstack server list" command now fails for about 30 seconds when the standby controller is rebooted. Note that there is always going to be some failure time when a standby AIO-DX controller is rebooted, because the MariaDB goes down on any AIO-DX controller reboot, due to the way we handle redundancy for MariaDB.

I am going to use this LP to make the change to replicate the nova-api-proxy.