Openstack horizon not available on floating IP/active node's IP during reboot/shutdown of standby node for 4-5 mins while testing HA

Bug #1852395 reported by Akshay
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Won't Fix
Low
chen haochuan

Bug Description

Brief Description
-----------------
Setup: I have deployed Bare Metal StarlingX R2 duplex mode.

Test Case: While testing HA, I tested a case in which I simply rebooted/switched off the standby node.

Issue: But when I tried to access the OpenStack horizon on floating IP or active node's IP, the horizon was unavailable for 4-5 mins from as soon as I rebooted/switched off the standby node.

I tried this case many times with same result.
Is it the expected behavior ? If not, please guide me to find the real issue.

Severity
--------

Critical

Steps to Reproduce
------------------
1. Deploy Bare Metal StarlingX R2 duplex mode.
2. Reboot/Switch off standby node.
3. Access OpenStack horizon.

Expected Behavior
------------------
Horizon should be always available on floating IP at least.

Actual Behavior
----------------
Horizon becomes unavailable for 4-5 mins.

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Two node system

Last Pass
---------
NO

Revision history for this message
Akshay (yadavakshay58) wrote :

Also if it is the expected behavior, then by which reasons it is taking this much time ? Is it like all OpenStack service corresponding pods/containers re-initiates or what ?

Revision history for this message
ANIRUDH GUPTA (anyrude10) wrote :

Please find below the behavior I have observed corresponding to this issue.

When both Controller Nodes are up and running, I can see 2 Pods of each service in my system

controller-0:~$ kubectl get po -n openstack | grep glance
glance-api-55fd4664c5-8jgpn 1/1 Running 0 23h
glance-api-55fd4664c5-rklv6 1/1 Running 0 89m

Case 1: When one Controller is Rebooted

One of the Pod remains as it is in Running State.
The pod on the node which goes down, goes in Terminating State and another Pod start getting Creating, which is currently in Pending State

controller-0:~$ kubectl get po -n openstack | grep glance
glance-api-55fd4664c5-8jgpn 1/1 Running 0 23h
glance-api-55fd4664c5-d42kg 0/1 Pending 0 49s
glance-api-55fd4664c5-rklv6 1/1 Terminating 0 92m

Once the system reboots successfully, There are again only 2 Running Pods

Case 2: When One Controller is Poweroff

One of the Pod remains as it is in Running State.
The pod on the node which goes down, goes in Terminating State and another Pod start getting Creating, which is currently in Pending State.

controller-0:~$ kubectl get po -n openstack | grep glance
glance-api-55fd4664c5-8jgpn 1/1 Running 0 23h
glance-api-55fd4664c5-d42kg 0/1 Pending 0 49s
glance-api-55fd4664c5-rklv6 1/1 Terminating 0 92m

It remains in this situation only.

In both the cases, all the services on running node also are not accessible for around 4-5 mins.

yong hu (yhu6)
tags: added: stx.2.0
tags: added: stx.distro.openstack
Revision history for this message
ANIRUDH GUPTA (anyrude10) wrote :

Hi Yong,

As discussed in yesterday's distro openstack call, I am sharing the "collect" logs of Controller-0 which is currently active, when Standby controller is rebooted.

Test Scenario:

Controller-0 is Active and Controller-1 is in StandBy.

When StandBy node Controller-1 is rebooted, even then Active Controller Services gets stopped for around 4-5 mins.
I am unable to access any Openstack Service and even the Horizon is not accessible neither from Controller-0 OAM IP nor from Floating IP.

Changed in starlingx:
assignee: nobody → chen haochuan (martin1982)
Revision history for this message
chen haochuan (martin1982) wrote :

not reproduced on latest image. will check r2 release

yong hu (yhu6)
Changed in starlingx:
importance: Undecided → Low
Revision history for this message
yong hu (yhu6) wrote :

This LP is similar to https://bugs.launchpad.net/starlingx/+bug/1855474
the root cause was "mariadb-server" pods were not timely recovered.
Before the solution is worked for 3.x maintenance release, pls do NOT run this "rebooting controller" tests.

Ghada Khalil (gkhalil)
Changed in starlingx:
status: New → Triaged
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Closing as stx.2.0 is EOL as of Aug 2020

Changed in starlingx:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.