OAM Floating IP wasn't accessible after the initial controller swact
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Won't Fix
|
Low
|
John Kung |
Bug Description
Brief Description
-----------------
Initially controller-0 was ACTIVE
When the OAM Floating IP was changed, the new floating IP was not accessible until we make controller-0 back ACTIVE again. Ideally after OAM Floating IP change, we lock/unlock the standby (Controller-1) & then swact Controller-0 to make Controller-1 ACTIVE. After this I would lock & unlock the new standby controller (controller-0). At this stage, I would expect the new floating IP to be accessible even if Controller-1 as ACTIVE controller. This wouldn’t be successful. Only when I change back Controller-0 to ACTIVE again, only then the new floating IP would be accessible
Behavior is irrespective of change through GUI or CLI
Severity
--------
Major - can't access the floating IP
Steps to Reproduce
-------------------
1) Controller-0 ACTIVE and Controller-1 STANDBY
2) Change the OAM Floating IP through GUI or CLI
[NOTE] On my standard Lab I interchanged controller-0 fixed IP as new floating IP & the old floating IP as controller-0 fixed IP
2) Lock & Unlock STANDBY controller-1
3) Swact controller-0 to make controller-1 as ACTIVE
4) Lock & Unlock STANDBY controller-0
5) Expect new floating IP to be accessible via GUI or CLI
[NOTE] The ping to the new floating IP works but neither SSH nor http works.
6) performed additional lock/unlock of controller-0 (but still new floating IP wouldn't be successful)
7) swact back controller-1 to make controller-0 as ACTIVE
8) The new floating IP would be accessible
Expected Behavior
------------------
Expect the new floating IP to be accessible even after step [5] per above
Actual Behavior
------------------
The new floating IP would be accessible only after step [7] above.
Reproducibility
------------------
Tested once so far
System Configuration
------------------
standard with dedicated storage
Branch/Pull Time/Commit
-------
STX-BUILD-ID = 2019-10-17_20-00-00
Last Pass
---------
Didn't verify on the previous builds
Timestamp/Logs
--------------
1.Logs Attached
2.Brief Timeline
# The Initial IP assigned were like below *before* starting of this test
128.224.151.243 (floating)
128.224.151.244 (c0) - Active
128.224.150.205 (c1) - Standby
# Changed to (from GUI)
128.224.151.243 (c0)
128.224.151.244 (floating)
128.224.150.205 (c1) - Standby
# From horizon.log
2019-10-25 05:38:22,572 [INFO] horizon.
Locked & Unlocked C1
2019-10-25 05:40:11,788 [INFO] starlingx_
Swact to make C1 active
2019-10-25 05:49:17,415 [INFO] starlingx_
Locked C0 - [NOTE] At this time C1 went to config_out_of_date again and the new floating IP was NOT accessible
Then logged in to C1 (new Active) by its original IP (128.224.150.205) and issued a system host-unlock on controller-0. controller-0 unlocked successfully but still the new floating IP 128.224.151.244 wasn't accessible. However 128.224.151.244 is reachable ( a ping works & but https or ssh will not work)
[2019-10-25 11:56.51] ~
[VVeldand.
ssh_exchange_
─
[2019-10-25 11:58.21] ~
[VVeldand.
Pinging 128.224.151.244 with 32 bytes of data:
Reply from 128.224.151.244: bytes=32 time=543ms TTL=58
Reply from 128.224.151.244: bytes=32 time=526ms TTL=58
Reply from 128.224.151.244: bytes=32 time=462ms TTL=58
However on GUI, I see the new floating IP is reflected.
No specific alarms related to config_out_of_date were noted at this time
However there was an ALARM related to "openstack" application failure that has been "set"
| 750.002 | Application Apply Failure | k8s_application
| | | openstack | | 50:12.486414 |
[sysadmin@
+------
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+------
| 750.002 | Application Apply Failure | k8s_application
| | | openstack | | 50:12.486414 |
| | | | | |
| 200.001 | compute-0 was administratively locked to take it out-of-service. | host=compute-0 | warning | 2019-10-23T19: |
| | | | | 31:12.233918 |
GUI Shows the new IP
<attached the image>
# Additional Lock/Unlock performed on controller-0
# Then, I swacted back controller-1 to controller-0 (c0 will be the new Active Controller)
2019-10-
Success: Then I was able to connect to both SSH & Horizon using the new Floating IP 128.224.151.244
To start the test with CLI, I assigned back the original floating IP as below
[sysadmin@
+------
| Property | Value |
+------
| created_at | 2019-10-
| isystem_uuid | 36ee8ab9-
| oam_c0_ip | 128.224.151.244 |
| oam_c1_ip | 128.224.150.205 |
| oam_floating_ip | 128.224.151.243 |
| oam_gateway_ip | 128.224.150.1 |
| oam_subnet | 128.224.150.0/23 |
| updated_at | None |
| uuid | 8104824e-
+------
[2019-10-25 13:15.16] ~
[VVeldand.
ssh_exchange_
## Only after making the controller-0 back ACTIVE again, the new floating IP would be accessible.
Logs attached