AIO-DX: mtcAgent did not recover after power cycling both controllers
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Eric MacDonald |
Bug Description
Brief Description
-----------------
In AIO-DX Distributed Cloud system controller, after power off/on both system controller nodes, ssh connection lost for 50 mins.
Investigation of the above issue (bug 1868604) revealed that one of the failures was that controller-1 failed to go active because the mtcAgent can't get the cluster IP.
Severity
--------
Major
Steps to Reproduce
------------------
In Distributed Cloud, power off/on both (AIO-DX) system controller nodes, check ssh connection.
Expected Behavior
------------------
ssh connection should be resume after nodes boot up, within 5 mins
Actual Behavior
----------------
ssh re-connected in 50 mins
Reproducibility
---------------
Unknown - first time this is seen in sanity, will monitor
System Configuration
-------
DC system (AIO-DX system controller)
Lab-name: DC-3
Branch/Pull Time/Commit
-------
2020-03-20_00-10-00
Last Pass
---------
Last passed on same system with following load:
Load: 2020-03-14_04-10-00
Timestamp/Logs
--------------
See bug 1868604
Test Activity
-------------
Sanity
Changed in starlingx: | |
assignee: | nobody → Eric MacDonald (rocksolidmtce) |
tags: | added: stx.metal |
Changed in starlingx: | |
status: | Triaged → In Progress |
stx.4.0 / medium priority - issue w/ dead office recovery