AIO Plus computes don't get heartbeat enabled over a DOR
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Triaged
|
Low
|
Eric MacDonald |
Bug Description
Brief Description:
------------------
AIO Plus computes don't get heartbeat enabled over a DOR. When the Plus feature of AIO System was added maintenance was never retrofitted to handle the DOR case for the 'Plus' (compute) nodes.
Severity:
---------
Major: No heartbeat fault detection of AIO plus compute nodes following a Dead Office Recovery (DOR) until node is locked and unlocked.
Steps to Reproduce:
-------------------
Power off and then back on a fully unlocked enabled AIO Plus System ; all controllers and plus nodes.
Expected Behavior:
------------------
All nodes recover unlocked enabled with maintenance heartbeat monitoring
Actual Behavior:
----------------
All nodes recover but maintenance heartbeat is not enabled for plus (compute) nodes.
Reproducibility:
----------------
Reproducible 100% of the time.
System Configuration:
-------
AIO Plus system
Branch/Pull Time/Commit:
-------
BUILD_DATE=
Last Pass:
----------
Test Escape. There is no cli command to display the nodes that are being heartbeated. need to look at the hbsAgent logs, which is not convenient. Should consider adding a command for this.
Timestamp/Logs:
---------------
'sudo pkill -usr2 hbsAgent' and then look at the hbsAgent.log for the following
2021-12-
2021-12-
2021-12-
2021-12-
2021-12-
2021-12-
2021-12-
2021-12-
2021-12-
2021-12-
2021-12-
2021-12-
2021-12-
2021-12-
2021-12-
2021-12-
2021-12-
Alarms:
-------
None and that's part of the issue.
Test Activity:
--------------
Debug of other issue.
Workaround:
-----------
Lock and unlock compute hosts
Changed in starlingx: | |
assignee: | nobody → Eric MacDonald (rocksolidmtce) |
screening: stx.7.0 / medium - specific scenario/config related to AIO-DX+ and DOR; workaround exists. Should fix in the stx master branch, but not required for stx.6.0