Worker node is stuck in degraded yet there is no alarm
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Low
|
Eric MacDonald |
Bug Description
Brief Description
-----------------
While the system was idle, a worker node became degraded and stayed in this state for a while without any alarms.
Severity
--------
Major
Steps to Reproduce
------------------
Soak a standard system with a large number app pods
Note: this system exhibits periodic activation of MNFA (https:/
Expected Behavior
------------------
Worker node should regain "available" state as soon as its heartbeat becomes regular.
Actual Behavior
----------------
Most nodes show the expected behavior but occasionally a node would get stuck in degraded state until the next MNFA is activated.
Reproducibility
---------------
Intermittent
System Configuration
-------
IPv6 standard system
Branch/Pull Time/Commit
-------
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID=
SRC_BUILD_ID="15"
Last Pass
---------
N/A - this appears to be a race condition
Timestamp/Logs
--------------
See logs attached
Test Activity
-------------
System Test
minor (missing alarm), but would be nice to fix if not risky