Pmond seen to leave stuck 200.006 process failure alarm if clear attempt fails
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Low
|
Eric MacDonald |
Bug Description
The Maintenance Process Monitor (pmond) tries to clear any existing process failure alarms on process startup.
If the system Fault Manager (FM) is not running when the Process Monitor starts up and tries to query/clear existing alarms and gets a failure to do so then any existing 200.006 process failure alarms that exists remain stuck asserted until same process fails and recovers or the pmond process itself restarts again while FM is running.
Severity: Minor
Impact: Stuck alarm has no service impact and can be manually deleted or auto corrected by locking and unlocking the host.
Steps to Reproduce
------------------
Kill pmond monitored process and allow alarm to get asserted
Kill FM and prevent it from restarting.
Kill pmond and allow failed monitored process to recover while FM is down
pmond is unable to clear alarm so it gets stuck.
Expected Behavior: Alarm should be cleared
Actual Behavior: Alarm is stuck asserted
Reproducibility: 100% with the described (above) conditions met.
System Configuration: Any
Branch/Pull Time/Commit: Aug 25, 2020
Last Pass: Never seen. Test Escape
Timestamp/Logs: Issue understood. No logs required
Test Activity: Robustness testing
Workaround: Manually delete alarm or lock/unlock host
stx.5.0 / low priority - stuck alarm; minor impact and workaround exists.