mtcAgent and/or hwmond connecting to the BMC over process restart failed intermittently
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Alexander Kozyrev |
Bug Description
Brief Description
-----------------
An intermittent issue with the mtcAgent and/or hwmond connecting to the BMC over process restart triggered alarms, and which in turn failed several Sanity test cases. The alarms were in form of:
200.010 | controller-1 access to board management module has failed.
Severity
--------
Major
Steps to Reproduce
------------------
Swact the active controllers or force-reboot a node with running VMs
Expected Behavior
------------------
No alarms remained uncleared after 5 minutes
Actual Behavior
----------------
There were alarms remained uncleared after 5 minutes, e.g.:
200.010 | controller-0 access to board management module has failed
200.010 | controller-1 access to board management module has failed
Reproducibility
---------------
Intermittent
System Configuration
-------
found on Two node system, but may exist on other type of labs
Branch/Pull Time/Commit
-------
CentOS7.6
Timestamp/Logs
--------------
2019-02-27 22:44:47
tags: |
added: stx.2.0 removed: stx.2019.05 |
tags: | added: stx.retestneeded |
Marking as release gating; maybe related to code changes to introduce Barbican