Acknowledge old SEL entries without clearing them
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
hw-health-charm |
Fix Released
|
High
|
Unassigned |
Bug Description
The current IPMI check alerts whenever there are entries in the SEL, regardless of their age. This forces us to clear the SEL whenever issues have been fixed, which prevents us from using the SEL as a historic record of previous failures.
I have filed https:/
I would suggest we implement the an ack-sel action that would perform the following:
* store the value of a 'date' parameter in the unit-state db. If the date parameter is not specified, it defaults to now
* trigger a config-changed hook
The config-changed hook would then need to be extended to:
* read the stored date threshold parameter
* append to the check_ipmi_sensor command line --seloptions '--date-
The above would effectively make the ipmi check stateful, and only alert on SEL entries that are more recent than the last time the ack-sel action is run.
If the action is never run (or is run with a very old date value), the behavior remains unchanged.
Related branches
- 🤖 prod-jenkaas-bootstack (community): Needs Fixing (continuous-integration)
- Joe Guo (community): Needs Fixing
- BootStack Reviewers: Pending requested
-
Diff: 490 lines (+305/-10)9 files modified.gitmodules (+1/-1)
src/README.md (+41/-0)
src/actions.yaml (+10/-0)
src/actions/ack-sel (+1/-0)
src/actions/actions.py (+60/-1)
src/actions/unack-sel (+1/-0)
src/lib/hwhealth/tools.py (+44/-8)
src/tests/functional/test_hwhealth.py (+94/-0)
src/tests/unit/test_ipmi_sensor.py (+53/-0)
- James Troup (community): Needs Fixing
-
Diff: 213 lines (+100/-12)5 files modifiedsrc/README.md (+14/-0)
src/actions.yaml (+7/-0)
src/actions/ack-sel (+1/-0)
src/actions/actions.py (+69/-11)
src/lib/hwhealth/tools.py (+9/-1)
Changed in charm-hw-health: | |
status: | New → Confirmed |
importance: | Undecided → High |
Changed in charm-hw-health: | |
assignee: | nobody → Facundo Ciccioli (fandanbango) |
Changed in charm-hw-health: | |
status: | Confirmed → Fix Committed |
assignee: | Facundo Ciccioli (fandanbango) → nobody |
milestone: | none → 21.04 |
Changed in charm-hw-health: | |
status: | Fix Committed → Fix Released |
cron_ipmi_ sensors. py encapsulated the check_ipmi_sensor command line.
cron_ipmi_ sensors. py is executed by cron job /etc/cron. d/hwhealth_ ipmi. 'ack-sel' action needs to trigger a config-changed hook to read the 'date' parameter and update the cron job command line option