Acknowledge old SEL entries without clearing them

Bug #1901735 reported by Andrea Ieri
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
hw-health-charm
Fix Released
High
Unassigned

Bug Description

The current IPMI check alerts whenever there are entries in the SEL, regardless of their age. This forces us to clear the SEL whenever issues have been fixed, which prevents us from using the SEL as a historic record of previous failures.

I have filed https://github.com/thomas-krenn/check_ipmi_sensor_v3/issues/28 to address this in the upstream check_ipmi_sensor code, but on second thought this could also be idiomatically handled within the charm code.

I would suggest we implement the an ack-sel action that would perform the following:
* store the value of a 'date' parameter in the unit-state db. If the date parameter is not specified, it defaults to now
* trigger a config-changed hook

The config-changed hook would then need to be extended to:
* read the stored date threshold parameter
* append to the check_ipmi_sensor command line --seloptions '--date-range=$date-now'

The above would effectively make the ipmi check stateful, and only alert on SEL entries that are more recent than the last time the ack-sel action is run.
If the action is never run (or is run with a very old date value), the behavior remains unchanged.

Related branches

Linda Guo (lihuiguo)
Changed in charm-hw-health:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Linda Guo (lihuiguo) wrote :

cron_ipmi_sensors.py encapsulated the check_ipmi_sensor command line.

cron_ipmi_sensors.py is executed by cron job /etc/cron.d/hwhealth_ipmi. 'ack-sel' action needs to trigger a config-changed hook to read the 'date' parameter and update the cron job command line option

Changed in charm-hw-health:
assignee: nobody → Facundo Ciccioli (fandanbango)
Xav Paice (xavpaice)
Changed in charm-hw-health:
status: Confirmed → Fix Committed
assignee: Facundo Ciccioli (fandanbango) → nobody
milestone: none → 21.04
Celia Wang (ziyiwang)
Changed in charm-hw-health:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.