Activity log for bug #1993977

Date Who What changed Old value New value Message
2022-10-24 03:11:10 Linda Guo bug added bug
2022-10-24 03:11:51 Linda Guo description `ack-sel` action supports to filter out SEL entries older than a specific date. We found some hardware alerted the disk failure in ipmi-sensors rather than ipmi SEL, for example: $ sudo ipmi-sensors |grep DISK12 149 | DISK12 | Drive Slot | N/A | N/A | 'Drive Presence' 'Predictive Failure' hw-health charm currently doesn't support to ignore ipmi sensor entry on a unit, so IPMI alert could not be cleared unitl the hardware issue was fixed. There is `ipmi_check_options` in hw-health config, a sensor number can be ignored by setting something likes: ipmi_check_options="-O --exclude-record-ids=50" But this will apply to all hw-health units. It'd be better to add an action like 'ack-sensor <record-id>` to ignore the ipmi sensor record. `ack-sel` action supports to filter out SEL entries older than a specific date. We found some hardware alerted the disk failure in ipmi-sensors rather than ipmi SEL, for example:  $ sudo ipmi-sensors |grep DISK12  149 | DISK12 | Drive Slot | N/A | N/A | 'Drive Presence' 'Predictive Failure' hw-health charm currently doesn't support to ignore ipmi sensor entry on a unit, so IPMI alert could not be cleared unitl the hardware issue was fixed. There is `ipmi_check_options` in hw-health config, a sensor number can be ignored by setting something likes: ipmi_check_options="-O --exclude-record-ids=149" But this will apply to all hw-health units. It'd be better to add an action like 'ack-sensor <record-id>` to ignore the ipmi sensor record.
2022-10-24 03:12:31 Linda Guo summary add an action 'ack-sensor' add an action 'ack-sensor' to ignore bad sensor record
2022-11-15 02:49:03 Linda Guo description `ack-sel` action supports to filter out SEL entries older than a specific date. We found some hardware alerted the disk failure in ipmi-sensors rather than ipmi SEL, for example:  $ sudo ipmi-sensors |grep DISK12  149 | DISK12 | Drive Slot | N/A | N/A | 'Drive Presence' 'Predictive Failure' hw-health charm currently doesn't support to ignore ipmi sensor entry on a unit, so IPMI alert could not be cleared unitl the hardware issue was fixed. There is `ipmi_check_options` in hw-health config, a sensor number can be ignored by setting something likes: ipmi_check_options="-O --exclude-record-ids=149" But this will apply to all hw-health units. It'd be better to add an action like 'ack-sensor <record-id>` to ignore the ipmi sensor record. `ack-sel` action supports to filter out SEL entries older than a specific date. We found some hardware alerted the disk failure in ipmi-sensors rather than ipmi SEL, for example:  $ sudo ipmi-sensors |grep DISK12  149 | DISK12 | Drive Slot | N/A | N/A | 'Drive Presence' 'Predictive Failure' hw-health charm currently doesn't support to ignore ipmi sensor entry on a unit, so IPMI alert could not be cleared unitl the hardware issue was fixed. If there is further hw failure, we won't be able to receive alert. There is `ipmi_check_options` in hw-health config, a sensor number can be ignored by setting something likes: ipmi_check_options="-O --exclude-record-ids=149" But this will apply to all hw-health units. It'd be better to add an action like 'ack-sensor <record-id>` to ignore the ipmi sensor record.
2022-12-30 23:17:28 Andrea Ieri charm-hw-health: status New Won't Fix