2022-10-24 03:11:10 |
Linda Guo |
bug |
|
|
added bug |
2022-10-24 03:11:51 |
Linda Guo |
description |
`ack-sel` action supports to filter out SEL entries older than a specific date. We found some hardware alerted the disk failure in ipmi-sensors rather than ipmi SEL, for example:
$ sudo ipmi-sensors |grep DISK12
149 | DISK12 | Drive Slot | N/A | N/A | 'Drive Presence' 'Predictive Failure'
hw-health charm currently doesn't support to ignore ipmi sensor entry on a unit, so IPMI alert could not be cleared unitl the hardware issue was fixed. There is `ipmi_check_options` in hw-health config, a sensor number can be ignored by setting something likes:
ipmi_check_options="-O --exclude-record-ids=50"
But this will apply to all hw-health units. It'd be better to add an action like 'ack-sensor <record-id>` to ignore the ipmi sensor record. |
`ack-sel` action supports to filter out SEL entries older than a specific date. We found some hardware alerted the disk failure in ipmi-sensors rather than ipmi SEL, for example:
$ sudo ipmi-sensors |grep DISK12
149 | DISK12 | Drive Slot | N/A | N/A | 'Drive Presence' 'Predictive Failure'
hw-health charm currently doesn't support to ignore ipmi sensor entry on a unit, so IPMI alert could not be cleared unitl the hardware issue was fixed. There is `ipmi_check_options` in hw-health config, a sensor number can be ignored by setting something likes:
ipmi_check_options="-O --exclude-record-ids=149"
But this will apply to all hw-health units. It'd be better to add an action like 'ack-sensor <record-id>` to ignore the ipmi sensor record. |
|
2022-10-24 03:12:31 |
Linda Guo |
summary |
add an action 'ack-sensor' |
add an action 'ack-sensor' to ignore bad sensor record |
|
2022-11-15 02:49:03 |
Linda Guo |
description |
`ack-sel` action supports to filter out SEL entries older than a specific date. We found some hardware alerted the disk failure in ipmi-sensors rather than ipmi SEL, for example:
$ sudo ipmi-sensors |grep DISK12
149 | DISK12 | Drive Slot | N/A | N/A | 'Drive Presence' 'Predictive Failure'
hw-health charm currently doesn't support to ignore ipmi sensor entry on a unit, so IPMI alert could not be cleared unitl the hardware issue was fixed. There is `ipmi_check_options` in hw-health config, a sensor number can be ignored by setting something likes:
ipmi_check_options="-O --exclude-record-ids=149"
But this will apply to all hw-health units. It'd be better to add an action like 'ack-sensor <record-id>` to ignore the ipmi sensor record. |
`ack-sel` action supports to filter out SEL entries older than a specific date. We found some hardware alerted the disk failure in ipmi-sensors rather than ipmi SEL, for example:
$ sudo ipmi-sensors |grep DISK12
149 | DISK12 | Drive Slot | N/A | N/A | 'Drive Presence' 'Predictive Failure'
hw-health charm currently doesn't support to ignore ipmi sensor entry on a unit, so IPMI alert could not be cleared unitl the hardware issue was fixed. If there is further hw failure, we won't be able to receive alert. There is `ipmi_check_options` in hw-health config, a sensor number can be ignored by setting something likes:
ipmi_check_options="-O --exclude-record-ids=149"
But this will apply to all hw-health units. It'd be better to add an action like 'ack-sensor <record-id>` to ignore the ipmi sensor record. |
|
2022-12-30 23:17:28 |
Andrea Ieri |
charm-hw-health: status |
New |
Won't Fix |
|