IPMI sensors alert on Add-On slots missing cards

Bug #1891915 reported by Drew Freiberger on 2020-08-17
This bug affects 1 person
Affects Status Importance Assigned to Milestone

Bug Description

It was discovered that the updated ipmi checks appear to alert on Presence sensor "Entity Absent".

When investigating this issue, I found that the presence sensor was checking on all PCI slots whether there was a card present or not.

As an example:

This line created an error on all units of a cloud that didn't have PCI slot 3 filled with a card (which is a valid config):
ubuntu@os-s003:/etc/cron.d$ sudo /usr/sbin/ipmi-sensors -r 50
ID | Name | Type | Reading | Units | Event
50 | Presence | Entity Presence | N/A | N/A | 'Entity Absent'
ubuntu@os-s003:/etc/cron.d$ sudo /usr/sbin/ipmi-sensors -r 50 --entity-sensor-names
ID | Name | Type | Reading | Units | Event
50 | Add-in Card 3 Presence | Entity Presence | N/A | N/A | 'Entity Absent'

This is a present card and is not alerting:
ubuntu@os-s003:/etc/cron.d$ sudo /usr/sbin/ipmi-sensors -r 49 --entity-sensor-names
ID | Name | Type | Reading | Units | Event
49 | Add-in Card 1 Presence | Entity Presence | N/A | N/A | 'Entity Present'

This is able to be ignored as a workaround with:

ipmi_check_options="-O --exclude-record-ids=50"

This specifically tells the ipmi-sensors command to ignore sensor number 50. This works, but maybe "Entity Absent" reports on Presence sensors should be automatically binned with "--ignore-not-available-sensors" and ignored by the ipmi scripts. This unfortunately is a catch-22 as getting notification that a fan or power sensor missing is important, but missing an Add-in Card is not necessarily critical.

Drew Freiberger (afreiberger) wrote :

The code may need to add "--noentityabsent" to avoid false positives across all environments for pci slots missing cards.

Jeremy Lounder (jldev) on 2020-08-18
Changed in charm-hw-health:
importance: Undecided → High
Changed in charm-hw-health:
assignee: nobody → David O Neill (dmzoneill)
status: New → In Progress
Changed in charm-hw-health:
assignee: David O Neill (dmzoneill) → nobody
status: In Progress → New
Xav Paice (xavpaice) wrote :

it's possible to work around this with:

juju config hw-health ipmi_check_options='--noentityabsent'

Jose Guedez (jfguedez) on 2021-02-15
Changed in charm-hw-health:
status: New → Triaged
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers