IPMI sensors alert on Add-On slots missing cards

Bug #1891915 reported by Drew Freiberger
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
hw-health-charm
Won't Fix
Low
Unassigned

Bug Description

It was discovered that the updated ipmi checks appear to alert on Presence sensor "Entity Absent".

When investigating this issue, I found that the presence sensor was checking on all PCI slots whether there was a card present or not.

As an example:

This line created an error on all units of a cloud that didn't have PCI slot 3 filled with a card (which is a valid config):
ubuntu@os-s003:/etc/cron.d$ sudo /usr/sbin/ipmi-sensors -r 50
ID | Name | Type | Reading | Units | Event
50 | Presence | Entity Presence | N/A | N/A | 'Entity Absent'
ubuntu@os-s003:/etc/cron.d$ sudo /usr/sbin/ipmi-sensors -r 50 --entity-sensor-names
ID | Name | Type | Reading | Units | Event
50 | Add-in Card 3 Presence | Entity Presence | N/A | N/A | 'Entity Absent'

This is a present card and is not alerting:
ubuntu@os-s003:/etc/cron.d$ sudo /usr/sbin/ipmi-sensors -r 49 --entity-sensor-names
ID | Name | Type | Reading | Units | Event
49 | Add-in Card 1 Presence | Entity Presence | N/A | N/A | 'Entity Present'

This is able to be ignored as a workaround with:

ipmi_check_options="-O --exclude-record-ids=50"

This specifically tells the ipmi-sensors command to ignore sensor number 50. This works, but maybe "Entity Absent" reports on Presence sensors should be automatically binned with "--ignore-not-available-sensors" and ignored by the ipmi scripts. This unfortunately is a catch-22 as getting notification that a fan or power sensor missing is important, but missing an Add-in Card is not necessarily critical.

Tags: bseng-481
Revision history for this message
Drew Freiberger (afreiberger) wrote :

The code may need to add "--noentityabsent" to avoid false positives across all environments for pci slots missing cards.

Jeremy Lounder (jldev)
Changed in charm-hw-health:
importance: Undecided → High
Changed in charm-hw-health:
assignee: nobody → David O Neill (dmzoneill)
status: New → In Progress
Changed in charm-hw-health:
assignee: David O Neill (dmzoneill) → nobody
status: In Progress → New
Revision history for this message
Xav Paice (xavpaice) wrote :

it's possible to work around this with:

juju config hw-health ipmi_check_options='--noentityabsent'

Jose Guedez (jfguedez)
Changed in charm-hw-health:
status: New → Triaged
Eric Chen (eric-chen)
tags: added: bseng-481
Revision history for this message
Andrea Ieri (aieri) wrote :

Reprioritizing as low since we have a workaround (comment #2)

Changed in charm-hw-health:
importance: High → Low
Revision history for this message
Eric Chen (eric-chen) wrote :

This charm is no longer being actively maintained. Please consider using the new hardware-observer-operator instead. (https://github.com/canonical/hardware-observer-operator)
This issue is not critical, therefore, I mark it as "won't fix"

Changed in charm-hw-health:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.