hw-health reporting memory error on healthy system

Bug #1905523 reported by Michał Ajduk
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
hw-health-charm
Won't Fix
Medium
Unassigned

Bug Description

hw-health reports Nagios memory error on a health system.

Root cause:
/usr/sbin/ipmimonitoring tool used reports:
122 | Mem_Stat_C01S03 | Memory | Critical | N/A | N/A | 'Presence detected' 'Configuration error'
123 | Mem_Stat_C01S04 | Memory | Critical | N/A | N/A | 'Presence detected' 'Configuration error'
130 | Mem_Stat_C01S11 | Memory | Critical | N/A | N/A | 'Presence detected' 'Configuration error'
131 | Mem_Stat_C01S12 | Memory | Critical | N/A | N/A | 'Presence detected' 'Configuration error'
134 | Mem_Stat_C02S03 | Memory | Critical | N/A | N/A | 'Configuration error'
135 | Mem_Stat_C02S04 | Memory | Critical | N/A | N/A | 'Configuration error'

However the memory status in iLO (attached screenshot) is perfectly fine.

Revision history for this message
Michał Ajduk (majduk) wrote :
Revision history for this message
Michał Ajduk (majduk) wrote :

Details of ipmi-sensors debug:

=====================================================
Get Sensor Reading Request
=====================================================
[ 2Dh] = cmd[ 8b]
[ 13h] = sensor_number[ 8b]
=====================================================
Get Sensor Reading Response
=====================================================
[ 2Dh] = cmd[ 8b]
[ 0h] = comp_code[ 8b]
[ 0h] = sensor_reading[ 8b]
[ 0h] = reserved1[ 5b]
[ 0h] = reading_state[ 1b]
[ 1h] = sensor_scanning[ 1b]
[ 1h] = all_event_messages[ 1b]
[ 80h] = sensor_event_bitmask1[ 8b]
[ 0h] = sensor_event_bitmask2[ 7b]
[ 1h] = reserved2[ 1b]
135 | Mem_Stat_C02S04 | Memory | N/A | N/A | 'Configuration error'

Revision history for this message
James Troup (elmo) wrote :

What version of iLO is in use? It looks like this could be a firmware bug.

Revision history for this message
Michał Ajduk (majduk) wrote :

It's the latest Gen10 ILO, v5:

Product Name
ProLiant DL380 Gen10
Product ID
P19717-B21
System ROM
U30 v2.34 (04/08/2020)
System ROM Date
04/08/2020
Redundant System ROM
U30 v2.34 (04/08/2020)
iLO Advanced
iLO Firmware Version
2.30 Aug 24 2020

Edin S (exsdev)
Changed in charm-hw-health:
importance: Undecided → Medium
Revision history for this message
Eric Chen (eric-chen) wrote :

This issue was pending for a while and this charm is no longer being actively maintained. Please consider using the new hardware-observer-operator instead. (https://github.com/canonical/hardware-observer-operator)

This issue is not critical, therefore, I mark it as "won't fix"

Changed in charm-hw-health:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.