hw-health reporting memory error on healthy system
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
hw-health-charm |
Won't Fix
|
Medium
|
Unassigned |
Bug Description
hw-health reports Nagios memory error on a health system.
Root cause:
/usr/sbin/
122 | Mem_Stat_C01S03 | Memory | Critical | N/A | N/A | 'Presence detected' 'Configuration error'
123 | Mem_Stat_C01S04 | Memory | Critical | N/A | N/A | 'Presence detected' 'Configuration error'
130 | Mem_Stat_C01S11 | Memory | Critical | N/A | N/A | 'Presence detected' 'Configuration error'
131 | Mem_Stat_C01S12 | Memory | Critical | N/A | N/A | 'Presence detected' 'Configuration error'
134 | Mem_Stat_C02S03 | Memory | Critical | N/A | N/A | 'Configuration error'
135 | Mem_Stat_C02S04 | Memory | Critical | N/A | N/A | 'Configuration error'
However the memory status in iLO (attached screenshot) is perfectly fine.
Changed in charm-hw-health: | |
importance: | Undecided → Medium |
Details of ipmi-sensors debug:
======= ======= ======= ======= ======= ======= ======= ==== ======= ======= ======= ======= ======= ======= ==== ======= ======= ======= ======= ======= ======= ==== ======= ======= ======= ======= ======= ======= ==== event_bitmask1[ 8b] event_bitmask2[ 7b]
Get Sensor Reading Request
=======
[ 2Dh] = cmd[ 8b]
[ 13h] = sensor_number[ 8b]
=======
Get Sensor Reading Response
=======
[ 2Dh] = cmd[ 8b]
[ 0h] = comp_code[ 8b]
[ 0h] = sensor_reading[ 8b]
[ 0h] = reserved1[ 5b]
[ 0h] = reading_state[ 1b]
[ 1h] = sensor_scanning[ 1b]
[ 1h] = all_event_messages[ 1b]
[ 80h] = sensor_
[ 0h] = sensor_
[ 1h] = reserved2[ 1b]
135 | Mem_Stat_C02S04 | Memory | N/A | N/A | 'Configuration error'