cannot display sensor name when its owner is lun1

Bug #2076173 reported by Gabriel B. Sant'Anna

This bug report will be marked for expiration in 29 days if no further activity occurs. (find out why)

10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ipmitool (Ubuntu)
Incomplete
Undecided
Unassigned

Bug Description

ipmitool sel does not correctly display the sensor's name if its owner is set to lun1.
Upstream bug: https://codeberg.org/IPMITool/ipmitool/issues/8

We were asked to track this in order to enable new hardware from a partner.
It was reported in v1.8.19 (used in Noble), but likely affects previous versions as well.

Steps to reproduce, copied verbatim from the upstream report:

1. Using AMI/BMC to generate a sensor error event. The sensor belong to LUN1
GPU1_MEM | 10h | ok | 11.1 | Uncorrectable ECC
GPU2_MEM | 11h | ok | 11.2 | Uncorrectable ECC
GPU3_MEM | 12h | ok | 11.3 | Uncorrectable ECC
GPU4_MEM | 13h | ok | 11.4 | Uncorrectable ECC
GPU5_MEM | 14h | ok | 11.5 | Uncorrectable ECC
GPU6_MEM | 15h | ok | 11.6 | Uncorrectable ECC
GPU7_MEM | 16h | ok | 11.7 | Uncorrectable ECC
GPU8_MEM | 17h | ok | 11.8 | Uncorrectable ECC

2. Run `ipmitool sel elist`

3. Observe the abnormal reply:
c5 | 2023/08/02 | 17时17分24秒 CST | Memory | Uncorrectable ECC | Asserted
c6 | 2023/08/02 | 17时18分29秒 CST | Memory | Uncorrectable ECC | Asserted
c7 | 2023/08/02 | 17时18分29秒 CST | Memory | Uncorrectable ECC | Asserted
c8 | 2023/08/02 | 17时18分30秒 CST | Memory | Uncorrectable ECC | Asserted
c9 | 2023/08/02 | 17时18分30秒 CST | Memory | Uncorrectable ECC | Asserted
ca | 2023/08/02 | 17时19分34秒 CST | Memory | Uncorrectable ECC | Asserted
cb | 2023/08/02 | 17时19分34秒 CST | Memory INTEGRAL_DIMM | Uncorrectable ECC | Asserted
cc | 2023/08/02 | 17时19分34秒 CST | Memory | Uncorrectable ECC | Asserted
SensorName is empty or wrong.(Expexct GPU1_MEM)

A fix was proposed upstream but is yet to be merged: https://codeberg.org/IPMITool/ipmitool/pulls/39

Tags: patch
Revision history for this message
Gabriel B. Sant'Anna (baioc) wrote :
description: updated
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "Upstream proposed patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Revision history for this message
Gabriel B. Sant'Anna (baioc) wrote :

I wasn't able to test the specific case covered by this (sensor with lun1 owner), but in case anyone's interested in trying it out, I've built patched versions for Noble and Jammy in ppa:baioc/ipmitool-sru-sel-sensor-lun1
I'll attach the debdiffs here as well.

Revision history for this message
Gabriel B. Sant'Anna (baioc) wrote :
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hi Gabriel,

Thanks for the bug report. IMHO, I believe it's best to wait for upstream's comments regarding the proposed patch before going forward with the upload. Also, it seems that one needs specific hardware to reproduce the issue...? Is there any way to reproduce it without such hardware?

Thanks.

Changed in ipmitool (Ubuntu):
status: New → Incomplete
Revision history for this message
Gabriel B. Sant'Anna (baioc) wrote :

Hi Sergio,

Correct, I haven't been able to reproduce it without specific hardware.
Someone pointed me to <https://github.com/wrouesnel/openipmi/tree/master/lanserv>, which contains an IPMI simulator, but after an afternoon of experiments I'm still not able to generate events with the right lun.

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Thanks Gabriel. I agree with Sergio, let's wait for comment from upstream.

Revision history for this message
Joao Andre Simioni (jasimioni) wrote :
Download full text (4.1 KiB)

Early tests with the provided PPA for Jammy worked as expected. Below the report:

We've confirmed successful results with the PPA ipmitool. Previously, the SEL events for SPD_FAN_[1..4]_5 plus SPD_FAN_4_[1..4] were not properly reported. E.g. these IDs for these fans were 0x1a0-0x1a7, and only the lower byte was being used.

# ipmitool -V

ipmitool version 1.8.18

ipmitool sel elist log as below:

| 08/07/2024 | 09:41:47 | Fan SPD_FAN_1_5_R | Lower Critical going low | Asserted | Reading 0 < Threshold 2200 RPM

| 08/07/2024 | 09:41:47 | Fan SPD_FAN_1_5_F | Lower Critical going low | Asserted | Reading 0 < Threshold 2464 RPM

| 08/07/2024 | 09:42:14 | Fan SPD_FAN_1_5_R | Lower Critical going low | Deasserted | Reading 4840 < Threshold 2200 RPM

| 08/07/2024 | 09:42:14 | Fan SPD_FAN_1_5_F | Lower Critical going low | Deasserted | Reading 5720 < Threshold 2464 RPM

| 08/07/2024 | 09:42:31 | Fan SPD_FAN_2_5_R | Lower Critical going low | Asserted | Reading 0 < Threshold 2200 RPM

| 08/07/2024 | 09:42:31 | Fan SPD_FAN_2_5_F | Lower Critical going low | Asserted | Reading 0 < Threshold 2464 RPM

| 08/07/2024 | 09:43:00 | Fan SPD_FAN_2_5_R | Lower Critical going low | Deasserted | Reading 3520 < Threshold 2200 RPM

| 08/07/2024 | 09:43:00 | Fan SPD_FAN_2_5_F | Lower Critical going low | Deasserted | Reading 4224 < Threshold 2464 RPM

| 08/07/2024 | 09:43:23 | Fan SPD_FAN_3_5_R | Lower Critical going low | Asserted | Reading 0 < Threshold 2200 RPM

| 08/07/2024 | 09:43:23 | Fan SPD_FAN_3_5_F | Lower Critical going low | Asserted | Reading 0 < Threshold 2464 RPM

| 08/07/2024 | 09:43:58 | Fan SPD_FAN_3_5_R | Lower Critical going low | Deasserted | Reading 3520 < Threshold 2200 RPM

| 08/07/2024 | 09:44:03 | Fan SPD_FAN_3_5_F | Lower Critical going low | Deasserted | Reading 5632 < Threshold 2464 RPM

| 08/07/2024 | 09:44:14 | Fan SPD_FAN_4_1_R | Lower Critical going low | Asserted | Reading 0 < Threshold 2200 RPM

| 08/07/2024 | 09:44:16 | Fan SPD_FAN_4_1_F | Lower Critical going low | Asserted | Reading 0 < Threshold 2464 RPM

| 08/07/2024 | 09:44:37 | Fan SPD_FAN_4_1_R | Lower Critical going low | Deasserted | Reading 3608 < Threshold 2200 RPM

| 08/07/2024 | 09:44:37 | Fan SPD_FAN_4_1_F | Lower Critical going low | Deasserted | Reading 4312 < Threshold 2464 RPM

| 08/07/2024 | 09:44:59 | Fan SPD_FAN_4_2_R | Lower Critical going low | Asserted | Reading 0 < Threshold 2200 RPM

| 08/07/2024 | 09:45:00 | Fan SPD_FAN_4_2_F | Lower Critical going low | Asserted | Reading 0 < Threshold 2464 RPM

| 08/07/2024 | 09:45:34 | Fan SPD_FAN_4_2_R | Lower Critical going low | Deasserted | Reading 3520 < Threshold 2200 RPM

| 08/07/2024 | 09:45:34 | Fan SPD_FAN_4_2_F | Lower Critical going low | Deasserted | Reading 4136 < Threshold 2464 RPM

| 08/07/2024 | 09:45:49 | Fan SPD_FAN_4_3_R | Lower Critical going low | Asserted | Reading 0 < Threshold 2200 RPM

| 08/07/2024 | 09:45:49 | Fan SPD_FAN_4_3_F | Lower Critical going low | Asserted | Reading 0 < Threshold 2464 RPM

| 08/07/2024 | 09:46:09 | Fan SPD_FAN_4_3_R | Lower Critical going low | Deasserted | Reading 3168 < Threshold 2200 RPM

| 08/07/2024 | 09:46:09 | Fan SPD_FAN_4_3_F | Lower Critical going low | D...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.