cannot display sensor name when its owner is lun1
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
ipmitool (Ubuntu) | Status tracked in Questing | |||||
Jammy |
Fix Released
|
Undecided
|
Mitchell Augustin | |||
Noble |
Fix Released
|
Undecided
|
Mitchell Augustin | |||
Oracular |
Fix Released
|
Undecided
|
Mitchell Augustin | |||
Plucky |
Fix Released
|
Undecided
|
Mitchell Augustin | |||
Questing |
Fix Released
|
Undecided
|
Mitchell Augustin |
Bug Description
SRU Justification:
[ Impact ]
ipmitool sel does not correctly display the sensor's name if its owner is set to lun1.
Upstream bug: https:/
We were asked to track this in order to enable new hardware from a partner.
It was reported in v1.8.19 (used in Noble), but likely affects previous versions as well.
Steps to reproduce, copied verbatim from the upstream report:
1. Using AMI/BMC to generate a sensor error event. The sensor belong to LUN1
GPU1_MEM | 10h | ok | 11.1 | Uncorrectable ECC
GPU2_MEM | 11h | ok | 11.2 | Uncorrectable ECC
GPU3_MEM | 12h | ok | 11.3 | Uncorrectable ECC
GPU4_MEM | 13h | ok | 11.4 | Uncorrectable ECC
GPU5_MEM | 14h | ok | 11.5 | Uncorrectable ECC
GPU6_MEM | 15h | ok | 11.6 | Uncorrectable ECC
GPU7_MEM | 16h | ok | 11.7 | Uncorrectable ECC
GPU8_MEM | 17h | ok | 11.8 | Uncorrectable ECC
2. Run `ipmitool sel elist`
3. Observe the abnormal reply:
c5 | 2023/08/02 | 17时17分24秒 CST | Memory | Uncorrectable ECC | Asserted
c6 | 2023/08/02 | 17时18分29秒 CST | Memory | Uncorrectable ECC | Asserted
c7 | 2023/08/02 | 17时18分29秒 CST | Memory | Uncorrectable ECC | Asserted
c8 | 2023/08/02 | 17时18分30秒 CST | Memory | Uncorrectable ECC | Asserted
c9 | 2023/08/02 | 17时18分30秒 CST | Memory | Uncorrectable ECC | Asserted
ca | 2023/08/02 | 17时19分34秒 CST | Memory | Uncorrectable ECC | Asserted
cb | 2023/08/02 | 17时19分34秒 CST | Memory INTEGRAL_DIMM | Uncorrectable ECC | Asserted
cc | 2023/08/02 | 17时19分34秒 CST | Memory | Uncorrectable ECC | Asserted
SensorName is empty or wrong.(Expexct GPU1_MEM)
A fix was proposed upstream but is yet to be merged: https:/
[ Test Plan ]
I have confirmed that this cleanly applies to the latest Plucky ipmitool and prepared a test PPA: https:/
I tested for regressions when running `ipmitool sel elist` on our DGX A100, and did not observe any. (results were the same as with current plucky ipmitool)
I am asking Nvidia to confirm that this works for them, since we do not currently have the hardware to test the new functionality internally. (However, they have already confirmed that this patch, when applied to jammy ipmitool, works as expected.)
[ Fix ]
The change will add checking of SEL Generator ID byte 2 LUN bits [1:0]
in the compare with the SDR LUN field to display the correct SDR string
in the SEL event
[ Where problems could occur ]
If upstream ever does accept a different version of this patch which conflicts with our sauce, we may need to revert and apply their version. However, they have not responded to the upstream merge request in over 7 months [0][1], so since this functionality is still required by our users, and since it only adjusts a part of ipmitool to match the ipmi specification, it seems appropriate for a sauce patch.
The regression risk should be low since this just adds a check for a field that should already be present in any hardware-generated event records according to the IPMI spec[2] (section 32.1 SEL Event Records), and this check is only done/used in a function that is specifically for printing sensor-generated event records.
[0]:
Patch: https:/
PR: https:/
Related branches
- Lena Voytek (community): Approve
- Ubuntu Sponsors: Pending requested
- git-ubuntu import: Pending requested
-
Diff: 79 lines (+57/-0)3 files modifieddebian/changelog (+7/-0)
debian/patches/selsdr-fix-SEL-cannot-display-sensor-name-when-owner-lun1.patch (+49/-0)
debian/patches/series (+1/-0)
- Lena Voytek (community): Approve
- Ubuntu Sponsors: Pending requested
- git-ubuntu import: Pending requested
-
Diff: 79 lines (+57/-0)3 files modifieddebian/changelog (+7/-0)
debian/patches/selsdr-fix-SEL-cannot-display-sensor-name-when-owner-lun1.patch (+49/-0)
debian/patches/series (+1/-0)
- Lena Voytek (community): Approve
- Ubuntu Sponsors: Pending requested
- git-ubuntu import: Pending requested
-
Diff: 79 lines (+57/-0)3 files modifieddebian/changelog (+7/-0)
debian/patches/selsdr-fix-SEL-cannot-display-sensor-name-when-owner-lun1.patch (+49/-0)
debian/patches/series (+1/-0)
- Lena Voytek (community): Approve
- Ubuntu Sponsors: Pending requested
-
Diff: 93 lines (+59/-1)4 files modifieddebian/changelog (+7/-0)
debian/control (+2/-1)
debian/patches/selsdr-fix-SEL-cannot-display-sensor-name-when-owner-lun1.patch (+49/-0)
debian/patches/series (+1/-0)
- Lena Voytek (community): Needs Fixing
- Ubuntu Sponsors: Pending requested
-
Diff: 93 lines (+59/-1)4 files modifieddebian/changelog (+7/-0)
debian/control (+2/-1)
debian/patches/selsdr-fix-SEL-cannot-display-sensor-name-when-owner-lun1.patch (+49/-0)
debian/patches/series (+1/-0)
Changed in ipmitool (Ubuntu): | |
status: | Expired → In Progress |
assignee: | nobody → Mitchell Augustin (mitchellaugustin) |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
Changed in ipmitool (Ubuntu Plucky): | |
assignee: | nobody → Mitchell Augustin (mitchellaugustin) |
Changed in ipmitool (Ubuntu Oracular): | |
assignee: | nobody → Mitchell Augustin (mitchellaugustin) |
Changed in ipmitool (Ubuntu Noble): | |
assignee: | nobody → Mitchell Augustin (mitchellaugustin) |
Changed in ipmitool (Ubuntu Jammy): | |
assignee: | nobody → Mitchell Augustin (mitchellaugustin) |
Changed in ipmitool (Ubuntu Oracular): | |
status: | New → In Progress |
Changed in ipmitool (Ubuntu Noble): | |
status: | New → In Progress |
Changed in ipmitool (Ubuntu Jammy): | |
status: | New → In Progress |
The attachment "Upstream proposed patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.
[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]