NMI: IOCK error on HP MicroServer Gen8 after Focal -> Jammy upgrade

Bug #2012366 reported by Diego Vitali
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

After distro upgrade to new LTS "22" from 20 the HP microserver shows a critical NMI event in ILO

Unrecoverable System Error (NMI) has occurred. System Firmware will log additional details in a separate IML entry if possible

no hp modules are loaded except hpilo:

lsmod |grep hp
hpilo 24576 0

Despite the server is flashing "red" the System is functional. A few minutes after boot the kernel shows 4-5 of these kernel panics (see full log) but continue to run ok, the NMI flag in ILO appears simultaneously to this kernel error. this issue was not present before the distro-upgrade.

[ 739.516183] NMI: IOCK error (debug interrupt?) for reason 61 on CPU 0.
[ 739.516190] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-67-generic #74-Ubuntu
[ 739.516194] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 04/04/2019
[ 739.516196] RIP: 0010:mwait_idle_with_hints.constprop.0+0x4f/0xa0

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: linux-image-5.15.0-67-generic 5.15.0-67.74
ProcVersionSignature: Ubuntu 5.15.0-67.74-generic 5.15.85
Uname: Linux 5.15.0-67-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Mar 20 23:54 seq
 crw-rw---- 1 root audio 116, 33 Mar 20 23:54 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu82.3
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CasperMD5CheckResult: pass
Date: Tue Mar 21 09:20:20 2023
InstallationDate: Installed on 2022-02-16 (397 days ago)
InstallationMedia: Ubuntu-Server 20.04.3 LTS "Focal Fossa" - Release amd64 (20210824)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: HP ProLiant MicroServer Gen8
PciMultimedia:

ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
ProcFB: 0 mgag200drmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.15.0-67-generic root=UUID=ce9a2e21-54ae-4461-8086-c62bbcaa1481 ro
RelatedPackageVersions:
 linux-restricted-modules-5.15.0-67-generic N/A
 linux-backports-modules-5.15.0-67-generic N/A
 linux-firmware 20220329.git681281e4-0ubuntu3.10
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: Upgraded to jammy on 2023-03-20 (0 days ago)
dmi.bios.date: 04/04/2019
dmi.bios.vendor: HP
dmi.bios.version: J06
dmi.chassis.type: 7
dmi.chassis.vendor: HP
dmi.ec.firmware.release: 2.82
dmi.modalias: dmi:bvnHP:bvrJ06:bd04/04/2019:efr2.82:svnHP:pnProLiantMicroServerGen8:pvr:cvnHP:ct7:cvr:sku712317-421:
dmi.product.family: ProLiant
dmi.product.name: ProLiant MicroServer Gen8
dmi.product.sku: 712317-421
dmi.sys.vendor: HP
---
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Jun 19 16:38 seq
 crw-rw---- 1 root audio 116, 33 Jun 19 16:38 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CasperMD5CheckResult: pass
CurrentDmesg:
 Error: command ['pkexec', 'dmesg'] failed with exit code 127: Error executing command as another user: Not authorized

 This incident has been reported.
DistroRelease: Ubuntu 22.04
InstallationDate: Installed on 2022-02-16 (488 days ago)
InstallationMedia: Ubuntu-Server 20.04.3 LTS "Focal Fossa" - Release amd64 (20210824)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: HP ProLiant MicroServer Gen8
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
ProcFB: 0 mgag200drmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.19.0-45-generic root=UUID=ce9a2e21-54ae-4461-8086-c62bbcaa1481 ro
ProcVersionSignature: Ubuntu 5.19.0-45.46~22.04.1-generic 5.19.17
RelatedPackageVersions:
 linux-restricted-modules-5.19.0-45-generic N/A
 linux-backports-modules-5.19.0-45-generic N/A
 linux-firmware 20220329.git681281e4-0ubuntu3.13
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: jammy
Uname: Linux 5.19.0-45-generic x86_64
UnreportableReason: This report is about a package that is not installed.
UpgradeStatus: Upgraded to jammy on 2023-03-20 (91 days ago)
UserGroups: adm cdrom dip docker lxd plugdev sudo
_MarkForUpload: False
dmi.bios.date: 04/04/2019
dmi.bios.vendor: HP
dmi.bios.version: J06
dmi.chassis.type: 7
dmi.chassis.vendor: HP
dmi.ec.firmware.release: 2.82
dmi.modalias: dmi:bvnHP:bvrJ06:bd04/04/2019:efr2.82:svnHP:pnProLiantMicroServerGen8:pvr:cvnHP:ct7:cvr:sku712317-421:
dmi.product.family: ProLiant
dmi.product.name: ProLiant MicroServer Gen8
dmi.product.sku: 712317-421
dmi.sys.vendor: HP

Revision history for this message
Diego Vitali (artoo80) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Paride Legovini (paride)
summary: - NMI event on
+ NMI: IOCK error on HP MicroServer Gen8 after Focal -> Jammy upgrade
tags: added: regression-release
Revision history for this message
Paride Legovini (paride) wrote :

Hi dv and thanks for your bug report. I adjusted the bug summary, I hope I got it right. I also tagged the bug regression-release (see [1]).

It would be interesting to know if this bug is present in the Jammy HWE kernel. Is it possible for you to install linux-image-generic-hwe-22.04, reboot and report back? To be clear: if the problem goes away this is not to be considered a fix, but it's a useful data point and possibly a workaround for you.

You mentioned in a side channel that this problem "came back". Do you remember which Ubuntu release was affected by it, or (even better) if there was a bug report about it?

Note: this is more of a drive-by comment than a real triage, which I'm leaving to the kernel team.

[1] https://wiki.ubuntu.com/Bugs/Tags

Diego Vitali (artoo80)
description: updated
Revision history for this message
Diego Vitali (artoo80) wrote (last edit ):

Hi Paride,

In that channel I was referring to the same issue on this server ~2 years ago after a first install of Debian - the workaround in debian then was to blacklist the HP watchdog timer module (hpwdt). However, when I replaced debian with focal on this same machine thet NMI issue completely disappeared and the server run very smoothly for almost two years. That is what I meant with the same issue has "come back".

with the kernel suggested:

uname -a
Linux delta 5.19.0-35-generic #36~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb 17 15:17:25 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

I got the same issue

[ 339.985218] NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0.
[ 339.985225] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-35-generic #36~22.04.1-Ubuntu
[ 339.985229] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 04/04/2019
[ 339.985231] RIP: 0010:mwait_idle_with_hints.constprop.0+0x48/0xa0

Revision history for this message
Diego Vitali (artoo80) wrote (last edit ):

Just as reference, and following Paride's suggestion: there was a an ancient bug which looks terribly similar to the one I am posting here.

The summary is familiar:

HP Proliant Servers Advices for Ubuntu Linux (cmdline, panics, firmware options)

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1417580

and content seems matching - is this a regression?

In the current case (22.04) the OS does not initiate the reboot sequence and after the kernel logs the panic NMI event, the system simply does nothing. In fact, if it was not for iLO I would have not noticed the issue

Revision history for this message
Diego Vitali (artoo80) wrote :

Just updated to:

5.19.0-38-generic #39~22.04.1-Ubuntu

The same issue persists.

Revision history for this message
Diego Vitali (artoo80) wrote : CRDA.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Diego Vitali (artoo80) wrote : Lspci.txt

apport information

Revision history for this message
Diego Vitali (artoo80) wrote : Lspci-vt.txt

apport information

Revision history for this message
Diego Vitali (artoo80) wrote : Lsusb.txt

apport information

Revision history for this message
Diego Vitali (artoo80) wrote : Lsusb-t.txt

apport information

Revision history for this message
Diego Vitali (artoo80) wrote : Lsusb-v.txt

apport information

Revision history for this message
Diego Vitali (artoo80) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Diego Vitali (artoo80) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Diego Vitali (artoo80) wrote : ProcModules.txt

apport information

Revision history for this message
Diego Vitali (artoo80) wrote : UdevDb.txt

apport information

Revision history for this message
Diego Vitali (artoo80) wrote : WifiSyslog.txt

apport information

Revision history for this message
Diego Vitali (artoo80) wrote : acpidump.txt

apport information

Revision history for this message
Diego Vitali (artoo80) wrote :

After updating the kernel to:

5.19.0-45-generic #46~22.04.1

The problem NMI-lock has (so far) not showed: No alarm in ILO, blue light instead of Red flashing light on the server.

I submitted the updated from the cli of apport on the server. I am not sure why these updates have been submitted like so, it seems like a lot of unnecessary update-comments - Maybe I did something wrong

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.