Comment 0 for bug 1771467

Revision history for this message
Ryan Finnie (fo0bar) wrote : Reboot/shutdown kernel panic on HP DL360 Gen9 w/ bionic 4.15.0

Verified on multiple DL360 Gen9 servers with up to date firmware. Just before reboot or shutdown, there is the following panic:

[ 289.093083] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[ 289.093085] {1}[Hardware Error]: event severity: fatal
[ 289.093087] {1}[Hardware Error]: Error 0, type: fatal
[ 289.093088] {1}[Hardware Error]: section_type: PCIe error
[ 289.093090] {1}[Hardware Error]: port_type: 4, root port
[ 289.093091] {1}[Hardware Error]: version: 1.16
[ 289.093093] {1}[Hardware Error]: command: 0x6010, status: 0x0143
[ 289.093094] {1}[Hardware Error]: device_id: 0000:00:01.0
[ 289.093095] {1}[Hardware Error]: slot: 0
[ 289.093096] {1}[Hardware Error]: secondary_bus: 0x03
[ 289.093097] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x2f02
[ 289.093098] {1}[Hardware Error]: class_code: 040600
[ 289.093378] {1}[Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0003
[ 289.093380] {1}[Hardware Error]: Error 1, type: fatal
[ 289.093381] {1}[Hardware Error]: section_type: PCIe error
[ 289.093382] {1}[Hardware Error]: port_type: 4, root port
[ 289.093383] {1}[Hardware Error]: version: 1.16
[ 289.093384] {1}[Hardware Error]: command: 0x6010, status: 0x0143
[ 289.093386] {1}[Hardware Error]: device_id: 0000:00:01.0
[ 289.093386] {1}[Hardware Error]: slot: 0
[ 289.093387] {1}[Hardware Error]: secondary_bus: 0x03
[ 289.093388] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x2f02
[ 289.093674] {1}[Hardware Error]: class_code: 040600
[ 289.093676] {1}[Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0003
[ 289.093678] Kernel panic - not syncing: Fatal hardware error!
[ 289.093745] Kernel Offset: 0x1cc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 289.105835] ERST: [Firmware Warn]: Firmware does not respond in time.

It does eventually restart after this. Then during the subsequent POST, the following warning appears:

Embedded RAID 1 : Smart Array P440ar Controller - (2048 MB, V6.30) 7 Logical
Drive(s) - Operation Failed
 - 1719-Slot 0 Drive Array - A controller failure event occurred prior
   to this power-up. (Previous lock up code = 0x13) Action: Install the
   latest controller firmware. If the problem persists, replace the
   controller.

The latter's symptoms are described in https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565 but the running storage controller firmware is much newer than the doc's resolution.

Neither of these problems occur during shutdown/reboot on the xenial kernel.

FWIW, when running on old P89 (1.50 (07/20/2015) vs 2.56 (01/22/2018)), the shutdown failure mode was a loop like so:

[529151.035267] NMI: IOCK error (debug interrupt?) for reason 75 on CPU 0.
[529153.222883] Uhhuh. NMI received for unknown reason 25 on CPU 0.
[529153.222884] Do you have a strange power saving mode enabled?
[529153.222884] Dazed and confused, but trying to continue
[529153.554447] Uhhuh. NMI received for unknown reason 25 on CPU 0.
[529153.554448] Do you have a strange power saving mode enabled?
[529153.554449] Dazed and confused, but trying to continue
[529153.554450] Uhhuh. NMI received for unknown reason 25 on CPU 0.
[529153.554451] Do you have a strange power saving mode enabled?
[529153.554452] Dazed and confused, but trying to continue
[529153.554452] Uhhuh. NMI received for unknown reason 25 on CPU 0.
[529153.554453] Do you have a strange power saving mode enabled?
[529153.554454] Dazed and confused, but trying to continue
[529153.554454] Uhhuh. NMI received for unknown reason 35 on CPU 0.
[529153.554455] Do you have a strange power saving mode enabled?
[529153.554456] Dazed and confused, but trying to continue
[529153.554457] Uhhuh. NMI received for unknown reason 25 on CPU 0.
[529153.554458] Do you have a strange power saving mode enabled?
[529153.554458] Dazed and confused, but trying to continue
[529153.554459] Uhhuh. NMI received for unknown reason 25 on CPU 0.
[529153.554460] Do you have a strange power saving mode enabled?
[529153.554460] Dazed and confused, but trying to continue
[529154.953916] Uhhuh. NMI received for unknown reason 25 on CPU 0.
[529154.953917] Do you have a strange power saving mode enabled?
[529154.953918] Dazed and confused, but trying to continue

But upgrading to 2.56 changes that to a kernel panic.

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-signed-image-generic 4.15.0.21.22
ProcVersionSignature: Ubuntu 4.15.0-21.22-generic 4.15.17
Uname: Linux 4.15.0-21-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 May 15 23:11 seq
 crw-rw---- 1 root audio 116, 33 May 15 23:11 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Wed May 16 00:17:53 2018
HibernationDevice: RESUME=UUID=696e8063-c668-4c89-a478-bfc23a450369
InstallationDate: Installed on 2016-06-01 (713 days ago)
InstallationMedia: Ubuntu-Server 14.04.5 LTS "Trusty Tahr" - Beta amd64 (20160527)
MachineType: HP ProLiant DL360 Gen9
PciMultimedia:

ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 mgadrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-21-generic root=UUID=6e6d422d-8ffb-4db3-b8c7-6c81e320b1b2 ro console=tty0 console=ttyS1,38400 nosplash console=ttyS1,38400 console=tty0 nosplash
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-21-generic N/A
 linux-backports-modules-4.15.0-21-generic N/A
 linux-firmware 1.173
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
SourcePackage: linux
UpgradeStatus: Upgraded to bionic on 2018-05-09 (6 days ago)
dmi.bios.date: 01/22/2018
dmi.bios.vendor: HP
dmi.bios.version: P89
dmi.board.name: ProLiant DL360 Gen9
dmi.board.vendor: HP
dmi.chassis.type: 23
dmi.chassis.vendor: HP
dmi.modalias: dmi:bvnHP:bvrP89:bd01/22/2018:svnHP:pnProLiantDL360Gen9:pvr:rvnHP:rnProLiantDL360Gen9:rvr:cvnHP:ct23:cvr:
dmi.product.family: ProLiant
dmi.product.name: ProLiant DL360 Gen9
dmi.sys.vendor: HP