Radeon Pro W5500 in passthrough with vfio generates spurious NMI reason 25
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-signed-hwe-5.13 (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
First encountered in 5.4 kernel, but still present in HWE.
Description: Ubuntu 20.04.4 LTS
Release: 20.04
We have three of those cards in three identical EPYC 7302P HP DL325 Gen10 servers.
ruben@alpha:~$ cat /proc/cmdline
BOOT_IMAGE=
dmesg excerpt (with vendor-reset):
[ 412.868799] vfio-pci 0000:86:00.0: enabling device (0142 -> 0143)
[ 412.868980] vfio-pci 0000:86:00.0: AMD_NAVI14: version 1.1
[ 412.868982] vfio-pci 0000:86:00.0: AMD_NAVI14: performing pre-reset
[ 412.888842] vfio-pci 0000:86:00.0: AMD_NAVI14: performing reset
[ 412.925218] ATOM BIOS: 113-D3250100-102
[ 412.925221] vendor-reset-drm: atomfirmware: bios_scratch_
[ 413.171020] vfio-pci 0000:86:00.0: AMD_NAVI14: bus reset disabled? yes
[ 413.171028] vfio-pci 0000:86:00.0: AMD_NAVI14: SMU response reg: 0, sol reg: 0, mp1 intr enabled? no, bl ready? yes
[ 413.171035] vfio-pci 0000:86:00.0: AMD_NAVI14: performing post-reset
[ 413.208794] vfio-pci 0000:86:00.0: AMD_NAVI14: reset result = 0
[ 413.208971] vfio-pci 0000:86:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
[ 413.208985] vfio-pci 0000:86:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0
[ 413.208990] vfio-pci 0000:86:00.0: vfio_ecap_init: hiding ecap 0x25@0x400
[ 413.208992] vfio-pci 0000:86:00.0: vfio_ecap_init: hiding ecap 0x26@0x410
[ 413.208994] vfio-pci 0000:86:00.0: vfio_ecap_init: hiding ecap 0x27@0x440
[ 413.228798] vfio-pci 0000:86:00.1: enabling device (0140 -> 0142)
[ 413.296899] vfio-pci 0000:86:00.0: AMD_NAVI14: version 1.1
[ 413.296904] vfio-pci 0000:86:00.0: AMD_NAVI14: performing pre-reset
[ 413.297096] vfio-pci 0000:86:00.0: AMD_NAVI14: performing reset
[ 413.333349] ATOM BIOS: 113-D3250100-102
[ 413.333351] vendor-reset-drm: atomfirmware: bios_scratch_
[ 413.579787] vfio-pci 0000:86:00.0: AMD_NAVI14: bus reset disabled? yes
[ 413.579793] vfio-pci 0000:86:00.0: AMD_NAVI14: SMU response reg: 0, sol reg: 0, mp1 intr enabled? no, bl ready? yes
[ 413.579797] vfio-pci 0000:86:00.0: AMD_NAVI14: performing post-reset
[ 413.616795] vfio-pci 0000:86:00.0: AMD_NAVI14: reset result = 0
[ 419.766917] Uhhuh. NMI received for unknown reason 25 on CPU 0.
[ 419.766919] Do you have a strange power saving mode enabled?
[ 419.766920] Dazed and confused, but trying to continue
[ 436.498601] Uhhuh. NMI received for unknown reason 25 on CPU 0.
[ 436.498604] Do you have a strange power saving mode enabled?
[ 436.498605] Dazed and confused, but trying to continue
[ 454.306951] Uhhuh. NMI received for unknown reason 25 on CPU 0.
[ 454.306955] Do you have a strange power saving mode enabled?
[ 454.306955] Dazed and confused, but trying to continue
[ 456.237162] Uhhuh. NMI received for unknown reason 25 on CPU 0.
[ 456.237165] Do you have a strange power saving mode enabled?
[ 456.237166] Dazed and confused, but trying to continue
[ 457.800596] Uhhuh. NMI received for unknown reason 25 on CPU 0.
[ 457.800598] Do you have a strange power saving mode enabled?
[ 457.800599] Dazed and confused, but trying to continue
[ 474.068911] Uhhuh. NMI received for unknown reason 25 on CPU 0.
[ 474.068914] Do you have a strange power saving mode enabled?
[ 474.068915] Dazed and confused, but trying to continue
This happens both with and without the vendor-reset workaround (https:/
I will now move one of these GPUs in an older Intel system and run bare metal because we need a student to work on it. I'll also test passthrough on that machine, to see whether it has the same behaviour.
ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-
ProcVersionSign
Uname: Linux 5.13.0-30-generic x86_64
ApportVersion: 2.20.11-
Architecture: amd64
CasperMD5CheckR
Date: Mon Mar 7 10:26:25 2022
InstallationDate: Installed on 2022-01-05 (60 days ago)
InstallationMedia: Ubuntu-Server 18.04.6 LTS "Bionic Beaver" - Release amd64 (20210915)
ProcEnviron:
TERM=xterm-
PATH=(custom, no user)
XDG_RUNTIME_
LANG=en_US.UTF-8
SHELL=/bin/bash
SourcePackage: linux-signed-
UpgradeStatus: Upgraded to focal on 2022-01-17 (48 days ago)