Intel PET not available on recent kernel causing QEMU VM crashes

Bug #2015455 reported by MRATT
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Hi

Following a recent kernel update on Ubuntu Server 22.04.2 x86_64 to 5.19.0-35 (& ..0-38), QEMU (via LXD) Windows Server 2022 VMs are crashing every day.

The CPU has Intel PET feature, but I've had to disable tdp_mmu using modprobe so stabilise the VMs.

The platform:
-------------
Linux 5.19.0-38-generic #39~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar 17 21:16:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

The CPU tech specs on Dell R620:
https://www.intel.com/content/www/us/en/products/sku/75277/intel-xeon-processor-e52680-v2-25m-cache-2-80-ghz/specifications.html

The work-around (success with modprobe):
----------------------------------------
https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0#KVM:_entry_failed.2C_hardware_error_0x80000021

LXD Issue:
----------
https://github.com/lxc/lxd/issues/11520

The QEMU log:
-------------
someadmin@us2204-iph-lxd03:/home/someadmin# cat /var/snap/lxd/common/lxd/logs/mw2022-ivm-test01/qemu.log.old
qemu-system-x86_64: Issue while setting TUNSETSTEERINGEBPF: Invalid argument with fd: 48, prog_fd: -1
KVM: entry failed, hardware error 0x80000021

If you're running a guest on an Intel machine without unrestricted mode
support, the failure can be most likely due to the guest entering an invalid
state for Intel VT. For example, the guest maybe running in big real mode
which is not supported on less recent Intel processors.

EAX=00000008 EBX=00040ee0 ECX=800003ac EDX=00000000
ESI=32e2f000 EDI=32e26040 EBP=813d2810 ESP=813d2790
EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0
ES =0000 00000000 ffffffff 00809300
CS =8000 7ff80000 ffffffff 00809300
SS =0000 00000000 ffffffff 00809300
DS =0000 00000000 ffffffff 00809300
FS =0000 00000000 ffffffff 00809300
GS =0000 00000000 ffffffff 00809300
LDT=0000 00000000 00000000 00000000
TR =0040 ff2a0000 00000067 00008b00
GDT= ff2a1fb0 00000057
IDT= 00000000 00000000
CR0=00050032 CR2=7c3fa0b0 CR3=001ae002 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=qemu-system-x86_64: ../hw/core/cpu-sysemu.c:77: cpu_asidx_from_attrs: Assertion `ret < cpu->num_ases && ret >= 0' failed.

The CPU via lscpu:
------------------
Architecture: x86_64
  CPU op-mode(s): 32-bit, 64-bit
  Address sizes: 46 bits physical, 48 bits virtual
  Byte Order: Little Endian
CPU(s): 40
  On-line CPU(s) list: 0-39
Vendor ID: GenuineIntel
  Model name: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
    CPU family: 6
    Model: 62
    Thread(s) per core: 2
    Core(s) per socket: 10
    Socket(s): 2
    Stepping: 4
    CPU max MHz: 3600.0000
    CPU min MHz: 1200.0000
    BogoMIPS: 5599.96
    Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault pti ssbd ibrs ibpb stibp tpr_shad
                         ow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d
Virtualization features:
  Virtualization: VT-x
Caches (sum of all):
  L1d: 640 KiB (20 instances)
  L1i: 640 KiB (20 instances)
  L2: 5 MiB (20 instances)
  L3: 50 MiB (2 instances)
NUMA:
  NUMA node(s): 2
  NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
  NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
Vulnerabilities:
  Itlb multihit: KVM: Mitigation: Split huge pages
  L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
  Mds: Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown: Mitigation; PTI
  Mmio stale data: Unknown: No mitigations
  Retbleed: Not affected
  Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
  Srbds: Not affected
  Tsx async abort: Not affected

QEMU version:
I am not able to determine this as yet, but whatever is bundled with the latest/stable channel's 5.12-c63881f version of the LXD Snap. When I am able to find out I will update this report.

Thanks
Mark
---
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Mar 30 18:35 seq
 crw-rw---- 1 root audio 116, 33 Mar 30 18:35 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu82.3
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: N/A
CasperMD5CheckResult: pass
DistroRelease: Ubuntu 22.04
InstallationDate: Installed on 2022-05-28 (312 days ago)
InstallationMedia: Ubuntu-Server 20.04.4 LTS "Focal Fossa" - Release amd64 (20220223.1)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: Dell Inc. PowerEdge R620
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 mgag200drmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.19.0-38-generic root=/dev/mapper/vg00-vg00--lv00 ro
ProcVersionSignature: Ubuntu 5.19.0-38.39~22.04.1-generic 5.19.17
RelatedPackageVersions:
 linux-restricted-modules-5.19.0-38-generic N/A
 linux-backports-modules-5.19.0-38-generic N/A
 linux-firmware 20220329.git681281e4-0ubuntu3.11
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: jammy uec-images
Uname: Linux 5.19.0-38-generic x86_64
UpgradeStatus: Upgraded to jammy on 2022-08-10 (239 days ago)
UserGroups: N/A
_MarkForUpload: True
dmi.bios.date: 12/06/2019
dmi.bios.release: 2.9
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 2.9.0
dmi.board.name: 0KCKR5
dmi.board.vendor: Dell Inc.
dmi.board.version: A02
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr2.9.0:bd12/06/2019:br2.9:svnDellInc.:pnPowerEdgeR620:pvr:rvnDellInc.:rn0KCKR5:rvrA02:cvnDellInc.:ct23:cvr:skuSKU=NotProvided;ModelName=PowerEdgeR620:
dmi.product.name: PowerEdge R620
dmi.product.sku: SKU=NotProvided;ModelName=PowerEdge R620
dmi.sys.vendor: Dell Inc.

MRATT (mrmail)
summary: - Intel PET not available on recent kernel causing KVM VM crashes
+ Intel PET not available on recent kernel causing QEMU VM crashes
description: updated
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 2015455

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
MRATT (mrmail) wrote : CurrentDmesg.txt

apport information

tags: added: apport-collected jammy uec-images
description: updated
Revision history for this message
MRATT (mrmail) wrote : Lspci.txt

apport information

Revision history for this message
MRATT (mrmail) wrote : Lspci-vt.txt

apport information

Revision history for this message
MRATT (mrmail) wrote : Lsusb.txt

apport information

Revision history for this message
MRATT (mrmail) wrote : Lsusb-t.txt

apport information

Revision history for this message
MRATT (mrmail) wrote : Lsusb-v.txt

apport information

Revision history for this message
MRATT (mrmail) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
MRATT (mrmail) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
MRATT (mrmail) wrote : ProcInterrupts.txt

apport information

Revision history for this message
MRATT (mrmail) wrote : ProcModules.txt

apport information

Revision history for this message
MRATT (mrmail) wrote : UdevDb.txt

apport information

Revision history for this message
MRATT (mrmail) wrote : WifiSyslog.txt

apport information

Revision history for this message
MRATT (mrmail) wrote : acpidump.txt

apport information

MRATT (mrmail)
Changed in linux (Ubuntu):
status: Incomplete → New
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.