KVM: after upgrading the kernel to 5.15.0-75, VM hangs after migration.

Bug #2024500 reported by mloza1
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-hwe-5.15 (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I have weird situation when I am live migrating a VM between two nodes, the VM hangs after migrating from EPYC3(Milan) to EPYC1(Naples) nodes.

First node:

CPU(s) AMD EPYC 7713 64-Core Processor (2 Sockets)
Linux compute81 5.15.0-75-generic #82~20.04.1-Ubuntu SMP Wed Jun 7 19:37:37 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Second node:

CPU(s) AMD EPYC 7401 24-Core Processor (2 Sockets)
Linux compute37 5.15.0-75-generic #82~20.04.1-Ubuntu SMP Wed Jun 7 19:37:37 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

When I am migrating from second type node to first type - everything works.
When I am migrating from first type to second type - VM hangs

No issues using the following kernels:

5.15.99 from upstream
linux-image-5.15.0-25-generic_5.15.0-25.25_amd64

I'm able to reproduce the issue using the following kernels:

linux-image-5.15.0-67-generic_5.15.0-67.74_amd64.deb
linux-image-5.15.0-68-generic_5.15.0-68.75_amd64.deb
linux-image-5.15.0-69-generic_5.15.0-69.76_amd64.deb
linux-image-5.15.0-70-generic_5.15.0-70.77_amd64.deb
linux-image-5.15.0-72-generic_5.15.0-72.79_amd64.deb
linux-image-5.15.0-73-generic_5.15.0-73.80_amd64.deb
linux-image-5.15.0-74-generic_5.15.0-74.81_amd64.deb
linux-image-5.15.0-75-generic_5.15.0-75.82_amd64.deb
linux-image-5.15.0-77-generic_5.15.0-77.84_amd64.deb

---
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Jun 20 21:15 seq
 crw-rw---- 1 root audio 116, 33 Jun 20 21:15 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu27.27
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: [Errno 2] No such file or directory: 'fuser'
CasperMD5CheckResult: skip
DistroRelease: Ubuntu 20.04
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: Supermicro AS -2124BT-HNTR
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=screen-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 astdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.15.0-75-generic root=UUID=178395aa-ca05-47a1-9f4a-0696787bb100 ro rootflags=subvol=@ console=tty1 console=ttyS1,115200n8
ProcVersionSignature: Ubuntu 5.15.0-75.82~20.04.1-generic 5.15.99
RelatedPackageVersions:
 linux-restricted-modules-5.15.0-75-generic N/A
 linux-backports-modules-5.15.0-75-generic N/A
 linux-firmware 1.187.39
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: focal
Uname: Linux 5.15.0-75-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: True
dmi.bios.date: 09/23/2022
dmi.bios.release: 5.22
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 2.5
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: H12DST-B
dmi.board.vendor: Supermicro
dmi.board.version: 1.00A
dmi.chassis.asset.tag: To be filled by O.E.M.
dmi.chassis.type: 1
dmi.chassis.vendor: Supermicro
dmi.chassis.version: 0123456789
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr2.5:bd09/23/2022:br5.22:svnSupermicro:pnAS-2124BT-HNTR:pvr0123456789:rvnSupermicro:rnH12DST-B:rvr1.00A:cvnSupermicro:ct1:cvr0123456789:skuTobefilledbyO.E.M.:
dmi.product.family: To be filled by O.E.M.
dmi.product.name: AS -2124BT-HNTR
dmi.product.sku: To be filled by O.E.M.
dmi.product.version: 0123456789
dmi.sys.vendor: Supermicro

Revision history for this message
mloza1 (mloza1) wrote :

compute81 apport

description: updated
Revision history for this message
mloza1 (mloza1) wrote (last edit ):

compute37 apport

description: updated
Revision history for this message
mloza1 (mloza1) wrote (last edit ):

libvirt version

(nova-libvirt)[root@compute81 /]# dpkg -l | grep libvirt

ii libvirt-clients 6.0.0-0ubuntu8.16 amd64 Programs for the libvirt library
ii libvirt-daemon 6.0.0-0ubuntu8.16 amd64 Virtualization daemon
ii libvirt-daemon-driver-qemu 6.0.0-0ubuntu8.16 amd64 Virtualization daemon QEMU connection driver
ii libvirt-daemon-system 6.0.0-0ubuntu8.16 amd64 Libvirt daemon configuration files
ii libvirt-daemon-system-systemd 6.0.0-0ubuntu8.16 amd64 Libvirt daemon configuration files (systemd)
ii libvirt0:amd64 6.0.0-0ubuntu8.16 amd64 library for interfacing with different virtualization systems

qemu version

(nova-libvirt)[root@compute81 /]# dpkg -l | grep qemu

ii ipxe-qemu 1.0.0+git-20190109.133f4c4-0ubuntu3.2 all PXE boot firmware - ROM images for qemu
ii ipxe-qemu-256k-compat-efi-roms 1.0.0+git-20150424.a25a16d-0ubuntu4 all PXE boot firmware - Compat EFI ROM images for qemu
ii libvirt-daemon-driver-qemu 6.0.0-0ubuntu8.16 amd64 Virtualization daemon QEMU connection driver
ii qemu-block-extra:amd64 1:4.2-3ubuntu6.24 amd64 extra block backend modules for qemu-system and qemu-utils
ii qemu-kvm 1:4.2-3ubuntu6.24 amd64 QEMU Full virtualization on x86 hardware
ii qemu-slof 20191209+dfsg-1 all Slimline Open Firmware -- QEMU PowerPC version
ii qemu-system 1:4.2-3ubuntu6.24 amd64 QEMU full system emulation binaries
ii qemu-system-arm 1:4.2-3ubuntu6.24 amd64 QEMU full system emulation binaries (arm)
ii qemu-system-common 1:4.2-3ubuntu6.24 amd64 QEMU full system emulation binaries (common files)
ii qemu-system-data 1:4.2-3ubuntu6.24 all QEMU full system emulation (data files)
ii qemu-system-mips 1:4.2-3ubuntu6.24 amd64 QEMU full system emulation binaries (mips)
ii qemu-system-misc 1:4.2-3ubuntu6.24 amd64 QEMU full system emulation binaries (miscellaneous)
ii qemu-system-ppc 1:4.2-3ubuntu6.24 amd64 QEMU full system emulation binaries (ppc)
ii qemu-system-s390x 1:4.2-3ubuntu6.24 amd64 QEMU full system emulation binaries (s390x)
ii qemu-system-sparc 1:4.2-3ubuntu6.24 amd64 QEMU full system emulation binaries (sparc)
ii qemu-system-x86 1:4.2-3ubuntu6.24 amd64 QEMU full system emulation binaries (x86)

summary: - KVM: after upgrading the kernel to 5.15.0-75, Guest hangs after
- migration.
+ KVM: after upgrading the kernel to 5.15.0-75, VM hangs after migration.
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 2024500

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
mloza1 (mloza1) wrote : CRDA.txt

apport information

tags: added: apport-collected focal
description: updated
Revision history for this message
mloza1 (mloza1) wrote : CurrentDmesg.txt

apport information

Revision history for this message
mloza1 (mloza1) wrote : Lspci.txt

apport information

Revision history for this message
mloza1 (mloza1) wrote : Lspci-vt.txt

apport information

Revision history for this message
mloza1 (mloza1) wrote : Lsusb.txt

apport information

Revision history for this message
mloza1 (mloza1) wrote : Lsusb-t.txt

apport information

Revision history for this message
mloza1 (mloza1) wrote : Lsusb-v.txt

apport information

Revision history for this message
mloza1 (mloza1) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
mloza1 (mloza1) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
mloza1 (mloza1) wrote : ProcInterrupts.txt

apport information

Revision history for this message
mloza1 (mloza1) wrote : ProcModules.txt

apport information

Revision history for this message
mloza1 (mloza1) wrote : UdevDb.txt

apport information

Revision history for this message
mloza1 (mloza1) wrote : acpidump.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
mloza1 (mloza1)
description: updated
mloza1 (mloza1)
affects: linux (Ubuntu) → linux-hwe-5.15 (Ubuntu)
mloza1 (mloza1)
description: updated
Revision history for this message
Khaled El Mously (kmously) wrote :

Hello @mloza1, Are you using AMD SEV here by any chance?

Revision history for this message
mloza1 (mloza1) wrote (last edit ):

Not using SEV. These are flags

EPYC3 Flags:

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm

EPYC1 flags:

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca

Revision history for this message
Khaled El Mously (kmously) wrote :

Thanks @mloza1 - I see well this looks different from the issue I am investigating, but maybe I can still help.

There are 10s of thousands of changes between linux-image-5.15.0-25 and linux-image-5.15.0-67 so it's difficult to know what caused this. I think bisecting the changes is our best bet. Maybe you can try to narrow down which Ubuntu kernel started to fail this use-case? If you can find out which kernel version exactly started to fail that would be helpful.

Revision history for this message
mloza1 (mloza1) wrote :

@kmously - I'll do my test again and get back with results

Revision history for this message
mloza1 (mloza1) wrote (last edit ):
Download full text (4.3 KiB)

List of 5.15 kernels available in the repo

root@compute81:~# apt search linux-image-5.15.*-generic

linux-image-5.15.0-25-generic/now 5.15.0-25.25 amd64 [installed,local] <--- everything works

linux-image-5.15.0-33-generic/focal-updates,focal-security,now 5.15.0-33.34~20.04.1 amd64 [installed] <-- VM hangs

linux-image-5.15.0-41-generic/focal-updates,focal-security,now 5.15.0-41.44~20.04.1 amd64 [installed] <-- VM hangs

Here's a diff of the kernel config files

root@compute81:~# diff -Naur /boot/config-5.15.0-25-generic /boot/config-5.15.0-33-generic
--- /boot/config-5.15.0-25-generic 2022-03-30 15:28:11.000000000 +0000
+++ /boot/config-5.15.0-33-generic 2022-05-19 14:04:01.000000000 +0000
@@ -1,23 +1,22 @@
 #
 # Automatically generated file; DO NOT EDIT.
-# Linux/x86 5.15.0-25-generic Kernel Configuration
+# Linux/x86 5.15.0-33-generic Kernel Configuration
 #
-CONFIG_CC_VERSION_TEXT="gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0"
+CONFIG_CC_VERSION_TEXT="gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0"
 CONFIG_CC_IS_GCC=y
-CONFIG_GCC_VERSION=110200
+CONFIG_GCC_VERSION=90400
 CONFIG_CLANG_VERSION=0
 CONFIG_AS_IS_GNU=y
-CONFIG_AS_VERSION=23800
+CONFIG_AS_VERSION=23400
 CONFIG_LD_IS_BFD=y
-CONFIG_LD_VERSION=23800
+CONFIG_LD_VERSION=23400
 CONFIG_LLD_VERSION=0
 CONFIG_CC_CAN_LINK=y
 CONFIG_CC_CAN_LINK_STATIC=y
 CONFIG_CC_HAS_ASM_GOTO=y
-CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y
 CONFIG_CC_HAS_ASM_INLINE=y
 CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y
-CONFIG_PAHOLE_VERSION=122
+CONFIG_PAHOLE_VERSION=121
 CONFIG_IRQ_WORK=y
 CONFIG_BUILDTIME_TABLE_SORT=y
 CONFIG_THREAD_INFO_IN_TASK=y
@@ -47,7 +46,7 @@
 CONFIG_KERNEL_ZSTD=y
 CONFIG_DEFAULT_INIT=""
 CONFIG_DEFAULT_HOSTNAME="(none)"
-CONFIG_VERSION_SIGNATURE="Ubuntu 5.15.0-25.25-generic 5.15.30"
+CONFIG_VERSION_SIGNATURE="Ubuntu 5.15.0-33.34~20.04.1-generic 5.15.30"
 CONFIG_SWAP=y
 CONFIG_SYSVIPC=y
 CONFIG_SYSVIPC_SYSCTL=y
@@ -508,6 +507,7 @@
 # CONFIG_LEGACY_VSYSCALL_NONE is not set
 # CONFIG_CMDLINE_BOOL is not set
 CONFIG_MODIFY_LDT_SYSCALL=y
+# CONFIG_STRICT_SIGALTSTACK_SIZE is not set
 CONFIG_HAVE_LIVEPATCH=y
 CONFIG_LIVEPATCH=y
 # end of Processor type and features
@@ -841,6 +841,7 @@
 CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
 CONFIG_ARCH_HAS_ELFCORE_COMPAT=y
 CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH=y
+CONFIG_DYNAMIC_SIGFRAME=y

 #
 # GCOV-based kernel profiling
@@ -10282,7 +10283,25 @@
 CONFIG_EROFS_FS_SECURITY=y
 CONFIG_EROFS_FS_ZIP=y
 CONFIG_VBOXSF_FS=m
-# CONFIG_AUFS_FS is not set
+CONFIG_AUFS_FS=m
+CONFIG_AUFS_BRANCH_MAX_127=y
+# CONFIG_AUFS_BRANCH_MAX_511 is not set
+# CONFIG_AUFS_BRANCH_MAX_1023 is not set
+# CONFIG_AUFS_BRANCH_MAX_32767 is not set
+CONFIG_AUFS_SBILIST=y
+# CONFIG_AUFS_HNOTIFY is not set
+CONFIG_AUFS_EXPORT=y
+CONFIG_AUFS_INO_T_64=y
+CONFIG_AUFS_XATTR=y
+# CONFIG_AUFS_FHSM is not set
+# CONFIG_AUFS_RDU is not set
+CONFIG_AUFS_DIRREN=y
+# CONFIG_AUFS_SHWH is not set
+# CONFIG_AUFS_BR_RAMFS is not set
+# CONFIG_AUFS_BR_FUSE is not set
+CONFIG_AUFS_BR_HFSPLUS=y
+CONFIG_AUFS_BDEV_LOOP=y
+# CONFIG_AUFS_DEBUG is not set
 CONFIG_NETWORK_FILESYSTEMS=y
 CONFIG_NFS_FS=m
 CONFIG_NFS_V2=m
@@ -10428,7 +10447,7 @@
 CONFIG_ENCRYPTED_KEYS=y
 CONFIG_KEY_DH_OPERATIONS=y
 CONFIG_KEY_NOTIFICATIONS=y
-CO...

Read more...

Revision history for this message
mloza1 (mloza1) wrote (last edit ):

I installed 5.15.30-051530-generic from https://kernel.ubuntu.com/~kernel-ppa/mainline/ and everything works

To recap:

linux-image-5.15.0-25-generic - everything works
linux-image-5.15.0-33-generic - VM hangs
linux-image-5.15.0-41-generic - VM hangs
5.15.30-051530-generic - everything works

Do you think it's something to do with one of the enabled options in the kernel config for HWE 5.15.0-33-generic and up ?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.