5.15.0-76: Qemu live migration regression due to PKRU leakage

Bug #2025987 reported by Martin Friedrich
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned
qemu (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

I am trying to live migrate Qemu VMs between Amd Zen3 to Zen2 servers.

What we expect:
VMs continue to run after live-migrate

What happened:
VMs instantly get stuck after migration with 100% CPU usage.

All Nodes are running Ubuntu 22.04.
We have tested Kernel releases back to 5.15.0-70.

Nodes running Ubuntu 5.19.0-43.44~22.04.1-generic 5.19.17 :
Zen2 -> Zen3 migration: OK
Zen3 -> Zen2 migration: OK

Nodes running Ubuntu 5.15.0-76.83-generic 5.15.99 :
Zen2 -> Zen3 migration: OK
Zen3 -> Zen2 migration: NOT OK

Mixed Kernel versions:
Zen2 5.15.0-76.83-generic -> Zen3 5.19.0-43.44~22.04.1-generic -> OK
Zen3 5.19.0-43.44~22.04.1-generic -> Zen2 5.15.0-76.83-generic -> OK

Zen2 5.19.0-43.44~22.04.1-generic -> Zen3 5.15.0-76.83-generic -> OK
Zen3 5.15.0-76.83-generic -> Zen2 5.19.0-43.44~22.04.1-generic -> NOT OK

We've tested it with fresh started and with live-migrated VMs, getting same results.

Its probably related to
KVM: SVM: fix tsc scaling cache logic
https://linux.googlesource.com/linux/kernel/git/torvalds/linux/+/11d39e8cc43e1c6737af19ca9372e590061b5ad2

Tags: kernel-bug
Revision history for this message
Martin Friedrich (npanic) wrote :
summary: - 5.15.0-76: Qemu live migration only works with hwe kernel
+ 5.15.0-76: Qemu live migration causes VMs to crash but works with HWE
+ kernel
Revision history for this message
Martin Friedrich (npanic) wrote : Re: 5.15.0-76: Qemu live migration causes VMs to crash but works with HWE kernel
description: updated
description: updated
Revision history for this message
Paride Legovini (paride) wrote :

Marking the Qemu task as Incomplete, given that the problem goes away by using a different kernel, and there's nothing specific pointing to a Qemu bug.

Changed in qemu (Ubuntu):
status: New → Incomplete
description: updated
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 2025987

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Markus Schade (lp-markusschade) wrote : Re: 5.15.0-76: Qemu live migration causes VMs to crash but works with HWE kernel

It looks like this bug is related to leaking of the PKRU bit.
Applying the patch from Proxmox mentioned in
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036675
resolves the issue

summary: - 5.15.0-76: Qemu live migration causes VMs to crash but works with HWE
- kernel
+ 5.15.0-76: Qemu live migration regression due to PKRU leakage
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for qemu (Ubuntu) because there has been no activity for 60 days.]

Changed in qemu (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.