2024-01-08 10:22:29 |
Jan Horstmann |
bug |
|
|
added bug |
2024-02-26 01:32:08 |
William Grant |
bug |
|
|
added subscriber William Grant |
2024-04-02 13:23:40 |
Mauricio Faria de Oliveira |
bug |
|
|
added subscriber Mauricio Faria de Oliveira |
2024-04-02 17:54:15 |
Junien F |
bug |
|
|
added subscriber The Canonical Sysadmins |
2024-04-02 21:25:18 |
Launchpad Janitor |
qemu (Ubuntu): status |
New |
Confirmed |
|
2024-04-10 15:27:17 |
Sergio Durigan Junior |
qemu (Ubuntu): assignee |
|
Sergio Durigan Junior (sergiodj) |
|
2024-04-10 15:27:23 |
Sergio Durigan Junior |
tags |
|
server-todo |
|
2024-04-10 15:27:29 |
Sergio Durigan Junior |
bug |
|
|
added subscriber Ubuntu Server |
2024-04-22 13:59:42 |
Sven Kieske |
bug |
|
|
added subscriber Sven Kieske |
2024-04-22 18:20:15 |
Mauricio Faria de Oliveira |
qemu (Ubuntu): status |
Confirmed |
Incomplete |
|
2024-04-24 11:30:50 |
Mauricio Faria de Oliveira |
bug task added |
|
nova (Ubuntu) |
|
2024-04-24 11:30:59 |
Mauricio Faria de Oliveira |
qemu (Ubuntu): status |
Incomplete |
Invalid |
|
2024-04-24 11:31:09 |
Mauricio Faria de Oliveira |
nova (Ubuntu): status |
New |
Triaged |
|
2024-04-24 11:31:14 |
Mauricio Faria de Oliveira |
nova (Ubuntu): importance |
Undecided |
Medium |
|
2024-04-24 11:31:17 |
Mauricio Faria de Oliveira |
nova (Ubuntu): assignee |
|
Mauricio Faria de Oliveira (mfo) |
|
2024-04-24 11:31:20 |
Mauricio Faria de Oliveira |
qemu (Ubuntu): assignee |
Sergio Durigan Junior (sergiodj) |
|
|
2024-05-08 15:13:12 |
Bryce Harrington |
tags |
server-todo |
|
|
2024-07-12 09:46:58 |
Mauricio Faria de Oliveira |
nominated for series |
|
Ubuntu Focal |
|
2024-07-12 09:46:58 |
Mauricio Faria de Oliveira |
bug task added |
|
qemu (Ubuntu Focal) |
|
2024-07-12 09:46:58 |
Mauricio Faria de Oliveira |
bug task added |
|
nova (Ubuntu Focal) |
|
2024-07-12 09:47:23 |
Mauricio Faria de Oliveira |
qemu (Ubuntu Focal): status |
New |
Invalid |
|
2024-07-12 09:47:27 |
Mauricio Faria de Oliveira |
nova (Ubuntu Focal): status |
New |
Triaged |
|
2024-07-12 09:47:30 |
Mauricio Faria de Oliveira |
nova (Ubuntu Focal): importance |
Undecided |
Medium |
|
2024-07-12 09:47:34 |
Mauricio Faria de Oliveira |
nova (Ubuntu Focal): assignee |
|
Mauricio Faria de Oliveira (mfo) |
|
2024-07-12 09:47:38 |
Mauricio Faria de Oliveira |
nova (Ubuntu): status |
Triaged |
Fix Released |
|
2024-07-12 09:47:43 |
Mauricio Faria de Oliveira |
nova (Ubuntu): importance |
Medium |
Undecided |
|
2024-07-12 09:47:46 |
Mauricio Faria de Oliveira |
nova (Ubuntu): assignee |
Mauricio Faria de Oliveira (mfo) |
|
|
2024-07-12 13:26:20 |
Mauricio Faria de Oliveira |
description |
The linux kernel upstream disabled XSAVES on AMD EPYC Rome CPUs ([1]). Upstream qemu shortly followed with a patch adding a CPU model version of EPYC-Rome without XSAVES ([2])
The change in the kernel has been backported to ubuntu focal ([3]).
Without further workarounds or the adapted CPU model in qemu this will lead to a situation were virtual machines with an EPYC-Rome CPU model created on hypervisors with newer EPYC CPUs will have the XSAVES flag enabled, thus preventing live migration to hypervisors with EPYC Rome CPUs were XSAVES is no longer available.
Therefore I would like to argue that the patch adapting the CPU model in qemu should also be backported to ubuntu focal.
[1]
https://lore.kernel.org/all/20230307174643.1240184-1-andrew.cooper3@citrix.com/
[2]
https://patchew.org/QEMU/20230524213748.8918-1-davydov-max@yandex-team.ru/
[3]
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2023420 |
[ Impact ]
* Live migration is increasingly being impacted by changes to CPU flags
(e.g., 'xsaves' disabled on AMD EPYC; PKRU/'xsave' behavior changes),
which prevents migration on otherwise identical hypervisors, but the
only difference is a CPU flag (i.e., source hypervisor still has flag
enabled; destination hypervisor had flag disabled on a kernel update).
* These CPU flags updates require changes to CPU model definitions in
several places (qemu, libvirt, and nova if openstack is being used),
which is a lot of overhead for each subtle variation that may appear.
* Fortunately, it's possible to reduce the changes required by allowing
nova to customize CPU flags to enable/disable _on top_ of a CPU model
definition (e.g., the same AMD EPYC CPU model with 'xsaves' disabled).
* This change is present in Jammy and later, and is backward compatible
with the existing config files, as the (new) enable/disable operators
are an optional prefix to existing flags (e.g., '-xsaves' or '+xsaves').
[ Test Plan ]
* Deploy Openstack with 2 hypervisors (or more), and configure nova.conf
with a cpu_model and cpu_extra_flags to disable/enable, for example:
# grep cpu_model /etc/nova/nova.conf
cpu_model = EPYC-Rome
cpu_model_extra_flags = -xsaves
* Start a VM before/after the package upgrade (focal-proposed), checking
the VM XML for that flag (e.g., policy change from require to disable);
for example:
Before:
# virsh dumpxml instance-<number> | grep xsaves
<feature policy='require' name='xsaves'/>
After:
# virsh dumpxml instance-<number> | grep xsaves
<feature policy='disable' name='xsaves'/>
* Ensure that nova is able to start *with* and *without* enable/disable
cpu flag changes.
* Ensure live migration works on both ways across the 2 hypervisors
*with* and *without* enable/disable cpu flag changes.
[ Regression Potential ]
* Regressions would likely manifest in the areas modified by the patches,
i.e., parsing the config file's cpu flags (on nova startup), generating
a VM's XML file (on nova VM start/creation), and also live migration.
* The patched packages have been evaluated/running in production for 2-3
months now, and live migration have been performed, without any issues.
[ Other Info ]
* The code changes had their callee-paths reviewed, and potential issues
were not identified.
* The patches are already present in Jammy and later.
[ Original Bug Description ]
The linux kernel upstream disabled XSAVES on AMD EPYC Rome CPUs ([1]). Upstream qemu shortly followed with a patch adding a CPU model version of EPYC-Rome without XSAVES ([2])
The change in the kernel has been backported to ubuntu focal ([3]).
Without further workarounds or the adapted CPU model in qemu this will lead to a situation were virtual machines with an EPYC-Rome CPU model created on hypervisors with newer EPYC CPUs will have the XSAVES flag enabled, thus preventing live migration to hypervisors with EPYC Rome CPUs were XSAVES is no longer available.
Therefore I would like to argue that the patch adapting the CPU model in qemu should also be backported to ubuntu focal.
[1]
https://lore.kernel.org/all/20230307174643.1240184-1-andrew.cooper3@citrix.com/
[2]
https://patchew.org/QEMU/20230524213748.8918-1-davydov-max@yandex-team.ru/
[3]
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2023420 |
|
2024-07-12 13:29:51 |
Mauricio Faria de Oliveira |
nova (Ubuntu Focal): status |
Triaged |
In Progress |
|