Activity log for bug #2048517

Date Who What changed Old value New value Message
2024-01-08 10:22:29 Jan Horstmann bug added bug
2024-02-26 01:32:08 William Grant bug added subscriber William Grant
2024-04-02 13:23:40 Mauricio Faria de Oliveira bug added subscriber Mauricio Faria de Oliveira
2024-04-02 17:54:15 Junien F bug added subscriber The Canonical Sysadmins
2024-04-02 21:25:18 Launchpad Janitor qemu (Ubuntu): status New Confirmed
2024-04-10 15:27:17 Sergio Durigan Junior qemu (Ubuntu): assignee Sergio Durigan Junior (sergiodj)
2024-04-10 15:27:23 Sergio Durigan Junior tags server-todo
2024-04-10 15:27:29 Sergio Durigan Junior bug added subscriber Ubuntu Server
2024-04-22 13:59:42 Sven Kieske bug added subscriber Sven Kieske
2024-04-22 18:20:15 Mauricio Faria de Oliveira qemu (Ubuntu): status Confirmed Incomplete
2024-04-24 11:30:50 Mauricio Faria de Oliveira bug task added nova (Ubuntu)
2024-04-24 11:30:59 Mauricio Faria de Oliveira qemu (Ubuntu): status Incomplete Invalid
2024-04-24 11:31:09 Mauricio Faria de Oliveira nova (Ubuntu): status New Triaged
2024-04-24 11:31:14 Mauricio Faria de Oliveira nova (Ubuntu): importance Undecided Medium
2024-04-24 11:31:17 Mauricio Faria de Oliveira nova (Ubuntu): assignee Mauricio Faria de Oliveira (mfo)
2024-04-24 11:31:20 Mauricio Faria de Oliveira qemu (Ubuntu): assignee Sergio Durigan Junior (sergiodj)
2024-05-08 15:13:12 Bryce Harrington tags server-todo
2024-07-12 09:46:58 Mauricio Faria de Oliveira nominated for series Ubuntu Focal
2024-07-12 09:46:58 Mauricio Faria de Oliveira bug task added qemu (Ubuntu Focal)
2024-07-12 09:46:58 Mauricio Faria de Oliveira bug task added nova (Ubuntu Focal)
2024-07-12 09:47:23 Mauricio Faria de Oliveira qemu (Ubuntu Focal): status New Invalid
2024-07-12 09:47:27 Mauricio Faria de Oliveira nova (Ubuntu Focal): status New Triaged
2024-07-12 09:47:30 Mauricio Faria de Oliveira nova (Ubuntu Focal): importance Undecided Medium
2024-07-12 09:47:34 Mauricio Faria de Oliveira nova (Ubuntu Focal): assignee Mauricio Faria de Oliveira (mfo)
2024-07-12 09:47:38 Mauricio Faria de Oliveira nova (Ubuntu): status Triaged Fix Released
2024-07-12 09:47:43 Mauricio Faria de Oliveira nova (Ubuntu): importance Medium Undecided
2024-07-12 09:47:46 Mauricio Faria de Oliveira nova (Ubuntu): assignee Mauricio Faria de Oliveira (mfo)
2024-07-12 13:26:20 Mauricio Faria de Oliveira description The linux kernel upstream disabled XSAVES on AMD EPYC Rome CPUs ([1]). Upstream qemu shortly followed with a patch adding a CPU model version of EPYC-Rome without XSAVES ([2]) The change in the kernel has been backported to ubuntu focal ([3]). Without further workarounds or the adapted CPU model in qemu this will lead to a situation were virtual machines with an EPYC-Rome CPU model created on hypervisors with newer EPYC CPUs will have the XSAVES flag enabled, thus preventing live migration to hypervisors with EPYC Rome CPUs were XSAVES is no longer available. Therefore I would like to argue that the patch adapting the CPU model in qemu should also be backported to ubuntu focal. [1] https://lore.kernel.org/all/20230307174643.1240184-1-andrew.cooper3@citrix.com/ [2] https://patchew.org/QEMU/20230524213748.8918-1-davydov-max@yandex-team.ru/ [3] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2023420 [ Impact ] * Live migration is increasingly being impacted by changes to CPU flags (e.g., 'xsaves' disabled on AMD EPYC; PKRU/'xsave' behavior changes), which prevents migration on otherwise identical hypervisors, but the only difference is a CPU flag (i.e., source hypervisor still has flag enabled; destination hypervisor had flag disabled on a kernel update). * These CPU flags updates require changes to CPU model definitions in several places (qemu, libvirt, and nova if openstack is being used), which is a lot of overhead for each subtle variation that may appear. * Fortunately, it's possible to reduce the changes required by allowing nova to customize CPU flags to enable/disable _on top_ of a CPU model definition (e.g., the same AMD EPYC CPU model with 'xsaves' disabled). * This change is present in Jammy and later, and is backward compatible with the existing config files, as the (new) enable/disable operators are an optional prefix to existing flags (e.g., '-xsaves' or '+xsaves'). [ Test Plan ] * Deploy Openstack with 2 hypervisors (or more), and configure nova.conf with a cpu_model and cpu_extra_flags to disable/enable, for example: # grep cpu_model /etc/nova/nova.conf cpu_model = EPYC-Rome cpu_model_extra_flags = -xsaves * Start a VM before/after the package upgrade (focal-proposed), checking the VM XML for that flag (e.g., policy change from require to disable); for example: Before: # virsh dumpxml instance-<number> | grep xsaves <feature policy='require' name='xsaves'/> After: # virsh dumpxml instance-<number> | grep xsaves <feature policy='disable' name='xsaves'/> * Ensure that nova is able to start *with* and *without* enable/disable cpu flag changes. * Ensure live migration works on both ways across the 2 hypervisors *with* and *without* enable/disable cpu flag changes. [ Regression Potential ] * Regressions would likely manifest in the areas modified by the patches, i.e., parsing the config file's cpu flags (on nova startup), generating a VM's XML file (on nova VM start/creation), and also live migration. * The patched packages have been evaluated/running in production for 2-3 months now, and live migration have been performed, without any issues. [ Other Info ] * The code changes had their callee-paths reviewed, and potential issues were not identified. * The patches are already present in Jammy and later. [ Original Bug Description ] The linux kernel upstream disabled XSAVES on AMD EPYC Rome CPUs ([1]). Upstream qemu shortly followed with a patch adding a CPU model version of EPYC-Rome without XSAVES ([2]) The change in the kernel has been backported to ubuntu focal ([3]). Without further workarounds or the adapted CPU model in qemu this will lead to a situation were virtual machines with an EPYC-Rome CPU model created on hypervisors with newer EPYC CPUs will have the XSAVES flag enabled, thus preventing live migration to hypervisors with EPYC Rome CPUs were XSAVES is no longer available. Therefore I would like to argue that the patch adapting the CPU model in qemu should also be backported to ubuntu focal. [1] https://lore.kernel.org/all/20230307174643.1240184-1-andrew.cooper3@citrix.com/ [2] https://patchew.org/QEMU/20230524213748.8918-1-davydov-max@yandex-team.ru/ [3] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2023420
2024-07-12 13:29:51 Mauricio Faria de Oliveira nova (Ubuntu Focal): status Triaged In Progress