Comment 0 for bug 2106791

Revision history for this message
DUFOUR Olivier (odufourc) wrote : Support for Emerald Rapids is missing, related to TSX

[Environment]
Tested platform and environments :
* Ubuntu Jammy 22.04 LTS : 8.0.0-1ubuntu7.10 and 10.0.0-2ubuntu8.5~cloud0 (Caracal UCA) with HWE kernel (6.8)
* Ubuntu Noble 24.04 LTS : 10.0.0-2ubuntu8.6
* Ubuntu Oricular 24.10 : 10.6.0-1ubuntu3.2

Hardware :
* HPE DL360 with Intel Xeon Gold 6542Y

[Issue]
CPU is being recognised as Broadwell, thus missing either Skylake, Cascadelake, Icelake, SapphireRapids features.

[Impact]
It impacts deployments for any customers using Openstack with Nova and using any recent Intel CPU like Icelake, Sapphire Rapids, Emerald Rapids, Granite Rapids and will prevent the user from using any instruction from anything more recent than Broadwell CPUs.

[Root cause]
1. For SapphireRapids profile :
 a. on 8.0.0-1ubuntu7.10 --> It doesn't match x86_SapphireRapids-noTSX.xml because of the missing feature "taa-no" (caused by TSX being off in the kernel, like "hle" and "rtm" features)
 b. from 10.0.0-2ubuntu8.6, newer and even from upstream --> there isn't any "-noTSX.xml" profile variant available.

   If copying the profile from x86_SapphireRapids.xml with hle, rtm and taa-no, the CPU gets recognised without issue by libvirt.

 c. TSX is actually disabled by default on Ubuntu kernels, by enabling with "tsx=on" or "tsx=auto" in the kernel boot command, it allowed libvirt to recognise hle, rtm and taa-no features.

2. For Skylake, Cascadelake, Icelake :
 The cpu cannot be recognised because of the missing feature "mpx". Upstream has actually a fix for this : https://gitlab.com/libvirt/libvirt/-/commit/fa5459517848f333743c771e90eb01faeced3dae
 Theorically, it should impact IceLake CPUs as well on Jammy 22.04, Noble 24.04 LTS, meaning we have no current Ubuntu LTS to recognise Icelake, Sapphire Rapids and Emerald Rapids correctly.
 CPU recognition as Icelake-Server-noTSX from both Icelake and SapphireRapids will work only with libvirt's version from Oricular 24.10.

[Potential improvements]
1. Backport fix from upstream with the removed "mpx" feature to current LTS to allow proper support for recent Intel CPUs
 --> it would at least fix the support for Icelake and allow newer CPUs like Sapphire Rapids to no end up using a CPU feature set as old as Broadwell
 --> this is confirmed to work with Ubuntu 24.10 but is not a LTS release and viaable for production/customers environments.

2. Decide what to do with TSX
 a. it is currently disabled by default on Ubuntu's kernels and not even set to auto
 It can be checked quickly by looking at the config of Ubuntu kernel like below :
  $ grep TSX /boot/config-6.8.0-57-generic
    CONFIG_X86_INTEL_TSX_MODE_OFF=y
    # CONFIG_X86_INTEL_TSX_MODE_ON is not set
    # CONFIG_X86_INTEL_TSX_MODE_AUTO is not set

 b. Ubuntu libvirt's packages from Noble 10.0.0-2ubuntu8.6, newer and even upstream don't include any noTSX profile for Rapids CPUs
 --> meaning that even if we retrieve the current cpu_maps from upstream, the Sapphire/Emerald/Granite Rapids CPUs will never be recognised properly by libvirt as of now and in the future in Ubuntu

3. Add dedicated noTSX profiles for Sapphire Rapids and newer on Ubuntu packages and upstream
 a. if noTSX profiles are created for Sapphire Rapids and newer, we should make sure the feature "taa-no" is removed as well since it will not be recognised with tsx=off in Ubuntu's kernels