Comment 1 for bug 1853200

Revision history for this message
Christian Ehrhardt  (paelzer) wrote (last edit ):

Hi Dave,
IIRC Openstack either tries to determine the least common denominator (in cpu features) or whatever you pass to hi, in your case that was:
[libvirt]
cpu_mode = custom
cpu_model = Skylake-Server-IBRS

And your guest definition won't change after the initial definition. Even if you would run host-model instead of a named type it would (in the past) have determined the hle and rtm features and now can't start with them.

But Skylake-Server-IBRS is a name for a defined set of feature and it would be a bug to change "Skylake-Server-IBRS" to now contain other features.

As you have spotted yourself people could set tsx=yes on the commandline or for whatever probably non-smart reason run with a kernel without the fixes.

Therefore changing the existing "Skylake-Server-IBRS" is a no-go as an SRU, lets consider other options.

---

Upstream did create these new custom names with the -IBRS suffix when the first security issues hit. But as you know there were many issues following that one like L1TF, MDS, ....
Upstream realized quickly that this would be a massive type proliferation that grows even further every now and then.

Also these types back then got defined in qemu not libvirt, you can see them with
  $ qemu-system-x86_64 -cpu ?
Libvirt only tracks names and features of those in /usr/share/libvirt/cpu_map*.

---

Interestingly for all the dangers and drawbacks of host-passthrough, in these cases those setups would not care as they would just pass less features. But modelling the features in libvirt or openstack made them explicit and it is now correctly telling us that it can't provide those.

---

Back when the first set of spectre mitigations hit Daniel made a great post summarizing how configuring models&features works including modifying the named models to yoour needs.
=> https://www.berrange.com/posts/2018/06/29/cpu-model-configuration-for-qemu-kvm-on-x86-hosts/
An example for your "Skylake-Server-IBRS" would now be in libvirt like:
   <cpu mode='custom'>
       <model>Skylake-Server-IBRS</model>
       <feature name="hle" policy="disable"/>
       <feature name="rtm" policy="disable"/>
   </cpu>

Therefore from libvirt's perspective there isn't much we can/should do IMHO, I'll double check if upstream on qemu/libvirt considered otherwise and again defined new types or other quirks. But looking at L1TF, MDS and such I'm expecting that using individual features is what is expected.

---

Lets summarize the options we have right now:

a) You can define your own types for libvirt in /usr/share/libvirt/cpu_map, that seems tempting at first, but
  a1) you'd still need to change the type in every guest, so you gained nothing
  a2) those are not meant to be edited, e.g. they are no conffiles and will
      be overwritten on upgrades of libvirt0
b) Define a new type in qemu and then libvirt as the -IBRS types
  b1) as I said recent security fixes didn't do this anymore, I don't expect this to be different
  b2) this needs to be in sync with others (upstream and distros) or proliferation
      and confusion gets even worse
c) Start to define your guests based on feature and not (only) on names
  c1) that is what most recent security fixes used
  c2) that mechanism will work for any known features without even changing libvirt
  c3) new feature flags are usually tried to be backported, but often are not required but
      further optimizations

a-c) and yes for any of the above you'll need to
 1. Touch the guest definition before you can start it again after booting into the fixed kernel
    Probably worth to prep that before the reboot to reduce downtime
 2. That the features are gone will be a guest visible change with low but potential impact of
    former things now failing (as any feature takeaway)

@security - the USN probably should get a not abut such prep to be done prior to rebooting into the new kernel - opinions? Assigning to you to get this answer while I look if a week after TSX/TAA upstream decided to this time add types - after all taking away old features is a first in all of this