Ubuntu
libvirt package

cpu features hle and rtm disabled for security are present in /usr/share/libvirt/cpu_map.xml

Bionic (18.04)
Bug #1853200

Bug #1853200 reported by David Coronel on 2019-11-19

This bug affects 5 people

	Status	Importance	Assigned to
libvirt (Ubuntu)	Fix Released	High	Christian Ehrhardt 
Bionic	Confirmed	Undecided	Ubuntu Security Team
Eoan	Won't Fix	Undecided	Ubuntu Security Team
qemu (Ubuntu)	Fix Released	Undecided	Unassigned
Bionic	Confirmed	Undecided	Ubuntu Security Team
Eoan	Won't Fix	Undecided	Ubuntu Security Team

Bug Description

When trying to launch an instance in OpenStack Queens on Ubuntu 18.04 with the new kernels, this error happens:

Error: Failed to perform requested operation on instance "david", the instance has an error status: Please try again later [Error: Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance bf8dc8b8-37dd-43fa-ace0-90fe18c1d63b. Last exception: the CPU is incompatible with host CPU: Host CPU does not provide required features: hle, rtm].

This seems to be caused by the new kernels disabling the tsx cpu feature as per https://wiki.ubuntu.com/SecurityTeam/KnowledgeBase/TAA_MCEPSC_i915

Disabling tsx also disables hle and rtm, and /usr/share/libvirt/cpu_map.xml has hle and rtm configured for many cpu models:

ubuntu@cloud3:~$ grep -e "model name" -e hle -e rtm -e tsx
[...]
    <model name='Haswell'>
      <feature name='hle'/>
      <feature name='rtm'/>
    <model name='Haswell-IBRS'>
      <feature name='hle'/>
      <feature name='rtm'/>
[...]
    <model name='Broadwell'>
      <feature name='hle'/>
      <feature name='rtm'/>
    <model name='Broadwell-IBRS'>
      <feature name='hle'/>
      <feature name='rtm'/>
    <model name='Skylake-Client'>
      <feature name='hle'/>
      <feature name='rtm'/>
    <model name='Skylake-Client-IBRS'>
      <feature name='hle'/>
      <feature name='rtm'/>
    <model name='Skylake-Server'>
      <feature name='hle'/>
      <feature name='rtm'/>
    <model name='Skylake-Server-IBRS'>
      <feature name='hle'/>
      <feature name='rtm'/>
[...]

This only happens when configuring cpu_mode and cpu_model in /etc/nova/nova.conf:

[libvirt]
cpu_mode = custom
cpu_model = Skylake-Server-IBRS

In my case, this was done by setting the cpu-mode and cpu-model nova-compute charm options.

[Additional info]

I see this issue with the following kernel and libvirt versions:

Linux cloud3 4.15.0-70-generic #79-Ubuntu SMP Tue Nov 12 10:36:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

ubuntu@cloud3:~$ dpkg -l | grep -e libvirt -e nova
ii libvirt-clients 4.0.0-1ubuntu8.13 amd64 Programs for the libvirt library
ii libvirt-daemon 4.0.0-1ubuntu8.13 amd64 Virtualization daemon
ii libvirt-daemon-driver-storage-rbd 4.0.0-1ubuntu8.13 amd64 Virtualization daemon RBD storage driver
ii libvirt-daemon-system 4.0.0-1ubuntu8.13 amd64 Libvirt daemon configuration files
ii libvirt0:amd64 4.0.0-1ubuntu8.13 amd64 library for interfacing with different virtualization systems
ii nova-common 2:17.0.11-0ubuntu1 all OpenStack Compute - common files
ii nova-compute 2:17.0.11-0ubuntu1 all OpenStack Compute - compute node base
ii nova-compute-kvm 2:17.0.11-0ubuntu1 all OpenStack Compute - compute node (KVM)
ii nova-compute-libvirt 2:17.0.11-0ubuntu1 all OpenStack Compute - compute node libvirt support
ii python-libvirt 4.0.0-1 amd64 libvirt Python bindings
ii python-nova 2:17.0.11-0ubuntu1 all OpenStack Compute Python libraries
ii python-novaclient 2:9.1.1-0ubuntu1 all client library for OpenStack Compute API - Python 2.7

ubuntu@cloud3:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.3 LTS
Release: 18.04
Codename: bionic

[Workaround]

A workaround is to remove the cpu_mode and cpu_model lines in the libvirt section of /etc/nova/nova.conf.

This can be done with juju like this:

juju config nova-compute-kvm --reset cpu-model
juju config nova-compute-kvm --reset cpu-mode

Apparently another workaround would be to re-enable the tsx cpu feature on the host with tsx=yes on the boot command line, but I have not tested that workaround.

See original description

Related branches

~paelzer/ubuntu/+source/libvirt:lp-1867460-fix-domcapabilities-focal

Merged into ubuntu/+source/libvirt:ubuntu/focal-devel at revision 6702d5fcd45475e8f5e0a5acb5fec2916b99b964

Rafael David Tinoco (community): Approve on 2020-03-25

Canonical Server: Pending requested 2020-03-20

Canonical Server packageset reviewers: Pending requested 2020-03-20

Diff: 5704 lines (+5514/-0)

31 files modified

debian/changelog (+13/-0)
debian/patches/series (+29/-0)
debian/patches/stable/lp-1868539-bhyve-command-remove-unused-includes.patch (+41/-0)
debian/patches/stable/lp-1868539-daemon-set-default-memlock-limit-for-systemd-service.patch (+94/-0)
debian/patches/stable/lp-1868539-m4-libxl-properly-fail-when-libxl-is-required.patch (+47/-0)
debian/patches/stable/lp-1868539-qemu-Don-t-compare-local-and-remote-hostnames-on-mig.patch (+62/-0)
debian/patches/stable/lp-1868539-qemu-Stop-domain-on-failed-restore.patch (+104/-0)
debian/patches/stable/lp-1868539-qemu-Use-g_autoptr-for-qemuDomainSaveCookie.patch (+140/-0)
debian/patches/stable/lp-1868539-qemu-do-not-revert-to-NULL-bandwidth.patch (+45/-0)
debian/patches/stable/lp-1868539-qemu-preserve-error-on-bandwidth-rollback.patch (+59/-0)
debian/patches/stable/lp-1868539-qemu-save-restore-original-error-when-recovering-fro.patch (+60/-0)
debian/patches/stable/lp-1868539-qemu-use-correct-backendType-when-checking-memfd-cap.patch (+46/-0)
debian/patches/stable/lp-1868539-qemuDomainGetStatsIOThread-Don-t-leak-array-with-0-i.patch (+49/-0)
debian/patches/stable/lp-1868539-qemuDomainSaveImageStartVM-Use-VIR_AUTOCLOSE-for-int.patch (+50/-0)
debian/patches/stable/lp-1868539-qemuDomainSaveImageStartVM-Use-g_autoptr-for-virComm.patch (+40/-0)
debian/patches/stable/lp-1868539-qemuTestParseCapabilitiesArch-Free-binary.patch (+52/-0)
debian/patches/stable/lp-1868539-security-Try-harder-to-run-transactions.patch (+97/-0)
debian/patches/stable/lp-1868539-tests-fix-double-unlock-of-monitor-in-hotplug-test.patch (+64/-0)
debian/patches/stable/lp-1868539-testutils-check-return-value-of-g_setenv.patch (+39/-0)
debian/patches/stable/lp-1868539-testutilsxen-error-out-on-initialization-failure.patch (+42/-0)
debian/patches/stable/lp-1868539-virDomainFSDefFree-Unref-private-data.patch (+52/-0)
debian/patches/stable/lp-1868539-virsystemdtest-do-not-leak-socket-path.patch (+55/-0)
debian/patches/stable/lp-1868539-vz-Fix-return-value-in-error-path.patch (+49/-0)
debian/patches/ubuntu/lp-1853200-cpu_map-Add-decode-element-to-x86-CPU-model-definiti.patch (+741/-0)
debian/patches/ubuntu/lp-1853200-cpu_map-Add-more-noTSX-x86-CPU-models.patch (+695/-0)
debian/patches/ubuntu/lp-1853200-cpu_map-Don-t-use-new-noTSX-models-for-host-model-CP.patch (+129/-0)
debian/patches/ubuntu/lp-1853200-cpu_x86-Honor-CPU-models-decode-element.patch (+59/-0)
debian/patches/ubuntu/lp-1853200-cputest-Add-data-for-Intel-R-Core-TM-i7-8550U-CPU-wi.patch (+2022/-0)
debian/patches/ubuntu/lp-1867460-qemu-fixing-auto-detecting-binary-in-domain-capabili.patch (+115/-0)
debian/patches/ubuntu/lp-1867460-qemu_capabilities-Rework-domain-caps-cache.patch (+325/-0)
debian/patches/ubuntu/lp-1868528-util-virhostcpu-Fail-when-fetching-CPU-Stats-for-inv.patch (+99/-0)

CVE References

2019-11135

David Coronel (davecore) on 2019-11-19

description:

updated

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-11-20 (last edit on 2023-08-14):

Download full text (4.0 KiB)

Hi Dave,
IIRC Openstack either tries to determine the least common denominator (in cpu features) or whatever you pass to hi, in your case that was:
[libvirt]
cpu_mode = custom
cpu_model = Skylake-Server-IBRS

And your guest definition won't change after the initial definition. Even if you would run host-model instead of a named type it would (in the past) have determined the hle and rtm features and now can't start with them.

But Skylake-Server-IBRS is a name for a defined set of feature and it would be a bug to change "Skylake-Server-IBRS" to now contain other features.

As you have spotted yourself people could set tsx=yes on the commandline or for whatever probably non-smart reason run with a kernel without the fixes.

Therefore changing the existing "Skylake-Server-IBRS" is a no-go as an SRU, lets consider other options.

---

Upstream did create these new custom names with the -IBRS suffix when the first security issues hit. But as you know there were many issues following that one like L1TF, MDS, ....
Upstream realized quickly that this would be a massive type proliferation that grows even further every now and then.

Also these types back then got defined in qemu not libvirt, you can see them with
$ qemu-system-x86_64 -cpu ?
Libvirt only tracks names and features of those in /usr/share/libvirt/cpu_map*.

---

Interestingly for all the dangers and drawbacks of host-passthrough, in these cases those setups would not care as they would just pass less features. But modelling the features in libvirt or openstack made them explicit and it is now correctly telling us that it can't provide those.

---

Back when the first set of spectre mitigations hit Daniel made a great post summarizing how configuring models&features works including modifying the named models to yoour needs.
=> https://www.berrange.com/posts/2018/06/29/cpu-model-configuration-for-qemu-kvm-on-x86-hosts/
An example for your "Skylake-Server-IBRS" would now be in libvirt like:
   <cpu mode='custom'>
       <model>Skylake-Server-IBRS</model>
       <feature name="hle" policy="disable"/>
       <feature name="rtm" policy="disable"/>
   </cpu>

Therefore from libvirt's perspective there isn't much we can/should do IMHO, I'll double check if upstream on qemu/libvirt considered otherwise and again defined new types or other quirks. But looking at L1TF, MDS and such I'm expecting that using individual features is what is expected.

---

Lets summarize the options we have right now:

a) You can define your own types for libvirt in /usr/share/libvirt/cpu_map, that seems tempting at first, but
  a1) you'd still need to change the type in every guest, so you gained nothing
  a2) those are not meant to be edited, e.g. they are no conffiles and will
      be overwritten on upgrades of libvirt0
b) Define a new type in qemu and then libvirt as the -IBRS types
  b1) as I said recent security fixes didn't do this anymore, I don't expect this to be different
  b2) this needs to be in sync with others (upstream and distros) or proliferation
      and confusion gets even worse
c) Start to define your guests based on feature and not (only) on names
  c1) that is what most recent security fixes ...

But Skylake-Server-IBRS is a name for a defined set of feature and it would be a bug to change "Skylake-Server-IBRS" to now contain other features.

As you have spotted yourself people could set tsx=yes on the commandline or for whatever probably non-smart reason run with a kernel without the fixes.

Therefore changing the existing "Skylake-Server-IBRS" is a no-go as an SRU, lets consider other options.

---

Also these types back then got defined in qemu not libvirt, you can see them with
  $ qemu-system-x86_64 -cpu ?
Libvirt only tracks names and features of those in /usr/share/libvirt/cpu_map*.

---

Lets summarize the options we have right now:

a-c) and yes for any of the above you'll need to
 1. Touch the guest definition before you can start it again after booting into the fixed kernel
    Probably worth to prep that before the reboot to reduce downtime
 2. That the features are gone will be a  guest visible change with low but potential impact of 
    former things now failing (as any feature takeaway)

@security - the USN probably should get a not abut such prep to be done prior to rebooting into the new kernel - opinions? Assigning to you to get this answer while I look if a week after TSX/TAA upstream decided to this time add types - after all taking away old features is a first in all of this

Changed in libvirt (Ubuntu):
assignee:	nobody → Ubuntu Security Team (ubuntu-security)
importance:	Undecided → High
status:	New → Confirmed

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-11-20:

I found no changes since the CRD nor a discussion on the ML of qemu or libvirt.
There are the backported fixes that add taa-no and pschange-mc-no to qemu - but nothing touches hle/rtm yet.

The general answer of of the virt stack avoiding type proliferation are versioned CPU models.
=> https://git.qemu.org/?p=qemu.git;a=commit;h=aa5b969287125d1924d74648b378d4abba544465
=> https://git.qemu.org/?p=qemu.git;a=commit;h=d86a708815c3bec0b934760e6bdab7eb647087b8

But that is a feature that will be in qemu >4.1 and newer libvirt, so look towards Ubuntu 20.04 and UCA derived from that for this feature.
It seems way too big and invasive for an SRU.
But even if you had that, you'd either have
a) unversioned type without consistency (might change on update, which isn't what you want either)
b) versioned types which stay static behave exactly as what we have now

@security - where there discussions about how to handle these feature losses in the "closed circle" around the TSX/TAA bug?

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-11-20:

@UCA-Team
I'm also subscribing the UCA Team as I personally only know about [1] which allows you to use named CPU models.

I expect there already is some general pattern established to handle e.g. the older MDS which if you look at "Configuration as a Hypervisor" at [2] also needs such <feature...> entries to be configured through libvirt.
Eventually dropping hle/rtm is no different than adding md-clear back then.

@UCATeam - How would in a OS/Charm world one add/disable individual cpu features?
Could you outline how this is handled today so we might consider adapting the same for hle/rtm?

[1]: https://wiki.openstack.org/wiki/LibvirtXMLCPUModel
[2]: https://wiki.ubuntu.com/SecurityTeam/KnowledgeBase/MDS

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2019-11-20:

There's a related fix for enabling CPU feature flags that landed in upstream nova and charm-nova-compute via LP: #1750829 as a result of Meltdown.

There's mention in the upstream fix [1] that a future patch will allow disabling of CPU feature flag but I'm not sure if that has landed. I'll dig some more to see.

[1] https://review.opendev.org/#/c/534384/

Fyi the config for the nova-compute charm that corresonds to this change is cpu-model-extra-flags, defined as a space delimited list of specific CPU flags for libvirt/

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2019-11-20:

I chatted with Kashyap in #openstack-nova and he has a blueprint up for disabling CPU feature flags at: https://blueprints.launchpad.net/nova/+spec/allow-disabling-cpu-flags

Kashyap mentioned an option for now: One (very valid) 'workaround' is that have QEMU add new "named CPU models" to remove the said flags. For that, upstream QEMU folks must add them...please file a QEMU "RFE" bug on launchpad for it.

@Cpaelzer, thoughts on that ^ ?

Revision history for this message

Steve Beattie (sbeattie) wrote on 2019-11-20:

@Cpaelzer there were alas no discussions pre-disclosure beyond adding the basic taa-no and pschange-mc-no flag support to qemu.

We can definitely add text to the USN about this; let's get it in the KnowledgeBase article first since that's easier to modify.

Revision history for this message

Kashyap Chamarthy (kashyapc) wrote on 2019-11-20:

A small addendum to what Corey said: Upstream QEMU will mostly providing new named CPU models with 'hle' and 'rtm' CPU flags turned off.

Keep an eye on the upstream 'qemu-devel' list :-)

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2019-11-20:

Patches are in progress for new CPU models with TSX disabled: https://lists.gnu.org/archive/html/qemu-devel/2019-11/msg03323.html (Thanks Cpaelzer and Kashyap)

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-11-22:

This was now accepted upstream in qemu.
[1] is the merge commit containing the new names discussed here 02fa60d1 / 9ab2237f but also tsx-ctrl which seems to be part of the overall tsx handlign and not yet part of the patches of last week 2a9758c5.

@security - are you gonna take a look at picking those up?

P.S. I haven't seen similar updates to libvirt CPU maps yet.

[1]: https://git.qemu.org/?p=qemu.git;a=commit;h=2061735ff09f9d5e67c501a96227b470e7de69b1

Revision history for this message

David Coronel (davecore) wrote on 2019-11-22:

#10

Subscribed ~field-medium

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-11-22:

#11

FYI: I got message from sbeattie that security will take a look at the backports for these qemu fixes.

For libvirt we can check again then if the types got updated, and if no one did I can send something there as well.

Revision history for this message

Nobuto Murata (nobuto) wrote on 2019-12-08:

#12

I'm not sure I'm following the discussion here, but I see hle and rtm flags with the latest security update for bionic GA kernel. So looks like the issue is not reproducible any longer. Am I missing something?

$ dpkg -l | grep linux-image
ii linux-image-4.15.0-72-generic 4.15.0-72.81 amd64 Signed kernel image generic
ii linux-image-generic 4.15.0.72.74 amd64 Generic Linux kernel image

$ uname -a
Linux <hostname> 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
...
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
Stepping: 4
...
Virtualization: VT-x
...
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_pkg_req pku ospke md_clear flush_l1d

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-12-09:

#13

You said this might have been resolved differently anyway with the newest kernel having again hle/rtm enabled - I haven't heard about it but that would probably be even better.
Lets see on the kernel side.
- Fixes for CVE-2019-11135 got added in 4.15.0-69.78
- This was reported against 4.15.0-70
- Wondering about 4.15.0-72 being ok again

Reading the latest state of Documentation/admin-guide/hw-vuln/tsx_async_abort.rst shows:
https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html#mitigation-control-on-the-kernel-command-line

Maybe the initial take was tsx=off which would switch off those flags. But now is any of the tsx=on but with full mitigations? But I'm guessing at this point.
I have not found a clear kernel change since then (not until 4.15.0-73.82, but even less so between .70 and .72) that would change these.
The only related "- x86/speculation/taa: Fix printing of TAA_MSG_SMT on IBRS_ALL CPUs" seems to only affect print output, but not change behavior.

Furthermore none of the systems I have has got hle/rtm back since then.

@Nobuto - has your system any of the above kernel parameters set manually?

I haven't heard from this by sbeattie or others after my last update.
Lets ping security to be sure this hasn't been forgotten.
(I have done that on IRC as well)
@Security - any updates on this from your side?

212   tsx=on     tsx_async_abort=full         The system will use VERW to clear CPU
213                                           buffers. Cross-thread attacks are still
214                                           possible on SMT machines.
215   tsx=on     tsx_async_abort=full,nosmt   As above, cross-thread attacks on SMT
216                                           mitigated.
217   tsx=on     tsx_async_abort=off          The system is vulnerable.
218   tsx=off    tsx_async_abort=full         TSX might be disabled if microcode
219                                           provides a TSX control MSR. If so,
220                                           system is not vulnerable.
221   tsx=off    tsx_async_abort=full,nosmt   Ditto
222   tsx=off    tsx_async_abort=off          ditto

Maybe the initial take was tsx=off which would switch off those flags. But now is any of the tsx=on but with full mitigations? But I'm guessing at this point.
I have not found a clear kernel change since then (not until 4.15.0-73.82, but even less so  between .70 and .72) that would change these.
The only related "- x86/speculation/taa: Fix printing of TAA_MSG_SMT on IBRS_ALL CPUs" seems to only affect print output, but not change behavior.

Furthermore none of the systems I have has got hle/rtm back since then.

@Nobuto - has your system any of the above kernel parameters set manually?

Revision history for this message

Nobuto Murata (nobuto) wrote on 2019-12-09:

#14

> @Nobuto - has your system any of the above kernel parameters set manually?

No, we don't have any flags in kernel parameters related to tsx or similar.

FWIW, I haven't tested any older kernel to check if those flags are available. But we are using Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-12-09:

#15

@Nobuto:
Interesting what is your:
grep . /sys/devices/system/cpu/vulnerabilities/*

Revision history for this message

Nobuto Murata (nobuto) wrote on 2019-12-09:

#16

Here you are:

$ grep . /sys/devices/system/cpu/vulnerabilities/*
/sys/devices/system/cpu/vulnerabilities/itlb_multihit:KVM: Mitigation: Split huge pages
/sys/devices/system/cpu/vulnerabilities/l1tf:Mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable
/sys/devices/system/cpu/vulnerabilities/mds:Mitigation: Clear CPU buffers; SMT vulnerable
/sys/devices/system/cpu/vulnerabilities/meltdown:Mitigation: PTI
/sys/devices/system/cpu/vulnerabilities/spec_store_bypass:Mitigation: Speculative Store Bypass disabled via prctl and seccomp
/sys/devices/system/cpu/vulnerabilities/spectre_v1:Mitigation: usercopy/swapgs barriers and __user pointer sanitization
/sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: conditional, RSB filling
/sys/devices/system/cpu/vulnerabilities/tsx_async_abort:Mitigation: Clear CPU buffers; SMT vulnerable

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-12-09:

#17

Oh so you seem to have a combination of HW/FW/Kernel that gets along with it. But for e.g. a Hosting environment please be aware of the "SMT vulnerable" unless you don't have SMT disabled anyway.
Since different systems (HW/FW/Kernel) will behave differently I think this issue isn't resolved yet @Nobuto.

@Security please let us know what your plan is on that - backporting the named types with the features removed to qemu/libvirt or did this change overall.

Revision history for this message

Janåke Rönnblom (jan-ake) wrote on 2019-12-09:

#18

Hi,

There is also the microcode package where Intel has published and updated these flags. So this in combination with the kernel might cause these errors.

-J

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2020-01-14:

#19

FYI: I pinged the security Team again on this one to be sure it doesn't fall through the cracks

Revision history for this message

Marc Deslauriers (mdeslaur) wrote on 2020-02-11:

#20

These look like the list of commits to support recent kernel/microcode feature updates:

qemu:
https://git.qemu.org/?p=qemu.git;a=commit;h=7fac38635e1cc5ebae34eb6530da1009bd5808e4 (taa)
https://git.qemu.org/?p=qemu.git;a=commit;h=0723cc8a5558c94388db75ae1f4991314914edd3 (vmx)
https://git.qemu.org/?p=qemu.git;a=commit;h=2a9758c51e2c2d13fc3845c3d603c11df98b8823 (tsx)
https://git.qemu.org/?p=qemu.git;a=commit;h=9ab2237f1979f31de228b2a73b56925dbde938d1 (tsx)
https://git.qemu.org/?p=qemu.git;a=commit;h=02fa60d10137ed2ef17534718d7467e0d2170142 (tsx)
https://git.qemu.org/?p=qemu.git;a=commit;h=c6f3215ffa3064fd04e00d2b159c3b90c3c9b1a5 (vmx)

libvirt:
https://libvirt.org/git/?p=libvirt.git;a=commit;h=07aaced4e6ea6db8b27f44636f51cafa6f1847a8 (taa)
https://libvirt.org/git/?p=libvirt.git;a=commit;h=f411b7ef68221e82dec0129aaf2f2a26a8987504 (tsx)

Revision history for this message

Marc Deslauriers (mdeslaur) wrote on 2020-02-12:

#21

I don't know what the way forward is to resolve this issue. While upstream qemu has added some new CPU models, "Skylake-Client-noTSX-IBRS", "Skylake-Server-noTSX-IBRS", etc, libvirt has not. If I do add these to libvirt, we will need to carry them forward as a delta to upstream possibly forever.

Even adding those new CPU models is just a workaround that makes manually changing the CPU model easier to do, but does not present a solution for this issue. There is no automatic way to fix this issue that wouldn't cause migration failures.

Christian Ehrhardt  (paelzer) on 2020-02-26

Changed in libvirt (Ubuntu):
status:	Confirmed → Won't Fix

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2020-03-04:

#22

FYI: bug 1861643 might be a new symptom of the same root cause. If that is confirmed we might bump this for re-considering our options ... again (at least going forward we might want to add Skylake-Client-noTSX-IBRS and such to qemu in Focal).

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2020-03-04:

#23

.. or adding the types to libvirt, whatever is needed to at least limit the impact of dropping hle/rtm in 20.04 ...

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2020-03-04:

#24

Upstream agrees, we need to add new types to libvirt.
The Discussion is still ongoing but I'm prepping a submission of those types ...

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2020-03-04:

#25

List of types to consider adding by comparing qemu qmp output of cpu probing with cpu_maps of libvirt:

-Cascadelake-Server-noTSX
-Icelake-Client-noTSX
-Icelake-Server-noTSX
-Skylake-Server-noTSX-IBRS
-Skylake-Client-noTSX-IBRS

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2020-03-06:

#26

Submitted to upstream libvirt as:
=> https://www.redhat.com/archives/libvir-list/2020-March/msg00175.html

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2020-03-10:

#27

FYI: discussion goes on, v2 now submitted to the thread on the ML

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2020-03-11:

#28

Now reply yet on v2:
https://www.redhat.com/archives/libvir-list/2020-March/msg00296.html

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2020-03-13:

#29

FYI I pinged on the list as I see that "auto-detect other cpu types" not linked to the submission I made. But OTOH we can't go ahead integrating this in Ubuntu ahead of time while not accepted or we would risk to diverge making things even worse ...

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2020-03-13:

#30

diverge ... details of names CPU types I mean, breaking cross system and cross release behavior

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2020-03-23:

#31

Updated task state to reflect my work with upstream to get libvirt to know about this change.

Changed in libvirt (Ubuntu):
assignee:	Ubuntu Security Team (ubuntu-security) → Christian Ehrhardt  (paelzer)
status:	Won't Fix → Triaged

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2020-03-23:

#32

PPA: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3986
MP: https://code.launchpad.net/~paelzer/ubuntu/+source/libvirt/+git/libvirt/+merge/380942

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2020-03-23:

#33

Tests with the PPA build using the patches sent by Jiri (and my addition of the noTSX types):

#1 virsh capabilities
Before:
Broadwell-noTSX-IBRS + 33 features

After:
Skylake-Client-noTSX-IBRS +24 features

=> good

#2 virsh domcapabilities
Before:
Skylake-Client-IBRS + 16 features

After:
Skylake-Client-IBRS + 16 features (unchanged as intended for compatibility)

#3 usable models
Before:
only older types

After:
now added "Skylake-Client-noTSX-IBRS" which is a more modern IBRS type than the others I had

My system isn't new enough to get the others added, but that is fine as a test.
Also the type "Skylake-Server-noTSX-IBRS" worked, auto-disabling the avx features my chip is missing.

I started a guest with such a type through libvirt and it looks as expected:
-cpu Skylake-Client-noTSX-IBRS

#4 and finally the adapted tests still ran fien at build time.

Changed in libvirt (Ubuntu):
status:	Triaged → Fix Committed

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2020-03-25:

#34

FYI Regression test is fine on this build, but waiting a bit to give it a chance to be upstream committed

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2020-03-25:

#35

Since tests are good I asked if they are ready to be committed and got:
[13:31] <jdenemar> cpaelzer_: I'll commit it later today

With that I can later on replace the preliminary patches with the final ones before an upload.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2020-03-26:

#36

Upstream committed as:
dd17a4eba8 cpu_map: Add more -noTSX x86 CPU models
f4914045c2 cpu_map: Add <decode> element to x86 CPU model definitions
7cd896ef31 cpu_x86: Honor CPU models' <decode> element
17cdefe5f1 cpu_map: Don't use new noTSX models for host-model CPUs

I have replaced my patches with the final ones which comes down to replacing the link to the upstream commit from an ML entry to a commit.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2020-03-26:

#37

This bug was fixed in the package libvirt - 6.0.0-0ubuntu6

---------------
libvirt (6.0.0-0ubuntu6) focal; urgency=medium

  * d/p/ubuntu/lp-1867460-*: fix domcapabilities before capabilities
    and binary autodetection in general (LP: #1867460)
  * d/p/stable/lp-1868539-*: stabilize libvirt by backporting upstream
    fixes (LP: #1868539)
  * d/p/ubuntu/lp-1853200*: add cpu models without hle/rtm features to have
    modern types on kernels with recent security fixes (LP: #1853200)
  * d/p/ubuntu/lp-1868528-*: Fail when fetching CPU Status for invalid CPU
    (LP: #1868528)

-- Christian Ehrhardt <email address hidden> Fri, 20 Mar 2020 10:34:19 +0100

Changed in libvirt (Ubuntu):
status:	Fix Committed → Fix Released

Revision history for this message

Janåke Rönnblom (jan-ake) wrote on 2020-03-27:

#38

Will these changes go into 18.04.x also?

-J

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2020-03-27:

#39

A while back Marc checked the case and realize that backporting the qemu changes without anything in libvirt would make no sense - comment #21.
Now the things in libvirt exist, which is a step forward.

This still will be no transparent solution, people will have to switch types if they can't run the old types. And it appears that if one is willing to change the cpu-model, he could also add feature-disable hle/rtm with the same effort.

The special case is for situations in which people can select cpu-models but not define custom features - for those backporting these would help to mitigate the impact of the CVE related TSX/TAA kernel changes that started all of this.

I'd suggest to let this mature in focal for a few days, see if people or tests run into issues. And then ask Marc to re-evaluate again.
I'll add qemu/libvirt backport tasks and assign them to Ubuntu security - so that they can comment on what the think (now with the libvirt changes existing).

Changed in qemu (Ubuntu):
status:	New → Fix Released
Changed in libvirt (Ubuntu Bionic):
assignee:	nobody → Ubuntu Security Team (ubuntu-security)
Changed in libvirt (Ubuntu Eoan):
assignee:	nobody → Ubuntu Security Team (ubuntu-security)
Changed in qemu (Ubuntu Bionic):
assignee:	nobody → Ubuntu Security Team (ubuntu-security)
Changed in qemu (Ubuntu Eoan):
assignee:	nobody → Ubuntu Security Team (ubuntu-security)

Revision history for this message

panticz.de (panticz.de) wrote on 2020-05-26:

#40

A quick workaround for those who need hle and rtm CPU flags back is to set the tsx=on kernel boot parameter:

# /etc/default/grub
...
GRUB_CMDLINE_LINUX_DEFAULT="... tsx=on"
...

update-grub

Tested on Ubuntu 20.04 with kernel 5.4.0-31-generic:
# uname -a
Linux com1-dev 5.4.0-31-generic #35-Ubuntu SMP Thu May 7 20:20:34 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

# lscpu | grep Flags
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm arat pln pts pku ospke avx512_vnni md_clear flush_l1d arch_capabilities

Revision history for this message

Brian Murray (brian-murray) wrote on 2020-08-18:

#41

The Eoan Ermine has reached end of life, so this bug will not be fixed for that release

Changed in libvirt (Ubuntu Eoan):
status:	New → Won't Fix
Changed in qemu (Ubuntu Eoan):
status:	New → Won't Fix

Revision history for this message

Launchpad Janitor (janitor) wrote on 2020-10-08:

#43

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in libvirt (Ubuntu Bionic):
status:	New → Confirmed
Changed in qemu (Ubuntu Bionic):
status:	New → Confirmed

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntulibvirt package

cpu features hle and rtm disabled for security are present in /usr/share/libvirt/cpu_map.xml

Bug Description

Related branches

CVE References

Other bug subscribers

Remote bug watches

Ubuntu
libvirt package