[FFe/SRU] Add/Backport EPYC-v3 and EPYC-Rome CPU model

Bug #1887490 reported by Markus Schade on 2020-07-14
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Undecided
Unassigned
Focal
Undecided
Unassigned
linux (Ubuntu)
Undecided
Unassigned
Focal
Medium
Unassigned
qemu (Ubuntu)
Undecided
Unassigned
Focal
High
Unassigned

Bug Description

[Impact]

 * CPU definitions are added to libvirt as these CPUs are known
   and added to qemu for execution.
   And due to that over time some are considered missing in
   former releases.

 * To really benefit from the new features of these chips
   they have to be known, therefore new type additions done by
   upstream should be backported if they generally apply and do
   not depend on SRU-critical changes.

 * This backports three upstream fixes that just add definitions
   (no control flow changes)

[Test Case]

 * Check if it has an EPYC-Rome entry in
   /usr/share/libvirt/cpu_map/index.xml and the file included
   there exists.

 * Define a guest like:
   <cpu mode='custom' match='exact' check='partial'>
     <model fallback='forbid'>EPYC-Rome</model>
   </cpu>
   You can only "really" start this on a system with the
   matching HW. But even on others it will change from:
     error: internal error: Unknown CPU model EPYC-Rome
   to being unable to start for some features missing.

 * libvirt probes a system if a named cpu can be used, after the
   fix this should include EPYC-Rome
   $ virsh domcapabilities | grep EPYC
      <model usable='no'>EPYC-IBPB</model>
      <model usable='no'>EPYC</model>

[Regression Potential]

 * Usually these type additions are safe unless they add control flow
   changes (e.g. to handle yet unknown types of registers or such) but
   that isn't the case here.
   A regression if any is to be expected on systems that are close to the
   newly added type(s). Those will after the update be detected as such
   if e.g. host-model is used. If then running on a mixed cluster of
   updated/non-updated systems migrations will only work if the target is
   updated as well.

[Other Info]

 * This is the first build since glibc 2.32 arrived in groovy, hence we
   need to be careful of the fix done for bug 1892826.
   It has to be checked if the linking is fine after the rebuild.
   The workload still works in groovy despite 2.32 being present (I'd ahve
   expected it doesn't), so we will keep the revert as-is for now.
   To be sure that adds two tests that shall be done:
   - check the linking to point to libtirpc instead of glibc
     $ eu-readelf -a /usr/lib/libvirt/libvirt_lxc | grep xdr_uint64 | grep GLOBAL
     Was pointing to glibc, does it still and if so does it work (see
     below)?
   - run the autopkgtest cases as the LXC tests would trigger an issue if
     there is one

----

## Qemu SRU ##

[Impact]

 * CPU definitions are added to qemu as these CPUs are known.
   And due to that over time are missing in former releases.

 * To really benefit from the new features of these chips
   they have to be known, therefore new type additions done by
   upstream should be backported if they generally apply and do
   not depend on SRU-critical changes.

 * This backports two upstream fixes that just add definitions (no
   control flow changes)

[Test Case]

 * Probe qemu for the known CPU types (works on all HW)
   $ qemu-system-x86_64 -cpu ? | grep EPYC
   Focal without fix:
   x86 EPYC (alias configured by machine type)
   x86 EPYC-IBPB (alias of EPYC-v2)
   x86 EPYC-v1 AMD EPYC Processor
   x86 EPYC-v2 AMD EPYC Processor (with IBPB)
   Focal with fix also adds:
   x86 EPYC-Rome (alias configured by machine type)
   x86 EPYC-Rome-v1 AMD EPYC-Rome Processor
   x86 EPYC-v3 AMD EPYC Processor

 * Given such HW is available start a KVM guest using those new types
   Since we don't have libvirt support (yet) do so directly in qemu
   commandline like (bootloader is enough)
   $ qemu-system-x86_64 -cpu EPYC-Rome -machine pc-q35-focal,accel=kvm -nographic
   $ qemu-system-x86_64 -cpu EPYC-v3 -machine pc-q35-focal,accel=kvm -nographic

[Regression Potential]

 * This adds new CPU types to the list of known CPUs defining their name
   and features. Generally the changes are contained to those new types
   and only active when selected - and usually only selectable on such new
   machines. Therefore not a lot should change for other users.
   One thing thou, if a user selected an unversioned type (which in this
   case only can be "EPYC") by default it will pick the latest subversion
   that applies. In this case the behavior will change and pick EPYC-v3
   after the fix. But this is the whole purpose of versioned (stay as is)
   and unversioned (move with updates) CPU types - so that should be ok.
   The EPYC-Rome type didn't exist in Focal before, so it can't "change"
   for users.

[Other Info]

 * Depends on the new kernel 5.4.0-49 or later (Currently in
   focal-proposed)

---

Qemu in focal has already support for most (except amd-stibp) flags of this model.

Please backport the following patches:

https://github.com/qemu/qemu/commit/a16e8dbc043720abcb37fc7dca313e720b4e0f0c

https://github.com/qemu/qemu/commit/143c30d4d346831a09e59e9af45afdca0331e819

Related branches

CVE References

description: updated
Changed in qemu (Ubuntu Focal):
status: New → Triaged
importance: Undecided → Medium

Hello Markus,

Thanks for the report. I'm subscribing @ubuntu-virt here so @paelzer can take a look at this. He is the current maintainer for qemu

$ git describe --tags a16e8dbc043720abcb37fc7dca313e720b4e0f0c
v4.2.0-2476-ga16e8dbc04

$ git describe --tags 143c30d4d346831a09e59e9af45afdca0331e819
v4.2.0-2477-g143c30d4d3

$ git tag --contains a16e8dbc043720abcb37fc7dca313e720b4e0f0c | sort -n
v5.0.0
v5.0.0-rc0

Looks like those were added in v5.0.0-rc0 and that we could add EPYC extra CPU features and the EPYC 2nd gen CPU model to Focal indeed.

Marcus I'm afraid this will take around a week or more for feedback as @paelzer is out and there are some current on-going fixes landing qemu this week.

Thank you!

Changed in qemu (Ubuntu):
status: New → Triaged
status: Triaged → Fix Released
Changed in qemu (Ubuntu Focal):
importance: Medium → High
Markus Schade (lp-markusschade) wrote :

Hi Rafael,

thanks for the feedback. I would consider the patches to be non-invasive, but important enough to justify an SRU. There are a similar set of patches to add the missing bits to libvirt as well, except for the EPYC-Rome model. I'll try to submit a patch upstream to add this model.

# Qemu

We have done similar in the past for new CPU Models - and the changes apply cleanly to the qemu 4.2 in Focal.

Preliminary I have opened a PPA with the fix
  https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4161
And a branch containing the (trivial) backport
  https://code.launchpad.net/~paelzer/ubuntu/+source/qemu/+git/qemu/+ref/bug-1887490-EPYC-v3

That PPA can already be used for preliminary testing - feedback is welcome.

But there are a few extras to consider here which I'll post in following comments:

# Libvirt

Depending on the case sometimes libvirt also needs changes - Focal already has /usr/share/libvirt/cpu_map/x86_EPYC.xml which contains the base version and v3 will plug in there. But for "Epic-Rome" there isn't anything upstream in libvirt.

I think this will therefore be non-selectable through libvirt - it will still help qemu if called directly, but if you want this we might consider poking libvirt upstream together later on (after testing if it really is missing to us int he backports).

Changed in libvirt (Ubuntu):
status: New → Incomplete
Changed in libvirt (Ubuntu Focal):
status: New → Incomplete
Changed in linux (Ubuntu Focal):
status: New → Confirmed
Changed in linux (Ubuntu):
status: New → Confirmed

# Kernel

It lists a bunch of depending kernel changes
    40bc47b08b6e ("kvm: x86: Enumerate support for CLZERO instruction")
    504ce1954fba ("KVM: x86: Expose XSAVEERPTR to the guest")
    6d61e3c32248 ("kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID")
    52297436199d ("kvm: svm: Update svm_xsaves_supported")

504ce1954fba is odd as it is actually committed as 41cd02c6f7f6e66e7abf02a4379e355a7db89f78.
All but 52297436199d are in Focal already.

But that means to be fully working we need to ask the Kernel team to consider this into the next Kernel they build for Focal.

@Kernel team - what do you tihnk about adding the following patch to the focal kernel?
=> https://github.com/torvalds/linux/commit/52297436199d

On Qemu I'm waiting on:
a) Kernel Teams statement on backporting the commit mentioned
b) some testing (by the reporter if possible) of the referred PPA builds of Qemu.

Setting to incomplete until we have that.

Changed in qemu (Ubuntu Focal):
status: Triaged → Incomplete
Markus Schade (lp-markusschade) wrote :

Plain qemu with the EPYC-Rome model works without any problems. I already get xsave, xsaveopt, xsavec and xsaveerptr with the current focal kernel, which also means that gcc finally uses znver1 (for v3) and znver2 (for Rome) optimizations.

Libvirt would require cherrypicking a number of commits to enable support for all CPU flags currently supported by focal qemu and for the additional AMD flags from the mentioned patched.
I currently run a private build in production with the following patches added:

https://github.com/libvirt/libvirt/commits/master/src/cpu_map:

- cpu_map: Distinguish Cascadelake-Server from Skylake-Server (unrelated, but recommended)
- cpu_map: Add pschange-mc-no bit in IA32_ARCH_CAPABILITIES MSR
- cpu_map: Request test files update when adding x86 features
- cpu_map: Add missing x86 features in 0x7 CPUID leaf
- cpu_map: Add missing x86 features in 0x80000008 CPUID leaf

Plus the attached patch which defines an EPYC-Rome type (without any tests), so not quite ready for upstream

Markus Schade (lp-markusschade) wrote :

Is there anything I can do, regarding the backport of 52297436199d into focal kernel?

Thanks for Reminding Markus, it might have missed their triage completely somehow.
I'll ping a few people directly about it and hopefully they will insert it into the kernel queue and state here on the bug about it.

Stefan Bader (smb) wrote :

If I understand the comments correctly this should be irrellevant for groovy/devel using a v5.8 base.

Changed in linux (Ubuntu Focal):
importance: Undecided → Medium
Changed in linux (Ubuntu):
status: Confirmed → Invalid
Changed in linux (Ubuntu Focal):
status: Confirmed → Fix Committed

Correct Stefan, this is for Focal

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
tags: added: verification-done-focal
removed: verification-needed-focal

With the 5.4.0-49 from focal-proposed the xsaves flag can now be passed into instances.

Thanks for the test of the new kernel Markus!

I have respun the qemu build (the old one was outdated by a massive stable update - too bad this wasn't part of 4.2.1 already) with the patches we had before applied on top.

Further I have reviewed the suggested libvirt changes and added a build of that.
There is another AMD SVM change which makes sense in the same context and a few more.
All those are in libvirt >=6.5 and therefore already in Groovy.
But in the past it turned out useful for users to get all those changes as long as they are easy to backport and not conflicting (e.g. once we had a rework there). I agree with the "Distinguish" patch and have seen that in the wild.

The drawback for this kind of patches usually was that migrations from new-to-old might fail specifying features unknown to the other peer, but that is ok as it can be safely considered required to update the target before migration as well (reverse migrations always work best-effort but never are guaranteed).

That would overall be this list for libvirt:
5d6059f8 cpu_map: Distinguish Cascadelake-Server from Skylake-Server
  59558518 cpu_map: Introduce ARM cpu models
12eb0c94 cpu_map: Add pschange-mc-no bit in IA32_ARCH_CAPABILITIES MSR
  3944f685 cpu_map: Add Cooperlake x86 CPU model
  1c425857 cpu_map: Distribute x86_Cooperlake.xml
df69263c cpu_map: Request test files update when adding x86 features
6ea3bb19 cpu_map: Add missing x86 features in 0x7 CPUID leaf
892b7c70 cpu_map: Add missing x86 features in 0x80000008 CPUID leaf
  96a39aad cpu_map: Add missing AMD SVM features

The PPA [1] still is the same and now contains:
- qemu 1:4.2-3ubuntu6.7~ppa1
- libvirt 6.0.0-0ubuntu8.5~ppa1

@Markus:
- If you could give those builds a try on your system, that would be helpful to see if we can/should go on this way?
- Further if you would not mind upstreaming your x86_EPYC-Rome.xml change (CC me please) that
  would be great? Adding types matching what qemu already added should be fine and it would be
  great to have that added upstream before continuing (we also need this in groovy before a Focal
  SRU). Or as an alternative if from your tests you think the libvirt build as provided in the
  PPA is enough (without the Rome name but with the features) as-is then we can go with that.

[1]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4161
[2]: branch for libvirt "bug-1887490-new-cpu-handling"

Hmm, I spoke too soon ... :-/
Some of the self tests failed due to the changes - I'll fix them up and ping again once the builds complete

Thanks for the update. It looks like your libvirt build fails.
Most likely due to the following patch missing:

https://github.com/libvirt/libvirt/commit/58691208e2063285d981a620873d48ddf8df8be5

That patch adds the CooperLake test data (wrongly as CascadeLake-Server), which is later fixed in

https://github.com/libvirt/libvirt/commit/3944f6855b9d4df73754bb6e5c8023d77399879b

that you have partially applied.

I would howver not add the CooperLake patches because focal qemu does not have the required patches (yet).

I agree Markus.
Not only does the Focal qemu miss the related changes, I also dived deeper through the "what exactly happened" to be sure.

The added patches I picked make it more obvious (test breaks) that we would need cpu model "stepping" ranges to work. This would add the need for much more logic change:
a8ec1d746 cpu_x86: Move and rename x86ModelCopySignatures
782be9f0a cpu_x86: Move and rename x86ModelHasSignature
7e0d351fa cpu_x86: Move and rename x86FormatSignatures
372b2cf1c cpu_x86: Introduce virCPUx86SignaturesFree
3b474c1f8 cpu_x86: Introduce virCPUx86SignatureFromCPUID
22bded201 cpu_x86: Replace 32b signatures in virCPUx86Model with a struct
c7a279499 cpu_x86: Add support for stepping part of CPU signature

... Those then in turn depend on various switches to glib g_fee allocations and such.

Unfortunately that isn't only true for my picked "Add-Cooperlake-x86-CPU-model", but also for one of the suggested changes by Markus as "Distinguish Cascadelake-Server from Skylake-Server" adds stepping='5-7'.

Let us keep them out for now (too much risk for an SRU, users strictly depending on this need to speak up to make a strong case or upgrade to newer Ubuntu versions)

---

Also the Arm vendors changes were a nice try, but break other places as not only cpu_map but much more code changes would be needed.
libvirt: CPU Driver error : internal error: Unexpected element 'vendor' in CPU map '/<<PKGBUILDDIR>>/debian/build/../../src/cpu_map/arm_vendors.xml'
I'm not going to pull the source changes to work fine with that.
Ok to postpone that to groovy where those changes are available already.

---

A new upload that worked in a local build was pushed to the PPA

@Markus - did the patch
  5d6059f8ec cpu_map: Distinguish Cascadelake-Server from Skylake-Server
really work for you without the changes to support stepping applies?
Or does your own build have much more applied to make it work?

Actually no. It builds and does not have any downsides, but does not resolve the issue. Sorry for that. When I looked at the history of cpu_map and this patch, I did not see that it would require further patches to actually make it work.

So that would leave it at these four patches:
Add-pschange-mc-no-bit-in-IA32_ARCH_CAPABILITIES-MSR.patch
Request-test-files-update-when-adding-x86-features.patch
Add-missing-x86-features-in-0x7-CPUID-leaf.patch
Add-missing-x86-features-in-0x80000008-CPUID-leaf.patch
Add-missing-AMD-SVM-features.patch

Plus the patch adding the EPYC-Rome model. I'll try to get it upstream, but it will take some time. From my side, it's totally fine from my side to go ahead with the update without it.

My patch queue for qemu also has 2 more patches, reported as LP#1896751 and LP#1896751

Thanks for the extra reports - I'll take a look at these qemu cases and in that time you can try getting the libvirt Rome change upstream.
Once everything is in place we can SRU libvirt&qemu sort of together on this.

Changed in qemu (Ubuntu Focal):
status: Incomplete → In Progress

SRU Template for qemu added and MP linked to fix this in Ubuntu 20.04

description: updated

Hello Markus, or anyone else affected,

Accepted qemu into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:4.2-3ubuntu6.7 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in qemu (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-focal
removed: verification-done-focal

All autopkgtests for the newly accepted qemu (1:4.2-3ubuntu6.7) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

casper/1.445.1 (amd64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#qemu

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Thanks. It does not look like the "regression" preventing proposed migration is caused by the qemu changes.

Download full text (3.4 KiB)

Before:
qemu-system-x86_64 -cpu ? | grep EPYC
x86 EPYC (alias configured by machine type)
x86 EPYC-IBPB (alias of EPYC-v2)
x86 EPYC-v1 AMD EPYC Processor
x86 EPYC-v2 AMD EPYC Processor (with IBPB)

Upgrade:
...
Setting up qemu-system-data (1:4.2-3ubuntu6.7) ...
Setting up libgfortran5:amd64 (10.2.0-5ubuntu1~20.04) ...
Setting up libubsan1:amd64 (10.2.0-5ubuntu1~20.04) ...
Setting up python3.8-minimal (3.8.5-1~20.04) ...
Setting up gdbserver (9.2-0ubuntu1~20.04) ...
Setting up language-selector-common (0.204.2) ...
Setting up build-essential (12.8ubuntu1.1) ...
Setting up libpython3.8-stdlib:amd64 (3.8.5-1~20.04) ...
Setting up python3.8 (3.8.5-1~20.04) ...
Setting up python3-lib2to3 (3.8.5-1~20.04.1) ...
find: ‘/usr/lib/python3.7/lib2to3’: No such file or directory
find: ‘/usr/lib/python3.7/lib2to3’: No such file or directory
find: ‘/usr/lib/python3.7’: No such file or directory
Setting up qemu-block-extra:amd64 (1:4.2-3ubuntu6.7) ...
Setting up libcc1-0:amd64 (10.2.0-5ubuntu1~20.04) ...
Setting up liblsan0:amd64 (10.2.0-5ubuntu1~20.04) ...
Setting up mdadm (4.1-5ubuntu1.2) ...
update-initramfs: deferring update (trigger activated)
Created symlink /etc/systemd/system/mdmonitor.service.wants/mdcheck_continue.timer → /lib/systemd/system/mdcheck_continue.timer.
Setting up libitm1:amd64 (10.2.0-5ubuntu1~20.04) ...
Setting up bolt (0.8-4ubuntu1) ...
Setting up gcc-9-base:amd64 (9.3.0-17ubuntu1~20.04) ...
Setting up libgcc-s1-arm64-cross (10.2.0-5ubuntu1~20.04cross1) ...
Setting up libtsan0:amd64 (10.2.0-5ubuntu1~20.04) ...
Setting up libatomic1-arm64-cross (10.2.0-5ubuntu1~20.04cross1) ...
Setting up python3-distutils (3.8.5-1~20.04.1) ...
find: ‘/usr/lib/python3.7/distutils’: No such file or directory
find: ‘/usr/lib/python3.7/distutils’: No such file or directory
find: ‘/usr/lib/python3.7’: No such file or directory
Setting up libgomp1-armhf-cross (10.2.0-5ubuntu1~20.04cross1) ...
Setting up liblsan0-arm64-cross (10.2.0-5ubuntu1~20.04cross1) ...
Setting up libgomp1-arm64-cross (10.2.0-5ubuntu1~20.04cross1) ...
Setting up qemu-system-common (1:4.2-3ubuntu6.7) ...
Setting up cpp-9-arm-linux-gnueabihf (9.3.0-17ubuntu1~20.04cross2) ...
Setting up libgcc-s1-armhf-cross (10.2.0-5ubuntu1~20.04cross1) ...
Setting up qemu-system-mips (1:4.2-3ubuntu6.7) ...
Setting up libpython3.8-dbg:amd64 (3.8.5-1~20.04) ...
Setting up libtsan0-arm64-cross (10.2.0-5ubuntu1~20.04cross1) ...
Setting up qemu-system-x86 (1:4.2-3ubuntu6.7) ...
...

After:
qemu-system-x86_64 -cpu ? | grep EPYC
x86 EPYC (alias configured by machine type)
x86 EPYC-IBPB (alias of EPYC-v2)
x86 EPYC-Rome (alias configured by machine type)
x86 EPYC-Rome-v1 AMD EPYC-Rome Processor
x86 EPYC-v1 AMD EPYC Processor
x86 EPYC-v2 AMD EPYC Processor (with IBPB)
x86 EPYC-v3 AMD EPYC Processor

Marking as verified.

@Markus - if you could give this a better test...

Read more...

tags: added: verification-done verification-done-focal
removed: verification-needed verification-needed-focal

The new qemu version works as expected together with 5.4.0-49 kernel required for xsaves

# grep -e 'model name' -e 'flags' /proc/cpuinfo
model name : AMD EPYC-Rome Processor
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr wbnoinvd arat npt nrip_save umip rdpid

Patches for libvirt are submitted upstream
https://www.redhat.com/archives/libvir-list/2020-October/msg00008.html

A version fitting for the focal version is attached

Everything else seems to be soon to be released into Focal.

I've seen v1 and v2 of the libvirt upstream series.
Please just ping here once the libvirt changes were accepted so we can revisit the libvirt part of this.
Thanks in advance!

libvirt changes accepted:
https://www.redhat.com/archives/libvir-list/2020-October/msg00493.html
And became
https://gitlab.com/libvirt/libvirt/-/commit/e06590f1708a599286f3ee3690b3dc50ee525d40
https://gitlab.com/libvirt/libvirt/-/commit/f941639f86f4bc66c106eb1291f1b58cf9e24680
https://gitlab.com/libvirt/libvirt/-/commit/736b8637f691242fd688cf726d22f79d0eb300d3

These mostly apply as-is and one with minimal delta.
They are SRUable and therefore not much of an FF problem, but might be stalled a bit.

We need to be a bit extra careful for the rebuild with glibc 2.32 due to bug 1892826.
But I'll test that ahead of the final upload to groovy.

description: updated
summary: - Add/Backport EPYC-v3 and EPYC-Rome CPU model
+ [FFe/SRU] Add/Backport EPYC-v3 and EPYC-Rome CPU model
description: updated

Test build in PPA: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4303

But there are some issues when checking for known models:
Current:
    <mode name='host-passthrough' supported='yes'>
...
    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>Skylake-Client-IBRS</model>
...
    <mode name='custom' supported='yes'>
      <model usable='no'>qemu64</model>
...

New build:
    <mode name='host-passthrough' supported='yes'>
...
    <mode name='host-model' supported='no'/>
    <mode name='custom' supported='no'/>

Something went wrong and broke any host-model or custom named CPU support.
Need to debug what exactly happened.

Furthermore as expected bug 1892826 provides some non-fun (updates about that there).

The issues of bug 1892826 seem resolved.
I also have found what the problem here is, it comes down to:

Oct 08 08:23:50 groovy-testEPYC libvirtd[2176]: XML error: failed to parse xml document '/usr/share/libvirt/cpu_map/x86_EPYC-Rome.xml'

Which is right it isn't installed in my test build.

But we don't explicitly list these files in the packaging:
  debian/libvirt0.install:5:usr/share/libvirt/cpu_map/*

So I'd expect that this works, unless we have missed to install it in the upstream build system.
Maybe since master has already switched to meson in 6.7 and are post:
https://gitlab.com/libvirt/libvirt/-/commit/dfa2f42a046cce85d9c07869bd765dfbcaab2ab9
But we are on 6.6 we need a change to make this file available.
So we need a replacement for (this is no reocket science, in fact it is as it was in the suggested debdiff already):
--- a/src/cpu_map/meson.build
+++ b/src/cpu_map/meson.build
@@ -32,6 +32,7 @@ cpumap_data = [
   'x86_Dhyana.xml',
   'x86_EPYC-IBPB.xml',
   'x86_EPYC.xml',
+ 'x86_EPYC-Rome.xml',
   'x86_features.xml',
   'x86_Haswell-IBRS.xml',
   'x86_Haswell-noTSX-IBRS.xml',

But furthermore we will also need the following cleanup:
https://gitlab.com/libvirt/libvirt/-/commit/3bf6f9fe22dfbd3c1dcc614b31f2f4fe8b71a2f2

Now seeing your comment Markus, yeah I've seen the monitor fixup myself thanks.

/me looks forward to live refreshing launchpad to not appear so ignorant to updates ...

Pre-post upgrade domcapabilities diff now only adds EPYC-Rome as new type (as expected)

As expected on my machine that isn't a usable type, but ok that way:
      <model usable='no'>EPYC-Rome</model>

root@g:~# grep Rome /usr/share/libvirt/cpu_map/index.xml
    <include filename="x86_EPYC-Rome.xml"/>
root@g:~# ll /usr/share/libvirt/cpu_map/x86_EPYC-Rome.xml
-rw-r--r-- 1 root root 2289 Oct 8 05:36 /usr/share/libvirt/cpu_map/x86_EPYC-Rome.xml

Also I completed the SRU Template on the bug description here.
Self-tests started on https://bileto.ubuntu.com/excuses/4303/groovy.html
Merge Proposal on: https://code.launchpad.net/~paelzer/ubuntu/+source/libvirt/+git/libvirt/+merge/391963

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:4.2-3ubuntu6.7

---------------
qemu (1:4.2-3ubuntu6.7) focal; urgency=medium

  * d/p/ubuntu/lp-1882774-*: add newer EPYC processor types (LP: #1887490)
  * d/p/u/lp-1896751-exec-rom_reset-Free-rom-data-during-inmigrate-skip.patch:
    fix reboot after migration (LP: #1896751)
  * d/p/u/lp-1849644-io-channel-websock-treat-binary-and-no-sub-protocol-.patch:
    fix websocket compatibility with newer versions of noVNC (LP: #1849644)

 -- Christian Ehrhardt <email address hidden> Mon, 27 Jul 2020 11:45:26 +0200

Changed in qemu (Ubuntu Focal):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for qemu has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

The attachment "Add-EPYC-ROME-x86-CPU-model.patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch

Qemu landed in Focal \o/

An update for Libvirt:
Groovy:
 - MP approved
 - no FFe needed (as it is an SRU even)
 - and uploaded to groovy.
 - Being so close to the groovy release it might be a zero day SRU or a sooner
   fix before Groovy-release - we will see.
Focal:
 - After groovy is fixed we can consider Focal

Changed in libvirt (Ubuntu):
status: Incomplete → In Progress
Changed in libvirt (Ubuntu Focal):
status: Incomplete → Triaged
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 6.6.0-1ubuntu3

---------------
libvirt (6.6.0-1ubuntu3) groovy; urgency=medium

  * d/p/ubuntu/lp-1887490-*: add named types and definitions for EPYC-Rome
    chips (LP: #1887490)

 -- Christian Ehrhardt <email address hidden> Thu, 08 Oct 2020 07:36:06 +0200

Changed in libvirt (Ubuntu):
status: In Progress → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (31.2 KiB)

This bug was fixed in the package linux - 5.4.0-51.56

---------------
linux (5.4.0-51.56) focal; urgency=medium

  * Packaging resync (LP: #1786013)
    - update dkms package versions

linux (5.4.0-50.55) focal; urgency=medium

  * CVE-2020-16119
    - SAUCE: dccp: avoid double free of ccid on child socket

  * CVE-2020-16120
    - Revert "UBUNTU: SAUCE: overlayfs: ensure mounter privileges when reading
      directories"
    - ovl: pass correct flags for opening real directory
    - ovl: switch to mounter creds in readdir
    - ovl: verify permissions in ovl_path_open()
    - ovl: call secutiry hook in ovl_real_ioctl()
    - ovl: check permission to open real file

linux (5.4.0-49.53) focal; urgency=medium

  * focal/linux: 5.4.0-49.53 -proposed tracker (LP: #1896007)

  * Comet Lake PCH-H RAID not support on Ubuntu20.04 (LP: #1892288)
    - ahci: Add Intel Comet Lake PCH-H PCI ID

  * Novalink (mkvterm command failure) (LP: #1892546)
    - tty: hvcs: Don't NULL tty->driver_data until hvcs_cleanup()

  * Oops and hang when starting LVM snapshots on 5.4.0-47 (LP: #1894780)
    - SAUCE: Revert "mm: memcg/slab: fix memory leak at non-root kmem_cache
      destroy"

  * Intel x710 LOMs do not work on Focal (LP: #1893956)
    - i40e: Fix LED blinking flow for X710T*L devices
    - i40e: enable X710 support

  * Add/Backport EPYC-v3 and EPYC-Rome CPU model (LP: #1887490)
    - kvm: svm: Update svm_xsaves_supported

  * Fix non-working NVMe after S3 (LP: #1895718)
    - SAUCE: PCI: Enable ACS quirk on CML root port

  * Focal update: v5.4.65 upstream stable release (LP: #1895881)
    - ipv4: Silence suspicious RCU usage warning
    - ipv6: Fix sysctl max for fib_multipath_hash_policy
    - netlabel: fix problems with mapping removal
    - net: usb: dm9601: Add USB ID of Keenetic Plus DSL
    - sctp: not disable bh in the whole sctp_get_port_local()
    - taprio: Fix using wrong queues in gate mask
    - tipc: fix shutdown() of connectionless socket
    - net: disable netpoll on fresh napis
    - Linux 5.4.65

  * Focal update: v5.4.64 upstream stable release (LP: #1895880)
    - HID: quirks: Always poll three more Lenovo PixArt mice
    - drm/msm/dpu: Fix scale params in plane validation
    - tty: serial: qcom_geni_serial: Drop __init from qcom_geni_console_setup
    - drm/msm: add shutdown support for display platform_driver
    - hwmon: (applesmc) check status earlier.
    - nvmet: Disable keep-alive timer when kato is cleared to 0h
    - drm/msm: enable vblank during atomic commits
    - habanalabs: validate FW file size
    - habanalabs: check correct vmalloc return code
    - drm/msm/a6xx: fix gmu start on newer firmware
    - ceph: don't allow setlease on cephfs
    - drm/omap: fix incorrect lock state
    - cpuidle: Fixup IRQ state
    - nbd: restore default timeout when setting it to zero
    - s390: don't trace preemption in percpu macros
    - drm/amd/display: Reject overlay plane configurations in multi-display
      scenarios
    - drivers: gpu: amd: Initialize amdgpu_dm_backlight_caps object to 0 in
      amdgpu_dm_update_backlight_caps
    - drm/amd/display: Retry AUX write when fail occurs
    - drm/amd/display: Fix memleak in amdg...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released

Am MP for a Focal backport of this is ready at:
https://code.launchpad.net/~paelzer/ubuntu/+source/libvirt/+git/libvirt/+merge/392204

Along a PPA at:
https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4313/

@Markus if you could pre-check this PPA on the specific ahrdware that would be awesome.

These changes will be great - also bringing in a few more cpu bit definitions for better HW support in general. But OTOH due to that they will need some extra care in regard to start/stop/migration in between updates/non-updated systems. I'll run some regression tests on the PPA before we upload this to Focal for an SRU.

Well, for me it's more or less a noop because we have been running a version with these patches for some time now. But nevertheless I haven't encountered any problems using your build/patch series. Host capabilities are correctly detected as EPYC-Rome and I can (still) start an instance with the EPYC-Rome model via libvirt. Also the new flags like npt/nrip-save are present when svm is enabled.

Thank you Markus, it also passed all my regression tests and a review by a Teammate.
Now uploaded into Focal-unapproved entering the SRU Process.

Changed in linux (Ubuntu):
status: Invalid → Fix Released

Hello Markus, or anyone else affected,

Accepted libvirt into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/libvirt/6.0.0-0ubuntu8.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in libvirt (Ubuntu Focal):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-focal
removed: verification-done verification-done-focal

Same tests as above. Still working as intended

tags: added: verification-done verification-done-focal
removed: patch verification-needed verification-needed-focal
Download full text (5.5 KiB)

Pre:
root@f:~# grep EPYC /usr/share/libvirt/cpu_map/index.xml
    <include filename="x86_EPYC.xml"/>
    <include filename="x86_EPYC-IBPB.xml"/>
root@f:~# virsh domcapabilities | grep EPYC
      <model usable='no'>EPYC-IBPB</model>
      <model usable='no'>EPYC</model>
root@f:~# cat test.xml
<domain type='kvm'>
  <name>test</name>
  <memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <os>
    <type arch='x86_64' machine='pc-q35-focal'>hvm</type>
    <boot dev='hd'/>
  </os>
  <cpu mode='custom' match='exact' check='partial'>
    <model fallback='forbid'>EPYC-Rome</model>
  </cpu>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
  </devices>
</domain>
root@f:~# virsh define test.xml
Domain test defined from test.xml
root@f:~# virsh start test
error: Failed to start domain test
error: internal error: Unknown CPU model EPYC-Rome

Upgrade:
root@f:~# v=6.0.0-0ubuntu8.5; apt install libvirt-daemon-system=$v libvirt-clients=$v libvirt-daemon=$v libvirt0=$v libvirt-daemon-driver-qemu=$v libvirt-daemon-driver-storage-rbd=$v
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
  geoip-database libgd3 libgeoip1 libxencall1 libxendevicemodel1 libxenevtchn1 libxenforeignmemory1 libxengnttab1 libxenmisc4.11 libxenstore3.0 libxentoolcore1 libxentoollog1 libxpm4
Use 'apt autoremove' to remove them.
Suggested packages:
  libvirt-daemon-driver-lxc libvirt-daemon-driver-vbox libvirt-daemon-driver-xen libvirt-daemon-driver-storage-gluster libvirt-daemon-driver-storage-zfs numad auditd pm-utils radvd
  systemtap zfsutils
The following packages will be upgraded:
  libvirt-clients libvirt-daemon libvirt-daemon-driver-qemu libvirt-daemon-driver-storage-rbd libvirt-daemon-system libvirt0
6 upgraded, 0 newly installed, 0 to remove and 53 not upgraded.
Need to get 2892 kB of archives.
After this operation, 8192 B of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu focal-proposed/main amd64 libvirt-daemon-driver-qemu amd64 6.0.0-0ubuntu8.5 [605 kB]
Get:2 http://archive.ubuntu.com/ubuntu focal-proposed/main amd64 libvirt-daemon-driver-storage-rbd amd64 6.0.0-0ubuntu8.5 [28.3 kB]
Get:3 http://archive.ubuntu.com/ubuntu focal-proposed/main amd64 libvirt-daemon-system amd64 6.0.0-0ubuntu8.5 [67.5 kB]
Get:4 http://archive.ubuntu.com/ubuntu focal-proposed/main amd64 libvirt-clients amd64 6.0.0-0ubuntu8.5 [344 kB]
Get:5 http://archive.ubuntu.com/ubuntu focal-proposed/main amd64 libvirt0 amd64 6.0.0-0ubuntu8.5 [1443 kB]
Get:6 http://archive.ubuntu.com/ubuntu focal-proposed/main amd64 libvirt-daemon amd64 6.0.0-0ubuntu8.5 [405 kB]
Fetched 2892 kB in 1s (2314 kB/s)
Preconfiguring packages ...
(Reading database ... 77297 files and directories currently installed.)
Preparing to unpack .../0-libvirt-daemon-driver-qemu_6.0.0-0ubuntu8.5_amd64.deb ...
Unpacking libvirt-daemon-driver-qemu (6.0.0-0ubuntu8.5) over (6.0.0-0ubuntu8.4) ...
Preparing to unpack .../1-libvirt-daemon-driver-storage-rbd_6.0.0-0ubuntu8.5_amd64.deb ...
Unpacking libvirt-d...

Read more...

Markus was faster :-) verified waiting for the SRU aging period ...

Hi - commit 411a139d84bd34931e0e01d6ed5e48a718392e5d added svm_xsaves_support, but AFAICT support for virtualizing the IA32_XSS MSR on SVM was not merged back from upstream. Commit 864e2ab2b46db1ac266c46a7c9cefe6cc893029d added this support upstream and was a parent of 411a13.

This means QEMU-KVM will report xsaves in CPUID, but IA32_XSS will not be available; guests that expect it to be present when xsaves is enumerated may be surprised.

Thanks Venkatesh for this FYI.
The commits mentioned are of the kernel portion of all of this.
I spawned a new bug TODO for this to keep tracking separate and readable - I subscribed you there.

That update would have made even more sense if I've updated "TODO" for actually being bug 1902176 in time :-)

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 6.0.0-0ubuntu8.5

---------------
libvirt (6.0.0-0ubuntu8.5) focal; urgency=medium

  * d/p/ubuntu/lp-1887490-*: add named types and definitions for EPYC-Rome
    chips (LP: #1887490)

 -- Christian Ehrhardt <email address hidden> Thu, 08 Oct 2020 07:36:06 +0200

Changed in libvirt (Ubuntu Focal):
status: Fix Committed → Fix Released

It seems one of the patches also introduced a regression:
* lp-1887490-cpu_map-Add-missing-AMD-SVM-features.patch
adds various SVM-related flags. Specifically npt and nrip-save are now expected to be present by default as shown in the updated testdata.
This however breaks migration from instances using EPYC or EPYC-IBPB CPU models started with libvirt versions prior to this one because the instance on the target host has these extra flags

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers