Add/Backport EPYC-v3 and EPYC-Rome CPU model

Bug #1887490 reported by Markus Schade on 2020-07-14
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Undecided
Unassigned
Focal
Undecided
Unassigned
linux (Ubuntu)
Undecided
Unassigned
Focal
Medium
Unassigned
qemu (Ubuntu)
Undecided
Unassigned
Focal
High
Unassigned

Bug Description

## Qemu SRU ##

[Impact]

 * CPU definitions are added to qemu as these CPUs are known.
   And due to that over time are missing in former releases.

 * To really benefit from the new features of these chips
   they have to be known, therefore new type additions done by
   upstream should be backported if they generally apply and do
   not depend on SRU-critical changes.

 * This backports two upstream fixes that just add definitions (no
   control flow changes)

[Test Case]

 * Probe qemu for the known CPU types (works on all HW)
   $ qemu-system-x86_64 -cpu ? | grep EPYC
   Focal without fix:
   x86 EPYC (alias configured by machine type)
   x86 EPYC-IBPB (alias of EPYC-v2)
   x86 EPYC-v1 AMD EPYC Processor
   x86 EPYC-v2 AMD EPYC Processor (with IBPB)
   Focal with fix also adds:
   x86 EPYC-Rome (alias configured by machine type)
   x86 EPYC-Rome-v1 AMD EPYC-Rome Processor
   x86 EPYC-v3 AMD EPYC Processor

 * Given such HW is available start a KVM guest using those new types
   Since we don't have libvirt support (yet) do so directly in qemu
   commandline like (bootloader is enough)
   $ qemu-system-x86_64 -cpu EPYC-Rome -machine pc-q35-focal,accel=kvm -nographic
   $ qemu-system-x86_64 -cpu EPYC-v3 -machine pc-q35-focal,accel=kvm -nographic

[Regression Potential]

 * This adds new CPU types to the list of known CPUs defining their name
   and features. Generally the changes are contained to those new types
   and only active when selected - and usually only selectable on such new
   machines. Therefore not a lot should change for other users.
   One thing thou, if a user selected an unversioned type (which in this
   case only can be "EPYC") by default it will pick the latest subversion
   that applies. In this case the behavior will change and pick EPYC-v3
   after the fix. But this is the whole purpose of versioned (stay as is)
   and unversioned (move with updates) CPU types - so that should be ok.
   The EPYC-Rome type didn't exist in Focal before, so it can't "change"
   for users.

[Other Info]

 * Depends on the new kernel 5.4.0-49 or later (Currently in
   focal-proposed)

---

Qemu in focal has already support for most (except amd-stibp) flags of this model.

Please backport the following patches:

https://github.com/qemu/qemu/commit/a16e8dbc043720abcb37fc7dca313e720b4e0f0c

https://github.com/qemu/qemu/commit/143c30d4d346831a09e59e9af45afdca0331e819

Related branches

description: updated
Changed in qemu (Ubuntu Focal):
status: New → Triaged
importance: Undecided → Medium

Hello Markus,

Thanks for the report. I'm subscribing @ubuntu-virt here so @paelzer can take a look at this. He is the current maintainer for qemu

$ git describe --tags a16e8dbc043720abcb37fc7dca313e720b4e0f0c
v4.2.0-2476-ga16e8dbc04

$ git describe --tags 143c30d4d346831a09e59e9af45afdca0331e819
v4.2.0-2477-g143c30d4d3

$ git tag --contains a16e8dbc043720abcb37fc7dca313e720b4e0f0c | sort -n
v5.0.0
v5.0.0-rc0

Looks like those were added in v5.0.0-rc0 and that we could add EPYC extra CPU features and the EPYC 2nd gen CPU model to Focal indeed.

Marcus I'm afraid this will take around a week or more for feedback as @paelzer is out and there are some current on-going fixes landing qemu this week.

Thank you!

Changed in qemu (Ubuntu):
status: New → Triaged
status: Triaged → Fix Released
Changed in qemu (Ubuntu Focal):
importance: Medium → High
Markus Schade (lp-markusschade) wrote :

Hi Rafael,

thanks for the feedback. I would consider the patches to be non-invasive, but important enough to justify an SRU. There are a similar set of patches to add the missing bits to libvirt as well, except for the EPYC-Rome model. I'll try to submit a patch upstream to add this model.

# Qemu

We have done similar in the past for new CPU Models - and the changes apply cleanly to the qemu 4.2 in Focal.

Preliminary I have opened a PPA with the fix
  https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4161
And a branch containing the (trivial) backport
  https://code.launchpad.net/~paelzer/ubuntu/+source/qemu/+git/qemu/+ref/bug-1887490-EPYC-v3

That PPA can already be used for preliminary testing - feedback is welcome.

But there are a few extras to consider here which I'll post in following comments:

# Libvirt

Depending on the case sometimes libvirt also needs changes - Focal already has /usr/share/libvirt/cpu_map/x86_EPYC.xml which contains the base version and v3 will plug in there. But for "Epic-Rome" there isn't anything upstream in libvirt.

I think this will therefore be non-selectable through libvirt - it will still help qemu if called directly, but if you want this we might consider poking libvirt upstream together later on (after testing if it really is missing to us int he backports).

Changed in libvirt (Ubuntu):
status: New → Incomplete
Changed in libvirt (Ubuntu Focal):
status: New → Incomplete
Changed in linux (Ubuntu Focal):
status: New → Confirmed
Changed in linux (Ubuntu):
status: New → Confirmed

# Kernel

It lists a bunch of depending kernel changes
    40bc47b08b6e ("kvm: x86: Enumerate support for CLZERO instruction")
    504ce1954fba ("KVM: x86: Expose XSAVEERPTR to the guest")
    6d61e3c32248 ("kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID")
    52297436199d ("kvm: svm: Update svm_xsaves_supported")

504ce1954fba is odd as it is actually committed as 41cd02c6f7f6e66e7abf02a4379e355a7db89f78.
All but 52297436199d are in Focal already.

But that means to be fully working we need to ask the Kernel team to consider this into the next Kernel they build for Focal.

@Kernel team - what do you tihnk about adding the following patch to the focal kernel?
=> https://github.com/torvalds/linux/commit/52297436199d

On Qemu I'm waiting on:
a) Kernel Teams statement on backporting the commit mentioned
b) some testing (by the reporter if possible) of the referred PPA builds of Qemu.

Setting to incomplete until we have that.

Changed in qemu (Ubuntu Focal):
status: Triaged → Incomplete
Markus Schade (lp-markusschade) wrote :

Plain qemu with the EPYC-Rome model works without any problems. I already get xsave, xsaveopt, xsavec and xsaveerptr with the current focal kernel, which also means that gcc finally uses znver1 (for v3) and znver2 (for Rome) optimizations.

Libvirt would require cherrypicking a number of commits to enable support for all CPU flags currently supported by focal qemu and for the additional AMD flags from the mentioned patched.
I currently run a private build in production with the following patches added:

https://github.com/libvirt/libvirt/commits/master/src/cpu_map:

- cpu_map: Distinguish Cascadelake-Server from Skylake-Server (unrelated, but recommended)
- cpu_map: Add pschange-mc-no bit in IA32_ARCH_CAPABILITIES MSR
- cpu_map: Request test files update when adding x86 features
- cpu_map: Add missing x86 features in 0x7 CPUID leaf
- cpu_map: Add missing x86 features in 0x80000008 CPUID leaf

Plus the attached patch which defines an EPYC-Rome type (without any tests), so not quite ready for upstream

Markus Schade (lp-markusschade) wrote :

Is there anything I can do, regarding the backport of 52297436199d into focal kernel?

Thanks for Reminding Markus, it might have missed their triage completely somehow.
I'll ping a few people directly about it and hopefully they will insert it into the kernel queue and state here on the bug about it.

Stefan Bader (smb) wrote :

If I understand the comments correctly this should be irrellevant for groovy/devel using a v5.8 base.

Changed in linux (Ubuntu Focal):
importance: Undecided → Medium
Changed in linux (Ubuntu):
status: Confirmed → Invalid
Changed in linux (Ubuntu Focal):
status: Confirmed → Fix Committed

Correct Stefan, this is for Focal

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
tags: added: verification-done-focal
removed: verification-needed-focal

With the 5.4.0-49 from focal-proposed the xsaves flag can now be passed into instances.

Thanks for the test of the new kernel Markus!

I have respun the qemu build (the old one was outdated by a massive stable update - too bad this wasn't part of 4.2.1 already) with the patches we had before applied on top.

Further I have reviewed the suggested libvirt changes and added a build of that.
There is another AMD SVM change which makes sense in the same context and a few more.
All those are in libvirt >=6.5 and therefore already in Groovy.
But in the past it turned out useful for users to get all those changes as long as they are easy to backport and not conflicting (e.g. once we had a rework there). I agree with the "Distinguish" patch and have seen that in the wild.

The drawback for this kind of patches usually was that migrations from new-to-old might fail specifying features unknown to the other peer, but that is ok as it can be safely considered required to update the target before migration as well (reverse migrations always work best-effort but never are guaranteed).

That would overall be this list for libvirt:
5d6059f8 cpu_map: Distinguish Cascadelake-Server from Skylake-Server
  59558518 cpu_map: Introduce ARM cpu models
12eb0c94 cpu_map: Add pschange-mc-no bit in IA32_ARCH_CAPABILITIES MSR
  3944f685 cpu_map: Add Cooperlake x86 CPU model
  1c425857 cpu_map: Distribute x86_Cooperlake.xml
df69263c cpu_map: Request test files update when adding x86 features
6ea3bb19 cpu_map: Add missing x86 features in 0x7 CPUID leaf
892b7c70 cpu_map: Add missing x86 features in 0x80000008 CPUID leaf
  96a39aad cpu_map: Add missing AMD SVM features

The PPA [1] still is the same and now contains:
- qemu 1:4.2-3ubuntu6.7~ppa1
- libvirt 6.0.0-0ubuntu8.5~ppa1

@Markus:
- If you could give those builds a try on your system, that would be helpful to see if we can/should go on this way?
- Further if you would not mind upstreaming your x86_EPYC-Rome.xml change (CC me please) that
  would be great? Adding types matching what qemu already added should be fine and it would be
  great to have that added upstream before continuing (we also need this in groovy before a Focal
  SRU). Or as an alternative if from your tests you think the libvirt build as provided in the
  PPA is enough (without the Rome name but with the features) as-is then we can go with that.

[1]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4161
[2]: branch for libvirt "bug-1887490-new-cpu-handling"

Hmm, I spoke too soon ... :-/
Some of the self tests failed due to the changes - I'll fix them up and ping again once the builds complete

Thanks for the update. It looks like your libvirt build fails.
Most likely due to the following patch missing:

https://github.com/libvirt/libvirt/commit/58691208e2063285d981a620873d48ddf8df8be5

That patch adds the CooperLake test data (wrongly as CascadeLake-Server), which is later fixed in

https://github.com/libvirt/libvirt/commit/3944f6855b9d4df73754bb6e5c8023d77399879b

that you have partially applied.

I would howver not add the CooperLake patches because focal qemu does not have the required patches (yet).

I agree Markus.
Not only does the Focal qemu miss the related changes, I also dived deeper through the "what exactly happened" to be sure.

The added patches I picked make it more obvious (test breaks) that we would need cpu model "stepping" ranges to work. This would add the need for much more logic change:
a8ec1d746 cpu_x86: Move and rename x86ModelCopySignatures
782be9f0a cpu_x86: Move and rename x86ModelHasSignature
7e0d351fa cpu_x86: Move and rename x86FormatSignatures
372b2cf1c cpu_x86: Introduce virCPUx86SignaturesFree
3b474c1f8 cpu_x86: Introduce virCPUx86SignatureFromCPUID
22bded201 cpu_x86: Replace 32b signatures in virCPUx86Model with a struct
c7a279499 cpu_x86: Add support for stepping part of CPU signature

... Those then in turn depend on various switches to glib g_fee allocations and such.

Unfortunately that isn't only true for my picked "Add-Cooperlake-x86-CPU-model", but also for one of the suggested changes by Markus as "Distinguish Cascadelake-Server from Skylake-Server" adds stepping='5-7'.

Let us keep them out for now (too much risk for an SRU, users strictly depending on this need to speak up to make a strong case or upgrade to newer Ubuntu versions)

---

Also the Arm vendors changes were a nice try, but break other places as not only cpu_map but much more code changes would be needed.
libvirt: CPU Driver error : internal error: Unexpected element 'vendor' in CPU map '/<<PKGBUILDDIR>>/debian/build/../../src/cpu_map/arm_vendors.xml'
I'm not going to pull the source changes to work fine with that.
Ok to postpone that to groovy where those changes are available already.

---

A new upload that worked in a local build was pushed to the PPA

@Markus - did the patch
  5d6059f8ec cpu_map: Distinguish Cascadelake-Server from Skylake-Server
really work for you without the changes to support stepping applies?
Or does your own build have much more applied to make it work?

Actually no. It builds and does not have any downsides, but does not resolve the issue. Sorry for that. When I looked at the history of cpu_map and this patch, I did not see that it would require further patches to actually make it work.

So that would leave it at these four patches:
Add-pschange-mc-no-bit-in-IA32_ARCH_CAPABILITIES-MSR.patch
Request-test-files-update-when-adding-x86-features.patch
Add-missing-x86-features-in-0x7-CPUID-leaf.patch
Add-missing-x86-features-in-0x80000008-CPUID-leaf.patch
Add-missing-AMD-SVM-features.patch

Plus the patch adding the EPYC-Rome model. I'll try to get it upstream, but it will take some time. From my side, it's totally fine from my side to go ahead with the update without it.

My patch queue for qemu also has 2 more patches, reported as LP#1896751 and LP#1896751

Thanks for the extra reports - I'll take a look at these qemu cases and in that time you can try getting the libvirt Rome change upstream.
Once everything is in place we can SRU libvirt&qemu sort of together on this.

Changed in qemu (Ubuntu Focal):
status: Incomplete → In Progress

SRU Template for qemu added and MP linked to fix this in Ubuntu 20.04

description: updated
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers