Recent vCPU features are disabled

Bug #1621067 reported by Alexey Stupnikov
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Won't Fix
High
Alexey Stupnikov

Bug Description

The fix for bug #1618473 introduced a number of performance issues for MOS end users, since recent x86 extensions are no longer available for MOS compute's vCPUs. Here is the most important ones:
  - aes
  - pclmuldq
  - x2apic
  - avx (NOT avx2)
  - ssse3, sse4_1, sse4_2

This problem can be critical for micro and nano instances and will affect customer experience (there are a number of complaints in the web about cloud providers failing to provide some CPU instruction set extensions).

It is also worth mentioning that there is Ubuntu bug #1524069 with the same complaints.

Changed in mos:
importance: Undecided → High
description: updated
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

> - aes
> - pclmuldq
> - x2apic
> - avx (NOT avx2)
> - ssse3, sse4_1, sse4_2

Attempt to use these instructions in the guest will fail if the hypervisor is not aware how to enable/handle them properly. Therefore the hypervisor (the host kernel kvm module) is the best source of knowledge about which CPU flags should be advertised to the guest.

For example, (Intel) CPU executes AVX instructions only if XCR0.AVX bit is set,
otherwise it raises #GP exception, which eventually translates either to SIGSEGV delivered
to the guest process which was trying to execute such an instruction, or the *guest* kernel panic (if the guest kernel itself was trying to execute the instruction in question).

Setting XCR0 is a privileged instruction, so the guest kernel can't set it, the hypervisor
(the host kernel kvm module) should set it. The hypervisor should be aware of AVX instructions for this to work (which is not the case if the *host* kernel is old enough).

Thus the guest can't use AVX instructions without proper hypervisor support no matter if these instructions are advertised to the guest or not.

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

> The fix for bug #1618473 introduced a number of performance issues for MOS end users

This statement is not backed by any actual measurements, adjusting the bug importance accordingly

Changed in mos:
status: New → Opinion
Revision history for this message
Ivan Suzdal (isuzdal) wrote :

I guess what best way is disable avx2 (or any other) feature [0] instead of changing cpu_mode.

[0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1524069/comments/19

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Alexei, I have tried to say, that QEMU will use its default profile, which is qemu-64 (at lest it did so in my lab), so a lot of features will not be available to guest. I don't try to say that computes should advertise avx2 instructions to guest, since this will lead to segfaults with recent kernels.

About testing. I don't think that AES encryption speed with and without AES-NI, especially for micro and nano instances, can be a theme for discussion. I can say the same about other encryption and calculation extensions.

Changed in mos:
status: Opinion → Confirmed
status: Confirmed → Opinion
Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Please check again this issue's importance based on the arguments from my previous comment.

Changed in mos:
assignee: MOS Linux (mos-linux) → MOS Nova (mos-nova)
Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Nova team, can you please check if we can do something about comment https://bugs.launchpad.net/mos/+bug/1621067/comments/3

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

@asheplyakov I can't agree with the point that QEMU will select the optimal vCPU features to advertise to the guest. In my experience, it always selects qemu-64 profile with slight modifications for KVM hypervisor (tested for two different envs).

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :
Changed in mos:
status: Opinion → Invalid
status: Invalid → Confirmed
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Alexey St.

>>> Nova team, can you please check if we can do something about comment https://bugs.launchpad.net/mos/+bug/1621067/comments/3

I'm not sure, what you want to see here: a statement be added to libvirt nova driver code, that would add this section to a domain XML for all guests? Or make it a conditional statement based on what, qemu version / image? Nova just *does not* know what is there inside an image, and I don't think we can rely on the metadata to be set properly for all uploaded images.

In my opinion the clean way to fix this would be to set the cpu_model in nova.conf based on your host configuration after performing tests with most popular cloud images. Which is kind of what we did, it's just that we set a *safe* default and what you want too see is one, that would allow for performance optimizations.

I'd rather we keep the one that works for popular cloud images, but at the same time send a clear message to operators, that they can use the cpu_model option in nova.conf to configure a CPU model that better corresponds to the hardware they have, if they have tested it properly.

Changed in mos:
assignee: MOS Nova (mos-nova) → Alexey Stupnikov (astupnikov)
status: Confirmed → Incomplete
Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Roman, thanks for you answer. I would like to ask you to check is it possible to add two configuration parameters: cpu_features_enable and cpu_features_disable to configure feature require and feature disable to libvirt config template [1]?
[1]: https://libvirt.org/formatdomain.html#elementsCPU

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Roman, thanks for you answer. I would like to ask you to check is it possible to add two configuration parameters: cpu_features_enable and cpu_features_disable to configure MULTIPLE feature require and feature disable to libvirt config template [1]?
[1]: https://libvirt.org/formatdomain.html#elementsCPU

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Alexey,

The current way to do that is to specify the desired CPU model in nova.conf. I'm not sure we want to provide more flexibility than that. But even if we reconsider that in the future, config options will not be back ported to stable releases.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

By the way, I checked if AVX2 instructions actually work inside an instance and they *do*:

http://paste.openstack.org/show/568987/

So far it's only the ubuntu kernel which fails on start. I'm not sure we should disable AVX2 or any other features.

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Roman, I have other ideas how to fix it, but it will take much more time from customers. The problem is that real CPU's feature sets are not well aligned with ones defined in libvirt, the other issue is some CPUs having customised feature sets (BIOS/UEFI settings changed or being suboptimal). As a result, it is not a solid option to use pre-defined CPU profiles without exclusion/adding some parameters.

I don't have any other valuable propositions for nova, but IMO described issue is certainly a bug (though I am not sure about its importance). I am not 100% sure about nova, but it is certainly in the product and it is not configuration issue we can fix using Fuel. So your word is needed here.

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Wow, it works for Fedora cloud image:
[fedora@testvm1 ~]$ cat /proc/cpuinfo | grep avx2
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave avx f16c rdrand hypervisor lahf_lm abm vnmi ept fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Alexey,

>> I don't have any other valuable propositions for nova, but IMO described issue is certainly a bug (though I am not sure about its importance). I am not 100% sure about nova, but it is certainly in the product and it is not configuration issue we can fix using Fuel. So your word is needed here.

I gave you my take on this: operators can choose any cpu model they want in nova config (or enable features pass-through) after deployment, when they tested this configuration with the cloud images, that are going to be used in this cloud. For now we could probably live with a "safe" default.

I'm all for investigating this further, because, as we see, other cloud images with newer kernels or just applications that use AVX2 work properly, so it's only Ubuntu kernel which is affected. We could ask MOS Linux or Alexey Sheplyakov to continue the investigation and try to understand what is the root cause of the failure.

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

It is possible to define custom CPU profiles in /usr/share/libvirt/cpu_map.xml, so this bug has a clear WA in nova. Roman's approach looks pretty reasonable. At least when customers are not complaining about this change.

Changed in mos:
status: Incomplete → Won't Fix
Revision history for this message
Dmitry Sutyagin (dsutyagin) wrote :

folks, it's not possible. You can define anything there but QEMU uses it's own internal database, libvirt uses cpu_map.xml only to match current host cpu agaisnt this file, so adding definitions there will not work, QEMU will not take these new custom definitions. I tried adding a custom def in there, and set nova.conf to use cpu_mode=custom and cpu_model=mycustomcpu - the result is QEMU fails with "qemu-system-x86_64: Unable to find CPU definition: mycustomcpu". We need a new feature to solve this issue, which is to add smth like "cpu_feature=..." which takes smth like "disable=avx2,disable=...", parses it and sets "<feature policy='disable' name='avx2'/>" in VM xml

Revision history for this message
Dmitry Sutyagin (dsutyagin) wrote :

my bad, we can do "SandyBridge,+blahblah" and list all modules except the problematic one(s)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.