Nested virtualization (aka CPU extra flags revisited)

Bug #1791678 reported by Tobias Rydberg
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Florian Haas
OpenStack Public Cloud WG
Fix Released
Undecided
Kashyap Chamarthy

Bug Description

We should contribute some authoritative documentation on how to configure nested virtualization in a way that (a) doesn't break live migration, (b) does not tank guest performance because of Spectre/Meltdown.

Since https://review.openstack.org/#/c/534384/, we have the ability to set, in nova.conf:

[libvirt]
cpu_mode = custom
cpu_model = IvyBridge
cpu_model_extra_flags = <flags>

It is my understanding that deployers should always set the pcid flag so that Spectre/Meltdown mitigation patches don't kill guest performance. Deployers who want to also enable nested virtualization should enable pcid,vmx (which is only available from Rocky forward — in prior releases pcid is the only available option for reasons discussed in that Gerrit change).

This is already documented, albeit only deeply buried in the Nova configuration reference. I think it would be good to have a paragraph in the admin guide as well that simply explains how to enable nested virtualization, and what to consider. In particular, that enabling nested virtualization breaks live migration for guests that are themselves running guests, which tends to not be very widely known among OpenStack users.

Related links:
https://review.openstack.org/#/c/534384/
https://docs.openstack.org/nova/rocky/configuration/config.html#libvirt.cpu_model_extra_flags
https://docs.openstack.org/nova/rocky/admin/index.html
https://www.linux-kvm.org/page/Nested_Guests

Revision history for this message
Dan Smith (danms) wrote :

I think Kashyap would be an ideal person to write some admin docs for this. He has context on live migration, nested virt, as well as the spectre/meltdown stuff. He also loves words.

Kashyap could you take this on?

Changed in openstack-publiccloud-wg:
assignee: Tobias Rydberg (tobberydberg) → Kashyap Chamarthy (kashyapc)
Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

Dan: Yes, I can take this one. (I like words, but efficient and clutter-free words. :D)

Hey Tobias,

First, yes, your understanding is correct. Second, I have a TODO item since eons to write a blog post / admin docs on that topic; but I never bumped up its priority, so the "draft content" (part of it is indeed buried in the `nova.conf` documentation) is still here: https://kashyapc.fedorapeople.org/Reducing-OpenStack-Guest-Perf-Impact-from-Meltdown.txt

Speaking of live migration in context of nested virt, one kind of misconception (due to poor documentation) is that people tend to assume that using 'host-passthrough' CPU mode (for the level-1 guest) will not let you live migrate your nested guest in all scenarios; but that's not true — as long as you have *identical* CPUs on source and destination, then you can live migrate with 'host-passthrough', a crude and unpolished example here: https://kashyapc.fedorapeople.org/virt/Migration-with-host-passthrough-libvirt.txt.

Thanks for the reminder!

Which also reminds me: In light of the recent events, I should also probably tweak the DevStack doc I wrote on this topic many moons ago: https://docs.openstack.org/devstack/latest/guides/devstack-with-nested-kvm.html

Revision history for this message
Florian Haas (fghaas) wrote :

Kashyap, the "we should contribute" part in the original report comes from me (Tobias copied and pasted from an internal document here), meaning I was kinda volunteering for this. Can you use my help here? I'll be more than happy to help out.

Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

Hey Florian,

Sure — if you would like to assign this bug to yourself, go for it. If you post a patch, Cc me on it, I'll review it and add my own notes.

Revision history for this message
Florian Haas (fghaas) wrote :

This is a documentation issue in Nova — adding as affected project.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/609788

Changed in nova:
assignee: nobody → Florian Haas (fghaas)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/609789

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/609790

Revision history for this message
Florian Haas (fghaas) wrote :

Kashyap, I finally got around to taking a stab at this. When you get a chance to review, I'd be grateful for any and all feedback.

The patch set explains nested guests in all branches, but limits the discussion of vmx in cpu_model_extra_flags to rocky and master, since only those releases would support that feature flag.

Matt Riedemann (mriedem)
tags: added: docs libvirt
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/609788
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e304ad7f4d3bf0d0cf4e5b4d2de3f2c4be2f2b8e
Submitter: Zuul
Branch: master

commit e304ad7f4d3bf0d0cf4e5b4d2de3f2c4be2f2b8e
Author: Florian Haas <email address hidden>
Date: Thu Oct 11 18:01:21 2018 +0000

    Explain cpu_model_extra_flags and nested guest support

    In the Configuration Guide's section on KVM:

    * expand on the implications of selecting a CPU mode and model
      for live migration,
    * explain the cpu_model_extra_flags option,
    * discuss how to enable nested guests, and the implications and
      limitations of doing so,
    * bump the heading level of "Guest agent support".

    Closes-Bug: 1791678
    Change-Id: I671acd16c7e5eca01b0bd633caf8e58287d0a913

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/609789
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=63bf3833ae40d09c9a5e1ea9056c20d041da4fa5
Submitter: Zuul
Branch: stable/rocky

commit 63bf3833ae40d09c9a5e1ea9056c20d041da4fa5
Author: Florian Haas <email address hidden>
Date: Thu Oct 11 18:01:21 2018 +0000

    Explain cpu_model_extra_flags and nested guest support

    In the Configuration Guide's section on KVM:

    * expand on the implications of selecting a CPU mode and model
      for live migration,
    * explain the cpu_model_extra_flags option,
    * discuss how to enable nested guests, and the implications and
      limitations of doing so,
    * bump the heading level of "Guest agent support".

    Closes-Bug: 1791678
    Change-Id: I671acd16c7e5eca01b0bd633caf8e58287d0a913
    (cherry picked from commit e304ad7f4d3bf0d0cf4e5b4d2de3f2c4be2f2b8e)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.3

This issue was fixed in the openstack/nova 18.0.3 release.

Florian Haas (fghaas)
Changed in openstack-publiccloud-wg:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.opendev.org/609790
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4055261eba3c04f0eab9d754cce822db92f492ca
Submitter: Zuul
Branch: stable/queens

commit 4055261eba3c04f0eab9d754cce822db92f492ca
Author: Florian Haas <email address hidden>
Date: Thu Oct 11 18:01:21 2018 +0000

    Explain nested guest support

    In the Configuration Guide's section on KVM:

    * expand on the implications of selecting a CPU mode and model
      for live migration,
    * discuss how to enable nested guests, and the implications and
      limitations of doing so,
    * bump the heading level of "Guest agent support".

    This patch differs from the version in Stein
    (e304ad7f4d3bf0d0cf4e5b4d2de3f2c4be2f2b8e), and the clean cherry-pick
    into Rocky (63bf3833ae40d09c9a5e1ea9056c20d041da4fa5) in that it omits
    mentioning the option of enabling the vmx flag via
    cpu_model_extra_flags. The rationale for that is that vmx is not a
    supported option in stable/queens at the time of writing (only pcid,
    ssbd, virt-ssbd, amd-ssbd, and amd-no-ssb are), so the only option to
    enable nested guest support in Queens (and earlier releases) is to use
    cpu_mode=host-model or cpu_mode=host-passthrough, and enable nested
    virtualization via the kernel module.

    Using cpu_mode=custom is not an option for enabling nested
    virtualization in Queens in the absence of vmx as an available option
    for cpu_model_extra_flags, regardless of the cpu_model being used,
    because (again at the time of writing) none of the CPU models defined
    in libvirt do enable the vmx flag automatically.

    Closes-Bug: 1791678
    Change-Id: I671acd16c7e5eca01b0bd633caf8e58287d0a913

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.13

This issue was fixed in the openstack/nova 17.0.13 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.