nova libvirt driver needs to check qemu support for NUMA & hugepages

Bug #1422775 reported by Chris Friesen
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Low
Chris Friesen

Bug Description

In the trunk (kilo) nova libvirt code there are a numberr of places where we assume that qemu can provide NUMA-related functionality that was only introduced in a very recent version (2.1). If we try to use this code with older qemu it triggers an exception and fails to boot the instance.

In some cases, we can get partial NUMA support by removing some of the more advanced functionality (per-numa-node pinning, per-numa-node hugepages). This would arguably be the best solution.

At the very least, we should introduce a QEMU version check and fail cleanly with a proper error code rather than with cryptic errors that are complicated to resolve.

Known affected areas:

virt.libvirt.config.LibvirtConfigGuestMemoryBacking.format_dom()
--the "page" subelement of the "hugepages" element is not supported by earlier qemu versions

virt.libvirt.config.LibvirtConfigGuestNUMATune.format_dom()
--the "memnode" subelement of the "numatune" element is not supported by earlier qemu versions

Commenting out the code that configures these elements allows NUMA to work (with somewhat reduced functionality) on an older qemu version, or a newer one without NUMA support (like Ubuntu 14.10 at the time of writing this).

I think we should add a qemu version test against 2.1 and use that to determine whether or not to support the more advanced NUMA capabilities. If Ubuntu doesn't fix their qemu in 14.10 then we might even want to add some kind of runtime capabilities test to check whether qemu was built with numa support.

Chris Friesen (cbf123)
description: updated
Revision history for this message
Daniel Berrange (berrange) wrote :

> Commenting out the code that configures these elements
> allows NUMA to work (with somewhat reduced functionality)
> on an older qemu version, or a newer one without NUMA support
> (like Ubuntu 14.10 at the time of writing this).

No, it really doesn't work correctly. Removing the 'pages' element means we cannot ensure the VM is using the page size that we requested. Removing the 'memnode' element results in the VM memory nodes not having correct host node affinity. Not only will this result in worse performance, but will lead to incorrect schedular decisions because its view of what resources are allocated / in use will no longer match reality in any way.

Revision history for this message
Chris Friesen (cbf123) wrote :

A few things:

1) Currently the "pages" subelement of the "hugepages" element is set even when no NUMA support is requested for the guest. This is a bug. We should be able to use hugepages even if we can't use NUMA.

2) As long as the CPU affinity is set before the memory allocation (should probably check whether this is the case) then the NUMA allocation would be governed by the system-wide allocation rule. So maybe this means that with older qemu we don't allow setting per-flavor hw_numa_mempolicy, you're limited to the system-wide rule.

Even if we decide to totally disallow NUMA support unless qemu supports it "properly", we should still add in a version (or better yet a runtime capabilities) check so that we can fail early with a nice error message.

Revision history for this message
Daniel Berrange (berrange) wrote :

The "pages" element specifies the huge page size, so even if you don't need NUMA we still need the pages element, so that's not a bug. For NUMA, the system default allocation rule implemented by the kernel is just a preference, not a guarantee, so the actual placement of the guest is not going to match what the schedular is expecting. The NUMA code is complex enough without having to add workarounds for outdated versions of QEMU or versions which distros crippled. At most we should extend the version check here.

Revision history for this message
Chris Friesen (cbf123) wrote :

Okay, if you don't think it's worthwhile to try and make it work then I'll defer to your judgement.

I do think that we should add a qemu version check though..as it stands it is tricky to try and figure out why things aren't working and what version of libvirt/qemu is needed.

Changed in nova:
status: New → Confirmed
importance: Undecided → Low
melanie witt (melwitt)
tags: added: libvirt
Chris Friesen (cbf123)
Changed in nova:
assignee: nobody → Chris Friesen (cbf123)
Revision history for this message
Nikola Đipanov (ndipanov) wrote :

FWIW - I think we should just be failing the guest and provide a non-criptic message. That's the only way to ensure that there is a clear unambiguos semantics of the extra specs we have introduced.

Allowing partial support will lead to even more confusion from the users.

We skip the missing capabilities already (and that is fine because we don't want to be crashing the compute service) but when we detect that a capability was requested that cannot be unambiguously executed.

We may even want to do this at the scheduler level.

Revision history for this message
Nikola Đipanov (ndipanov) wrote :

The second paragraph in comment #5 should read:

"""
We skip the missing capabilities already (and that is fine because we don't want to be crashing the compute service) but when we detect that a capability was requested that cannot be executed to provde the feature we claim it provides - the instance request should fail.
"""

I forgot to finish the sentence :)

Revision history for this message
Chris Friesen (cbf123) wrote :

Looking at the code, it seems that we have some existing checks on libvirt version, but they behave differently than what we've been discussing.

Currently if libvirt is older than MIN_LIBVIRT_MEMNODE_VERSION or MIN_LIBVIRT_MEMPAGES_VERSION it basically just ignores the relevant portion of the instance topology.

This seems like a poor design--the user has explicitly requested a particular topology, but the compute node is silently ignoring that request.

Do we want to noisily fail (I'm thinking raising an exception) in these cases instead? (As well as if qemu is too old?)

Revision history for this message
Daniel Berrange (berrange) wrote :

The original code failed noisily, but some later changes caused a regression in this. I've fixed this regression with this change

https://review.openstack.org/#/c/159106/

Revision history for this message
Chris Friesen (cbf123) wrote :

Never mind about the libvirt checks, I just noticed Daniel's change at https://review.openstack.org/#/c/159106/

Would we want to add a fix for this bug into that commit, or would it make sense for me to do a follow-on bugfix for the qemu check?

Revision history for this message
Nikola Đipanov (ndipanov) wrote :

I'd say keep it as a separate commit, as 159106 is already quite noisy.

tags: added: kilo-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/170780

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Chris Friesen (cbf123) wrote :

For anyone interested in NUMA/hugepages/qemu, the change at https://review.openstack.org/170780 is ready to go but needs core review. It's a trivial patch, shouldn't take long.

summary: - nova libvirt driver assumes qemu support for NUMA pinning
+ nova libvirt driver needs to check qemu support for NUMA & hugepages
Thierry Carrez (ttx)
tags: removed: kilo-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/170780
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7f4d7e0b67521344e32790e4b59c42073182a2cf
Submitter: Jenkins
Branch: master

commit 7f4d7e0b67521344e32790e4b59c42073182a2cf
Author: Chris Friesen <email address hidden>
Date: Sun Apr 5 22:04:36 2015 -0600

    libvirt: check qemu version for NUMA & hugepage support

    In commit 945ab28 the libvirt driver was modified to validate the
    libvirt version before enabling NUMA and hugepage support.

    While good, it's not sufficient. We also need to check the qemu
    version since NUMA hugepage support wasn't added until version 2.1.
    (See http://wiki.qemu.org/ChangeLog/2.1#x86, specifically the
    memory-backend-* objects.)

    We know that qemu on the s390x doesn't support NUMA or hugepages.
    Support for ppc64 is possible but not yet determined...we can
    update the supported list once we find out for sure.

    Change-Id: Ib31c4f806e30a8b33669d5a15bd1578050c7e352
    Closes-Bug: #1422775

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-1 → 12.0.0
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers