Images with hw:vif_multiqueue_enabled can be limited to 8 queues even if more are supported

Bug #1847367 reported by Chris Stone on 2019-10-08
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Low
sean mooney
Queens
Low
Unassigned
Rocky
Low
Unassigned
Stein
Low
Unassigned
Train
Low
sean mooney

Bug Description

Nova version: 18.2.3
Release: Rocky
Compute node OS: CentOS 7.3
Compute node kernel: 3.10.0-327.13.1.el7.x86_64

In https://bugs.launchpad.net/nova/+bug/1570631 and commit https://review.opendev.org/#/c/332660/, a bug was fixed by making the assumption that the kernel version should also dictate the max number of queues on the tap interface when setting hw:vif_multiqueue_enabled=True. It was decided that 3.x kernels have a max queue count of 8. Unfortunately not all distributions follow this, and CentOS/RHEL has supported up to 256 queues since at least 7.2 even with a 3.x kernel.

The result of this is that a 20 core VM created in Mitaka will have 20 queues enabled (because the limit of 8 had not been added). The very same host after being upgraded to Rocky will instead only give 8 queues to the VM even though the kernel supports 256.

Could a workaround option be implemented to disable this check, or manually define the max queue count?

Snippet of drivers/net/tun.c from CentOS 7.2 kernel source code:
/* MAX_TAP_QUEUES 256 is chosen to allow rx/tx queues to be equal
 * to max number of VCPUs in guest. */
#define MAX_TAP_QUEUES 256
#define MAX_TAP_FLOWS 4096

Snippet from the 3.10.0 kernel code from https://elixir.bootlin.com/linux/v3.10/source/drivers/net/tun.c:
/* DEFAULT_MAX_NUM_RSS_QUEUES were choosed to let the rx/tx queues allocated for
 * the netdevice to be fit in one page. So we can make sure the success of
 * memory allocation. TODO: increase the limit. */
#define MAX_TAP_QUEUES DEFAULT_MAX_NUM_RSS_QUEUES
#define MAX_TAP_FLOWS 4096

In the above example, DEFAULT_MAX_NUM_RSS_QUEUES is set to 8.

Chris Stone (cstone-0) on 2019-10-08
description: updated
Matt Riedemann (mriedem) on 2019-10-10
tags: added: libvirt
Matt Riedemann (mriedem) wrote :

I've added this to the Oct 10 nova meeting agenda's open discussion section. I'm not sure I'll be around to discuss it during the meeting though since I have to leave early. It seems the options are:

1. Add a workaround option to configure the limit per compute host.

2. Add some distro-specific check to the code (that seems pretty nasty but at least it doesn't require a new option).

3. Figure out if there is a 3.x minor version that we could update the code to check which will work across multiple major distributions. Without knowing what/when the CentOS kernel was patched that's probably hard to do.

Changed in nova:
status: New → Confirmed
Changed in nova:
assignee: nobody → sean mooney (sean-k-mooney)
Matt Riedemann (mriedem) wrote :

We discussed this in the Oct 17 nova meeting:

http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-10-17-14.00.log.html#l-239

And agreed on two things:

1. sean-k-mooney says the limit shouldn't be applied to vhost-user ports but that's a separate bug IMO which Sean can pursue.

2. We can add a [libvirt] group config option which defaults to None to let nova decide but allows for overriding the limit. Note that we wouldn't use a [workarounds] group option since it's not clear when we could remove the workaround option since we don't really enforce minimum versions of kernels in nova.

The change for #2 can be used to close *this* bug.

Fix proposed to branch: master
Review: https://review.opendev.org/695118

Changed in nova:
status: Confirmed → In Progress

Reviewed: https://review.opendev.org/695118
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0e6aac3c2d97c999451da50537df6a0cbddeb4a6
Submitter: Zuul
Branch: master

commit 0e6aac3c2d97c999451da50537df6a0cbddeb4a6
Author: Sean Mooney <email address hidden>
Date: Wed Nov 20 00:13:03 2019 +0000

    add [libvirt]/max_queues config option

    This change adds a max_queues config option to allow
    operators to set the maximium number of virtio queue
    pairs that can be allocated to a virtio network
    interface.

    Change-Id: I9abe783a9a9443c799e7c74a57cc30835f679a01
    Closes-Bug: #1847367

Changed in nova:
status: In Progress → Fix Released
Matt Riedemann (mriedem) on 2019-12-11
Changed in nova:
importance: Undecided → Low
sean mooney (sean-k-mooney) wrote :

this goes back to when the feature was intoduced but this was reported against a centos/rhel host on rocked and release before queens will be out of support in a week or so, as such there is likely no point in going back further then queens. i don't think this issue will be present on other distros.

Reviewed: https://review.opendev.org/740064
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=286d7cfc5c45535561b6ed6e7906bb40de5abc93
Submitter: Zuul
Branch: stable/train

commit 286d7cfc5c45535561b6ed6e7906bb40de5abc93
Author: Sean Mooney <email address hidden>
Date: Wed Nov 20 00:13:03 2019 +0000

    add [libvirt]/max_queues config option

    This change adds a max_queues config option to allow
    operators to set the maximium number of virtio queue
    pairs that can be allocated to a virtio network
    interface.

    Change-Id: I9abe783a9a9443c799e7c74a57cc30835f679a01
    Closes-Bug: #1847367
    (cherry picked from commit 0e6aac3c2d97c999451da50537df6a0cbddeb4a6)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers