Images with hw:vif_multiqueue_enabled can be limited to 8 queues even if more are supported

Bug #1847367 reported by Chris Stone
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
sean mooney
Queens
Triaged
Low
Unassigned
Rocky
Triaged
Low
Unassigned
Stein
Triaged
Low
Unassigned
Train
Fix Released
Low
sean mooney

Bug Description

Nova version: 18.2.3
Release: Rocky
Compute node OS: CentOS 7.3
Compute node kernel: 3.10.0-327.13.1.el7.x86_64

In https://bugs.launchpad.net/nova/+bug/1570631 and commit https://review.opendev.org/#/c/332660/, a bug was fixed by making the assumption that the kernel version should also dictate the max number of queues on the tap interface when setting hw:vif_multiqueue_enabled=True. It was decided that 3.x kernels have a max queue count of 8. Unfortunately not all distributions follow this, and CentOS/RHEL has supported up to 256 queues since at least 7.2 even with a 3.x kernel.

The result of this is that a 20 core VM created in Mitaka will have 20 queues enabled (because the limit of 8 had not been added). The very same host after being upgraded to Rocky will instead only give 8 queues to the VM even though the kernel supports 256.

Could a workaround option be implemented to disable this check, or manually define the max queue count?

Snippet of drivers/net/tun.c from CentOS 7.2 kernel source code:
/* MAX_TAP_QUEUES 256 is chosen to allow rx/tx queues to be equal
 * to max number of VCPUs in guest. */
#define MAX_TAP_QUEUES 256
#define MAX_TAP_FLOWS 4096

Snippet from the 3.10.0 kernel code from https://elixir.bootlin.com/linux/v3.10/source/drivers/net/tun.c:
/* DEFAULT_MAX_NUM_RSS_QUEUES were choosed to let the rx/tx queues allocated for
 * the netdevice to be fit in one page. So we can make sure the success of
 * memory allocation. TODO: increase the limit. */
#define MAX_TAP_QUEUES DEFAULT_MAX_NUM_RSS_QUEUES
#define MAX_TAP_FLOWS 4096

In the above example, DEFAULT_MAX_NUM_RSS_QUEUES is set to 8.

Tags: libvirt
Chris Stone (cstone-0)
description: updated
Matt Riedemann (mriedem)
tags: added: libvirt
Revision history for this message
Matt Riedemann (mriedem) wrote :

I've added this to the Oct 10 nova meeting agenda's open discussion section. I'm not sure I'll be around to discuss it during the meeting though since I have to leave early. It seems the options are:

1. Add a workaround option to configure the limit per compute host.

2. Add some distro-specific check to the code (that seems pretty nasty but at least it doesn't require a new option).

3. Figure out if there is a 3.x minor version that we could update the code to check which will work across multiple major distributions. Without knowing what/when the CentOS kernel was patched that's probably hard to do.

Changed in nova:
status: New → Confirmed
Changed in nova:
assignee: nobody → sean mooney (sean-k-mooney)
Revision history for this message
Matt Riedemann (mriedem) wrote :

We discussed this in the Oct 17 nova meeting:

http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-10-17-14.00.log.html#l-239

And agreed on two things:

1. sean-k-mooney says the limit shouldn't be applied to vhost-user ports but that's a separate bug IMO which Sean can pursue.

2. We can add a [libvirt] group config option which defaults to None to let nova decide but allows for overriding the limit. Note that we wouldn't use a [workarounds] group option since it's not clear when we could remove the workaround option since we don't really enforce minimum versions of kernels in nova.

The change for #2 can be used to close *this* bug.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/695118

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/695118
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0e6aac3c2d97c999451da50537df6a0cbddeb4a6
Submitter: Zuul
Branch: master

commit 0e6aac3c2d97c999451da50537df6a0cbddeb4a6
Author: Sean Mooney <email address hidden>
Date: Wed Nov 20 00:13:03 2019 +0000

    add [libvirt]/max_queues config option

    This change adds a max_queues config option to allow
    operators to set the maximium number of virtio queue
    pairs that can be allocated to a virtio network
    interface.

    Change-Id: I9abe783a9a9443c799e7c74a57cc30835f679a01
    Closes-Bug: #1847367

Changed in nova:
status: In Progress → Fix Released
Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → Low
Revision history for this message
sean mooney (sean-k-mooney) wrote :

this goes back to when the feature was intoduced but this was reported against a centos/rhel host on rocked and release before queens will be out of support in a week or so, as such there is likely no point in going back further then queens. i don't think this issue will be present on other distros.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/700894

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/740064

Revision history for this message
sean mooney (sean-k-mooney) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/train)

Reviewed: https://review.opendev.org/740064
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=286d7cfc5c45535561b6ed6e7906bb40de5abc93
Submitter: Zuul
Branch: stable/train

commit 286d7cfc5c45535561b6ed6e7906bb40de5abc93
Author: Sean Mooney <email address hidden>
Date: Wed Nov 20 00:13:03 2019 +0000

    add [libvirt]/max_queues config option

    This change adds a max_queues config option to allow
    operators to set the maximium number of virtio queue
    pairs that can be allocated to a virtio network
    interface.

    Change-Id: I9abe783a9a9443c799e7c74a57cc30835f679a01
    Closes-Bug: #1847367
    (cherry picked from commit 0e6aac3c2d97c999451da50537df6a0cbddeb4a6)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 20.5.0

This issue was fixed in the openstack/nova 20.5.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by "sean mooney <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/nova/+/700894
Reason: realistically since i have not worked on this in 2-3 years at this point im not going to get back to this anytime soon so ill abandon this for now

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.