With hw:vif_multiqueue_enabled, libvirt driver fails with VM larger than 8 vCPU

Bug #1570631 reported by Kent Nickell on 2016-04-14
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Low
Unassigned
nova (Ubuntu)
Low
Unassigned

Bug Description

Nova version: 2:12.0.0-ubuntu2~cloud0
Release: Liberty
Compute node kernel: 3.19.0-47-generic
Hypervisor: Libvirt+KVM
libvirtd version: 1.2.16
Neutron network (Linuxbridge Agent)

When attempting to instantiate an VM based on an image with the metadata hw:vif_multiqueue_enabled=true, creation will fail if the flavor has >8 cores assigned. If the flavor specifies 8 or fewer vCPUs, creation is successful.

From /var/log/libvirt/libvirtd.log:

2016-04-14 21:19:08.161+0000: 3651: error : virNetDevTapCreate:290 : Unable to create tap device tap11db5bd0-3a: Argument list too long

This is the error throw when attempting to create the VM.

I believe the reason is that in kernels prior to 4.0, the number of queues on a tap interface was limited to 8.

Based on http://lxr.free-electrons.com/source/drivers/net/tun.c?v=3.19#L129, MAX_TAP_QUEUES resolves to 8 prior to kernel 4.0.

In the libvirt vif driver (nova/virt/libvirt/vif.py), in __get_virtio_mq_settings, this limit is not respected when setting vhost_queues = flavor.cpus. So when the domain XML is written for the guest, vhost_queues is used in the 'queues' argument in the driver. When this value is >8, it fails when attempting to create the tap interface.

Sylvain Bauza (sylvain-bauza) wrote :

The bug is also present in master.

tags: added: libvirt low-hanging-fruit
Changed in nova:
importance: Undecided → Low
status: New → Confirmed
Changed in nova:
assignee: nobody → Kengo Sakai (kengo-sakai)
Changed in nova:
status: Confirmed → In Progress
James Page (james-page) on 2016-06-09
Changed in nova (Ubuntu):
status: New → Triaged
importance: Undecided → Low
Kengo Sakai (kengo-sakai) wrote :

Is there anyone who knows how to retrieve MAX_TAP_QUEUES from the running system? I looked at drivers/net/tun.c but couldn't find how to do it.
MAX_TAP_QUEUES is 8 in kernel 3.x[1] and it is 256 in kernel 4.x[2]. I want to find MAX_TAP_QUEUES programmatically without hardcoding its value for each kernel version.

[1]https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/net/tun.c?id=refs/tags/v3.18.35#n118
[2]https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/net/tun.c?id=refs/tags/v4.1.26#n128

Reviewed: https://review.openstack.org/332660
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b9303e67640ac2052c0a79189b29f60bde6b8fdc
Submitter: Jenkins
Branch: master

commit b9303e67640ac2052c0a79189b29f60bde6b8fdc
Author: Kengo Sakai <email address hidden>
Date: Wed Jun 22 16:04:06 2016 +0900

    Check if flavor.vcpus is more than MAX_TAP_QUEUES

    When attempting to instantiate an instance based on an image with
    the metadata hw:vif_multiqueue_enabled=true, the code uses
    flavor.vcpus as the number of queues on a tap interface.

    In kernels prior to 3.0, multiple queues on a tap interface
    is not supported[1]. In kernels 3.x, the number of queues
    on a tap interface is limited to 8 as MAX_TAP_QUEUES in tun
    driver[2]. From 4.0, the number is 256[3]. If flavor.vcpus
    is more than MAX_TAP_QUEUES, creating the tap interface
    fails.

    This commit adds logic to check if flavor.vcpus is more
    than MAX_TAP_QUEUES and use MAX_TAP_QUEUES as the number
    of queues if so.

    [1]https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/net/tun.c?id=refs/tags/v2.6.32.71#n101
    [2]https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/net/tun.c?id=refs/tags/v3.18.35#n118
    [3]https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/net/tun.c?id=refs/tags/v4.1.26#n128

    Change-Id: I2aa24e3cf550ff69909a2b4bc8be90641dbe3d69
    Closes-Bug: #1570631

Changed in nova:
status: In Progress → Fix Released

This issue was fixed in the openstack/nova 14.0.0.0b2 development milestone.

Chuck Short (zulcss) on 2016-10-05
Changed in nova (Ubuntu):
status: Triaged → Fix Released
Saverio Proto (zioproto) wrote :

The patch does not solve the problem for me.

nova has to check the qemu version in addition to the kernel version and set its limit accordingly

In the version I am using of qemu (Ubuntu Liberty UCA) I have:

VIRTIO_PCI_QUEUE_MAX == 64

This leads to 31 max queues: (VIRTIO_PCI_QUEUE_MAX - 1) / 2

It is not just the Kernel version

Please read also bug #1644839

Changed in nova:
assignee: Kengo Sakai (kengo-sakai) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers