With hw:vif_multiqueue_enabled, libvirt driver fails with VM larger than 8 vCPU

Bug #1570631 reported by Kent Nickell
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Unassigned
nova (Ubuntu)
Fix Released
Low
Unassigned

Bug Description

Nova version: 2:12.0.0-ubuntu2~cloud0
Release: Liberty
Compute node kernel: 3.19.0-47-generic
Hypervisor: Libvirt+KVM
libvirtd version: 1.2.16
Neutron network (Linuxbridge Agent)

When attempting to instantiate an VM based on an image with the metadata hw:vif_multiqueue_enabled=true, creation will fail if the flavor has >8 cores assigned. If the flavor specifies 8 or fewer vCPUs, creation is successful.

From /var/log/libvirt/libvirtd.log:

2016-04-14 21:19:08.161+0000: 3651: error : virNetDevTapCreate:290 : Unable to create tap device tap11db5bd0-3a: Argument list too long

This is the error throw when attempting to create the VM.

I believe the reason is that in kernels prior to 4.0, the number of queues on a tap interface was limited to 8.

Based on http://lxr.free-electrons.com/source/drivers/net/tun.c?v=3.19#L129, MAX_TAP_QUEUES resolves to 8 prior to kernel 4.0.

In the libvirt vif driver (nova/virt/libvirt/vif.py), in __get_virtio_mq_settings, this limit is not respected when setting vhost_queues = flavor.cpus. So when the domain XML is written for the guest, vhost_queues is used in the 'queues' argument in the driver. When this value is >8, it fails when attempting to create the tap interface.

Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

The bug is also present in master.

tags: added: libvirt low-hanging-fruit
Changed in nova:
importance: Undecided → Low
status: New → Confirmed
Changed in nova:
assignee: nobody → Kengo Sakai (kengo-sakai)
Changed in nova:
status: Confirmed → In Progress
James Page (james-page)
Changed in nova (Ubuntu):
status: New → Triaged
importance: Undecided → Low
Revision history for this message
Kengo Sakai (kengo-sakai) wrote :

Is there anyone who knows how to retrieve MAX_TAP_QUEUES from the running system? I looked at drivers/net/tun.c but couldn't find how to do it.
MAX_TAP_QUEUES is 8 in kernel 3.x[1] and it is 256 in kernel 4.x[2]. I want to find MAX_TAP_QUEUES programmatically without hardcoding its value for each kernel version.

[1]https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/net/tun.c?id=refs/tags/v3.18.35#n118
[2]https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/net/tun.c?id=refs/tags/v4.1.26#n128

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/332660

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/332660
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b9303e67640ac2052c0a79189b29f60bde6b8fdc
Submitter: Jenkins
Branch: master

commit b9303e67640ac2052c0a79189b29f60bde6b8fdc
Author: Kengo Sakai <email address hidden>
Date: Wed Jun 22 16:04:06 2016 +0900

    Check if flavor.vcpus is more than MAX_TAP_QUEUES

    When attempting to instantiate an instance based on an image with
    the metadata hw:vif_multiqueue_enabled=true, the code uses
    flavor.vcpus as the number of queues on a tap interface.

    In kernels prior to 3.0, multiple queues on a tap interface
    is not supported[1]. In kernels 3.x, the number of queues
    on a tap interface is limited to 8 as MAX_TAP_QUEUES in tun
    driver[2]. From 4.0, the number is 256[3]. If flavor.vcpus
    is more than MAX_TAP_QUEUES, creating the tap interface
    fails.

    This commit adds logic to check if flavor.vcpus is more
    than MAX_TAP_QUEUES and use MAX_TAP_QUEUES as the number
    of queues if so.

    [1]https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/net/tun.c?id=refs/tags/v2.6.32.71#n101
    [2]https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/net/tun.c?id=refs/tags/v3.18.35#n118
    [3]https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/net/tun.c?id=refs/tags/v4.1.26#n128

    Change-Id: I2aa24e3cf550ff69909a2b4bc8be90641dbe3d69
    Closes-Bug: #1570631

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/nova 14.0.0.0b2

This issue was fixed in the openstack/nova 14.0.0.0b2 development milestone.

Chuck Short (zulcss)
Changed in nova (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Saverio Proto (zioproto) wrote :

The patch does not solve the problem for me.

nova has to check the qemu version in addition to the kernel version and set its limit accordingly

In the version I am using of qemu (Ubuntu Liberty UCA) I have:

VIRTIO_PCI_QUEUE_MAX == 64

This leads to 31 max queues: (VIRTIO_PCI_QUEUE_MAX - 1) / 2

It is not just the Kernel version

Please read also bug #1644839

Changed in nova:
assignee: Kengo Sakai (kengo-sakai) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/700894

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by "sean mooney <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/nova/+/700894
Reason: realistically since i have not worked on this in 2-3 years at this point im not going to get back to this anytime soon so ill abandon this for now

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.