libnl should be updated to support up to 63 VFs per single PF

Bug #1567578 reported by Lorenzo Cavassa
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
libnl3 (Ubuntu)
Fix Released
Medium
Unassigned
Trusty
Fix Released
High
Jorge Niedbalski

Bug Description

[Impact]

libnl can only enable up to 30 VFs even if the PF supports up to 63 VFs in an Openstack SRIOV configuration.

As already documented in https://bugs.launchpad.net/mos/+bug/1501738 there is a bug in the default libnl library release installed on Ubuntu 14.04.4.

When trying to enable a guest with more than 30 VFs attached, the following error is returned:

error: Failed to start domain guest1
error: internal error: missing IFLA_VF_INFO in netlink response

[Test Case]

 1) Edit /etc/default/grub.

GRUB_CMDLINE_LINUX="intel_iommu=on ixgbe.max_vfs=63"

 2) Update grub and reboot the machine.

$ sudo update-grub

 3) Check that the virtual functions are available.

$ sudo lspci|grep -i eth | grep -i virtual | wc -l
126

 4) Create a KVM guest.

$ sudo uvt-kvm create guest1 release=trusty

 5) List the VF devices.

$ sudo lspci|grep -i eth | grep -i virtual | awk '{print $1}' | sed 's/\:/\_/g' | sed 's/\./\_/g' > devices.txt

 6) Get the libvirt node device.

$ sudo for device in $(cat ./devices.txt); do virsh nodedev-list | grep $device; done > pci_devices.txt

 7) Generate the XML config for each device.

$ sudo mkdir devices && for d in $(cat pci_devices.txt); do virsh nodedev-dumpxml $d > devices/$d.xml; done

 8) Save and Run the following script. (http://pastebin.ubuntu.com/23374186/)

$ sudo python generate-interfaces.py |grep address | wc -l

 9) Finally attach the devices to the guest.

$ sudo for i in $(seq 0 63); do virsh attach-device guest1 ./interfaces/$i.xml --config; done
Device attached successfully
[...]

Device attached successfully
Device attached successfully

 10) Then destroy/start the guest again, at this point the error is reproduced.

$ sudo virsh destroy guest1
Domain guest1 destroyed

$ sudo virsh start guest1

error: Failed to start domain guest1
error: internal error: missing IFLA_VF_INFO in netlink response

[Regression Potential]

 * None identified.

[Other Info]

 * Redhat Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1040626

 * A workaround is to install a newer library release.

$ wget https://launchpad.net/ubuntu/+archive/primary/+files/libnl-3-200_3.2.24-2_amd64.deb
$ wget https://launchpad.net/ubuntu/+archive/primary/+files/libnl-genl-3-200_3.2.24-2_amd64.deb
$ wget https://launchpad.net/ubuntu/+archive/primary/+files/libnl-route-3-200_3.2.24-2_amd64.deb
$ dpkg -i libnl-3-200_3.2.24-2_amd64.deb
$ dpkg -i libnl-genl-3-200_3.2.24-2_amd64.deb
$ dpkg -i libnl-route-3-200_3.2.24-2_amd64.deb

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in libnl3 (Ubuntu):
status: New → Confirmed
Revision history for this message
Billy Olsen (billy-olsen) wrote :

I was not able to reproduce the problem using trusty + 3.19 kernel and libvirt alone. I was able to scale appropriately to 63 domains using VFs.

I guess next step is to test the OpenStack instructions identified, but then it may be a problem lying elsewhere.

Mathew Hodson (mhodson)
tags: added: trusty
removed: 14.04.4 libnl ubuntu vf
Changed in libnl3 (Ubuntu):
importance: Undecided → Low
Revision history for this message
Bjoern (bjoern-t) wrote :

The biggest issue the libnl upgrade will fix are the following libvirt errors with 3.13.0 LTS kernel:

internal error: missing IFLA_VF_INFO in netlink response

Hence the package should be updated. I had those issues with the xenial LTS kernel too until the libnl was updated

Changed in libnl3 (Ubuntu):
assignee: nobody → Jorge Niedbalski (niedbalski)
importance: Low → Medium
Changed in libnl3 (Ubuntu):
status: Confirmed → Fix Released
assignee: Jorge Niedbalski (niedbalski) → nobody
Changed in libnl3 (Ubuntu Trusty):
assignee: nobody → Jorge Niedbalski (niedbalski)
Changed in libnl3 (Ubuntu Precise):
assignee: nobody → Jorge Niedbalski (niedbalski)
Changed in libnl3 (Ubuntu Trusty):
status: New → In Progress
importance: Undecided → High
Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

Hello,

I am able to reproduce this bug consistently on a Trusty (3.19.0-73-generic) machine
equipped with 2 ixgbe 10GbE cards:

04:00.0 Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)
04:00.1 Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)

The sequence to reproduce this bug is:

1) Edit /etc/default/grub

GRUB_CMDLINE_LINUX="intel_iommu=on ixgbe.max_vfs=63"

2) $ sudo update-grub

### Reboot the machine.

3) Check that the virtual functions are available:

$ sudo lspci|grep -i eth | grep -i virtual | wc -l
126

4) Create a KVM guest

$ sudo uvt-kvm create guest1 release=trusty

5) List the VF devices :

$ sudo lspci|grep -i eth | grep -i virtual | awk '{print $1}' | sed 's/\:/\_/g' | sed 's/\./\_/g' > devices.txt

6) Get the libvirt node device:

$ sudo for device in $(cat ./devices.txt); do virsh nodedev-list | grep $device; done > pci_devices.txt

7) Generate the XML config for each device:

$ sudo mkdir devices && for d in $(cat pci_devices.txt); do virsh nodedev-dumpxml $d > devices/$d.xml; done

8) Save and Run the following script (http://pastebin.ubuntu.com/23374186/)

$ sudo python generate-interfaces.py |grep address | wc -l

9) Finally attach the devices to the guest.

$ sudo for i in $(seq 0 63); do virsh attach-device guest1 ./interfaces/$i.xml --config; done
Device attached successfully
[...]

Device attached successfully
Device attached successfully

10) Then destroy/start the guest again, at this point the error is reproduced.

$ sudo virsh destroy guest1
Domain guest1 destroyed

$ sudo virsh start guest1

error: Failed to start domain guest1
error: internal error: missing IFLA_VF_INFO in netlink response

Please note that this error is not reproducible using libnl3_2_22. I am currently
working to bisect and identify the offending commit and propose a backport to Ubuntu Trusty.

description: updated
tags: added: sts
Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

Hello,

After running a bisect between tags libnl3_2_21 and libnl3_2_22 I identified
the fixer commit to be 807fddc nl: Increase receive buffer size to 4 pages

commit 807fddc4cd9ecb12ba64e1b7fa26d86b6c2f19b0
Author: Thomas Graf <email address hidden>
Date: Wed May 8 13:52:27 2013 +0200

    nl: Increase receive buffer size to 4 pages

    Assuming that the kernel does not send more than a page is no longer valid,
    and enabling MSG_PEEK'ing by default to figure out the exact message buffer
    requirements can have a negative influence on the performance of existing
    applications. Bumping the default receive buffer space to 4 pages seems
    a sane default.

    Signed-off-by: Thomas Graf <email address hidden>

---

After applying this patch on top of the current trusty-updates this problem
is not longer exhibited and I can attach the full 128 VFs to the guest.

I am proposing this patch for SRU, and I already updated the description
with the reproduction steps.

Revision history for this message
Jorge Niedbalski (niedbalski) wrote :
Louis Bouchard (louis)
tags: added: sts-sru
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Lorenzo, or anyone else affected,

Accepted libnl3 into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/libnl3/3.2.21-1ubuntu4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in libnl3 (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed
Mathew Hodson (mhodson)
description: updated
Changed in libnl3 (Ubuntu Precise):
importance: Undecided → Medium
Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

Hello Bryan,

After installing the -proposed version the error presented on this bug is not reproducible anymore and the guest starts correctly.

You can follow the reproduction steps that I added to the bug's description.

Without -proposed:

$ sudo virsh destroy guest1
Domain guest1 destroyed

$ sudo virsh start guest1

error: Failed to start domain guest1
error: internal error: missing IFLA_VF_INFO in netlink response

With proposed:

root@bronzor:/home/ubuntu# start libvirt-bin
libvirt-bin start/running, process 31492

root@bronzor:/home/ubuntu# virsh start guest1
Domain guest1 started

tags: added: verification-done
removed: verification-needed
Revision history for this message
Martin Pitt (pitti) wrote : Update Released

The verification of the Stable Release Update for libnl3 has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libnl3 - 3.2.21-1ubuntu4

---------------
libnl3 (3.2.21-1ubuntu4) trusty; urgency=high

  [ Jorge Niedbalski ]
  * d/p/lib-nl-Increase-receive-buffer-size-to-4-pages.patch: Increase
  receive buffer size to 4 pages by default. (LP: #1567578).

 -- Louis Bouchard <email address hidden> Mon, 24 Oct 2016 16:26:30 +0200

Changed in libnl3 (Ubuntu Trusty):
status: Fix Committed → Fix Released
no longer affects: libnl3 (Ubuntu Precise)
Louis Bouchard (louis)
tags: removed: sts-sru
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.