Infiniband (mellanox) SR-IOV and libvirt + libnl problems

Bug #1496942 reported by Rafael David Tinoco on 2015-09-17
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Critical
Rafael David Tinoco

Bug Description

When trying to start an IB SR-IOV guest by using the following XML:

    <interface type='hostdev' managed='yes'>
      <mac address='52:54:00:70:ba:16'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </interface>

following the Mellanox SR-IOV guide, we are able to start guests using kernel 3.16 (Utopic).

We are NOT able to start guests using 3.13 OR 3.19. The following error occurs:

2015-09-17 02:25:07.208+0000: 52157: info : libvirt version: 1.2.12, package: 1.2.12-0ubuntu14.1~cloud0
2015-09-17 02:25:07.208+0000: 52157: error : virSecurityDriverLookup:80 : unsupported configuration: Security driver apparmor not enabled
2015-09-17 02:25:42.308+0000: 52281: info : libvirt version: 1.2.12, package: 1.2.12-0ubuntu14.1~cloud0
2015-09-17 02:25:42.308+0000: 52281: error : virSecurityDriverLookup:80 : unsupported configuration: Security driver apparmor not enabled
2015-09-17 02:25:48.996+0000: 52274: error : virNetDevParseVfConfig:1905 : internal error: missing IFLA_VF_INFO in netlink response
2015-09-17 02:25:49.006+0000: 52274: error : virFileReadAll:1347 : Failed to open file '/var/run/libvirt/hostdevmgr/ib0_vf0': No such file or directory
2015-09-17 02:25:49.006+0000: 52274: error : virFileReadAll:1347 : Failed to open file '/var/run/libvirt/qemu/ib0_vf0': No such file or directory

So probably there is some regression in between 3.16 and 3.19 for the IFLA_VF_INFO feature from netlink AND this has to be backported to kernel 3.13 for Trusty to have IB SR-IOV working.

Changed in linux (Ubuntu):
status: New → In Progress
assignee: nobody → Rafael David Tinoco (inaddy)
Changed in linux (Ubuntu):
status: In Progress → Confirmed
importance: Undecided → Critical
Changed in linux (Ubuntu):
assignee: Rafael David Tinoco (inaddy) → nobody
assignee: nobody → Rafael David Tinoco (inaddy)
Changed in linux (Ubuntu):
status: Confirmed → Triaged
Rafael David Tinoco (inaddy) wrote :

Installing Mellanox OFED in Xenial with Kernel 4.4:

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.1 LTS
Release: 16.04
Codename: xenial

# uname -a
Linux heatmor 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11 18:01:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

And, after configuring SR-IOV for ConnectX-4 (after configured with mlxconfig):

# lspci | grep -i mellanox
08:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
08:00.1 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
08:00.2 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
08:00.3 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
08:00.4 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
08:00.5 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]

echo 4 > /sys/class/infiniband/mlx5_0/device/sriov_numvfs

echo Follow > /sys/class/infiniband/mlx5_0/device/sriov/0/policy

echo e4:1d:2d:03:00:af:4f:06 > /sys/class/infiniband/mlx5_0/device/sriov/0/node
echo e4:1d:2d:03:00:af:5f:06 > /sys/class/infiniband/mlx5_0/device/sriov/0/port

echo e4:1d:2d:03:00:af:4f:07 > /sys/class/infiniband/mlx5_0/device/sriov/1/node
echo e4:1d:2d:03:00:af:5f:07 > /sys/class/infiniband/mlx5_0/device/sriov/1/port

echo e4:1d:2d:03:00:af:4f:08 > /sys/class/infiniband/mlx5_0/device/sriov/2/node
echo e4:1d:2d:03:00:af:5f:08 > /sys/class/infiniband/mlx5_0/device/sriov/2/port

echo e4:1d:2d:03:00:af:4f:09 > /sys/class/infiniband/mlx5_0/device/sriov/3/node
echo e4:1d:2d:03:00:af:5f:09 > /sys/class/infiniband/mlx5_0/device/sriov/3/port

echo 0000:08:00.2 > /sys/bus/pci/drivers/mlx5_core/unbind
echo 0000:08:00.2 > /sys/bus/pci/drivers/mlx5_core/bind

echo 0000:08:00.3 > /sys/bus/pci/drivers/mlx5_core/unbind
echo 0000:08:00.3 > /sys/bus/pci/drivers/mlx5_core/bind

echo 0000:08:00.4 > /sys/bus/pci/drivers/mlx5_core/unbind
echo 0000:08:00.4 > /sys/bus/pci/drivers/mlx5_core/bind

echo 0000:08:00.5 > /sys/bus/pci/drivers/mlx5_core/unbind
echo 0000:08:00.5 > /sys/bus/pci/drivers/mlx5_core/bind

And attaching this XML to a guest:

<interface type='hostdev' managed='yes'>
    <source>
      <address type='pci' domain='0' bus='8' slot='0' function='2'/>
        </source>
</interface>

root@heatmor:~# virsh attach-device ibdhcprelay ./new-device.xml --config

I can't start the guest in question:

# virsh start ibdhcprelay
error: Failed to start domain ibdhcprelay
error: internal error: missing IFLA_VF_INFO in netlink response

Looks like there is an incompatibility between Mellanox OFED DKMS packages and Ubuntu kernel (specifically about netlink support).

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers