Infiniband (mellanox) SR-IOV and libvirt + libnl problems

Bug #1496942 reported by Rafael David Tinoco
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Debian)
New
Undecided
Unassigned
linux (Ubuntu)
Invalid
Critical
Unassigned

Bug Description

When trying to start an IB SR-IOV guest by using the following XML:

    <interface type='hostdev' managed='yes'>
      <mac address='52:54:00:70:ba:16'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </interface>

following the Mellanox SR-IOV guide, we are able to start guests using kernel 3.16 (Utopic).

We are NOT able to start guests using 3.13 OR 3.19. The following error occurs:

2015-09-17 02:25:07.208+0000: 52157: info : libvirt version: 1.2.12, package: 1.2.12-0ubuntu14.1~cloud0
2015-09-17 02:25:07.208+0000: 52157: error : virSecurityDriverLookup:80 : unsupported configuration: Security driver apparmor not enabled
2015-09-17 02:25:42.308+0000: 52281: info : libvirt version: 1.2.12, package: 1.2.12-0ubuntu14.1~cloud0
2015-09-17 02:25:42.308+0000: 52281: error : virSecurityDriverLookup:80 : unsupported configuration: Security driver apparmor not enabled
2015-09-17 02:25:48.996+0000: 52274: error : virNetDevParseVfConfig:1905 : internal error: missing IFLA_VF_INFO in netlink response
2015-09-17 02:25:49.006+0000: 52274: error : virFileReadAll:1347 : Failed to open file '/var/run/libvirt/hostdevmgr/ib0_vf0': No such file or directory
2015-09-17 02:25:49.006+0000: 52274: error : virFileReadAll:1347 : Failed to open file '/var/run/libvirt/qemu/ib0_vf0': No such file or directory

So probably there is some regression in between 3.16 and 3.19 for the IFLA_VF_INFO feature from netlink AND this has to be backported to kernel 3.13 for Trusty to have IB SR-IOV working.

Tags: cscc
Changed in linux (Ubuntu):
status: New → In Progress
assignee: nobody → Rafael David Tinoco (inaddy)
Changed in linux (Ubuntu):
status: In Progress → Confirmed
importance: Undecided → Critical
Changed in linux (Ubuntu):
assignee: Rafael David Tinoco (inaddy) → nobody
assignee: nobody → Rafael David Tinoco (inaddy)
penalvch (penalvch)
Changed in linux (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Installing Mellanox OFED in Xenial with Kernel 4.4:

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.1 LTS
Release: 16.04
Codename: xenial

# uname -a
Linux heatmor 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11 18:01:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

And, after configuring SR-IOV for ConnectX-4 (after configured with mlxconfig):

# lspci | grep -i mellanox
08:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
08:00.1 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
08:00.2 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
08:00.3 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
08:00.4 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
08:00.5 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]

echo 4 > /sys/class/infiniband/mlx5_0/device/sriov_numvfs

echo Follow > /sys/class/infiniband/mlx5_0/device/sriov/0/policy

echo e4:1d:2d:03:00:af:4f:06 > /sys/class/infiniband/mlx5_0/device/sriov/0/node
echo e4:1d:2d:03:00:af:5f:06 > /sys/class/infiniband/mlx5_0/device/sriov/0/port

echo e4:1d:2d:03:00:af:4f:07 > /sys/class/infiniband/mlx5_0/device/sriov/1/node
echo e4:1d:2d:03:00:af:5f:07 > /sys/class/infiniband/mlx5_0/device/sriov/1/port

echo e4:1d:2d:03:00:af:4f:08 > /sys/class/infiniband/mlx5_0/device/sriov/2/node
echo e4:1d:2d:03:00:af:5f:08 > /sys/class/infiniband/mlx5_0/device/sriov/2/port

echo e4:1d:2d:03:00:af:4f:09 > /sys/class/infiniband/mlx5_0/device/sriov/3/node
echo e4:1d:2d:03:00:af:5f:09 > /sys/class/infiniband/mlx5_0/device/sriov/3/port

echo 0000:08:00.2 > /sys/bus/pci/drivers/mlx5_core/unbind
echo 0000:08:00.2 > /sys/bus/pci/drivers/mlx5_core/bind

echo 0000:08:00.3 > /sys/bus/pci/drivers/mlx5_core/unbind
echo 0000:08:00.3 > /sys/bus/pci/drivers/mlx5_core/bind

echo 0000:08:00.4 > /sys/bus/pci/drivers/mlx5_core/unbind
echo 0000:08:00.4 > /sys/bus/pci/drivers/mlx5_core/bind

echo 0000:08:00.5 > /sys/bus/pci/drivers/mlx5_core/unbind
echo 0000:08:00.5 > /sys/bus/pci/drivers/mlx5_core/bind

And attaching this XML to a guest:

<interface type='hostdev' managed='yes'>
    <source>
      <address type='pci' domain='0' bus='8' slot='0' function='2'/>
        </source>
</interface>

root@heatmor:~# virsh attach-device ibdhcprelay ./new-device.xml --config

I can't start the guest in question:

# virsh start ibdhcprelay
error: Failed to start domain ibdhcprelay
error: internal error: missing IFLA_VF_INFO in netlink response

Looks like there is an incompatibility between Mellanox OFED DKMS packages and Ubuntu kernel (specifically about netlink support).

Changed in linux (Ubuntu):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Brad Figg (brad-figg)
tags: added: cscc
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

This bug was related to incompatibilities between dkms provided by the Mellanox OFED - implementing some new netlink protocol addons - and Ubuntu kernel. This type of issues have already been addressed between Canonical and Mellanox (but might occur again while Mellanox OFED is using for RDMA and IB related setups).

Changed in linux (Ubuntu):
status: Triaged → Incomplete
maziyar (maziyarb)
Changed in linux (Ubuntu):
status: Incomplete → Fix Committed
Changed in linux (Ubuntu):
status: Fix Committed → In Progress
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Let's keep this bug as Invalid please. There is no issue specific issue to be solved here.

Thank you

-rafaeldtinoco

Changed in linux (Ubuntu):
status: In Progress → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.