[linux-azure] Ubuntu 16.04 + INFINIBAND-OPEN-MPI-2VM

Bug #1856605 reported by Joseph Salisbury
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux-azure (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

We ran an RDMA test case against gallery image Ubuntu 16.04 (with proposed kernel), and found the below issue. The kernel prior to proposed does not exhibit this bug, so it is a regression:

The issue is when ibv_devinfo is run, we get the below info:

ibv_devinfo
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
No IB devices found

ibv_devices
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
    device node GUID
------ ----------------

Ibstat works as expected
CA 'mlx5_0'
        CA type: MT4120
        Number of ports: 1
        Firmware version: 16.23.1020
        Hardware version: 0
        Node GUID: 0x00155dfffe33ff49
        System image GUID: 0x506b4b0300f521ec
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 100
                Base lid: 713
                LMC: 0
                SM lid: 16
                Capability mask: 0x2651ec48
                Port GUID: 0x00155dfffd33ff49
                Link layer: InfiniBand

This Family exhibits the bug, with a subsystem of MT28800:
lspci -v|egrep 'Mel|mlx'
0002:00:02.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
        Subsystem: Mellanox Technologies MT28800 Family [ConnectX-5 Virtual Function]
        Kernel driver in use: mlx5_core
        Kernel modules: mlx5_core

This issue does not occur with Ubuntu 18.04, which has a different Subsystem(MT27800):
lspci -v|egrep 'Mel|mlx'
0002:00:02.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
        Subsystem: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function]
        Kernel driver in use: mlx5_core
        Kernel modules: mlx5_core

If we could, we would like to move to rmda-core to version 22 or higher.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-azure (Ubuntu):
status: New → Confirmed
Revision history for this message
Juerg Haefliger (juergh) wrote :

Joe, is there anything in the kernel log when this fails?

Revision history for this message
Juerg Haefliger (juergh) wrote :

Joe, just to be clear:

1) Are you saying that you ran the test on a single system but 18.04 reported a different MT2xxx family compared to 16.04, or did you run this on two different systems?

2) Can you please provide the exact kernel versions that you tried: 16.04 that passes, 16.04 that fails, 18.04 that passes.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Juerg,

The tests were pointing at a PPA that had moved. Once the tests were updated to the new PPA, the issue was resolved. This bug can be marked as invalid.

Changed in linux-azure (Ubuntu):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.