ip commands error with mellanox devices in switchdev mode

Bug #1849856 reported by James Page
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
iproute2 (Ubuntu)
New
Undecided
Unassigned
linux (Ubuntu)
Incomplete
Undecided
Unassigned

Bug Description

Kernel: using 5.0.0-23-generic from hwe-edge

Configuring a mellanox connectx device with configured VF's into switchdev (rather than legacy) mode result in the ip cli tool erroring when trying to query the interface state:

$ sudo ip link
Error: Buffer too small for object.
Dump terminated

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: iproute2 4.15.0-2ubuntu1
ProcVersionSignature: Ubuntu 5.0.0-23.24~18.04.1-generic 5.0.15
Uname: Linux 5.0.0-23-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.7
Architecture: amd64
Date: Fri Oct 25 14:55:36 2019
ProcEnviron:
 TERM=screen-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=C.UTF-8
 SHELL=/bin/bash
SourcePackage: iproute2
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :

Also tried with the iproute2 in bionic-backports - same result

Revision history for this message
James Page (james-page) wrote :

strace from failing command

Revision history for this message
James Page (james-page) wrote :

Managed to figure out why this was happening - I had the NUM_VFS in the card firmware configured to 127 (the maximum value) - reducing this to a lower number allowed me to successfully switch the cards into switchdev mode at which point the ip tools all worked again.

The clue that pointed to this appeared when using the proposed 5.3 hwe edge kernel:

  [ 694.027106] infiniband (null): mlx5_ib_alloc_counters:5452:(pid 47479): couldn't allocate queue counter for port 128, err -12

(128 being the pertinent value)

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1849856

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: disco
Revision history for this message
Terry Rudd (terrykrudd) wrote :

Is there further work to do on this given the apparent resolution of the problem?

Revision history for this message
James Page (james-page) wrote :

@terrykrudd - I think there is a bug here somewhere as the card should support 127 VF's (and 1 PF) per port - setting the firmware to this limit results in this error.

I'm not sure whether this problem is in the kernel driver code or in the underlying firmware.

Revision history for this message
James Page (james-page) wrote :

FWIW same is seem with the 5.3.0 kernel that's just appeared in HWE edge.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.