Devlink - add RoCE disable kernel support

Bug #1877270 reported by Mohammad Heib
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Medium
Jeff Lane 
Focal
Fix Released
Medium
Jeff Lane 
Groovy
Invalid
Medium
Jeff Lane 

Bug Description

[Impact]

RoCE disable feature was added to the kernel v5.5.
This feature was requested by Mellanox customers that use Ubuntu 20.04,
and it's a very high important to deliver this feature to the customers
in one of ubuntu 20.04 SRU to the GA 5.4 kernel.

RoCE enablement state controls driver support for RoCE traffic.
When RoCE is disabled, there is no gid table, only raw ethernet QPs are supported and traffic on the well known UDP RoCE port is handled as raw ethernet traffic.

[Test Case]

To change RoCE enablement state a user must change the driverinit cmode value and run devlink reload.

User command examples:

- Disable RoCE::

    $ devlink dev param set pci/0000:06:00.0 name enable_roce value false cmode driverinit
    $ devlink dev reload pci/0000:06:00.0

- Read RoCE enablement state::

    $ devlink dev param show pci/0000:06:00.0 name enable_roce
      pci/0000:06:00.0:
      name enable_roce type generic
      values:
         cmode driverinit value true

[Regression Potential]

This feature shouldn't affect the regression because it's only adding support for RoCE enable/disable.
Also, This feature was tested internally by Mellanox QA teams
those tests logs/results are private unfortunately i can't share it here

Test 5.4 Ubuntu kernels have also been tested by Mellanox and have been verified working.

[Other Info]

Feature patchset:

4cca96a8d9da IB/mlx5: Do reverse sequence during device removal
4383cfcc65e7 net/mlx5: Add devlink reload
32680da71034 net/mlx5: Remove unneeded variable in mlx5_unload_one
94de879c28d8 IB/mlx5: Load profile according to RoCE enablement state
b5a498baf929 IB/mlx5: Rename profile and init methods
cc9defcbb8fa net/mlx5: Handle "enable_roce" devlink param
e90cde0d76f0 net/mlx5: Document flow_steering_mode devlink param
6c7295e13ffd devlink: Add new "enable_roce" generic device param

All are in upstream in 5.5, so no pick into Groovy necessary

Patches picked into this branch:
https://code.launchpad.net/~bladernr/ubuntu/+source/linux/+git/focal/+ref/1877270-pull-roce-disable-from-5.5

userspace:

No userspace dependency on this. the feature uses the devlink
param functionality which already exists in UB20.04.

Revision history for this message
Jeff Lane  (bladernr) wrote :

Quick runthrough,

All cherry-pick cleanly into Focal/master, You can grab them and verify the branch here:

git clone -b 1877270-pull-roce-disable-from-5.5 https://git.launchpad.net/~bladernr/ubuntu/+source/linux/+git/focal

Revision history for this message
Jeff Lane  (bladernr) wrote :

Moved to kernel package

Changed in mstflint (Ubuntu):
assignee: nobody → Jeff Lane (bladernr)
status: New → In Progress
importance: Undecided → Medium
affects: mstflint (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
assignee: Jeff Lane (bladernr) → nobody
dann frazier (dannf)
Changed in linux (Ubuntu Focal):
status: New → In Progress
Revision history for this message
Mohammad Heib (mohamadh) wrote :

Hi Jeff,
after testing the feature we see that some devlink functionality is missing in our driver and we must add them before adding the ROCE feature.

This missing functionality is the implementation of devlink reload and it was introduced upstream by the following two patches:

4383cfcc65e7 ("net/mlx5: Add devlink reload")
32680da71034 ("net/mlx5: Remove unneeded variable in mlx5_unload_one")

the above patches applied cleanly over your tree but we have some compiling issue since the following patch are missing:

 070c63f20f6c ("net: devlink: allow to change namespaces during reload")

these compiling issues can be resolved by aligned mlx5_devlink_reload_down to Focal devlink_ops->reload_down by removing the 'bool netns_change' parameter from mlx5_devlink_reload_down function.

Thanks,

Jeff Lane  (bladernr)
description: updated
Revision history for this message
Jeff Lane  (bladernr) wrote :
description: updated
Jeff Lane  (bladernr)
Changed in linux (Ubuntu Focal):
assignee: nobody → Jeff Lane (bladernr)
Revision history for this message
Jeff Lane  (bladernr) wrote :

Waiting on Mellanox to report back on test kernels.

Changed in linux (Ubuntu Groovy):
assignee: nobody → Jeff Lane (bladernr)
Revision history for this message
Amir Tzin (amirtz) wrote :

Hi Jeff,
the following patch need to be included in the patch set.

4cca96a8d9da IB/mlx5: Do reverse sequence during device removal

without it devlink dev reload <pci device> operation causes a double free of cache memory and leaves the kernel in a state that requires a restart.

the patch is applied cleanly above ubuntu-focal tree.

Thanks a lot,
Amir

Jeff Lane  (bladernr)
description: updated
description: updated
Revision history for this message
Jeff Lane  (bladernr) wrote :

Per Amir at Mellanox:
The kernel has been tested and It is good to go

Jeff Lane  (bladernr)
description: updated
description: updated
Jeff Lane  (bladernr)
description: updated
Changed in linux (Ubuntu Focal):
importance: Undecided → Medium
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Jeff Lane  (bladernr) wrote :

Marked Groovy task as invalid. Patches are all in 5.5, and thus already present in Groovy.

Changed in linux (Ubuntu Groovy):
status: In Progress → Invalid
tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (97.9 KiB)

This bug was fixed in the package linux - 5.4.0-45.49

---------------
linux (5.4.0-45.49) focal; urgency=medium

  * focal/linux: 5.4.0-45.49 -proposed tracker (LP: #1893050)

  * [Potential Regression] dscr_inherit_exec_test from powerpc in
    ubuntu_kernel_selftests failed on B/E/F (LP: #1888332)
    - powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()

linux (5.4.0-44.48) focal; urgency=medium

  * focal/linux: 5.4.0-44.48 -proposed tracker (LP: #1891049)

  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts

  * ipsec: policy priority management is broken (LP: #1890796)
    - xfrm: policy: match with both mark and mask on user interfaces

linux (5.4.0-43.47) focal; urgency=medium

  * focal/linux: 5.4.0-43.47 -proposed tracker (LP: #1890746)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * Devlink - add RoCE disable kernel support (LP: #1877270)
    - devlink: Add new "enable_roce" generic device param
    - net/mlx5: Document flow_steering_mode devlink param
    - net/mlx5: Handle "enable_roce" devlink param
    - IB/mlx5: Rename profile and init methods
    - IB/mlx5: Load profile according to RoCE enablement state
    - net/mlx5: Remove unneeded variable in mlx5_unload_one
    - net/mlx5: Add devlink reload
    - IB/mlx5: Do reverse sequence during device removal

  * msg_zerocopy.sh in net from ubuntu_kernel_selftests failed (LP: #1812620)
    - selftests/net: relax cpu affinity requirement in msg_zerocopy test

  * Enlarge hisi_sec2 capability (LP: #1890222)
    - Revert "UBUNTU: [Config] Disable hisi_sec2 temporarily"
    - crypto: hisilicon - update SEC driver module parameter

  * Fix missing HDMI/DP Audio on an HP Desktop (LP: #1890441)
    - ALSA: hda/hdmi: Add quirk to force connectivity

  * Fix IOMMU error on AMD Radeon Pro W5700 (LP: #1890306)
    - PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken

  * ASoC:amd:renoir: the dmic can't record sound after suspend and resume
    (LP: #1890220)
    - SAUCE: ASoC: amd: renoir: restore two more registers during resume

  * No sound, Dummy output on Acer Swift 3 SF314-57G with Ice Lake core-i7 CPU
    (LP: #1877757)
    - ASoC: SOF: Intel: hda: fix generic hda codec support

  * Fix right speaker of HP laptop (LP: #1889375)
    - SAUCE: hda/realtek: Fix right speaker of HP laptop

  * blk_update_request error when mount nvme partition (LP: #1872383)
    - SAUCE: nvme-pci: prevent SK hynix PC400 from using Write Zeroes command

  * soc/amd/renoir: detect dmic from acpi table (LP: #1887734)
    - ASoC: amd: add logic to check dmic hardware runtime
    - ASoC: amd: add ACPI dependency check
    - ASoC: amd: fixed kernel warnings

  * soc/amd/renoir: change the module name to make it work with ucm3
    (LP: #1888166)
    - AsoC: amd: add missing snd- module prefix to the acp3x-rn driver kernel
      module
    - SAUCE: remove a kernel module since its name is changed

  * Focal update: v5.4.55 upstream stable release (LP: #1890343)
    - AX.25: Fix out-of-bounds read in ax25_connect()
    - AX.25: Prevent out-of-bounds read in ax25_sendmsg()
    - dev: Defer free of skbs in flush_backlog
    - drivers/net/wan/x25_asy: Fix to make i...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.