Ubuntu 20.10 four needed fixes to 'Add driver for Mellanox Connect-IB adapters'

Bug #1905574 reported by Amir Tzin on 2020-11-25
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Jeff Lane
Focal
Medium
Jeff Lane
Groovy
Undecided
Unassigned

Bug Description

[Impact]

commit
d43b7007dbd1 net/mlx5: Fix a race when moving command interface to events mode
from upstream v5.7-rc1 (and in groovy) fixes
e126ba97dba9 mlx5: Add driver for Mellanox Connect-IB adapters
this fix should come with four more patches from v5.9.

410bd754cd73 net/mlx5: Add retry mechanism to the command entry index allocation
1d5558b1f0de net/mlx5: poll cmd EQ in case of command timeout
50b2412b7e78 net/mlx5: Avoid possible free of command entry while timeout comp handler
432161ea26d6 net/mlx5: Fix a race when moving command interface to polling mode

all four patches are applied cleanly on groovy tree and we ask to pull them into groovy.

please also see this discussion
https://www.spinics.net/lists/stable/msg428620.html

Thank's

CVE References

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1905574

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: groovy
Jeff Lane (bladernr) on 2020-12-03
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Jeff Lane (bladernr) on 2020-12-03
Changed in linux (Ubuntu):
assignee: nobody → Jeff Lane (bladernr)
status: Confirmed → In Progress
importance: Undecided → Medium
Jeff Lane (bladernr) wrote :

Tag/Log Check - looks like the first patch is already in 5.8

d43b7007dbd1 -- Ubuntu-5.8.0-10.11 -- Wed Mar 18 21:44:32 2020 +0200
410bd754cd73 -- v5.10-rc1 -- Mon Aug 31 15:04:35 2020 +0300
1d5558b1f0de -- v5.10-rc1 -- Tue Jul 21 10:25:52 2020 +0300
50b2412b7e78 -- v5.10-rc1 -- Tue Aug 4 10:40:21 2020 +0300
432161ea26d6 -- v5.10-rc1 -- Thu Aug 13 16:55:20 2020 +0300

Jeff Lane (bladernr) on 2020-12-03
Changed in linux (Ubuntu Focal):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Jeff Lane (bladernr)
Jeff Lane (bladernr) wrote :

Double Checked and ALL these are in 5.8 already.

These are via this bug https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1902130:
410bd754cd73
1d5558b1f0de
50b2412b7e78
432161ea26d6

and the other one appears directly in the kernel, so I guess that means it was part of a sync from 5.8/5.9?

commit d43b7007dbd1195a5b6b83213e49b1516aaf6f5e
Author: Eran Ben Elisha <email address hidden>
Date: Wed Mar 18 21:44:32 2020 +0200

So nothign to do for Groovy.

Changed in linux (Ubuntu):
status: In Progress → Invalid
Jeff Lane (bladernr) wrote :

Marking Linux task invalid as this is not necessary for Hirsute or Groovy.

Will work the Focal task now.

Jeff Lane (bladernr) wrote :

None of these cleanly pick into Focal. Deferring to Kernel team.

Jeff Lane (bladernr) on 2020-12-04
Changed in linux (Ubuntu Focal):
status: In Progress → Won't Fix
Jeff Lane (bladernr) wrote :

After discussion with the kernel team, we will not back port these to 5.4 at this time. There are varying degrees of backporting necessary to get each patch from 5.8 into 5.4 and unfortunately there's no way to schedule the work currently.

We can revisit this should there be customer demand, but for now the patches are in Groovy via the 5.8 kernel and will land in Focal via the HWE kernel next year.

Amir Tzin (amirtz) wrote :
Download full text (3.4 KiB)

Hi Jeff,

upstream commit
50b2412b7e78 net/mlx5: Avoid possible free of command entry while timeout comp handler
was picked to Ubuntu-5.4.0-56.62 kernel
(hash bcd6e98bef76cc8a49a1b736b0fefffbffb75c30)
(v5.4.71 upstream stable release, https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1902110 )

now a new issue arise
reloading mlx5 modules causes an error message in kernel buffer
"cmd_work_handler:887:(pid 292): failed to allocate command entry"

reproduction:
# modprobe -r mlx5_ib mlx5_core
# modprobe mlx5_core mlx5_ib
# dmesg
[ 142.638490] mlx5_core 0000:08:00.1: E-Switch: cleanup
[ 143.734339] mlx5_core 0000:08:00.0: E-Switch: cleanup
[ 164.171511] mlx5_core: unknown parameter 'mlx5_ib' ignored
[ 164.173501] mlx5_core 0000:08:00.0: firmware version: 16.28.1002
[ 164.173576] mlx5_core 0000:08:00.0: 126.016 Gb/s available PCIe bandwidth (8 GT/s x16 link)
[ 164.457342] mlx5_core 0000:08:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[ 164.457365] mlx5_core 0000:08:00.0: E-Switch: Total vports 2, per vport: max uc(1024) max mc(16384)
[ 164.484659] port_module: 5 callbacks suppressed
[ 164.484665] mlx5_core 0000:08:00.0: Port module event: module 0, Cable plugged
[ 164.485112] mlx5_core 0000:08:00.0: mlx5_pcie_event:294:(pid 8): PCIe slot advertised sufficient power (75W).
[ 164.494771] mlx5_core 0000:08:00.1: firmware version: 16.28.1002
[ 164.494844] mlx5_core 0000:08:00.1: 126.016 Gb/s available PCIe bandwidth (8 GT/s x16 link)
[ 164.779534] mlx5_core 0000:08:00.1: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[ 164.779552] mlx5_core 0000:08:00.1: E-Switch: Total vports 2, per vport: max uc(1024) max mc(16384)
[ 164.808886] mlx5_core 0000:08:00.1: Port module event: module 1, Cable plugged
[ 164.809228] mlx5_core 0000:08:00.1: mlx5_pcie_event:294:(pid 292): PCIe slot advertised sufficient power (75W).
[ 164.840667] mlx5_core 0000:08:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[ 165.081342] mlx5_core 0000:08:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
[ 165.282793] mlx5_ib: Mellanox Connect-IB Infiniband driver v5.0-0
[ 165.438226] mlx5_core 0000:08:00.0: cmd_work_handler:887:(pid 292): failed to allocate command entry
[ 165.442506] infiniband rocep8s0f0: reg_mr_callback:104:(pid 292): async reg mr failed. status -11
#

the following fixes this issue
410bd754cd73 net/mlx5: Add retry mechanism to the command entry index allocation (upstream 5.9)
1d5558b1f0de net/mlx5: poll cmd EQ in case of command timeout (upstream 5.9)
d43b7007dbd1 net/mlx5: Fix a race when moving command interface to events mode (upstream 5.7-rc7)
3ed879965cc4 net/mlx5: net/mlx5: Use async EQ setup cleanup helpers for multiple EQs (upstream 5.6-rc1)

those are on master-next branch off focal tree also synced from linux stable.
(v5.4.79 upstream stable release https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907151 )

# git log --oneline Ubuntu-5.4.0-59.65..master-next
....
400ec5bb2816 net/mlx5: Add retry mechanism to the command entry index allocation
2bd608898edd net/mlx5: Fix a race when moving command interface to events mode...

Read more...

dann frazier (dannf) wrote :

Updating the statuses. This is Fix Released in groovy (patches were later applied, per Comment #2), and marking Focal as NEW as some patches are coming into focal/master-next via stable after the initial "Won't Fix" decision. It needs to be investigated whether that unblocks the remaining patches.

Changed in linux (Ubuntu Focal):
status: Won't Fix → New
Changed in linux (Ubuntu):
status: Invalid → Fix Released
Changed in linux (Ubuntu Groovy):
status: New → Fix Released
Changed in linux (Ubuntu Focal):
status: New → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Launchpad Janitor (janitor) wrote :
Download full text (60.8 KiB)

This bug was fixed in the package linux - 5.4.0-66.74

---------------
linux (5.4.0-66.74) focal; urgency=medium

  * focal/linux: 5.4.0-66.74 -proposed tracker (LP: #1913152)

  * Add support for selective build of special drivers (LP: #1912789)
    - [Packaging] Add support for ODM drivers
    - [Packaging] Turn on ODM support for amd64

  * Packaging resync (LP: #1786013)
    - update dkms package versions
    - update dkms package versions

  * Introduce the new NVIDIA 460-server series and update the 460 series
    (LP: #1913200)
    - [Config] dkms-versions -- drop NVIDIA 435 455 and 440-server
    - [Config] dkms-versions -- add the 460-server nvidia driver

  * Enable mute and micmute LED on HP EliteBook 850 G7 (LP: #1910102)
    - ALSA: hda/realtek: Enable mute and micmute LED on HP EliteBook 850 G7

  * SYNA30B4:00 06CB:CE09 Mouse on HP EliteBook 850 G7 not working at all
    (LP: #1908992)
    - HID: multitouch: Enable multi-input for Synaptics pointstick/touchpad device

  * HD Audio Device PCI ID for the Intel Cometlake-R platform (LP: #1912427)
    - SAUCE: ALSA: hda: Add Cometlake-R PCI ID

  * switch to an autogenerated nvidia series based core via dkms-versions
    (LP: #1912803)
    - [Packaging] nvidia -- use dkms-versions to define versions built
    - [Packaging] update-version-dkms -- maintain flags fields
    - [Config] dkms-versions -- add transitional/skip information for nvidia
      packages

  * udpgro.sh in net from ubuntu_kernel_selftests seems not reflecting sub-test
    result (LP: #1908499)
    - selftests: fix the return value for UDP GRO test

  * qede: Kubernetes Internal DNS Failure due to QL41xxx NIC not supporting IPIP
    tx csum offload (LP: #1909062)
    - qede: fix offload for IPIP tunnel packets

  * Use DCPD to control HP DreamColor panel (LP: #1911001)
    - SAUCE: drm/dp: Another HP DreamColor panel brigntness fix

  * kvm: Windows 2k19 with Hyper-v role gets stuck on pending hypervisor
    requests on cascadelake based kvm hosts (LP: #1911848)
    - KVM: x86: Set KVM_REQ_EVENT if run is canceled with req_immediate_exit set

  * Ubuntu 20.10 four needed fixes to 'Add driver for Mellanox Connect-IB
    adapters' (LP: #1905574)
    - net/mlx5: Fix a race when moving command interface to polling mode

  * Fix right sounds and mute/micmute LEDs for HP ZBook Fury 15/17 G7 Mobile
    Workstation (LP: #1910561)
    - ALSA: hda/realtek: fix right sounds and mute/micmute LEDs for HP machines

  * Ubuntu 20.04 - multicast counter is not increased in ip -s (LP: #1901842)
    - net/mlx5e: Fix multicast counter not up-to-date in "ip -s"

  * eeh-basic.sh in powerpc from ubuntu_kernel_selftests timeout with 5.4 P8 /
    P9 (LP: #1882503)
    - selftests/powerpc/eeh: disable kselftest timeout setting for eeh-basic

  * DMI entry syntax fix for Pegatron / ByteSpeed C15B (LP: #1910639)
    - Input: i8042 - unbreak Pegatron C15B

  * CVE-2020-29372
    - mm: check that mm is still valid in madvise()

  * update ENA driver, incl. new ethtool stats (LP: #1910291)
    - net: ena: Change WARN_ON expression in ena_del_napi_in_range()
    - net: ena: ethtool: convert stat_offset to 64 bit resolution
    - net: ena: eth...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers