Mellanox patch for fixing failures in ConnectX3 Pro/DPDK

Bug #1896760 reported by Marcelo Cerri
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Incomplete
Undecided
Marcelo Cerri
Focal
Incomplete
Undecided
Marcelo Cerri
linux-azure (Ubuntu)
In Progress
High
Marcelo Cerri
Focal
Fix Released
Undecided
Marcelo Cerri

Bug Description

[Impact]

Mellanox has made a patch to fix DPDK failure in Azure with ConnextX3 pro:

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/infiniband/hw/mlx4?h=next-20200908&id=ec78b3bd66bc9a015505df0ef0eb153d9e64b03b

Older kernels don't need this patch. It only affects newer kernels with the following patch:

commit 9012a6de6b43146c934752a60d358c31dbad4368
Author: Kamal Heib <email address hidden>
Date: Tue May 19 13:12:17 2020 -0400
[infiniband] IB/core: Fix potential NULL pointer dereference in pkey cache

If a kernel ships with this patch, it needs the 1st patch.

[Test Case]

Boot the current 5.4 linux-azure kernel from focal-updates (5.4.0-48.52) on an Azure instance with a mlx4 device without RoCE support, the devices are wrongly ignored.

The expected result is that those devices are created an available.

[Regression Potential]

A potential regression would only affect instances running the linux-azure 5.4 kernels with mlx4 devices.

CVE References

Revision history for this message
Marcelo Cerri (mhcerri) wrote :

4.15 doesn't contain 1901b91f9982 ("IB/core: Fix potential NULL pointer dereference in pkey cache") and thus doesn't need the fix.

5.4 and 5.8 require the fix because they contain 1901b91f9982 and the fix will only be included in 5.9.

Changed in linux (Ubuntu Focal):
status: New → Incomplete
Changed in linux-azure (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu):
assignee: nobody → Marcelo Cerri (mhcerri)
Changed in linux-azure (Ubuntu Focal):
status: New → In Progress
assignee: nobody → Marcelo Cerri (mhcerri)
Changed in linux (Ubuntu Focal):
assignee: nobody → Marcelo Cerri (mhcerri)
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1896760

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Marcelo Cerri (mhcerri) wrote :

I made a test available with the fix at https://kernel.ubuntu.com/~mhcerri/azure/lp1896760.1/focal-linux-azure-5.4.0-1027.27+lp1896760.1.tgz

And microsoft has confirmed the fix solves the issue.

description: updated
Revision history for this message
Marcelo Cerri (mhcerri) wrote :
Marcelo Cerri (mhcerri)
Changed in linux-azure (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (32.3 KiB)

This bug was fixed in the package linux-azure - 5.4.0-1031.32

---------------
linux-azure (5.4.0-1031.32) focal; urgency=medium

  [ Ubuntu: 5.4.0-51.56 ]

  * Packaging resync (LP: #1786013)
    - update dkms package versions

linux-azure (5.4.0-1030.31) focal; urgency=medium

  [ Ubuntu: 5.4.0-50.55 ]

  * CVE-2020-16119
    - SAUCE: dccp: avoid double free of ccid on child socket
  * CVE-2020-16120
    - Revert "UBUNTU: SAUCE: overlayfs: ensure mounter privileges when reading
      directories"
    - ovl: pass correct flags for opening real directory
    - ovl: switch to mounter creds in readdir
    - ovl: verify permissions in ovl_path_open()
    - ovl: call secutiry hook in ovl_real_ioctl()
    - ovl: check permission to open real file

linux-azure (5.4.0-1029.29) focal; urgency=medium

  * focal/linux-azure: 5.4.0-1029.29 -proposed tracker (LP: #1897102)

  * btrfs: trimming a btrfs device which has been shrunk previously fails and
    fills root disk with garbage data (LP: #1896154)
    - btrfs: trim: fix underflow in trim length to prevent access beyond device
      boundary

  * Mellanox patch for fixing failures in ConnectX3 Pro/DPDK (LP: #1896760)
    - RDMA/mlx4: Read pkey table length instead of hardcoded value

linux-azure (5.4.0-1027.27) focal; urgency=medium

  * focal/linux-azure: 5.4.0-1027.27 -proposed tracker (LP: #1895994)

  * Focal update: v5.4.61 upstream stable release (LP: #1893115)
    - azure: [Config] update config for SPI_DYNAMIC

  * Unexpected warning when hibernating (LP: #1895980)
    - x86/hyperv: Properly suspend/resume reenlightenment notifications

  * [linux-azure] [SRU] UBUNTU: SAUCE: Drivers: hv: vmbus: Add timeout to
    vmbus_wait_for_unload (LP: #1895527)
    - SAUCE: Drivers: hv: vmbus: Add timeout to vmbus_wait_for_unload

  [ Ubuntu: 5.4.0-49.53 ]

  * focal/linux: 5.4.0-49.53 -proposed tracker (LP: #1896007)
  * Comet Lake PCH-H RAID not support on Ubuntu20.04 (LP: #1892288)
    - ahci: Add Intel Comet Lake PCH-H PCI ID
  * Novalink (mkvterm command failure) (LP: #1892546)
    - tty: hvcs: Don't NULL tty->driver_data until hvcs_cleanup()
  * Oops and hang when starting LVM snapshots on 5.4.0-47 (LP: #1894780)
    - SAUCE: Revert "mm: memcg/slab: fix memory leak at non-root kmem_cache
      destroy"
  * Intel x710 LOMs do not work on Focal (LP: #1893956)
    - i40e: Fix LED blinking flow for X710T*L devices
    - i40e: enable X710 support
  * Add/Backport EPYC-v3 and EPYC-Rome CPU model (LP: #1887490)
    - kvm: svm: Update svm_xsaves_supported
  * Fix non-working NVMe after S3 (LP: #1895718)
    - SAUCE: PCI: Enable ACS quirk on CML root port
  * Focal update: v5.4.65 upstream stable release (LP: #1895881)
    - ipv4: Silence suspicious RCU usage warning
    - ipv6: Fix sysctl max for fib_multipath_hash_policy
    - netlabel: fix problems with mapping removal
    - net: usb: dm9601: Add USB ID of Keenetic Plus DSL
    - sctp: not disable bh in the whole sctp_get_port_local()
    - taprio: Fix using wrong queues in gate mask
    - tipc: fix shutdown() of connectionless socket
    - net: disable netpoll on fresh napis
    - Linux 5.4.65
  * Focal update: v5.4.64 upstream stable release (LP: #1895...

Changed in linux-azure (Ubuntu Focal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.