Azure: mlx5e: Add support for PCI relaxed ordering (RO) for better performance

Bug #2039208 reported by Tim Gardner
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-azure (Ubuntu)
Fix Released
Undecided
Unassigned
Jammy
Fix Released
Medium
Tim Gardner
Lunar
Fix Committed
Medium
Tim Gardner

Bug Description

SRU Justification

[Impact]
On Azure, the VM SKU Standard_NC64as_T4_v3's bandwidth is 30 Gbps, but we can only reach 15~20 Gbps with the 5.15.0-1049-azure kernel in Ubuntu 20.04 or the 6.2.0-1014-azure kernel in Ubuntu 22.04.

After I pick up the upstream patch(es) to enable PCI relaxed ordering (RO) for the Mellanox VF NIC, the throughput goes up to 30.4 Gbps.

[Fix]

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=17347d5430c4e4e1a3c58ffa2732746bd26a9c02

[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e2351e517068718724f1d3b4010e2a41ec91fa76

[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=77528e2aed9246cf8017b8a6f1b658a264d6f2b2

[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ed4b0661cce119870edb1994fd06c9cbc1dc05c3

[Test Plan]

Microsoft tested

[Regression Potential]

Mellanox connections could be corrupted or run slower.

[Other Info]

SF: #00370735

Tim Gardner (timg-tpi)
affects: linux (Ubuntu) → linux-azure (Ubuntu)
Changed in linux-azure (Ubuntu):
status: New → Fix Released
description: updated
Revision history for this message
Tim Gardner (timg-tpi) wrote :
Changed in linux-azure (Ubuntu Jammy):
assignee: nobody → Tim Gardner (timg-tpi)
importance: Undecided → Medium
status: New → In Progress
Changed in linux-azure (Ubuntu Lunar):
assignee: nobody → Tim Gardner (timg-tpi)
importance: Undecided → Medium
status: New → In Progress
Tim Gardner (timg-tpi)
Changed in linux-azure (Ubuntu Jammy):
status: In Progress → Fix Committed
Changed in linux-azure (Ubuntu Lunar):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure/5.15.0-1055.63 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-azure' to 'verification-done-jammy-linux-azure'. If the problem still exists, change the tag 'verification-needed-jammy-linux-azure' to 'verification-failed-jammy-linux-azure'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-azure-v2 verification-needed-jammy-linux-azure
Tim Gardner (timg-tpi)
tags: added: verification-done-jammy-linux-azure
removed: verification-needed-jammy-linux-azure
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (64.3 KiB)

This bug was fixed in the package linux-azure - 5.15.0-1056.64

---------------
linux-azure (5.15.0-1056.64) jammy; urgency=medium

  * jammy/linux-azure: 5.15.0-1056.64 -proposed tracker (LP: #2052545)

  * Azure: Fix regression introduced in LP: #2045069 (LP: #2052453)
    - hv_netvsc: Fix race condition between netvsc_probe and netvsc_remove
    - hv_netvsc: Register VF in netvsc_probe if NET_DEVICE_REGISTER missed

linux-azure (5.15.0-1055.63) jammy; urgency=medium

  * jammy/linux-azure: 5.15.0-1055.63 -proposed tracker (LP: #2048291)

  * Azure - Kernel crashes when removing gpu from pci (LP: #2042568)
    - Revert "PCI: hv: Use async probing to reduce boot time"

  * Azure: mlx5e: Add support for PCI relaxed ordering (RO) for better
    performance (LP: #2039208)
    - RDMA/mlx5: Reorder calls to pcie_relaxed_ordering_enabled()
    - RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write

  * Azure: Deprecate Netvsc and implement MANA direct (LP: #2045069)
    - hv_netvsc: fix race of netvsc and VF register_netdevice
    - hv_netvsc: Fix race of register_netdevice_notifier and VF register
    - hv_netvsc: Mark VF as slave before exposing it to user-mode

  [ Ubuntu: 5.15.0-94.104 ]

  * jammy/linux: 5.15.0-94.104 -proposed tracker (LP: #2048777)
  * [SRU] Duplicate Device_dax ids Created and hence Probing is Failing.
    (LP: #2028158)
    - device-dax: Fix duplicate 'hmem' device registration
  * Add ODM driver f81604 usb-can (LP: #2045387)
    - can: usb: f81604: add Fintek F81604 support
    - [Config] updateconfigs for ODM drivers CONFIG_CAN_F81604
  * Add ODM driver gpio-m058ssan (LP: #2045386)
    - SAUCE: ODM: gpio: add M058SSAN gpio driver
    - [Config] updateconfigs for ODM drivers CONFIG_GPIO_M058SSAN
  * Add ODM driver rtc-pcf85263 (LP: #2045385)
    - SAUCE: ODM: rtc: add PCF85263 RTC driver
    - [Config] updateconfigs for ODM drivers CONFIG_RTC_DRV_PCF85263
  * AppArmor patch for mq-posix interface is missing in jammy (LP: #2045384)
    - SAUCE: (no-up) apparmor: reserve mediation classes
    - SAUCE: (no-up) apparmor: Add fine grained mediation of posix mqueues
  * Packaging resync (LP: #1786013)
    - [Packaging] update annotations scripts

  [ Ubuntu: 5.15.0-93.103 ]

  * jammy/linux: 5.15.0-93.103 -proposed tracker (LP: #2048330)
  * Packaging resync (LP: #1786013)
    - [Packaging] resync git-ubuntu-log
    - [Packaging] resync update-dkms-versions helper
    - [Packaging] remove helper scripts
    - [Packaging] update annotations scripts
    - debian/dkms-versions -- update from kernel-versions (main/2024.01.08)
  * Hotplugging SCSI disk in QEMU VM fails (LP: #2047382)
    - Revert "PCI: acpiphp: Reassign resources on bridge if necessary"
  * CVE-2023-6622
    - netfilter: nf_tables: bail out on mismatching dynset and set expressions
  * CVE-2024-0193
    - netfilter: nf_tables: skip set commit for deleted/destroyed sets
  * CVE-2023-6040
    - netfilter: nf_tables: Reject tables of unsupported family
  * Patches needed for AmpereOne (arm64) (LP: #2044192)
    - clocksource/arm_arch_timer: Add build-time guards for unhandled register
      accesses
    - clocksource/drivers/arm_arch_timer: Drop CNT*_T...

Changed in linux-azure (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure-fips/5.15.0-1058.66+fips1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-azure-fips' to 'verification-done-jammy-linux-azure-fips'. If the problem still exists, change the tag 'verification-needed-jammy-linux-azure-fips' to 'verification-failed-jammy-linux-azure-fips'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-azure-fips-v2 verification-needed-jammy-linux-azure-fips
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.