Azure: MANA: Fix doorbell access for receives

Bug #2027615 reported by Tim Gardner
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-azure (Ubuntu)
Fix Released
Undecided
Unassigned
Jammy
Fix Released
Medium
Tim Gardner
Lunar
Fix Released
Medium
Tim Gardner

Bug Description

SRU Justification

[Impact]

It's inefficient to ring the doorbell page every time a WQE is posted to
the received queue. Excessive MMIO writes result in CPU spending more
time waiting on LOCK instructions (atomic operations), resulting in
poor scaling performance.

[Test Plan]

MSFT tested.

[Regression Potential]

The MANA receive queue could stop.

[Other Info]

SF: #00363437

These 2 patches have been submitted for upstream inclusion.

Tim Gardner (timg-tpi)
affects: linux (Ubuntu) → linux-azure (Ubuntu)
Changed in linux-azure (Ubuntu Jammy):
assignee: nobody → Tim Gardner (timg-tpi)
importance: Undecided → Medium
status: New → In Progress
Changed in linux-azure (Ubuntu Lunar):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
importance: Undecided → Medium
Revision history for this message
Tim Gardner (timg-tpi) wrote :
Tim Gardner (timg-tpi)
Changed in linux-azure (Ubuntu Lunar):
status: In Progress → Fix Committed
Tim Gardner (timg-tpi)
Changed in linux-azure (Ubuntu Jammy):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure/5.15.0-1043.50 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-azure verification-needed-jammy
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure/6.2.0-1009.9 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-lunar' to 'verification-done-lunar'. If the problem still exists, change the tag 'verification-needed-lunar' to 'verification-failed-lunar'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-lunar-linux-azure verification-needed-lunar
Tim Gardner (timg-tpi)
tags: added: verification-done-jammy verification-done-lunar
removed: verification-needed-jammy verification-needed-lunar
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (177.8 KiB)

This bug was fixed in the package linux-azure - 6.2.0-1008.8

---------------
linux-azure (6.2.0-1008.8) lunar; urgency=medium

  * lunar/linux-azure: 6.2.0-1008.8 -proposed tracker (LP: #2026741)

  * Azure: MANA: Fix doorbell access for receives (LP: #2027615)
    - SAUCE: net: mana: Batch ringing RX queue doorbell on receiving packets
    - SAUCE: net: mana: Use the correct WQE count for ringing RQ doorbell

  [ Ubuntu: 6.2.0-26.26 ]

  * lunar/linux: 6.2.0-26.26 -proposed tracker (LP: #2026753)
  * CVE-2023-2640 // CVE-2023-32629
    - Revert "UBUNTU: SAUCE: overlayfs: handle idmapped mounts in
      ovl_do_(set|remove)xattr"
    - Revert "UBUNTU: SAUCE: overlayfs: Skip permission checking for
      trusted.overlayfs.* xattrs"
    - SAUCE: overlayfs: default to userxattr when mounted from non initial user
      namespace
  * CVE-2023-35001
    - netfilter: nf_tables: prevent OOB access in nft_byteorder_eval
  * CVE-2023-31248
    - netfilter: nf_tables: do not ignore genmask when looking up chain by id
  * CVE-2023-3389
    - io_uring/poll: serialize poll linked timer start with poll removal
  * CVE-2023-3390
    - netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE
  * CVE-2023-3090
    - ipvlan:Fix out-of-bounds caused by unclear skb->cb
  * CVE-2023-3269
    - mm: introduce new 'lock_mm_and_find_vma()' page fault helper
    - mm: make the page fault mmap locking killable
    - arm64/mm: Convert to using lock_mm_and_find_vma()
    - powerpc/mm: Convert to using lock_mm_and_find_vma()
    - mips/mm: Convert to using lock_mm_and_find_vma()
    - riscv/mm: Convert to using lock_mm_and_find_vma()
    - arm/mm: Convert to using lock_mm_and_find_vma()
    - mm/fault: convert remaining simple cases to lock_mm_and_find_vma()
    - powerpc/mm: convert coprocessor fault to lock_mm_and_find_vma()
    - mm: make find_extend_vma() fail if write lock not held
    - execve: expand new process stack manually ahead of time
    - mm: always expand the stack with the mmap write lock held
    - [CONFIG]: Set CONFIG_LOCK_MM_AND_FIND_VMA

linux-azure (6.2.0-1007.7) lunar; urgency=medium

  * lunar/linux-azure: 6.2.0-1007.7 -proposed tracker (LP: #2024534)

  * Packaging resync (LP: #1786013)
    - [Packaging] resync update-dkms-versions helper

  [ Ubuntu: 6.2.0-25.25 ]

  * lunar/linux: 6.2.0-25.25 -proposed tracker (LP: #2024167)
  * ftrace in ubuntu_kernel_selftests failed with "check if duplicate events are
    caught" on J-5.15 P9 / J-kvm / L-kvm (LP: #1977827)
    - SAUCE: selftests/ftrace: Add test dependency
  * Add microphone support of the front headphone port on P3 Tower
    (LP: #2023650)
    - ALSA: hda/realtek: Add Lenovo P3 Tower platform
  * Add audio support for ThinkPad P1 Gen 6 and Z16 Gen 2 (LP: #2023539)
    - ALSA: hda/realtek: Add quirk for ThinkPad P1 Gen 6
  * Fix Disable thunderbolt clx make edp-monitor garbage while moving the
    touchpad (LP: #2023004)
    - drm/i915: Use 18 fast wake AUX sync len
  * Fix Monitor lost after replug WD19TBS to SUT port with VGA/DVI to type-C
    dongle (LP: #2021949)
    - thunderbolt: Increase timeout of DP OUT adapter handshake
    - thunderbolt: Do not touch CL state configu...

Changed in linux-azure (Ubuntu Lunar):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (406.3 KiB)

This bug was fixed in the package linux-azure - 5.15.0-1044.51

---------------
linux-azure (5.15.0-1044.51) jammy; urgency=medium

  * jammy/linux-azure: 5.15.0-1044.51 -proposed tracker (LP: #2029291)

  * Packaging resync (LP: #1786013)
    - [Packaging] resync update-dkms-versions helper
    - [Packaging] update variants

linux-azure (5.15.0-1043.50) jammy; urgency=medium

  * jammy/linux-azure: 5.15.0-1043.50 -proposed tracker (LP: #2026495)

  * Packaging resync (LP: #1786013)
    - [Packaging] resync update-dkms-versions helper
    - [Packaging] resync getabis

  * kdump fails on big arm64 systems when offset is not specified (LP: #2024479)
    - arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
    - arm64: kdump: Reimplement crashkernel=X
    - docs: kdump: Update the crashkernel description for arm64
    - arm64: kdump: Do not allocate crash low memory if not needed
    - arm64/mm: Define defer_reserve_crashkernel()
    - arm64: kdump: Provide default size when crashkernel=Y, low is not specified
    - arm64: kdump: Support crashkernel=X fall back to reserve region above DMA
      zones

  * Azure: MANA: Fix doorbell access for receives (LP: #2027615)
    - SAUCE: net: mana: Batch ringing RX queue doorbell on receiving packets
    - SAUCE: net: mana: Use the correct WQE count for ringing RQ doorbell

  * [Azure][MANA][InfinitiBand] Features Support and InfiniBand for MANA
    (LP: #2024917)
    - bpf: Let bpf_warn_invalid_xdp_action() report more info
    - PCI: Move PCI_VENDOR_ID_MICROSOFT/PCI_DEVICE_ID_HYPERV_VIDEO definitions to
      pci_ids.h
    - net: mana: Assign interrupts to CPUs based on NUMA nodes
    - net: mana: Add support for auxiliary device
    - net: mana: Record the physical address for doorbell page region
    - net: mana: Handle vport sharing between devices
    - net: mana: Set the DMA device max segment size
    - net: mana: Export Work Queue functions for use by RDMA driver
    - net: mana: Record port number in netdev
    - net: mana: Move header files to a common location
    - net: mana: Define max values for SGL entries
    - net: mana: Define and process GDMA response code GDMA_STATUS_MORE_ENTRIES
    - net: mana: Define data structures for allocating doorbell page from GDMA
    - net: mana: Define data structures for protection domain and memory
      registration
    - net: mana: Fix return type of mana_start_xmit()
    - RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter
    - RDMA/mana: Remove redefinition of basic u64 type
    - RDMA/mana_ib: Prevent array underflow in mana_ib_create_qp_raw()
    - net: mana: Fix accessing freed irq affinity_hint
    - [Config] azure: Enable MANA_INFINIBAND

  * [Azure] Fix VM crash/hang issues due to fast VF add/remove events
    (LP: #2023071) // Case [Azure] Fix VM crash/hang issues due to fast VF
    add/remove events (LP: #2023594)
    - PCI: hv: Fix a race condition bug in hv_pci_query_relations()
    - PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic
    - PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev
    - Revert "PCI: hv: Fix a timing issue which causes kdump to fail occasionally"
    - ...

Changed in linux-azure (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure/6.5.0-1013.13 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-mantic-linux-azure' to 'verification-done-mantic-linux-azure'. If the problem still exists, change the tag 'verification-needed-mantic-linux-azure' to 'verification-failed-mantic-linux-azure'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-mantic-linux-azure-v2 verification-needed-mantic-linux-azure
Tim Gardner (timg-tpi)
tags: added: verification-done-mantic-linux-azure
removed: verification-needed-mantic-linux-azure
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (90.6 KiB)

This bug was fixed in the package linux-azure - 6.5.0-1015.15

---------------
linux-azure (6.5.0-1015.15) mantic; urgency=medium

  * mantic/linux-azure: 6.5.0-1015.15 -proposed tracker (LP: #2052984)

  * Azure: cifs modules missing from the linux-modules package (LP: #2052980)
    - [Config] Move cifs.ko to linux-modules package

linux-azure (6.5.0-1014.14) mantic; urgency=medium

  * mantic/linux-azure: 6.5.0-1014.14 -proposed tracker (LP: #2052273)

  [ Ubuntu: 6.5.0-21.21 ]

  * mantic/linux: 6.5.0-21.21 -proposed tracker (LP: #2052603)
  * The display becomes frozen after some time when a HDMI device is connected.
    (LP: #2049027)
    - drm/i915/dmc: Don't enable any pipe DMC events
  * partproke is broken on empty loopback device (LP: #2049689)
    - block: Move checking GENHD_FL_NO_PART to bdev_add_partition()
  * CVE-2023-51781
    - appletalk: Fix Use-After-Free in atalk_ioctl
  * CVE-2023-51780
    - atm: Fix Use-After-Free in do_vcc_ioctl
  * CVE-2023-6915
    - ida: Fix crash in ida_free when the bitmap is empty
  * CVE-2024-0565
    - smb: client: fix OOB in receive_encrypted_standard()
  * CVE-2024-0582
    - io_uring: enable io_mem_alloc/free to be used in other parts
    - io_uring/kbuf: defer release of mapped buffer rings
  * CVE-2024-0646
    - net: tls, update curr on splice as well

linux-azure (6.5.0-1013.13) mantic; urgency=medium

  * mantic/linux-azure: 6.5.0-1013.13 -proposed tracker (LP: #2052541)

  * Azure: Fix TDX regressions in Azure 6.5 (LP: #2052519)
    - x86/hyperv: Add sev-snp enlightened guest static key
    - x86/hyperv: Set Virtual Trust Level in VMBus init message
    - x86/hyperv: Mark Hyper-V vp assist page unencrypted in SEV-SNP enlightened
      guest
    - drivers: hv: Mark percpu hvcall input arg page unencrypted in SEV-SNP
      enlightened guest
    - x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp
      enlightened guest
    - clocksource: hyper-v: Mark hyperv tsc page unencrypted in sev-snp
      enlightened guest
    - x86/hyperv: Add smp support for SEV-SNP guest
    - x86/hyperv: Add hyperv-specific handling for VMMCALL under SEV-ES
    - x86/hyperv: Add missing 'inline' to hv_snp_boot_ap() stub
    - x86/hyperv: Fix undefined reference to isolation_type_en_snp without
      CONFIG_HYPERV
    - x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests
    - x86/hyperv: Support hypercalls for fully enlightened TDX guests
    - Drivers: hv: vmbus: Support fully enlightened TDX guests
    - x86/hyperv: Fix serial console interrupts for fully enlightened TDX guests
    - Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM
    - x86/hyperv: Introduce a global variable hyperv_paravisor_present
    - Drivers: hv: vmbus: Bring the post_msg_page back for TDX VMs with the
      paravisor
    - x86/hyperv: Use TDX GHCI to access some MSRs in a TDX VM with the paravisor
    - x86/hyperv: Remove hv_isolation_type_en_snp
    - x86/hyperv: Move the code in ivm.c around to avoid unnecessary ifdef's
    - x86/hyperv: Remove duplicate include
    - x86/tdx: Retry partially-completed page conversion hypercalls
    - x86/mm: Fix memory encryption features advertiseme...

Changed in linux-azure (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.