[aws] Backport CMA pool per numa functionality for 22.04 and 20.04

Bug #2067516 reported by Philip Cox
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-aws (Ubuntu)
Fix Released
Undecided
Philip Cox
Noble
Fix Released
Undecided
Unassigned

Bug Description

SRU Justification:

[Impact]

We discovered a network packets per second (PPS) performance issue in one of our upcoming EC2 instance platform on Graviton.The issue is addressed by setting a value for the numa_cma parameter. While this parameter is supported in 24.04, the parameter is not available in 22.04 or 20.04 today, which could impact customers running 22.04 or 20.04 on the new EC2 platform.

[Fix]

We are requesting the the following changes be backported to 22.04 and 20.04 for ARM:
https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-aws/+git/noble/commit/kernel/dma/contiguous.c?id=22e4a348f87c59df2c02f1efb7ba9a56b622c7b8
https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-aws/+git/noble/commit/kernel/dma/contiguous.c?id=bf29bfaa54901a4bdee2a18cd10eb951a884a5f9

EC2 Nitro team is only asking for the change in ARM, however we will leave to Canonical's discretion to make the same changes in x86.

Also requested to add this fix for NUMA node overrides for AWS EC2 ENA adapaters: https://github.com/torvalds/linux/commit/2dc8b1e7177d4f49f492ce648440caf2de0c366

We also request Canonical set kernel params to "numa_cma=1:32M" for all OSes that support this parameter - 24.04, 22.04 and 20.04.

Focal only needs this via the 5.15 based backport kernel, and not the 5.4 based focal kernel.

[Test Plan]
aws tested

[Where problems could occur]

[Other info]
sf# 00385923

Philip Cox (philcox)
Changed in linux-aws (Ubuntu):
status: New → Fix Released
Philip Cox (philcox)
description: updated
no longer affects: linux-aws (Ubuntu Focal)
description: updated
Revision history for this message
Philip Cox (philcox) wrote :
no longer affects: linux-aws (Ubuntu Jammy)
no longer affects: linux-aws (Ubuntu Mantic)
Changed in linux-aws (Ubuntu Noble):
status: New → In Progress
Philip Cox (philcox)
Changed in linux-aws (Ubuntu Noble):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-aws/6.8.0-1013.14 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-aws' to 'verification-done-noble-linux-aws'. If the problem still exists, change the tag 'verification-needed-noble-linux-aws' to 'verification-failed-noble-linux-aws'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-aws-v2 verification-needed-noble-linux-aws
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (72.3 KiB)

This bug was fixed in the package linux-aws - 6.8.0-1013.14

---------------
linux-aws (6.8.0-1013.14) noble; urgency=medium

  * noble/linux-aws: 6.8.0-1013.14 -proposed tracker (LP: #2072173)

  * [aws] Backport CMA pool per numa functionality for 22.04 and 20.04
    (LP: #2067516)
    - net: ena: Fix redundant device NUMA node override

  [ Ubuntu: 6.8.0-40.40 ]

  * noble/linux: 6.8.0-40.40 -proposed tracker (LP: #2072201)
  * FPS of glxgear with fullscreen is too low on MTL platform (LP: #2069380)
    - drm/i915: Bypass LMEMBAR/GTTMMADR for MTL stolen memory access
  * a critical typo in the code managing the ASPM settings for PCI Express
    devices (LP: #2071889)
    - PCI/ASPM: Restore parent state to parent, child state to child
  * [UBUNTU 24.04] IOMMU DMA mode changed in kernel config causes massive
    throughput degradation for PCI-related network workloads (LP: #2071471)
    - [Config] Set IOMMU_DEFAULT_DMA_STRICT=n and IOMMU_DEFAULT_DMA_LAZY=yes for
      s390x
  * UBSAN: array-index-out-of-bounds in
    /build/linux-D15vQj/linux-6.5.0/drivers/md/bcache/bset.c:1098:3
    (LP: #2039368)
    - bcache: fix variable length array abuse in btree_iter
  * Mute/mic LEDs and speaker no function on EliteBook 645/665 G11
    (LP: #2071296)
    - ALSA: hda/realtek: fix mute/micmute LEDs don't work for EliteBook 645/665
      G11.
  * failed to enable IPU6 camera sensor on kernel >= 6.8: ivsc_ace
    intel_vsc-5db76cf6-0a68-4ed6-9b78-0361635e2447: switch camera to host
    failed: -110 (LP: #2067364)
    - mei: vsc: Don't stop/restart mei device during system suspend/resume
    - SAUCE: media: ivsc: csi: don't count privacy on as error
    - SAUCE: media: ivsc: csi: add separate lock for v4l2 control handler
    - SAUCE: media: ivsc: csi: remove privacy status in struct mei_csi
    - SAUCE: mei: vsc: Enhance IVSC chipset stability during warm reboot
    - SAUCE: mei: vsc: Enhance SPI transfer of IVSC rom
    - SAUCE: mei: vsc: Utilize the appropriate byte order swap function
    - SAUCE: mei: vsc: Prevent timeout error with added delay post-firmware
      download
  * failed to probe camera sensor on Dell XPS 9315: ov01a10 i2c-OVTI01A0:00:
    failed to check hwcfg: -22 (LP: #2070251)
    - ACPI: utils: Make acpi_handle_path() not static
    - ACPI: property: Ignore bad graph port nodes on Dell XPS 9315
    - ACPI: property: Polish ignoring bad data nodes
    - ACPI: scan: Ignore camera graph port nodes on all Dell Tiger, Alder and
      Raptor Lake models
  * Update amd_sfh for AMD strix series (LP: #2058331)
    - HID: amd_sfh: Increase sensor command timeout
    - HID: amd_sfh: Improve boot time when SFH is available
    - HID: amd_sfh: Extend MP2 register access to SFH
    - HID: amd_sfh: Set the AMD SFH driver to depend on x86
  * RFIM and SAGV Linux Support for G10 models (LP: #2070158)
    - drm/i915/display: Add meaningful traces for QGV point info error handling
    - drm/i915/display: Extract code required to calculate max qgv/psf gv point
    - drm/i915/display: extract code to prepare qgv points mask
    - drm/i915/display: Disable SAGV on bw init, to force QGV point recalculation
    - drm/i915/display: handle systems with dup...

Changed in linux-aws (Ubuntu Noble):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.