cma alloc failure in large 5.15 arm instances

Bug #1990167 reported by Kyler Hornor
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-aws (Ubuntu)
Fix Released
Medium
Tim Gardner
Jammy
Fix Released
Medium
Tim Gardner
Kinetic
Fix Released
Medium
Tim Gardner

Bug Description

When launching large arm64 instances on the focal or jammy ami, cma allocation errors appear in the dmesg out:

[ 0.063255] cma: cma_alloc: reserved: alloc failed, req-size: 4096 pages, ret: -12

As far as I can tell, this does not impact instance launch in a meaningful way, but I am unsure of the other implications of this. I was able to confirm that these messages are only present in 5.15, as they do not show up in the bionic image, and rolling back focal to linux-aws 5.4 avoids them as well.

This was present in at least 2 instance types and only appears to pop up in large sizes (2x4 does not produce them, 64x124 (c6gn.16xlarge) does)

This could be as simple as just disabling CMA in the linux-aws pkg, as it appears this is already the case in linux-azure(LP: #1949770).

Attaching dmesg out to the report.

# Replication
+ Launch a large arm64 instance (c6gn.16xlarge)
+ Observe the messages in kern.log / dmesg

Revision history for this message
Kyler Hornor (kylerhornor) wrote :
Tim Gardner (timg-tpi)
Changed in linux-aws (Ubuntu Jammy):
assignee: nobody → Tim Gardner (timg-tpi)
importance: Undecided → Medium
status: New → Fix Committed
Changed in linux-aws (Ubuntu Kinetic):
assignee: nobody → Tim Gardner (timg-tpi)
importance: Undecided → Medium
status: New → In Progress
Tim Gardner (timg-tpi)
Changed in linux-aws (Ubuntu Kinetic):
status: In Progress → Invalid
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure/5.15.0-1021.26 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Fabio Augusto Miranda Martins (fabio.martins) wrote :

I believe this patch might have been dropped for newer linux-aws kernels. I just reproduced this problem while running 5.15.0-1026-aws

Revision history for this message
Tim Gardner (timg-tpi) wrote :

The patch didn't get dropped, it was actually applied to jammy/linux-azure instead of AWS.

Changed in linux-aws (Ubuntu Jammy):
status: Fix Committed → In Progress
Tim Gardner (timg-tpi)
Changed in linux-aws (Ubuntu Kinetic):
status: Invalid → In Progress
Revision history for this message
Tim Gardner (timg-tpi) wrote :
Tim Gardner (timg-tpi)
Changed in linux-aws (Ubuntu Jammy):
status: In Progress → Fix Committed
Changed in linux-aws (Ubuntu Kinetic):
status: In Progress → Fix Committed
Revision history for this message
Tim Gardner (timg-tpi) wrote :

Patches committed. Due for release in the 2023.01.30 SRU cycle.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-aws/5.19.0-1020.21 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-kinetic' to 'verification-done-kinetic'. If the problem still exists, change the tag 'verification-needed-kinetic' to 'verification-failed-kinetic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-kinetic-linux-aws verification-needed-kinetic
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-aws/5.15.0-1031.35 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-aws verification-needed-jammy
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (154.6 KiB)

This bug was fixed in the package linux-aws - 5.15.0-1031.35

---------------
linux-aws (5.15.0-1031.35) jammy; urgency=medium

  * jammy/linux-aws: 5.15.0-1031.35 -proposed tracker (LP: #2004305)

  * Jammy update: v5.15.81 upstream stable release (LP: #2003130)
    - [Config] aws: Updates after rebase

  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2023.01.30)

  * Regression in ext4 during online resize (LP: #2003816)
    - ext4: fix bad checksum after online resize
    - ext4: fix corruption when online resizing a 1K bigalloc fs
    - SAUCE: Export ext4_superblock_csum function
    - ext4: fix corrupt backup group descriptors after online resize

  * cma alloc failure in large 5.15 arm instances (LP: #1990167)
    - [Config] aws: Disable CONFIG_CMA for arm64

  * RDMA Back port DMA buffer fix (LP: #2004807)
    - RDMA/core: Fix ib block iterator counter overflow

  [ Ubuntu: 5.15.0-66.73 ]

  * jammy/linux: 5.15.0-66.73 -proposed tracker (LP: #2004636)
  * CVE-2023-0461
    - SAUCE: Fix inet_csk_listen_start after CVE-2023-0461

  [ Ubuntu: 5.15.0-65.72 ]

  * jammy/linux: 5.15.0-65.72 -proposed tracker (LP: #2004344)
  * Packaging resync (LP: #1786013)
    - [Packaging] update variants
    - debian/dkms-versions -- update from kernel-versions (main/2023.01.30)
  * NFS: client permission error after adding user to permissible group
    (LP: #2003053)
    - NFS: Clear the file access cache upon login
    - NFS: Judge the file access cache's timestamp in rcu path
    - NFS: Fix up a sparse warning
  * Fix W6400 hang after resume of S3 stress (LP: #2000299)
    - drm/amd/display: Manually adjust strobe for DCN303
  * Rear Audio port sometimes has no audio output after reboot(Cirrus Logic)
    (LP: #1998905)
    - ALSA: hda/cirrus: Add extra 10 ms delay to allow PLL settle and lock.
  * CVE-2022-20369
    - NFSD: fix use-after-free in __nfs42_ssc_open()
  * CVE-2023-0461
    - net/ulp: prevent ULP without clone op from entering the LISTEN status
    - net/ulp: use consistent error code when blocking ULP
  * CVE-2023-0179
    - netfilter: nft_payload: incorrect arithmetics when fetching VLAN header bits
  * Jammy update: v5.15.85 upstream stable release (LP: #2003139)
    - udf: Discard preallocation before extending file with a hole
    - udf: Fix preallocation discarding at indirect extent boundary
    - udf: Do not bother looking for prealloc extents if i_lenExtents matches
      i_size
    - udf: Fix extending file within last block
    - usb: gadget: uvc: Prevent buffer overflow in setup handler
    - USB: serial: option: add Quectel EM05-G modem
    - USB: serial: cp210x: add Kamstrup RF sniffer PIDs
    - USB: serial: f81232: fix division by zero on line-speed change
    - USB: serial: f81534: fix division by zero on line-speed change
    - xhci: Apply XHCI_RESET_TO_DEFAULT quirk to ADL-N
    - igb: Initialize mailbox message for VF reset
    - usb: dwc3: pci: Update PCIe device ID for USB3 controller on CPU sub-system
      for Raptor Lake
    - HID: uclogic: Add HID_QUIRK_HIDINPUT_FORCE quirk
    - selftests: net: Use "grep -E" instead of "egrep"
    - net: loopback: use NET_NAME_PR...

Changed in linux-aws (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (189.1 KiB)

This bug was fixed in the package linux-aws - 5.19.0-1020.21

---------------
linux-aws (5.19.0-1020.21) kinetic; urgency=medium

  * kinetic/linux-aws: 5.19.0-1020.21 -proposed tracker (LP: #2004284)

  * Kinetic update: upstream stable patchset 2023-01-27 (LP: #2004051)
    - [Config] Update configs after rebase

  * cma alloc failure in large 5.15 arm instances (LP: #1990167)
    - [Config] aws: Disable CONFIG_CMA for arm64

  * RDMA Back port DMA buffer fix (LP: #2004807)
    - RDMA/core: Fix ib block iterator counter overflow

  [ Ubuntu: 5.19.0-35.36 ]

  * kinetic/linux: 5.19.0-35.36 -proposed tracker (LP: #2004652)
  * CVE-2023-0461
    - SAUCE: Fix inet_csk_listen_start after CVE-2023-0461

  [ Ubuntu: 5.19.0-34.35 ]

  * kinetic/linux: 5.19.0-34.35 -proposed tracker (LP: #2004299)
  * LXD containers using shiftfs on ZFS or TMPFS broken on 5.15.0-48.54
    (LP: #1990849)
    - [SAUCE] shiftfs: fix -EOVERFLOW inside the container
  * Kinetic update: upstream stable patchset 2023-01-27 (LP: #2004051)
    - ASoC: fsl_sai: use local device pointer
    - serial: Add rs485_supported to uart_port
    - serial: fsl_lpuart: Fill in rs485_supported
    - x86/sgx: Create utility to validate user provided offset and length
    - x86/sgx: Add overflow check in sgx_validate_offset_length()
    - binder: validate alloc->mm in ->mmap() handler
    - ceph: Use kcalloc for allocating multiple elements
    - ceph: fix NULL pointer dereference for req->r_session
    - wifi: mac80211: fix memory free error when registering wiphy fail
    - wifi: mac80211_hwsim: fix debugfs attribute ps with rc table support
    - riscv: dts: sifive unleashed: Add PWM controlled LEDs
    - audit: fix undefined behavior in bit shift for AUDIT_BIT
    - wifi: airo: do not assign -1 to unsigned char
    - wifi: mac80211: Fix ack frame idr leak when mesh has no route
    - wifi: ath11k: Fix QCN9074 firmware boot on x86
    - spi: stm32: fix stm32_spi_prepare_mbr() that halves spi clk for every run
    - selftests/bpf: Add verifier test for release_reference()
    - Revert "net: macsec: report real_dev features when HW offloading is enabled"
    - platform/x86: ideapad-laptop: Disable touchpad_switch
    - platform/x86: touchscreen_dmi: Add info for the RCA Cambio W101 v2 2-in-1
    - platform/x86/intel/pmt: Sapphire Rapids PMT errata fix
    - scsi: ibmvfc: Avoid path failures during live migration
    - scsi: scsi_debug: Make the READ CAPACITY response compliant with ZBC
    - drm: panel-orientation-quirks: Add quirk for Acer Switch V 10 (SW5-017)
    - block, bfq: fix null pointer dereference in bfq_bio_bfqg()
    - arm64/syscall: Include asm/ptrace.h in syscall_wrapper header.
    - nvmet: fix memory leak in nvmet_subsys_attr_model_store_locked
    - Revert "drm/amdgpu: Revert "drm/amdgpu: getting fan speed pwm for vega10
      properly""
    - ALSA: usb-audio: add quirk to fix Hamedal C20 disconnect issue
    - RISC-V: vdso: Do not add missing symbols to version section in linker script
    - MIPS: pic32: treat port as signed integer
    - xfrm: fix "disable_policy" on ipv4 early demux
    - xfrm: replay: Fix ESN wrap around for GSO
    - af_key: Fix send_acquire race wit...

Changed in linux-aws (Ubuntu Kinetic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-aws - 6.2.0-1002.2

---------------
linux-aws (6.2.0-1002.2) lunar; urgency=medium

  * lunar/linux-aws: 6.2.0-1002.2 -proposed tracker (LP: #2011518)

 -- Paolo Pisati <email address hidden> Tue, 14 Mar 2023 11:04:29 +0100

Changed in linux-aws (Ubuntu):
status: Invalid → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-aws-5.15/5.15.0-1046.51~20.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal-linux-aws-5.15' to 'verification-done-focal-linux-aws-5.15'. If the problem still exists, change the tag 'verification-needed-focal-linux-aws-5.15' to 'verification-failed-focal-linux-aws-5.15'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-focal-linux-aws-5.15-v2 verification-needed-focal-linux-aws-5.15
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.