mm/gup: Issues running cudaMallocHost

Bug #2110319 reported by Carol Soto
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-nvidia (Ubuntu)
Invalid
Undecided
Unassigned
Noble
Fix Committed
Undecided
Unassigned
linux-nvidia-6.11 (Ubuntu)
Invalid
Undecided
Unassigned
Noble
Fix Released
Undecided
Unassigned

Bug Description

With kernel 6.11 we are seeing some issues with applications that pin_user_pages*(FOLL_LONGTERM).
We are seeing memory leak and failures to allocate pages when running cuda test that calls cudaMallocHost.

These 2 patches resolved the issue:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/patch/?id=aa6f8b2593b56a02043684182a89853f919dff3e
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/patch/?id=94efde1d15399f5c88e576923db9bcd422d217f2

CVE References

Revision history for this message
Ian May (ian-may) wrote :

Adding additional patch that provides Fixes: 94efde1d1539 ("mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases")

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/patch/?id=a1268be280d8e484ab3606d7476edd0f14bb9961

Changed in linux-nvidia (Ubuntu):
status: New → Invalid
Changed in linux-nvidia-6.11 (Ubuntu):
status: New → Invalid
Changed in linux-nvidia (Ubuntu Noble):
status: New → Fix Committed
Changed in linux-nvidia-6.11 (Ubuntu Noble):
status: New → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia-6.11/6.11.0-1009.9 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-nvidia-6.11' to 'verification-done-noble-linux-nvidia-6.11'. If the problem still exists, change the tag 'verification-needed-noble-linux-nvidia-6.11' to 'verification-failed-noble-linux-nvidia-6.11'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-nvidia-6.11-v2 verification-needed-noble-linux-nvidia-6.11
Revision history for this message
Carol Soto (csotog) wrote :

Verified with kernel 6.11.0-1009-nvidia-64k

tags: added: verification-done-noble-linux-nvidia-6.11
removed: verification-needed-noble-linux-nvidia-6.11
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (97.0 KiB)

This bug was fixed in the package linux-nvidia-6.11 - 6.11.0-1010.10

---------------
linux-nvidia-6.11 (6.11.0-1010.10) noble; urgency=medium

  * noble/linux-nvidia-6.11: 6.11.0-1010.10 -proposed tracker (LP: #2112217)

  * Packaging resync (LP: #1786013)
    - [Packaging] update variants

  * mt7925: Support country-specific regulatory requirements (LP: #2112155)
    - wifi: mt76: mt7925: load the appropriate CLC data based on hardware type
    - wifi: mt76: mt7925: replace chan config with extend txpower config for clc
    - wifi: mt76: mt7925: add EHT control support based on the CLC data
    - wifi: mt76: mt7925: update the channel usage when the regd domain changed
    - wifi: mt76: mt7925: remove unused acpi function for clc
    - wifi: mt76: mt792x: extend MTCL of APCI to version3 for EHT control
    - wifi: mt76: mt7925: add MTCL support to enhance the regulatory compliance
    - NVIDIA: SAUCE: MEDIATEK: wifi: mt76: mt7925: add reg hint support
    - NVIDIA: SAUCE: MEDIATEK: wifi: mt76: mt7925: update the regd by country code
      of 11d changed
    - wifi: mt76: mt7925: Fix logical vs bitwise typo

  * Backport: TPM Service Command Response Buffer Interface Over FF-A
    (LP: #2111511)
    - tpm_crb: ffa_tpm: Implement driver compliant to CRB over FF-A
    - tpm_crb: Clean-up and refactor check for idle support
    - ACPICA: Add start method for ARM FF-A
    - tpm_crb: Add support for the ARM FF-A start method
    - Documentation: tpm: Add documentation for the CRB FF-A interface
    - [Config] nvidia-6.11: Update annotations to enable TPM over FFA

  * Backport: ALSA: hda - Add new driver for HDA controllers listed via ACPI
    (LP: #2111447)
    - ALSA: hda - Add new driver for HDA controllers listed via ACPI
    - ALSA: hda: acpi: Use SYSTEM_SLEEP_PM_OPS()
    - ALSA: hda: acpi: Make driver's match data const static
    - NVIDIA: SAUCE: [Config] nvidia-6.11 CONFIG_SND_HDA_ACPI=m on arm64

  * UBSAN: shift-out-of-bounds arm-smmu-v3.c (LP: #2110750)
    - iommu/arm-smmu-v3: Fix pgsize_bit for sva domains

  [ Ubuntu: 6.11.0-26.26 ]

  * oracular/linux: 6.11.0-26.26 -proposed tracker (LP: #2107166)
  * Packaging resync (LP: #1786013)
    - [Packaging] debian.master/dkms-versions -- update from kernel-versions
      (main/2025.04.14)
  * drm/xe: prevent potential UAF in pf_provision_vf_ggtt() (LP: #2106652)
    - drm/xe: prevent potential UAF in pf_provision_vf_ggtt()
  * Oracular update: upstream stable patchset 2025-04-09 (LP: #2106703)
    - IB/mlx5: Set and get correct qp_num for a DCT QP
    - RDMA/mana_ib: Allocate PAGE aligned doorbell index
    - scsi: ufs: core: Fix ufshcd_is_ufs_dev_busy() and ufshcd_eh_timed_out()
    - ovl: fix UAF in ovl_dentry_update_reval by moving dput() in ovl_link_up
    - SUNRPC: convert RPC_TASK_* constants to enum
    - SUNRPC: Prevent looping due to rpc_signal_task() races
    - SUNRPC: Handle -ETIMEDOUT return from tlshd
    - RDMA/mlx5: Fix AH static rate parsing
    - scsi: core: Clear driver private data when retrying request
    - RDMA/mlx5: Fix bind QP error cleanup flow
    - sunrpc: suppress warnings for unused procfs functions
    - ALSA: usb-audio: Avoid dropping MIDI events at clos...

Changed in linux-nvidia-6.11 (Ubuntu Noble):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.