drm/amdkfd: relax checks for over allocation of save area

Bug #2133740 reported by Benjamin Wheeler
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
New
Undecided
Benjamin Wheeler
Questing
New
Undecided
Benjamin Wheeler

Bug Description

This is a tracking bug for inclusion of the upstream Linux kernel commit "drm/amdkfd: relax checks for over allocation of save area" into [Q/R] linux packages.

SRU Justification:

[Impact]

This will fix issues with certain AMD Strix Halo APUs running into hangs when paired with particular versions of AMD ROCm.

[Fix]

Include commit d15deafab5d722afb9e2f83c5edcdef9d9d98bd1 from the upstream mainline linux kernel into our generic kernels.

[Test Plan]

The reproducer script below should be sufficient to confirm the bug is no longer present:

export PYTORCH_ROCM_ARCH=gfx1151
export TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1
uv init --python 3.13
uv add --index therock_nightly=https://rocm.nightlies.amd.com/v2/gfx1151/ --index-strategy unsafe-best-match --prerelease allow "rocm-sdk-core==7.10.0a20251015" "rocm[libraries]==7.10.0a20251015" "torch==2.10.0a0+rocm7.10.0a20251015" "torchvision==0.25.0a0+rocm7.10.0a20251015" "torchaudio==2.8.0a0+rocm7.10.0a20251015" "pytorch-triton-rocm==3.5.0+gitb0cf18f2.rocm7.10.0a20251015"
uv add git+https://github.com/huggingface/diffusers.git
uv add git+https://github.com/ivanfioravanti/qwen-image-mps.git
uv run qwen-image-mps edit -i mushroom1.png mouse1.png -p "The mouse is under the mushroom."

(source: https://github.com/ROCm/ROCm/issues/5590#issuecomment-3481580910)

We don't have any Strix Halo hardware to test this fix on, so there currently is no test plan. I am currently investigating some Kraken Point APUs that may be suitable, if I am able to provision them with a Questing kernel. Currently, because they install the OEM kernel, they always install noble. WIP.

[Where problems could occur]

This could introduce regressions for folks utilizing Strix Halo APUs with ROCm. They may experience issues with version desync, as this patch is intended to be paired with a user-space update of ROCm. (See https://github.com/ROCm/rocm-systems/commit/770f30bc4c72d763742e39932e2c0583813d531f for details on the exact version.)

Changed in linux (Ubuntu Questing):
assignee: nobody → Benjamin Wheeler (benjaminwheeler)
Revision history for this message
Benjamin Wheeler (benjaminwheeler) wrote :

Resolute will already have this by virtue of using a new enough kernel version. This patch will land in 6.18.

no longer affects: linux (Ubuntu Resolute)
tags: added: kernel-daily-bug
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.