Fix system hang while entering suspend with AMD Navi3x graphics

Bug #2063417 reported by Chris Chiu
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
HWE Next
New
Undecided
Unassigned
linux (Ubuntu)
Confirmed
Undecided
Unassigned
Jammy
Confirmed
Undecided
Unassigned
Noble
Confirmed
Undecided
Unassigned
linux-firmware (Ubuntu)
Confirmed
Undecided
Unassigned
Jammy
Confirmed
Undecided
Unassigned
Noble
Confirmed
Undecided
Unassigned
linux-oem-6.5 (Ubuntu)
Confirmed
Undecided
Unassigned
Jammy
In Progress
Undecided
Unassigned
Noble
Confirmed
Undecided
Unassigned

Bug Description

SRU Jusitification for Kernel

[Impact]
The system with AMD W7500/W7600/W7700 graphics will randomly hang when entering suspend. The page fault would keep happening and the system can't handle other tasks.
BUG: unable to handle page fault for address: 000000000a980148

[Fix]
Backport the fix from upstream
drm/amdgpu: skip to program GFXDEC registers for suspend abort · torvalds/linux@0326de4 · GitHub
drm/amdgpu: Reset dGPU if suspend got aborted · torvalds/linux@8b2be55 · GitHub
https://patchwork.freedesktop.org/patch/590570/ [patchwork.freedesktop.org]

[Test Case]
1. Install AMD W7500/W7600/W7700 graphics
2. Install latest firmware with dcn_3_2_0_dmcub.bin for Navi31 and 32 and dcn_3_2_1_dmcub.bin for Navi33.
3. Running fwts s3 stress test to check if system hangs

[Where problems could occur]
Improve the error handling when suspend and add the fallback mechanism in MES engine. Only observed on particular AMD models. Need to test w/ more combinations

=========================================================================================

SRU Jusitification for linux-firmware

[Impact]
The system will randomly hang due to page fault while suspending.

[Fix]
Add release FW binary from AMD to linux-firmware
dcn_3_2_0_dmcub.bin for Navi31 and 32: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/amdgpu/dcn_3_2_0_dmcub.bin?id=eb06e8bbe56cea19b8c2a23c154e2dcefd79fa47 [git.kernel.org]
dcn_3_2_1_dmcub.bin for Navi33: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/amdgpu/dcn_3_2_1_dmcub.bin?id=8b8ac15f9bce35d555b8253156053a7e2b661f6a [git.kernel.org]

[Test Case]
1. Install AMD W7500/W7600/W7700 graphics
2. Test with latest linux kernel and linux-firmware
3. Running fwts s3 stress test to check if system hangs

[Where problems could occur]
The dcn_3_2_0_dmcub only for Navi31 and dcn_3_2_1_dmcub only for Navi33. The impact are restricted to particular series.

Chris Chiu (mschiu77)
Changed in linux-oem-6.5 (Ubuntu Jammy):
status: New → In Progress
tags: added: oem-priority originate-from-2048051 somerville
Juerg Haefliger (juergh)
tags: added: kern-10794
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu Jammy):
status: New → Confirmed
Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux-firmware (Ubuntu Jammy):
status: New → Confirmed
Changed in linux-firmware (Ubuntu):
status: New → Confirmed
Changed in linux-oem-6.5 (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.