Fix system hang while entering suspend with AMD Navi3x graphics

Bug #2063417 reported by Chris Chiu
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
HWE Next
New
Undecided
Unassigned
linux (Ubuntu)
Confirmed
Undecided
Unassigned
Jammy
Confirmed
Undecided
Unassigned
Noble
Confirmed
Undecided
Unassigned
linux-firmware (Ubuntu)
Invalid
Undecided
Unassigned
Jammy
Fix Committed
Undecided
Unassigned
Noble
Fix Committed
Undecided
Unassigned
linux-oem-6.5 (Ubuntu)
Invalid
Undecided
Unassigned
Jammy
Fix Released
Undecided
Unassigned
Noble
Invalid
Undecided
Unassigned

Bug Description

SRU Jusitification for Kernel

[Impact]
The system with AMD W7500/W7600/W7700 graphics will randomly hang when entering suspend. The page fault would keep happening and the system can't handle other tasks.
BUG: unable to handle page fault for address: 000000000a980148

[Fix]
Backport the fix from upstream
drm/amdgpu: skip to program GFXDEC registers for suspend abort · torvalds/linux@0326de4 · GitHub
drm/amdgpu: Reset dGPU if suspend got aborted · torvalds/linux@8b2be55 · GitHub
https://patchwork.freedesktop.org/patch/590570/ [patchwork.freedesktop.org]

[Test Case]
1. Install AMD W7500/W7600/W7700 graphics
2. Install latest firmware with dcn_3_2_0_dmcub.bin for Navi31 and 32 and dcn_3_2_1_dmcub.bin for Navi33.
3. Running fwts s3 stress test to check if system hangs

[Where problems could occur]
Improve the error handling when suspend and add the fallback mechanism in MES engine. Only observed on particular AMD models. Need to test w/ more combinations

=========================================================================================

SRU Jusitification for linux-firmware

[Impact]
The system will randomly hang due to page fault while suspending.

[Fix]
Add release FW binary from AMD to linux-firmware
dcn_3_2_0_dmcub.bin for Navi31 and 32: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/amdgpu/dcn_3_2_0_dmcub.bin?id=eb06e8bbe56cea19b8c2a23c154e2dcefd79fa47 [git.kernel.org]
dcn_3_2_1_dmcub.bin for Navi33: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/amdgpu/dcn_3_2_1_dmcub.bin?id=8b8ac15f9bce35d555b8253156053a7e2b661f6a [git.kernel.org]

[Test Case]
1. Install AMD W7500/W7600/W7700 graphics
2. Test with latest linux kernel and linux-firmware
3. Running fwts s3 stress test to check if system hangs

[Where problems could occur]
The dcn_3_2_0_dmcub only for Navi31 and dcn_3_2_1_dmcub only for Navi33. The impact are restricted to particular series.

Chris Chiu (mschiu77)
Changed in linux-oem-6.5 (Ubuntu Jammy):
status: New → In Progress
tags: added: oem-priority originate-from-2048051 somerville
Juerg Haefliger (juergh)
tags: added: kern-10794
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu Jammy):
status: New → Confirmed
Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux-firmware (Ubuntu Jammy):
status: New → Confirmed
Changed in linux-firmware (Ubuntu):
status: New → Confirmed
Changed in linux-oem-6.5 (Ubuntu):
status: New → Confirmed
Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello Chris, or anyone else affected,

Accepted linux-firmware into noble-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/linux-firmware/20240318.git3b128b60-0ubuntu2.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-noble to verification-done-noble. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-noble. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in linux-firmware (Ubuntu Noble):
status: Confirmed → Fix Committed
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

Hello Chris, or anyone else affected,

Accepted linux-firmware into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/linux-firmware/20220329.git681281e4-0ubuntu3.31 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in linux-firmware (Ubuntu Jammy):
status: Confirmed → Fix Committed
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (linux-firmware/20240318.git3b128b60-0ubuntu2.1)

All autopkgtests for the newly accepted linux-firmware (20240318.git3b128b60-0ubuntu2.1) for noble have finished running.
The following regressions have been reported in tests triggered by the package:

linux-firmware-raspi/unknown (arm64, armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/noble/update_excuses.html#linux-firmware

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Timo Aaltonen (tjaalton)
Changed in linux-oem-6.5 (Ubuntu):
status: Confirmed → Invalid
Changed in linux-oem-6.5 (Ubuntu Noble):
status: Confirmed → Invalid
Changed in linux-oem-6.5 (Ubuntu Jammy):
status: In Progress → Fix Committed
Changed in linux-firmware (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-oem-6.5/6.5.0-1023.24 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-oem-6.5' to 'verification-done-jammy-linux-oem-6.5'. If the problem still exists, change the tag 'verification-needed-jammy-linux-oem-6.5' to 'verification-failed-jammy-linux-oem-6.5'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-oem-6.5-v2 verification-needed-jammy-linux-oem-6.5
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (4.8 KiB)

This bug was fixed in the package linux-oem-6.5 - 6.5.0-1023.24

---------------
linux-oem-6.5 (6.5.0-1023.24) jammy; urgency=medium

  * jammy/linux-oem-6.5: 6.5.0-1023.24 -proposed tracker (LP: #2063580)

  * Add support for Quectel RM520N-GL modem [1eac:1007] (LP: #2063529)
    - bus: mhi: host: pci_generic: Add support for Quectel RM520N-GL modem
    - bus: mhi: host: pci_generic: Add support for Quectel RM520N-GL Lenovo
      variant

  * S2idle regression (LP: #2064595)
    - drm/amd: Evict resources during PM ops prepare() callback
    - drm/amd: Add concept of running prepare_suspend() sequence for IP blocks
    - drm/amd: Flush GFXOFF requests in prepare stage

  * Add support of TAS2781 amp of audio (LP: #2064064)
    - ALSA: hda/tas2781: Add tas2781 HDA driver
    - ALSA: hda/tas2781: Add tas2781 HDA driver
    - ALSA: hda/tas2781: handle missing EFI calibration data
    - ALSA: hda/tas2781: Add new vendor_id and subsystem_id to support ThinkPad
      ICE-1
    - ALSA: hda/realtek: tas2781: enable subwoofer volume control
    - ALSA: hda/tas2781: leave hda_component in usable state
    - ALSA: hda/tas2781: call cleanup functions only once
    - ALSA: hda/tas2781: do not use regcache
    - [Config] enable TAS2781 amplifier

  * Fix system hang while entering suspend with AMD Navi3x graphics
    (LP: #2063417)
    - drm/amdgpu: skip to program GFXDEC registers for suspend abort
    - drm/amdgpu: Reset dGPU if suspend got aborted
    - SAUCE: drm/amdgpu/mes: fix use-after-free issue

  * Add support for Quectel EM160R-GL modem [1eac:100d] (LP: #2063399)
    - bus: mhi: host: pci_generic: Add support for Quectel EM160R-GL modem

  * RTL8852BE fw security fail then lost WIFI function during suspend/resume
    cycle (LP: #2063096)
    - wifi: rtw89: download firmware with five times retry

  * Fix bluetooth connections with 3.0 device (LP: #2063067)
    - Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST

  * Fix the RTL8852CE BT FW Crash based on SER false alarm (LP: #2060904)
    - wifi: rtw89: disable txptctrl IMR to avoid flase alarm
    - SAUCE: wifi: rtw89: pci: correct TX resource checking for PCI DMA channel of
      firmware command

  * Add Cirrus Logic CS35L56 amplifier support (LP: #2062135)
    - ASoC: cs35l56: Patch soft registers to defaults
    - ASoC: cs35l56: Move shared data into a common data structure
    - ASoC: cs35l56: Make cs35l56_system_reset() code more generic
    - ASoC: cs35l56: Convert utility functions to use common data structure
    - ASoC: cs35l56: Move utility functions to shared file
    - ASoC: cs35l56: Move runtime suspend/resume to shared library
    - ASoC: cs35l56: Move cs_dsp init into shared library
    - ASoC: cs35l56: Move part of cs35l56_init() to shared library
    - ASoC: cs35l56: Make common function for control port wait
    - ASoC: cs35l56: Make a common function to shutdown the DSP
    - ALSA: hda: Fix missing header dependencies
    - ALSA: hda/cs35l56: Add driver for Cirrus Logic CS35L56 amplifier
    - ALSA: hda: realtek: Re-work CS35L41 fixups to re-use for other amps
    - ALSA: hda/realtek: Add quirks for HP G11 Laptops using CS35L56
    - ALSA: hda: cs35l56: Add ACPI ...

Read more...

Changed in linux-oem-6.5 (Ubuntu Jammy):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.