AMD: Suspend not working when some cores are disabled through cpufreq

Bug #1954930 reported by You-Sheng Yang
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
HWE Next
New
Undecided
Unassigned
linux (Ubuntu)
Fix Released
High
You-Sheng Yang
Focal
Invalid
Undecided
Unassigned
Impish
Fix Released
High
You-Sheng Yang
Jammy
Fix Released
High
You-Sheng Yang
linux-oem-5.14 (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
High
You-Sheng Yang
Impish
Invalid
Undecided
Unassigned
Jammy
Invalid
Undecided
Unassigned

Bug Description

[SRU Justification]

[Impact]

Detailed in https://gitlab.freedesktop.org/drm/amd/-/issues/1708, taking
some cpu cores offline using cpufreq gadgets or via sysfs may hang the
system.

[Fix]

In v5.16-rc1 commit d6b88ce2eb9d ("ACPI: processor idle: Allow playing
dead in C3 state") fixes this issue.

[Test Case]

As stated in aforementioned bug url, setup cpufreq extention to take
down a few cpu cores, and trigger system suspend. There are ~50% chances
that networking/input/... would hang and the user can only reboot by
sysrq keys.

[Where problems could occur]

According to the patch discussion thread in
https://<email address hidden>/,
the limitation to allow enter_dead in no more than ACPI_STATE_C2 might
not have a practical meaning, but simply C2 was the deepest supported
then.

[Other Info]

While this is currently only available in v5.16-rc1 and affects AMD
Cezanne/Barcelo, oem-5.14/impish and jammy are nominated.

========== original bug report ==========

https://gitlab.freedesktop.org/drm/amd/-/issues/1708

Reproduce steps:
1. Install cpufeq gnome extension (https://extensions.gnome.org/extension/1082/cpufreq/)
2. Click on the cpu freq extention in the top bar
3. Slide the "cores online" from 16 to 3
4. close lid of laptop

Expected result: the laptop goes into suspend
Actual result: the laptop stay on but screen is now always black and keyboard input is ignored

Fix committed to v5.16-rc1: https://github.com/torvalds/linux/commit/d6b88ce2eb9d2698eb24451eb92c0a1649b17bb1
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu27.20
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: ubuntu 1188 F.... pulseaudio
 /dev/snd/controlC2: ubuntu 1188 F.... pulseaudio
 /dev/snd/controlC0: ubuntu 1188 F.... pulseaudio
CasperMD5CheckResult: skip
Dependencies:

DistributionChannelDescriptor:
 # This is the distribution channel descriptor for the OEM CDs
 # For more information see http://wiki.ubuntu.com/DistributionChannelDescriptor
 canonical-oem-somerville-focal-amd64-20200502-85+fossa-edge-staging+X152
DistroRelease: Ubuntu 20.04
InstallationDate: Installed on 2021-09-09 (97 days ago)
InstallationMedia: Ubuntu 20.04 "Focal" - Build amd64 LIVE Binary 20200502-05:58
IwConfig:
 lo no wireless extensions.

 enp1s0f0 no wireless extensions.
Lsusb:
 Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 003 Device 002: ID 062a:4c01 MosArt Semiconductor Corp. 2.4G INPUT DEVICE
 Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: AMD Celadon-CZN
Package: linux-firmware 1.187.23+staging.38 [origin: LP-PPA-canonical-hwe-team-linux-firmware-staging]
PackageArchitecture: all
ProcFB: 0 amdgpu
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.14.0-9011-oem root=UUID=668f30b7-78ec-472e-9916-c9b1cbdbbbc6 ro automatic-oem-config no_console_suspend
ProcVersionSignature: Ubuntu 5.14.0-9011.11+staging.37-oem 5.14.20
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-5.14.0-9011-oem N/A
 linux-backports-modules-5.14.0-9011-oem N/A
 linux-firmware 1.187.23+staging.38
RfKill:

Tags: third-party-packages focal
Uname: Linux 5.14.0-9011-oem x86_64
UnreportableReason: This is not an official Ubuntu package. Please remove any third party package and try again.
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: True
dmi.bios.date: 06/30/2021
dmi.bios.release: 19.1
dmi.bios.vendor: INSYDE Corp.
dmi.bios.version: RLD1005B_AB
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: Celadon-CZN
dmi.board.vendor: AMD
dmi.board.version: Base Board Version
dmi.chassis.asset.tag: Chassis Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: Chassis Manufacturer
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnINSYDECorp.:bvrRLD1005B_AB:bd06/30/2021:br19.1:svnAMD:pnCeladon-CZN:pvr1:rvnAMD:rnCeladon-CZN:rvrBaseBoardVersion:cvnChassisManufacturer:ct10:cvrChassisVersion:sku123456789:
dmi.product.family: Renoir
dmi.product.name: Celadon-CZN
dmi.product.sku: 123456789
dmi.product.version: 1
dmi.sys.vendor: AMD

CVE References

You-Sheng Yang (vicamo)
tags: added: amd oem-priority originate-from-1954322
description: updated
Changed in linux (Ubuntu Focal):
status: New → Invalid
Changed in linux-oem-5.14 (Ubuntu Impish):
status: New → Invalid
Changed in linux-oem-5.14 (Ubuntu Jammy):
status: New → Invalid
You-Sheng Yang (vicamo)
Changed in linux (Ubuntu Impish):
status: New → In Progress
importance: Undecided → High
assignee: nobody → You-Sheng Yang (vicamo)
Changed in linux (Ubuntu Jammy):
status: New → In Progress
importance: Undecided → High
assignee: nobody → You-Sheng Yang (vicamo)
Changed in linux-oem-5.14 (Ubuntu Focal):
status: New → In Progress
importance: Undecided → High
assignee: nobody → You-Sheng Yang (vicamo)
Revision history for this message
You-Sheng Yang (vicamo) wrote :

[ 123.964355] r8169 0000:01:00.0 enp1s0f0: Link is Down
[ 124.353255] PM: suspend entry (s2idle)
[ 124.366348] Filesystems sync: 0.013 seconds
[ 124.983434] rfkill: input handler enabled
[ 125.062413] Freezing user space processes ... (elapsed 0.001 seconds) done.
[ 125.064287] OOM killer disabled.
[ 125.064288] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[ 125.390929] ACPI: EC: interrupt blocked
[ 125.469200] ACPI: EC: interrupt unblocked
[ 125.510332] pci 0000:00:00.2: can't derive routing for PCI INT A
[ 125.510335] pci 0000:00:00.2: PCI INT A: no GSI
[ 125.511064] [drm] PCIE GART of 1024M enabled.
[ 125.511068] [drm] PTB located at 0x000000F400900000
[ 125.511083] amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
[ 125.512588] amdgpu 0000:03:00.0: amdgpu: dpm has been disabled
[ 125.513573] amdgpu 0000:03:00.0: amdgpu: SMU is resumed successfully!
[ 125.522694] nvme nvme0: Shutdown timeout set to 10 seconds
[ 125.526041] nvme nvme0: 16/0/0 default/read/poll queues
[ 125.691534] amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma0 test failed (-110)
[ 125.691698] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <sdma_v4_0> failed -110
[ 125.691806] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
[ 125.691808] PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
[ 125.691820] amdgpu 0000:03:00.0: PM: failed to resume async: error -110
[ 125.693966] OOM killer enabled.
[ 125.693967] Restarting tasks ... done.
[ 125.702960] PM: suspend exit
[ 201.335694] sysrq: This sysrq operation is disabled.
[ 201.543676] sysrq: This sysrq operation is disabled.
[ 201.719682] sysrq: This sysrq operation is disabled.
[ 202.231681] sysrq: Emergency Sync
[ 202.240025] Emergency Sync complete
[ 203.031695] sysrq: Emergency Remount R/O

tags: added: apport-collected focal third-party-packages
description: updated
Revision history for this message
You-Sheng Yang (vicamo) wrote : AlsaInfo.txt

apport information

Revision history for this message
You-Sheng Yang (vicamo) wrote : CRDA.txt

apport information

Revision history for this message
You-Sheng Yang (vicamo) wrote : CurrentDmesg.txt

apport information

Revision history for this message
You-Sheng Yang (vicamo) wrote : Lspci.txt

apport information

Revision history for this message
You-Sheng Yang (vicamo) wrote : Lspci-vt.txt

apport information

Revision history for this message
You-Sheng Yang (vicamo) wrote : Lsusb-t.txt

apport information

Revision history for this message
You-Sheng Yang (vicamo) wrote : Lsusb-v.txt

apport information

Revision history for this message
You-Sheng Yang (vicamo) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
You-Sheng Yang (vicamo) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
You-Sheng Yang (vicamo) wrote : ProcEnviron.txt

apport information

Revision history for this message
You-Sheng Yang (vicamo) wrote : ProcInterrupts.txt

apport information

Revision history for this message
You-Sheng Yang (vicamo) wrote : ProcModules.txt

apport information

Revision history for this message
You-Sheng Yang (vicamo) wrote : UdevDb.txt

apport information

Revision history for this message
You-Sheng Yang (vicamo) wrote : WifiSyslog.txt

apport information

Revision history for this message
You-Sheng Yang (vicamo) wrote : acpidump.txt

apport information

Revision history for this message
You-Sheng Yang (vicamo) wrote :
description: updated
Timo Aaltonen (tjaalton)
Changed in linux-oem-5.14 (Ubuntu Focal):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Impish):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Jammy):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-oem-5.14/5.14.0-1012.12 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
You-Sheng Yang (vicamo) wrote :

verified linux-oem-5.14/focal-proposed version 5.14.0-1012.12.

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-oem-5.14 - 5.14.0-1013.13

---------------
linux-oem-5.14 (5.14.0-1013.13) focal; urgency=medium

  * focal/linux-oem-5.14: 5.14.0-1013.13 -proposed tracker (LP: #1955464)

  * devices on thunderbolt dock are not recognized on adl-p platform
    (LP: #1955016)
    - SAUCE: thunderbolt: Runtime PM activate both ends of the device link
    - SAUCE: thunderbolt: Tear down existing tunnels when resuming from hibernate
    - SAUCE: thunderbolt: Runtime resume USB4 port when retimers are scanned
    - SAUCE: thunderbolt: Do not allow subtracting more NFC credits than
      configured
    - SAUCE: thunderbolt: Do not program path HopIDs for USB4 routers
    - SAUCE: thunderbolt: Add debug logging of DisplayPort resource allocation

 -- Chia-Lin Kao (AceLan) <email address hidden> Tue, 21 Dec 2021 16:59:25 +0800

Changed in linux-oem-5.14 (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.13.0-24.24 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-impish' to 'verification-done-impish'. If the problem still exists, change the tag 'verification-needed-impish' to 'verification-failed-impish'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-impish
Revision history for this message
You-Sheng Yang (vicamo) wrote :

verified linux/impish version 5.13.0-26.27.

tags: added: verification-done-impish
removed: verification-needed-impish
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.15.0-17.17

---------------
linux (5.15.0-17.17) jammy; urgency=medium

  * jammy/linux: 5.15.0-17.17 -proposed tracker (LP: #1957809)

 -- Andrea Righi <email address hidden> Thu, 13 Jan 2022 17:11:21 +0100

Changed in linux (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (74.6 KiB)

This bug was fixed in the package linux - 5.13.0-28.31

---------------
linux (5.13.0-28.31) impish; urgency=medium

  * amd_sfh: Null pointer dereference on early device init causes early panic
    and fails to boot (LP: #1956519)
    - HID: amd_sfh: Fix potential NULL pointer dereference

  * impish: ddebs build take too long and times out (LP: #1957810)
    - [Packaging] enforce xz compression for ddebs

  * audio mute/ mic mute are not working on a HP machine (LP: #1955691)
    - ALSA: hda/realtek: fix mute/micmute LEDs for a HP ProBook

  * rtw88_8821ce causes freeze (LP: #1927808)
    - rtw88: Disable PCIe ASPM while doing NAPI poll on 8821CE

  * alsa/sdw: fix the audio sdw codec parsing logic in the acpi table
    (LP: #1955686)
    - ALSA: hda: intel-sdw-acpi: harden detection of controller
    - ALSA: hda: intel-sdw-acpi: go through HDAS ACPI at max depth of 2

  * icmp_redirect from selftests fails on F/kvm (unary operator expected)
    (LP: #1938964)
    - selftests: icmp_redirect: pass xfail=0 to log_test()

  * Impish update: upstream stable patchset 2021-12-17 (LP: #1955180)
    - arm64: zynqmp: Do not duplicate flash partition label property
    - arm64: zynqmp: Fix serial compatible string
    - ARM: dts: sunxi: Fix OPPs node name
    - arm64: dts: allwinner: h5: Fix GPU thermal zone node name
    - arm64: dts: allwinner: a100: Fix thermal zone node name
    - staging: wfx: ensure IRQ is ready before enabling it
    - ARM: dts: NSP: Fix mpcore, mmc node names
    - scsi: lpfc: Fix list_add() corruption in lpfc_drain_txq()
    - arm64: dts: rockchip: Disable CDN DP on Pinebook Pro
    - arm64: dts: hisilicon: fix arm,sp805 compatible string
    - RDMA/bnxt_re: Check if the vlan is valid before reporting
    - bus: ti-sysc: Add quirk handling for reinit on context lost
    - bus: ti-sysc: Use context lost quirk for otg
    - usb: musb: tusb6010: check return value after calling
      platform_get_resource()
    - usb: typec: tipd: Remove WARN_ON in tps6598x_block_read
    - ARM: dts: ux500: Skomer regulator fixes
    - staging: rtl8723bs: remove possible deadlock when disconnect (v2)
    - ARM: BCM53016: Specify switch ports for Meraki MR32
    - arm64: dts: qcom: msm8998: Fix CPU/L2 idle state latency and residency
    - arm64: dts: qcom: ipq6018: Fix qcom,controlled-remotely property
    - arm64: dts: freescale: fix arm,sp805 compatible string
    - ASoC: SOF: Intel: hda-dai: fix potential locking issue
    - clk: imx: imx6ul: Move csi_sel mux to correct base register
    - ASoC: nau8824: Add DMI quirk mechanism for active-high jack-detect
    - scsi: advansys: Fix kernel pointer leak
    - ALSA: intel-dsp-config: add quirk for APL/GLK/TGL devices based on ES8336
      codec
    - firmware_loader: fix pre-allocated buf built-in firmware use
    - ARM: dts: omap: fix gpmc,mux-add-data type
    - usb: host: ohci-tmio: check return value after calling
      platform_get_resource()
    - ARM: dts: ls1021a: move thermal-zones node out of soc/
    - ARM: dts: ls1021a-tsn: use generic "jedec,spi-nor" compatible for flash
    - ALSA: ISA: not for M68K
    - tty: tty_buffer: Fix the softlockup issue in flush_to_ldisc
    - MIPS: sni:...

Changed in linux (Ubuntu Impish):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers