Suspend (to ram) fails after upgrade from 5.13.0-35-generic to 5.13.0-37-generic

Bug #1966125 reported by Günter Neiß
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux-signed-hwe-5.13 (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

As stated in the summary, my PC can't suspend anymore after kernel upgrade.
When I click suspend from desktop:
- The screen gets blank
- (sometimes) Screen then shows up for a short time, then gets blank again
- Now PC ist "dead", no reaction an keyboard, mouse and so on.
- Even a CTR-ALT-DEL didn't show any reaction.
- There is still some activity (HD-Led flashing)
- The PC is "reachable" via network, but trying to ssh into it will not work (no response after entering password)
- I have to reboot, either by power the PC off by pressing the power button longer or by using "Magic SysRq" keyboard sequence

When I rollback (I use systemback) to Kernel 5.13.0-35-generic suspend works!

#################################################################
Information needed as directed by https://wiki.ubuntu.com/DebuggingKernelSuspend)
#################################################################
=============================
$cat /proc/acpi/wakeup
Device S-state Status Sysfs node
GPP0 S4 *enabled pci:0000:00:01.1
GP12 S4 *enabled pci:0000:00:07.1
GP13 S4 *enabled pci:0000:00:08.1
XHC0 S4 *enabled pci:0000:0e:00.3
GP30 S4 *disabled
GP31 S4 *disabled
PS2K S3 *disabled
GPP2 S4 *disabled
GPP3 S4 *disabled
GPP8 S4 *enabled pci:0000:00:03.1
SWUS S4 *enabled pci:0000:0a:00.0
SWDS S4 *enabled pci:0000:0b:00.0
GPP1 S4 *enabled pci:0000:00:01.2
=============================
sudo sh -c "sync && echo 1 > /sys/power/pm_trace && pm-suspend"

... after power up again ...

dmesg | grep -e "Magic number" -e "hash matches"
[ 3.141067] PM: Magic number: 0:247:389
[ 3.145394] PM: hash matches drivers/base/power/main.c:982
[ 3.149741] thermal cooling_device4: hash matches
=============================
sudo su
echo freezer > /sys/power/pm_test
exit
sudo sh -c "sync && echo 1 > /sys/power/pm_trace && pm-suspend"

... promt is back after a few seconds again ...

dmesg | grep -e "Magic number" -e "hash matches"
[ 3.142991] PM: Magic number: 14:995:344
[ 3.147383] tty tty61: hash matches
[ 3.151721] acpi ACPI0007:06: hash matches
=============================
sudo su
echo devices > /sys/power/pm_test
exit
sudo sh -c "sync && echo 1 > /sys/power/pm_trace && pm-suspend"

... after power up again ...

dmesg | grep -e "Magic number" -e "hash matches"
[ 3.141229] PM: Magic number: 0:247:394
[ 3.145558] PM: hash matches drivers/base/power/main.c:982
[ 3.149905] thermal cooling_device9: hash matches
=============================
sudo su
echo platform > /sys/power/pm_test
exit
sudo sh -c "sync && echo 1 > /sys/power/pm_trace && pm-suspend"

... after power up again ...

dmesg | grep -e "Magic number" -e "hash matches"
[ 3.142816] PM: Magic number: 0:247:394
[ 3.147180] PM: hash matches drivers/base/power/main.c:982
[ 3.151573] thermal cooling_device9: hash matches
=============================
sudo su
echo processors > /sys/power/pm_test
exit
sudo sh -c "sync && echo 1 > /sys/power/pm_trace && pm-suspend"

... after power up again ...

dmesg | grep -e "Magic number" -e "hash matches"
[ 3.141830] PM: Magic number: 0:247:394
[ 3.146148] PM: hash matches drivers/base/power/main.c:982
[ 3.150488] thermal cooling_device9: hash matches
=============================
sudo su
echo core > /sys/power/pm_test
exit
sudo sh -c "sync && echo 1 > /sys/power/pm_trace && pm-suspend"

... after power up again ...

dmesg | grep -e "Magic number" -e "hash matches"
[ 3.140971] PM: Magic number: 0:247:394
[ 3.145294] PM: hash matches drivers/base/power/main.c:982
[ 3.149639] thermal cooling_device9: hash matches
=============================
sudo su
echo none > /sys/power/pm_test
exit
sudo sh -c "sync && echo 1 > /sys/power/pm_trace && pm-suspend"

... after power up again ...

dmesg | grep -e "Magic number" -e "hash matches"
[ 3.143489] PM: Magic number: 0:247:394
[ 3.147856] PM: hash matches drivers/base/power/main.c:982
[ 3.152244] thermal cooling_device9: hash matches
=============================
$ sudo cat /sys/kernel/debug/suspend_stats
success: 0
fail: 0
failed_freeze: 0
failed_prepare: 0
failed_suspend: 0
failed_suspend_late: 0
failed_suspend_noirq: 0
failed_resume: 0
failed_resume_early: 0
failed_resume_noirq: 0
failures:
  last_failed_dev:

  last_failed_errno: 0
   0
  last_failed_step:
=============================

#################################################################
System information:
#################################################################

=============================
lsb_release -rd
Description: Ubuntu 20.04.4 LTS
Release: 20.04
=============================

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.13.0-37-generic 5.13.0-37.42~20.04.1
ProcVersionSignature: Ubuntu 5.13.0-37.42~20.04.1-generic 5.13.19
Uname: Linux 5.13.0-37-generic x86_64
ApportVersion: 2.20.11-0ubuntu27.21
Architecture: amd64
CasperMD5CheckResult: skip
CurrentDesktop: ubuntu:GNOME
Date: Wed Mar 23 21:15:22 2022
InstallationDate: Installed on 2020-12-30 (448 days ago)
InstallationMedia:

SourcePackage: linux-signed-hwe-5.13
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Günter Neiß (gneiss) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-signed-hwe-5.13 (Ubuntu):
status: New → Confirmed
Revision history for this message
Darren Spiteri (dspiteri) wrote :

Same issue for me on an AMD system after booting into 5.13.0-37-generic.

Revision history for this message
Günter Neiß (gneiss) wrote (last edit ):

I had already found the reason for this behavior!
The only change in the "drivers/base/power/" Subdirectory is inside main.c line 1912.

There is a "while loop" that loops over a list (dpm_list).

The (principal) task of this loop is to call "device_prepare" on every list-element.
Every element that is processed (without error) is then removed from the list and placed into a "prepared" list (I leave out the special case of EAGAIN here, because its not necessary to understood the bug).

In the prev. version this loop is terminated when the list is empty OR when there is an error.
In the newer revision the second condition was removed.

Removing the second condition will result in an endless loop whenever an error occurs!

Right now I am not sure what will be the correct way to handle an error (on device_prepare)...
I guess that the correct way should be to "skip/mark" this element, continue to process the rest of the elements and do special handling later on the marked elements.
But, because the prev. revision simply ignores the failed element AND all remaining (I believe that the second part is definitely wrong) , this should work here too.

I am currently trying to generate a kernel with this line reverted to prev. version and check if this works..

I am able to "make" the kernel (have already compiled it), but unsure what ".config" I should use (to be as near as possible with the original).
So if someone here can direct me, I will do the necessary tests..

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.