systemctl suspend stopped working: laptop immeadeately wakes up again

Bug #1953235 reported by Ufos92
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux-signed-hwe-5.11 (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I always suspend my laptop using `systemctl suspend`. It worked for a year as expected until today.

Buggy Behavior: after `systemctl suspend` the seemingly usual suspend happens, however the laptop wakes up immediately. Things get more weird from here, as I need to input my password twice to get back into GNOME.

Troubleshooting I did:
1. apt update && apt dist-upgrade
2. Disabled nvidia-suspend.service, nvidia-persistenced.service, nvidia-resume.service
3. Booted with an earlier kernel version where it used to work: `5.11.0-40`, `5.11.0-38`
3. Detached all devices, and the power cable
4. Rebooted into a separate user that has no tweaks, gnome-extensions etc
5. The issue persists -- I attached relevant parts of `journalctl`: from the time when I used `systemctl suspend` till a minute or so after I logged back into GNOME.

Additionally:

```
user@myLaptop:~$ lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Root Complex
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0]
00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0]
00:01.6 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0]
00:01.7 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0]
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal PCIe GPP Bridge 0 to Bus A
00:08.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal PCIe GPP Bridge 0 to Bus B
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 7
01:00.0 3D controller: NVIDIA Corporation TU117M [GeForce GTX 1650 Mobile / Max-Q] (rev a1)
02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 25)
04:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter (rev 32)
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c2)
05:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Raven/Raven2/Fenghuang HDMI/DP Audio Controller
05:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor
05:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1
05:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1
05:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller
06:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 61)
```

```
user@myLaptop:~$ cat /proc/acpi/wakeup
Device S-state Status Sysfs node
GPP0 S4 *enabled pci:0000:00:01.1
GPP1 S4 *enabled pci:0000:00:01.2
GPP2 S4 *disabled
GPP3 S4 *disabled
GPP4 S4 *disabled
GPP5 S4 *enabled pci:0000:00:01.6
GP17 S4 *enabled pci:0000:00:08.1
XHC0 S4 *enabled pci:0000:05:00.3
XHC1 S4 *enabled pci:0000:05:00.4
GP18 S4 *enabled pci:0000:00:08.2
```

----

Not sure if related, but the bug appeared some time after I installed some packages for CUDA support following these instructions: https://www.tensorflow.org/install/gpu#ubuntu_1804_cuda_110

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.11.0-41-generic 5.11.0-41.45~20.04.1
ProcVersionSignature: Ubuntu 5.11.0-41.45~20.04.1-generic 5.11.22
Uname: Linux 5.11.0-41-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair nvidia_modeset nvidia
ApportVersion: 2.20.11-0ubuntu27.21
Architecture: amd64
CasperMD5CheckResult: skip
CurrentDesktop: ubuntu:GNOME
Date: Sat Dec 4 20:03:08 2021
InstallationDate: Installed on 2020-10-16 (414 days ago)
InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
SourcePackage: linux-signed-hwe-5.11
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Ufos92 (ufos92) wrote :
Revision history for this message
Ufos92 (ufos92) wrote (last edit ):

Reading the debugging instructions: https://wiki.ubuntu.com/DebuggingKernelHibernate

1. `sudo pm-suspend` works from no-DE tty3 (ctrl+alt+f3), but only if I never log in into GNOME.
2. `sudo pm-suspend` from GNOME goes: dark screen, gets me back into GNOME with all my windows open, but soft-locks the system. Need to ctrl+alt+f3 and `sudo reboot`.
3. `(sudo) systemctl suspend` from no-DE tty3 doesn't have any effect. Calling it again tells me a suspend process is in progress...

Revision history for this message
Ufos92 (ufos92) wrote :

Is it already apparent what causes my issues, or shall I do full cycle as described here: https://wiki.ubuntu.com/DebuggingKernelHibernate#Per_sub-system_hibernate_testing

affects: ubuntu → linux-signed-hwe-5.11 (Ubuntu)
Revision history for this message
Ufos92 (ufos92) wrote :

Decided to get rid of nvidia just in case. Didn't help.

```shell
sudo apt remove --purge '.*nvidia.*'
sudo apt reinstall --install-recommends linux-generic-hwe-20.04
```

---

> `linux-generic-hwe-20.04` this thing being rolling is really not helpful. Can't even downgrade.

Revision history for this message
Ufos92 (ufos92) wrote :

Downgraded using bazooka method.

```shell
sudo apt install linux-generic
sudo apt purge '.*-hwe-.*'
dpkg -l | grep -i 5.11.0- | awk '{print $2}' | xargs -n1 sudo apt purge -y
sudo apt autoremove
```

Didn't help.

Revision history for this message
Ufos92 (ufos92) wrote :

Sorta solved it.

So, after all the above shenanigans `sudo pm-suspend` finally started working from GNOME.

So I went after `systemctl suspend`.

`journalctl -b | grep suspend`

was throwing a bunch of
```
 Could not acquire inhibitor lock:
GDBus.Error:org.freedesktop.DBus.Error.NoReply: Message recipient disconnected
```

which is very helpful to debug what the issue is /s

Luckily, someone here already managed to debug it: https://<email address hidden>/msg5950676.html

`/etc/systemd/system/systemd-suspend.service.requires/` had nvidia left-overs. After deleting them
`sudo rm -rf /etc/systemd/system/systemd-suspend.service.requires/` my `systemctl suspend` now works.

Always remember, kids: https://www.youtube.com/watch?v=_36yNWw_07g

Revision history for this message
Ufos92 (ufos92) wrote :

Reverted to the mainline kernel and hwe:

`sudo apt install linux-generic-hwe-20.04 --install-recommends`

suspend still works

Revision history for this message
Ufos92 (ufos92) wrote :

Installed nvidia drivers

`sudo apt install nvidia-driver-470 --install-recommends`

suspend broke. Did `sudo prime-select on-demand`, but no luck.

```
user@myLaptop:~$ journalctl -b | grep suspend
Dec 06 23:01:18 myLaptop ModemManager[1620]: <info> [sleep-monitor] system is about to suspend
Dec 06 23:01:24 myLaptop kernel: PM: suspend entry (deep)
Dec 06 23:01:25 myLaptop kernel: printk: Suspending console(s) (use no_console_suspend to debug)
Dec 06 23:01:25 myLaptop kernel: NVRM: GPU 0000:01:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Support' section in the driver README.
Dec 06 23:01:25 myLaptop kernel: PM: pci_pm_suspend(): nv_pmops_suspend+0x0/0x30 [nvidia] returns -5
Dec 06 23:01:25 myLaptop kernel: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -5
Dec 06 23:01:25 myLaptop kernel: PM: Device 0000:01:00.0 failed to suspend async: error -5
Dec 06 23:01:25 myLaptop kernel: PM: Some devices failed to suspend, or early wake event detected
Dec 06 23:01:26 myLaptop kernel: PM: suspend exit
Dec 06 23:01:26 myLaptop kernel: PM: suspend entry (s2idle)
Dec 06 23:01:28 myLaptop kernel: printk: Suspending console(s) (use no_console_suspend to debug)
Dec 06 23:01:28 myLaptop kernel: NVRM: GPU 0000:01:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Support' section in the driver README.
Dec 06 23:01:28 myLaptop kernel: PM: pci_pm_suspend(): nv_pmops_suspend+0x0/0x30 [nvidia] returns -5
Dec 06 23:01:28 myLaptop kernel: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -5
Dec 06 23:01:28 myLaptop kernel: PM: Device 0000:01:00.0 failed to suspend async: error -5
Dec 06 23:01:28 myLaptop kernel: PM: Some devices failed to suspend, or early wake event detected
Dec 06 23:01:28 myLaptop systemd-sleep[3748]: Failed to suspend system. System resumed again: Input/output error
Dec 06 23:01:28 myLaptop kernel: PM: suspend exit
Dec 06 23:01:29 myLaptop systemd[1]: systemd-suspend.service: Main process exited, code=exited, status=1/FAILURE
Dec 06 23:01:29 myLaptop systemd[1]: systemd-suspend.service: Failed with result 'exit-code'.
Dec 06 23:01:29 myLaptop systemd[1]: suspend.target: Job suspend.target/start failed with result 'dependency'.
```

Revision history for this message
Ufos92 (ufos92) wrote :

For whatever reason (old config, or nvidia unreasonable default) the new driver got installed with `PreserveVideoMemoryAllocations=1`, which without additional nvidia services doesn't work, and so suspend breaks.

To fix:
```shell
echo 'options nvidia NVreg_PreserveVideoMemoryAllocations=0' | sudo tee /etc/modprobe.d/nvidia.conf
sudo update-initramfs -u # this took a long while to figure out
```

Make sure `/etc/modprobe.d/anything.conf` doesn't have conflicting options.

Suspend now works, case closed.

https://www.youtube.com/watch?v=_36yNWw_07g

Revision history for this message
joe (ml-ubuntu) wrote :

You saved my day.

I also install cuda stuff from nvidia driver repository (using ubuntu 20.04 and needed cuda 11)
/etc/systemd/system/systemd-suspend.service.requires contained symbolic links which point to /lib/systemd/system/nvidia-resume.service

removing these symbolic links made suspend work again.

another symptom was: systemctl suspend was not working but pm-suspend was working.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-signed-hwe-5.11 (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.