[amdgpu] suspend to ram (standby) crashes amdgpu (Ryzen-7 Vega - HP Envy x360)

Bug #1881494 reported by kolAflash
34
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

suspend to ram crashes amdgpu.
System is still accessible via SSH after resume from standby.
Happens when X is running (gdm/gnome with 3d acceleration) and also when X is stopped.

Also the computer never really goes into standby. The cpu fan is running all the time and the power button is fully lighted up. Nevertheless, I need to press CTRL to "wake" it, so I can access it via SSH.

I'll attach the dmesg output of several attempts.
This is the hardware:
https://store.hp.com/GermanyStore/Merch/Product.aspx?id=9YN58EA&opt=ABD&sel=NTB
Tested EFI and BIOS bootmode.

Please let me know if I can help tracking this down.
I'm a software developer (sadly not a kernel developer), so you can ask dirty technical questions ;-)

Workaround:
Enable hibernation (suspend to disk) and use instead of suspend to ram (standby).

What didn't help:
Kernel parameters: no_console_suspend nomodeset amdgpu.gpu_recovery=1 init_on_free=0 idle=nowait amd_iommu=flush
echo mem > /sys/power/state
systemctl start suspend.target
echo 0 > /sys/power/pm_async
Upgrading kernel from 5.4 to: 5.6.0-1010-oem

Using the kernel parameters
  no_console_suspend nomodeset amdgpu.gpu_recovery=1 init_on_free=0
  idle=nowait amd_iommu=flush
and doing
  echo mem > /sys/power/state
before suspend made the system acutally not crash.
Nevertheless, the cpu fan and the power button stay full on/lighted. So the system doesn't really go to standby either.

I'll try to find the minimal combination of kernel parameters.
nomodeset seems to be essential, but not sufficient.
Without nomodeset the system wakes again for a few seconds, but freezes a moment later (ssh still accessible).

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: xorg 1:7.7+19ubuntu14
ProcVersionSignature: Ubuntu 5.6.0-1010.10-oem 5.6.8
Uname: Linux 5.6.0-1010-oem x86_64
ApportVersion: 2.20.11-0ubuntu27.2
Architecture: amd64
BootLog: Error: [Errno 13] Keine Berechtigung: '/var/log/boot.log'
CasperMD5CheckResult: skip
CompositorRunning: None
CurrentDesktop: ubuntu:GNOME
Date: Sun May 31 18:43:54 2020
DistUpgraded: Fresh install
DistroCodename: focal
DistroVariant: ubuntu
ExtraDebuggingInterest: Yes, including running git bisection searches
GpuHangFrequency: Continuously
GpuHangReproducibility: Yes, I can easily reproduce it
GpuHangStarted: Immediately after installing this version of Ubuntu
GraphicsCard:
 Advanced Micro Devices, Inc. [AMD/ATI] Picasso [1002:15d8] (rev c1) (prog-if 00 [VGA controller])
   Subsystem: Hewlett-Packard Company Picasso [103c:85de]
InstallationDate: Installed on 2020-05-29 (1 days ago)
InstallationMedia: Ubuntu 20.04 LTS "Focal Fossa" - Release amd64 (20200423)
MachineType: HP HP ENVY x360 Convertible 13-ar0xxx
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.6.0-1010-oem root=/dev/mapper/vgubuntu-root ro quiet splash vt.handoff=7
SourcePackage: xorg
Symptom: display
Title: Xorg freeze
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 12/26/2019
dmi.bios.vendor: AMI
dmi.bios.version: F.19
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: 85DE
dmi.board.vendor: HP
dmi.board.version: 41.36
dmi.chassis.type: 31
dmi.chassis.vendor: HP
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnAMI:bvrF.19:bd12/26/2019:svnHP:pnHPENVYx360Convertible13-ar0xxx:pvr:rvnHP:rn85DE:rvr41.36:cvnHP:ct31:cvrChassisVersion:
dmi.product.family: 103C_5335KV HP Envy
dmi.product.name: HP ENVY x360 Convertible 13-ar0xxx
dmi.product.sku: 9YN58EA#ABD
dmi.sys.vendor: HP
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.101-2
version.libgl1-mesa-dri: libgl1-mesa-dri 20.0.4-2ubuntu1
version.libgl1-mesa-glx: libgl1-mesa-glx N/A
version.xserver-xorg-core: xserver-xorg-core 2:1.20.8-2ubuntu2
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:19.1.0-1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20200226-1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.16-1

Revision history for this message
kolAflash (colaflash) wrote :
summary: - suspend to ram (standby) crashes amdgpu
+ suspend to ram (standby) crashes amdgpu (Ryzen-7 Vega - HP Envy x360)
kolAflash (colaflash)
description: updated
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It sounds like some part of the system has crashed. To help us find the cause of the crash please follow these steps:

1. Look in /var/crash for crash files and if found run:
    ubuntu-bug YOURFILE.crash
Then tell us the ID of the newly-created bug.

2. If step 1 failed then look at https://errors.ubuntu.com/user/ID where ID is the content of file /var/lib/whoopsie/whoopsie-id on the machine. Do you find any links to recent problems on that page? If so then please send the links to us.

3. If step 2 also failed then apply the workaround from bug 994921, reboot, reproduce the crash, and retry step 1.

Please take care to avoid attaching .crash files to bugs as we are unable to process them as file attachments. It would also be a security risk for yourself.

summary: - suspend to ram (standby) crashes amdgpu (Ryzen-7 Vega - HP Envy x360)
+ [amdgpu] suspend to ram (standby) crashes amdgpu (Ryzen-7 Vega - HP Envy
+ x360)
affects: xorg (Ubuntu) → xorg-server (Ubuntu)
Changed in xorg-server (Ubuntu):
status: New → Incomplete
Revision history for this message
kolAflash (colaflash) wrote :

I attached the dmesg output. (somehow missed to upload that file in my previous post)

And I found this link via my machine id on errors.ubuntu.com
https://errors.ubuntu.com/oops/8b1c6280-a200-11ea-a550-fa163e6cac46

Also ran: ubuntu-bug /var/crash/_usr_lib_xorg_Xorg.1000.crash
But it returned no bug id!?
Instead another link appeared on errors.ubuntu.com
https://errors.ubuntu.com/oops/226a75b6-a574-11ea-9c09-fa163e102db1

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Your dmesg seems to show some worrying recurring issue with the amdgpu kernel driver.

And yes those links are what we want but they're not showing anything useful yet. Maybe wait for the bots to update those...

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
kolAflash (colaflash) wrote :

Please tell me if you have any suggestions what I can test or which data I can provide.

I myself have no idea how to track this further down.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

At the moment, just wait...

With some time those oops pages might get analysed automatically and problem report links added to them.

Also with time a kernel engineer should be able to help with the amdgpu issue in your dmesg log.

Revision history for this message
kolAflash (colaflash) wrote :

I made another attempt with a kernel from https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.7/

The dmesg looks slightly different. But I think, that the bug behaves a little different every time. Sometimes I just get a black screen, sometimes the picture freezes and sometimes I can still move the mouse cursor over the frozen picture.

Revision history for this message
Matias N. Goldberg (dark-sylinc) wrote :

Hi,

I'm just passing by.

Regarding amdgpu issues related with Ryzen / Vega iGPU, there are a couple steps you can try:

1. Try a newer firmware blob. Download e.g.
 1a. Download the latest firmware and unpack it https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/snapshot/linux-firmware-20200519.tar.gz
 1b. Backup your /lib/firmware folder
 1c. sudo make install

A newer firmware could fix the issue (it did for me on Ubuntu 18.04). If this fixes your problem, remember to freeze the linux-firmware package otherwise the next update will overwrite your changes and instability will come back.

2. Try a newer libdrm.
Kernel updates for amdgpu go hand in hand with libdrm, but sometimes backports get these out of sync, or you try a mainline kernel but without a mainline libdrm.
Download latest libdrm from https://gitlab.freedesktop.org/mesa/drm compile it and replace your Ubuntu's installation with your custom build one

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Alright, that's a different amdgpu kernel issue in comment #8. So let's not confuse this bug by including Xorg. Just focus on the kernel issues...

no longer affects: xorg-server (Ubuntu)
Revision history for this message
kolAflash (colaflash) wrote :

I downloaded linux-firmware-20200519.tar.gz and replaced /lib/firmware with it's contents. (didn't use "make install")

And I can clearly reproduce the crash when no X is running. So there might also be an X issue. But I initially opened this bug for linux, because there's definitely a bug in the kernel (probably in the amdgpu driver).

I attached another dmesg crash log where no X is running.

Revision history for this message
kolAflash (colaflash) wrote :

I'll have to give the computer to it's user in a few days.
After that I won't have the possibility to do extensive debugging.

So if you have any ideas for testing, please tell me!

I just tested Linux-5.8.0-rc1 but it didn't solved the problem.

Revision history for this message
ALU (c-launchpadmail) wrote :

Have the same issue with HP envy x360 running KDE and Ubuntu 20.04.1 LTS

uname -a

Linux rix360 5.4.0-42-generic #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

First entry in /proc/cpuinfo shows

processor : 0
vendor_id : AuthenticAMD
cpu family : 23
model : 96
model name : AMD Ryzen 7 4700U with Radeon Graphics
stepping : 1
microcode : 0x8600103
cpu MHz : 1395.962
cache size : 512 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 8
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 16
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr
sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_
apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand
 lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topo
ext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp v
mmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv
1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv
svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vml
oad vgif umip rdpid overflow_recov succor smca
bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass
bogomips : 3992.69
TLB size : 3072 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]

-----------------------------

Revision history for this message
ALU (c-launchpadmail) wrote :

update: installed pm-utils ( sudo apt install pm-utils ) and the machine no longer gives a black screen when raising the laptop lid and resumes.

Revision history for this message
ALU (c-launchpadmail) wrote :

Update 2: This isn't actually fully solved. What's working is short-term "Sleep" but if you leave the laptop lid closed for a while it goes to suspend (s2idle) and that crashes the system.

Aug 23 14:22:56 REDACTED systemd[1]: NetworkManager-dispatcher.service: Succeeded.
Aug 23 14:23:08 REDACTED kernel: [ 124.757545] Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
Aug 23 14:23:08 REDACTED NetworkManager[734]: <info> [1598210588.2565] manager: sleep: sleep requested (sleeping: no enabled: yes)
Aug 23 14:23:08 REDACTED NetworkManager[734]: <info> [1598210588.2567] manager: NetworkManager state is now ASLEEP
Aug 23 14:23:08 REDACTED whoopsie[1113]: [14:23:08] offline
Aug 23 14:23:08 REDACTED kernel: [ 124.760864] Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
Aug 23 14:23:08 REDACTED kernel: [ 124.761905] Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
Aug 23 14:23:08 REDACTED systemd[1]: Reached target Sleep.
Aug 23 14:23:08 REDACTED systemd[1]: Starting Suspend...
Aug 23 14:23:08 REDACTED kernel: [ 124.775535] PM: suspend entry (s2idle)
Aug 23 14:23:08 REDACTED systemd-sleep[1370]: Suspending system...

Revision history for this message
ALU (c-launchpadmail) wrote :

Reading the initial bug report I see kolAflash said

> nomodeset seems to be essential, but not sufficient.

When I added nomodeset to /etc/default/grub ( GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nomodeset" ) then I can boot without issues, but suspend does not work. To reiterate:

 * the TTY1 screen is not blank on boot
 * sleep works
 * suspend does not work - you get this bug

Given this error in syslog:

> kernel: [ 0.047848] You have booted with nomodeset. This means your GPU drivers are DISABLED
> kernel: [ 0.047849] Unless you actually understand what nomodeset does, you should reboot without enabling it
> [ 0.047849] Any video related functionality will be severely degraded, and you may not even be able to suspend the system properly
> [ 0.047849] Unless you actually understand what nomodeset does, you should reboot without enabling it

I tried a few things without nomodeset in grub options. In this case:

  * the default screen, TTY1, (CTRL-ALT-F1) is blank on boot (or has the HP logo only)
  * sleep works
  * suspend doesn't seem to work. If you leave the laptop in suspend mode for a long time it does not resume. The screen is black with just a white underscore on it. The last log lines in /var/log/syslog is

 systemd[1]: Reached target Sleep.
 systemd[1]: Starting Suspend...
 systemd-sleep[8263]: Suspending system...
 kernel: [ 9425.088936] PM: suspend entry (s2idle)

As a workaround for the default screen being blank on boot, you can go to TTY2 (CTRL-ALT-F2), log in and run startx from the command prompt.

Some tests:

* This does not seem to be the same issue as https://bugzilla.kernel.org/show_bug.cgi?id=204241

because the following command works just fine.

for i in $(seq 30); do sudo rtcwake -m mem -s 5; sleep 15; done

* I've tried adding GRUB_GFXMODE=1920x1080 to /etc/default/grub but that doesn't seem to solve the issue with the screen on TTY1 being blank on boot.

The commands pm-hibernate and pm-suspend give the errors in /var/log/syslog
> [ 2566.154192] Lockdown: grep: hibernation is restricted; see man kernel_lockdown.7
and nothing happens

The command pm-powersave gives the same "hibernation is restricted" error in /var/log/syslog but the system goes blank and the is unresponsive.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

New HP AMD laptops only support Modern Standby. AMD is working on a patch to solve the issue.

Revision history for this message
Mario Limonciello (superm1) wrote :

This issue has been fixed in later kernels and s2idle is supported now. Suggest to use LTS 5.15 or LTS 6.1 kernels.

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.