amdgpu intermittently fails to resume correctly

Bug #1909856 reported by Jamie Scott
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I started seeing my system fail to resume from suspend over the last couple of weeks. It happens about 1 in 5 times maybe? I press the power button on my system to resume it and my screens stay blank. I am able to connect to the system remotely over ssh and check syslog to find out what went wrong. I then try to reboot the system with `sudo reboot`, the ssh session disconnects but it never actually reboots. I have to press the reset button after waiting a while to make it start up again.

dmesg log is attached which contains more info and the call trace. I think the interesting part is this:

kernel: amdgpu 0000:0d:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)
kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v8_0> failed -110
kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-110).
kernel: PM: dpm_run_callback(): pci_pm_resume+0x0/0xf0 returns -110
kernel: PM: Device 0000:0d:00.0 failed to resume async: error -110

It first happened on 20.04 with kernel 5.4 (can't remember exactly what version of 5.4) a couple of weeks ago. I saw 20.10 was available with 5.8 so thought I'd try upgrading as there was a variety of amdgpu fixes go in between 5.4 and 5.8 but sadly no dice.

From some research, I think what I'm seeing is similar to this bug, but that one looks like 5.4 solved it, whereas I guess some minor upgrade on 5.4 appeared to break it for me. https://bugs.freedesktop.org/show_bug.cgi?id=112221

I'm running a Ryzen 3700X on an X570 board with an AMD Radeon Pro WX 4100 GPU.
Currently using Kubuntu 20.10.

ProblemType: Bug
DistroRelease: Ubuntu 20.10
Package: linux-image-5.8.0-33-generic 5.8.0-33.36
ProcVersionSignature: Ubuntu 5.8.0-33.36-generic 5.8.17
Uname: Linux 5.8.0-33-generic x86_64
ApportVersion: 2.20.11-0ubuntu50.3
Architecture: amd64
CasperMD5CheckResult: skip
CurrentDesktop: KDE
Date: Sat Jan 2 12:40:36 2021
InstallationDate: Installed on 2020-06-18 (197 days ago)
InstallationMedia: Kubuntu 20.04 LTS "Focal Fossa" - Release amd64 (20200423)
MachineType: To Be Filled By O.E.M. To Be Filled By O.E.M.
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.8.0-33-generic root=/dev/mapper/vgkubuntu-root ro quiet splash usbcore.autosuspend=-1 vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-5.8.0-33-generic N/A
 linux-backports-modules-5.8.0-33-generic N/A
 linux-firmware 1.190.2
SourcePackage: linux
UpgradeStatus: Upgraded to groovy on 2020-12-30 (3 days ago)
dmi.bios.date: 09/09/2019
dmi.bios.release: 5.14
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: P2.10
dmi.board.name: X570 Taichi
dmi.board.vendor: ASRock
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrP2.10:bd09/09/2019:br5.14:svnToBeFilledByO.E.M.:pnToBeFilledByO.E.M.:pvrToBeFilledByO.E.M.:rvnASRock:rnX570Taichi:rvr:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.family: To Be Filled By O.E.M.
dmi.product.name: To Be Filled By O.E.M.
dmi.product.sku: To Be Filled By O.E.M.
dmi.product.version: To Be Filled By O.E.M.
dmi.sys.vendor: To Be Filled By O.E.M.

Revision history for this message
Jamie Scott (jamiescott) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Jamie Scott (jamiescott) wrote :

Ah, the included dmesg output doesn't actually have the useful snippet in it. Please see attached section of log output which runs from when I attempted to resume the system to when it was rebooted.

Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

Hello Jamie or anyone else affected,

We have released a new Focal linux kernel (version 5.4.0-59.65) which contains several bug fixes. Could you please update to this kernel and report whether it fixes the issue?

Thank you.

Revision history for this message
Jamie Scott (jamiescott) wrote :

Sorry, I may not have made it obvious. The problem started on Focal but I have since upgraded to Groovy as I thought a newer kernel may help given various amdgpu fixes hit the kernel between 5.4 and 5.8.

I do see a pending update for 5.8.0-34 though which I'll install.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.