amdgpu hangs for 90 seconds at a time in 5.13.0-23, but 5.13.0-22 works

Bug #1956401 reported by Henry Wertz
236
This bug affects 44 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Impish
Fix Released
High
Unassigned
Jammy
Fix Released
Undecided
Unassigned

Bug Description

SRU Justification

Impact:

This does not occur with linux-image-5.13.0-22-generic, but does with linux-image-5.13.0-23-generic.
On startup, I get about a 60 second hang, with the following in the kernel dmesg:
Jan 4 15:26:36 inspiron-3505 kernel: [ 34.160572] amdgpu 0000:04:00.0: amdgp : failed to write reg 28b4 wait reg 28c6
Jan 4 15:26:56 inspiron-3505 kernel: [ 54.189055] amdgpu 0000:04:00.0: amdgp : failed to write reg 1a6f4 wait reg 1a706
Jan 4 15:27:16 inspiron-3505 kernel: [ 74.329264] amdgpu 0000:04:00.0: amdgp : failed to write reg 28b4 wait reg 28c6
Jan 4 15:27:36 inspiron-3505 kernel: [ 94.337904] amdgpu 0000:04:00.0: amdgp : failed to write reg 1a6f4 wait reg 1a706
I have the following GPU:
04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picass
o (rev c2) (prog-if 00 [VGA controller])
04:00.0 0300: 1002:15d8 (rev c2)
(This is a Ryzen 5 3450U CPU with Radeon Vega Mobile.)

I get a similar hang if I start firefox (when it's probing OpenGL contexts), and even with glxgears and glxinfo. Seems like anything that'd kick on a OpenGL context does it. I had a freeze as well when I tried running firefox and glxgears both. Along with odd BUG: messages logged (I have some in the attached log.)

I was running with "iommu=pt", but did try with this removed, still got the errors (I think amdgpu driver uses the IOMMU even when it's set to IOMMU=pt though.). See the attached log for some very odd "[Hardware Error]" messages that were logged on one test run. I think this was when I tried to run firestorm (second life viewer) -- that had a large pause then opened to a black window.

Per Google, I see there was a bug like this that turned up in kernel 5.14.15 but fixed in 5.14.17. See https://gitlab.freedesktop.org/drm/amd/-/issues/1770

Thanks!
--Henry

Fix:
upstream commit afd18180c070 ("drm/amdkfd: fix boot failure when iommu is disabled in Picasso.")

Patch was included in the Impish kernel in -proposed (5.13.0.24.24) from an upstream patch set. multiple confirmations the problem is resolved with the kernel in -proposed.

Revision history for this message
Henry Wertz (hwertz10) wrote :
Revision history for this message
Henry Wertz (hwertz10) wrote (last edit ):

Additional note, I did notice one "un-regression" -- I have a build of rocm where I've tried enabling "GFX902" support for my card, this is an unsupported configuration so I don't know if I have it 100% functional but rocminfo (which as the name suggests dumps info about the rocm install and any video or compute cards it detects that can use. ) With the 5.4.0-91-generic kernel I can run rocminfo and it dumps some info about the card. On 5.13.0-22 it prints:
hsa api call failure at: /home/hwertz/ROCm/rocminfo/rocminfo.cc:1143
Call returned HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events.

On 5.13.0-23, although opengl is hosed the rocminfo didn't pause and printed the rocm-related information.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-hwe-5.13 (Ubuntu):
status: New → Confirmed
Revision history for this message
Bleys (rds2) wrote (last edit ):

Same Errors and behavior here with Ubuntu Budgie 21.10 after upgrade to 5.13.0-23
Additional Info: AMD Ryzen 5 3400G with AMDGPU

Revision history for this message
Henry Wertz (hwertz10) wrote :

I'm assuming the 5.14.15 or 5.14.16 amdgpu and amdkfd has been backported to the 5.13 Ubuntu kernel. Here's the patch in 5.14.17 that specifically addresses this.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.14.17&id=7883e13c249461877ea3be7b24a5935fc8946e46

The other amd-related patches for 5.14.17 largely appear to be fixes to DCN3.1 support (newest GPU models from last 6 months or so.) This is a fairly serious regression for those affected (I may have only gotten to a desktop because I"m using Gnome Flashback so no compositor trying to exercise the 3D hardware for desktop use. I've simply gone back to 5.13.0-22 for now.) If the plan is to ship a quick update, I could see just patching in that one patch; if it'll be fixed in 5.13.0-23 at a usual schedule I could see incorporating all of them to benefit DCN 3.1 users.

Not to dissemble, but kudos to the open source GPU driver developers, the Intel support's amazing (it's amusing on my friends Sandybridge, that he can run DX11 games in steam through Proton that it would not be able to run in Windows since Intel never shipped DX11 drivers for it...), and amdgpu has run every game I've thrown at it so far, generally at very good frame rates.

Revision history for this message
Slipie (slipiefreak) wrote :

Same issue on AMD Ryzen 3500U with kernel 5.13.0-23.
I've tried to boot the kernel with amdgpu.noretry=0, but that did not help.

Revision history for this message
Chris Newcomer (cnewcomer) wrote :

Same here with AMD Ryzen 5 3400G on kernel 5.13.0-23-generic

It never gets to the graphical login screen. I tested with xorg on 21.10, I did not test wayland. I expect that to be the same. I can do Ctrl+Alt+F4 to get to a text login screen a few mins after booting. This screen is where I saw the "failed to write reg" error.

Rolling back to 5.13.0-22 solved this issue.

Revision history for this message
Andrew Enderson (ande-e) wrote (last edit ):

Same issues here on Kubuntu 21.10 after upgrade to 5.13.0-23-generic:
  amdgpu: failed to write reg **** wait reg ****

AMD Ryzen 7 3700U w/ Radeon Vega 10 Mobile graphics

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu Impish):
status: New → Confirmed
affects: linux-hwe-5.13 (Ubuntu Impish) → linux (Ubuntu Impish)
Changed in linux (Ubuntu Impish):
status: New → Confirmed
no longer affects: linux (Ubuntu)
Revision history for this message
Kelsey Steele (kelsey-steele) wrote (last edit ):

Thank you for reporting Henry and everyone else for the confirmations. Henry, the patch you referenced is included in the Impish kernel 5.13.0.24.24 which is currently in proposed (Note this is for 21.20/Impish). May you (or anyone else running into this issue) please verify the kernel in proposed resolves this problem? Thank you!

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed.

Revision history for this message
Bleys (rds2) wrote :

~$ inxi -F
System:
  Host: Nexus Kernel: 5.13.0-24-generic x86_64 bits: 64
  Desktop: Budgie 10.5.3 Distro: Ubuntu 21.10 (Impish Indri)

Proposed enabled, 24.24 active. No Problems so far.

Revision history for this message
Henry Wertz (hwertz10) wrote :

Confirmed. I'm running 20.04 with the linux-generic-hwe-20.04-edge, so I did a manual install; I ran "aptitude search -F %p '~i 5.13.0-22'" to see what packages I should get, downloaded the 5.13.0-24 .deb packages and installed them manually with (in the directory with just the kernel .deb files) "dpkg -i *.deb". (This gives a dependency error on the kernel tools, since they're for 21.10 they have a dependency on a newer libc; but this doesn't affect the kernel and modules booting and running fine.)

Startup delay fixed, glxinfo & glxgears work (and firefox also started without pause), and rocminfo is still "un-regressed" and working as it did in 5.13.0-23. Looks good!

Thanks!
--Henry

description: updated
Revision history for this message
Kelsey Steele (kelsey-steele) wrote :

Thank you for verifying, Henry and Bleys! We're working to get this fix out.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This bug seems to be assigned to zero packages now. Please choose the correct kernel package to assign it to.

summary: - amdgpu hangs for 90 seconds at a time
+ amdgpu hangs for 90 seconds at a time in 5.13.0-23, but 5.13.0-22 works
tags: added: regression-update
tags: added: impish
description: updated
no longer affects: linux (Ubuntu Impish)
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu Impish):
status: New → Confirmed
Changed in linux (Ubuntu):
status: New → Confirmed
affects: linux-hwe-5.13 (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
status: New → Invalid
Changed in linux (Ubuntu Impish):
importance: Undecided → High
Revision history for this message
Mixim (mixim33) wrote :

I have the same issue. You can view my dmesg output

Revision history for this message
Mixim (mixim33) wrote :

@kelsey-skunberg , my ASUS TUF Gaming FX505DT-HN538 booted up quickly after enabling proposed and dmesg does not contain "amdgp : failed to write reg..." anymore, but it still contains another errors linked with my nvidia card:
[drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership

Revision history for this message
Amael (amael) wrote :

Hello, same here on an AMD graphics card, kernel 5.13.0-22-generic works fine on Ubuntu and kernel 5.13.0-23-generic hangs for for some time on boot.

Jan 6 21:21:27 amael-laptop kernel: [ 36.762752] amdgpu 0000:03:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6
Jan 6 21:21:47 amael-laptop kernel: [ 56.823245] amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
Jan 6 21:22:07 amael-laptop kernel: [ 76.851259] amdgpu 0000:03:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6
Jan 6 21:22:27 amael-laptop kernel: [ 96.924149] amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706

Interesting fact, it still does not work fine after finally booting : It takes several more minutes to be able to open Google Chrome or Firefox correctyl, whereas VSCode can open immeditely. Chrome seems to open but the window is transparent. I can open the menu with Alt+space, I see an overlay in front of a terminal window, but nothing more.

I originally stumbled on the related bug https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1956396.

Revision history for this message
Alexey Balmashnov (a.balmashnov) wrote :

Enabled proposed, kernel updated to 5.13.0-24-generic. Works fine! Thanks a lot for the quick fix.

System: Host: alrock Kernel: 5.13.0-24-generic x86_64 bits: 64 Desktop: GNOME 40.5 Distro: Ubuntu 21.10 (Impish Indri)
CPU: Info: Quad Core model: AMD Ryzen 5 3400G with Radeon Vega Graphics bits: 64 type: MT MCP cache: L2: 2 MiB
           Speed: 1400 MHz min/max: 1400/3700 MHz Core speeds (MHz): 1: 1400 2: 1350 3: 1404 4: 1388 5: 1384 6: 1399 7: 1397
           8: 1352
Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Picasso driver: amdgpu v: kernel
           Display: wayland server: X.Org 1.21.1.2 driver: loaded: ati,radeon,vesa unloaded: fbdev,modesetting
           resolution: 1920x1200~60Hz
           OpenGL: renderer: AMD Radeon Vega 11 Graphics (RAVEN DRM 3.41.0 5.13.0-24-generic LLVM 12.0.1) v: 4.6 Mesa 21.2.2

Revision history for this message
Dietmar (smiddy67) wrote :

Checked with kernel update 5.13.0-24-generic.
Unfortuanatly my problem is not solved with this kernel.

Revision history for this message
Alex.Nedel (alexnedel) wrote :

5.13.0-24 works for me, graphical login appears immediately --- seems to be as good as 5.13.0-22, the problem that I had with 5.13.0-23.23 seems fixed.

Revision history for this message
Dietmar (smiddy67) wrote :

With Kernel 5.13.0-24 the computer hangs completely when in standby. I had disconnect battery and restart. Standby modus doesn't work for me with kernel 5.13.0-24. 5.13.0-22 still works perfect. It's not a hardware problem.

Revision history for this message
Chris Newcomer (cnewcomer) wrote :

@Kelsey, thank you for the quick proposed kernel. I installed it in my system here and it solved my amdgpu errors completely. It boots quickly again into the graphical login screen.

Revision history for this message
Alex.Nedel (alexnedel) wrote :

@Dietmar (smiddy67): I also get some form of breakage:

No video after leaving the PC unused overnight:
with monitor turned on, got blank screen with backlighting visible in the morning.
I rebooted to get it working, the reboot was fast and OK.

But I'm NOT sure that standby worked for me in 5.13.0-22.
I remember a similar problem with 5.13.0-22 after suspend.

So the specific problem that I had with 5.13.0-23.23 seems fixed in 5.13.0-24.

Thank you all!

Revision history for this message
KonishchevDmitry (konishchevdmitry) wrote :

5.13.0-24.24 helps to me. With 5.13.0-23.23 my server don't boot at all: it starts booting and then monitor goes into inactive state, so I don't even have a way to see an error message.

My configuration:
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 60h-6fh) Processor Root Complex
00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Wani [Radeon R5/R6/R7 Graphics] (rev 85)
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 60h-6fh) Host Bridge
00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 60h-6fh) Processor Root Port
00:02.5 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 60h-6fh) Processor Root Port
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 60h-6fh) Host Bridge
00:08.0 Encryption controller: Advanced Micro Devices, Inc. [AMD] Carrizo Platform Security Processor
00:09.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Carrizo Audio Dummy Host Bridge
00:10.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller (rev 20)
00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 49)
00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller (rev 49)
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 4a)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 11)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 60h-6fh) Processor Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 60h-6fh) Processor Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 60h-6fh) Processor Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 60h-6fh) Processor Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 60h-6fh) Processor Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 60h-6fh) Processor Function 5
01:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe 2.0 x2 4-port SATA 6 Gb/s RAID Controller (rev 11)
02:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe

Revision history for this message
Dietmar (smiddy67) wrote :

@all
Yes we have to different problems. One is the boot problem, which is obvious solved with kernel 24.
The other is the standby problem, that still exists with kernel 5.13.0-24.
Maybe there should be a new topic for the standby problem.
Thank you for your support
Dietmar

Revision history for this message
Tom Cook (tom-k-cook) wrote :

Confirmed on Ryzen 7 3700U, and that the -24 pre-release update fixes it. Standby has never worked on this system so I can't comment on that.

tags: added: amdgpu
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.13.0-25.26

---------------
linux (5.13.0-25.26) impish; urgency=medium

  * amdgpu hangs for 90 seconds at a time in 5.13.0-23, but 5.13.0-22 works
    (LP: #1956401)
    - drm/amdkfd: fix boot failure when iommu is disabled in Picasso.

  * OOB write on BPF_RINGBUF (LP: #1956585)
    - SAUCE: bpf: prevent helper argument PTR_TO_ALLOC_MEM to have offset other
      than 0

 -- Kleber Sacilotto de Souza <email address hidden> Fri, 07 Jan 2022 16:16:40 +0100

Changed in linux (Ubuntu Impish):
status: Confirmed → Fix Released
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

Hello everyone,

We have released a impish:linux kernel yesterday containing a fix for a critical security issue and a fix for this bug. The version is 5.13.0-25.26, which has the fixes applied on top of 5.13.0-23.

Could you please update to 5.13.0-25 and check whether this kernel really fixes the amdgpu issue? Thank you!

Revision history for this message
Rudi Servo (rudiservo) wrote :

Just Updated to 5.13.0-25-generic, it seems to be fixed, thanks!

Revision history for this message
Kai Liu (kliu0x52) wrote :

I've updated to 5.13.0-25 and can confirm that this fixed the problem on my Ryzen 3700U laptop, which had been broken by the -23 kernel.

Since I see some comments here talking about sleep, I tried that as well: I suspended the system and then woke it up, and I did not encounter any unusual behavior.

Revision history for this message
mldytech (mldytech) wrote (last edit ):

Hello, thanks for the fast update.

Unfortunately my personal problem isn't fixed with it. I'm using a Ryzen 7 5700U (on a hp envy x360-15eu000), with ubuntu 21.10.
I use full disk encryption with luks, and when I try to boot with the new kernel (5.13.0-23, or now the newer 5.13.0-25) I type in the correct password and get stuck after this (last appearing message: [...] successfully decypted).
Usually (e.g with the 5.13.0-22 kernel), after decrypting the partition the system boots up and a row of instructions appear. With this kernel(s) it's completely stuck and nothing happens.

Is anyone else experiencing this issue?

Revision history for this message
Mathias Schindler (mathias-schindler) wrote :

Hello, I updated to 5.13.0-25 and I confirm it works properly on a Ryzen 5-3600G.

Revision history for this message
Alexey Balmashnov (a.balmashnov) wrote :

5.13.0-25 fixed for me amdgpu graphics init issue on AMD Ryzen 5 3400G.

Revision history for this message
Slipie (slipiefreak) wrote :

Issue has been resolved on AMD Ryzen 3500U as well with kernel 5.13.0-25.

Revision history for this message
Alex.Nedel (alexnedel) wrote :

Fixed for me.

5.13.0-25-generic #26-Ubuntu SMP Fri Jan 7... works no better and no worse than 5.13.0-24 (proposed).

AMD Ryzen 5 2400G with AMD Radeon(TM) Vega 11 Graphics (RAVEN, DRM 3.41.0, 5.13.0-25-generic, LLVM 12.0.1).

Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

Thank you everyone for the feedback!

@mldytech, it's likely that your system has a different issue that's preventing it from booting. Could you please open a bug report so we can better investigate it? Thanks.

Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This fix (afd18180c070 drm/amdkfd: fix boot failure when iommu is disabled in Picasso.) is already applied to devel kernel series (jammy:linux).

Changed in linux (Ubuntu Jammy):
status: Invalid → Fix Released
Revision history for this message
Chris Newcomer (cnewcomer) wrote :

Works for me as well. Thank you for the quick fix. I thought I was going to have a run a -proposed kernel package for a few weeks.

Revision history for this message
Daniel (chalkwalk) wrote :

I have a Ryzen 7 4700U (laptop class) mini PC running 20.10 impish (with ubuntu studio sources and packages added). With the 5.13.0-23 kernel (I use the lowlatency kernel) update I started seeing an (approximately) 2 minute hang during boot if I hit F12 all I see is: '/dev/nvem0n1p2: clean...' during the hang after which it seems to proceed as normal. Once booted, everything seems to work fine. Additionally when I shutdown the system hangs (pressing F12 has no effect and the splash screen freezes: I have to power the computer down manually.

I installed the 5.13.0-25 kernel (again, low latency) update but the problem remains unchanged: same long hang on boot, same indefinite hang on shutdown.

Revision history for this message
Daniel (chalkwalk) wrote :

FYI looking in dmesg and boot.log I could see the long wait was for systemd. Using journalctl I could see the long pause was zfs-import-cache & zfs-load-module waiting for systemd-udev-settle (which is presumably hanging due to the amd related kernel problem). I am not using zfs so I systemctl disabled systemd-udev-settle zfs-import-cache, zfs-load-module & zfs-mount (which is what depended on those two) which now results in my system booting quickly. This is not a good general solution (as people might need zfs, but I don't) and the system still hangs when shutting down. I just wanted to document this here in case it proves useful to anyone else.

Revision history for this message
Daniel (chalkwalk) wrote :

Same problem still present in 5.13.0-27-lowlatency. As with the previous versions, disabling ZFS as above avoids the 2 minutes pause during boot, but doesn't resolve the hang on shutdown.

Revision history for this message
Tadeuš Kozlovski (tadeuskozlovski) wrote :

I have Lenovo Ideapad with nvidia geforce gtx 1650 Ti. After upgrade from 20.04 -> 21.10 Opera, Chrome and Firefox become very slow. All other soft works fine. If you have nvidia, try change driver to 470(only one works with external monitor) and try chang nvidia x server settings -> PRIME Profiles -> NVIDIA (Performance Mode). This resolves this problem in my case.

Revision history for this message
Lukas Wiest (lukas-wiest) wrote (last edit ):

I'm using a Lenovo Ideapad with a Ryzen 7 4700U on Ubuntu 20.04 focal, and with the focal kernel 5.13.0-27 I still can't suspend the device. Last working is 5.13.0-22

The mentioned released fix for impish, is it also included in the focal version of 5.13.0-27 or do we have to wait longer for an focal fix?

If it's already included, and as my device is working perfectly fine, other than it won't get into suspend but hang itself on trying to, should that be further discussed here or in the as duplicate marked bug 1956422?
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1956422

The lts-mainline kernel 5.15 and 5.16 are working fine (JFYI)

Revision history for this message
ooshlablu (ooshlablu) wrote :

I've been having the suspend issue for the past week or so with the 5.13 kernels on an Atari VCS, which has a AMD Ryzen Embedded R1606G with Radeon Vega Gfx. Current kernel installed is 5.13.0-27-generic. I've also noticed that anything that tries to use Vulkan as a display driver segfaults.
Upon bootup, dmesg has some bad looking errors in it:
[ 2.772683] kfd kfd: amdgpu: error getting iommu info. is the iommu enabled?
[ 2.772687] kfd kfd: amdgpu: Error initializing iommuv2
[ 2.772899] kfd kfd: amdgpu: device 1002:15d8 NOT added due to errors

FWIW, I have this setup cloned to an Intel laptop, and everything appears fine there. The 5.13 series even enabled my wireless card on that laptop, since that wireless card wasn't supported until 5.12 or later.

Revision history for this message
ooshlablu (ooshlablu) wrote :

I forgot to mention that I'm running Ubuntu 20.04. Using 5.11.0-46-generic (which was installed previously but removed by the hwe upgrade to 5.13) suspend works fine, and I do not see those errors in dmesg.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-gcp-5.13/5.13.0-1013.16~20.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
tags: added: verification-failed-focal
removed: verification-needed-focal
Revision history for this message
Lukas Wiest (lukas-wiest) wrote :

Tested by installing linux-gcp-edge from the proposed repo which installed mentioned version 5.13.0-1013.16~20.04.1

This didn't fix the suspend bug on focal with a Ryzen 7 4700U, but this gcp kernel seems to lack other drivers needed for my device, as my touch screen e.g. didn't work and the lid-close didn't even automatically trigger the suspend.
But for the bug we're trying to fix the behavior didn't change with this: triggering the suspend ends up in trying to go to suspend but then I'm left with a black screen, power led on and no reaction anymore.
Recognized the power led goes off shortly, like it would when entering suspend normally to start pulsing, but then instantly goes back to permanent on, like as it's immediately woken up again instead of staying asleep. But it never comes back to live from there.

Revision history for this message
Henry Wertz (hwertz10) wrote :

I don't think this patch is intended to fix suspend/resume bug, that's not the problem I was seeing at any rate. That said...

Does this bug apply to GCP kernel? I'm assuming GCP is "Google Cloud Platform", so I'm not sure they are even using affected hardware. AMD CPUs are fairly power-efficient, so I could see Google using them, but probably not the ones with built-in GPU (in favor of getting ones with more CPU cores instead). I assume if they want to support CUDA-style workloads they'd thrown some monster GPUs into their "GPU compute" cloud systems. (That said, that all means for the use case of GCP kernel, it should be fine either way, apply or not apply this patch.)

Revision history for this message
Lukas Wiest (lukas-wiest) wrote :

Ok so I've subscribed to this bug, as I was led here from bug 1956422, which is marked as duplicate of this one. That's why I asked in #47 if the suspend issue should be kept here, or taken to bug 1956422 and remove the duplicate mark there.

If I go up in the comment history, I can see in #24 @smiddy67 has this suspend issue as well since the 5.13.0-23, where as 5.13.0-22 is just fine.
In #34 we see that suspend seems to work fine with the 3700U, wild guess but, maybe there's something up with the Ryzen 4000 series that has changed from 5.13.0-22 to -23 that's still not fixed.
For me the symptom is the not working/freezing on supsend, but as said, the bug I found describing my issue is marked as a duplicate of this as of right now. So, where do we go now with this?

Revision history for this message
Henry Wertz (hwertz10) wrote :

Sorry about that! I initially filed this bug, but I'm no long-time user of the bug system. If 1956422 is marked dup and pointed here, it's fine with me!

Still, as for GCP kernel specifically, a bot auto-generated the request to test against GCP kernel. I wonder how many drivers GCP kernel is missing (that'll be why the touchpad and lid switch didn't work.) I think for GCP kernel specifically, it won't matter if this patch is applied or not, no integrated GPUs so they won't be running affected hardware.

I suppose first start is to attach any relevant logs and info -- does kern.log or dmesg show anything interesting when you suspend & resume? If you can ssh into the machine, does the machine lock solid on resume, or does it have a black screen but still running? You probably provided this info in 1956422 already but if it's being dup'ed to here, feel free to put it here.

Revision history for this message
ooshlablu (ooshlablu) wrote :

5.13.0-28-generic was out this morning for 20.04 and fixed the suspend/resume issue on the Atari.

(Don't know where else to stick this now :-) Maybe that info will be helpful for GCP also)

Revision history for this message
Daniel (chalkwalk) wrote :

All issues seem to be resolved for my Ryzen 7 4700U with 5.13.0-28-lowlatency too.

Revision history for this message
Lukas Wiest (lukas-wiest) wrote :

Yup, can confirm this on my end as well for 5.13.0-28-generic(5.13.0.28.31~20.04.15) suspend is working fine again.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.