System hangs when loading google maps on chrome with navi AMD GPU

Bug #1875459 reported by Nicholas Skehin
44
This bug affects 9 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Since upgrading to 20.04 from 19.10 I can reliably hang my system by loading Google Maps in Chrome and zooming in and out.

After turning on DRM debugging:

echo 0x1ff | sudo tee /sys/module/drm/parameters/debug

I can see the following message in the kernel log:

[16333.092468] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=2542114, emitted seq=2542116
[16333.092537] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
[16333.092540] [drm] GPU recovery disabled.

It seems to be this upstream bug: https://gitlab.freedesktop.org/drm/amd/issues/892

I have two workarounds:

* Running Chrome with AMD_DEBUG=nongg fixes the issue on 5.4.0-26
* With the latest upstream kernel, 5.6.7-050607, Chrome can load google maps with no issues.

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.4.0-26-generic 5.4.0-26.30
ProcVersionSignature: Ubuntu 5.4.0-26.30-generic 5.4.30
Uname: Linux 5.4.0-26-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.11-0ubuntu27
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: nicholas 5696 F.... pulseaudio
 /dev/snd/controlC0: nicholas 5696 F.... pulseaudio
CasperMD5CheckResult: skip
CurrentDesktop: ubuntu:GNOME
Date: Mon Apr 27 18:36:12 2020
InstallationDate: Installed on 2019-11-04 (174 days ago)
InstallationMedia: Ubuntu 19.10 "Eoan Ermine" - Release amd64 (20191017)
MachineType: To Be Filled By O.E.M. To Be Filled By O.E.M.
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/BOOT/ubuntu_e02uh9@/vmlinuz-5.4.0-26-generic root=ZFS=rpool/ROOT/ubuntu_e02uh9 ro quiet splash vt.handoff=1
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-26-generic N/A
 linux-backports-modules-5.4.0-26-generic N/A
 linux-firmware 1.187
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
UpgradeStatus: Upgraded to focal on 2020-04-26 (0 days ago)
dmi.bios.date: 07/12/2019
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: P1.30
dmi.board.name: X570 Phantom Gaming 4
dmi.board.vendor: ASRock
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrP1.30:bd07/12/2019:svnToBeFilledByO.E.M.:pnToBeFilledByO.E.M.:pvrToBeFilledByO.E.M.:rvnASRock:rnX570PhantomGaming4:rvr:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.family: To Be Filled By O.E.M.
dmi.product.name: To Be Filled By O.E.M.
dmi.product.sku: To Be Filled By O.E.M.
dmi.product.version: To Be Filled By O.E.M.
dmi.sys.vendor: To Be Filled By O.E.M.

Revision history for this message
Nicholas Skehin (njs12345) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Thomas Stieler (launchp0d) wrote :

Same issue here: Zooming in and out with Google maps in chrome freezes the desktop reliably.

After installing mainline Kernel 5.6.5, the Problem disappeared.

Revision history for this message
You-Sheng Yang (vicamo) wrote :

Would it be possible for you to do a kernel bisection?

First, find the last good kernel (v5.6.5 in this case) and the first bad kernel version from http://kernel.ubuntu.com/~kernel-ppa/mainline/ . Then,

  $ sudo apt build-dep linux
  $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
  $ cd linux
  $ git bisect start
  $ git bisect good $(the good version you found)
  $ git bisect bad $(the bad version found)
  $ make localmodconfig
  $ make -j$(nproc) deb-pkg LOCALVERSION=-bisect-$(git log -1 --pretty=format:%h)
  $ sudo dpkg -i ../linux-image-*.deb ../linux-headers-*.deb

After installing the newly built kernel, then reboot with it.

If the issue still happens,

  $ git bisect bad

Otherwise,

  $ git bisect good

Repeat to "make -j$(nproc) deb-pkg ...." until you find the commit that causes the regression. Use `git bisect log` to dump bisect history.

Revision history for this message
Thomas Stieler (launchp0d) wrote :

I isolated version numbers working and not working:

v5.5.9 Chrome freezes while zooming in and out maps
v5.6.0 Everything is fine

git bisect seems to focus on regression bugs, I can't perform a bisect with v5.6.0 good and v5.5.9 bad.

How can I help to localize the required fix, so that it can be backported to 20.04 kernels?

Revision history for this message
Thomas Stieler (launchp0d) wrote :

Here are the last logs from kernel:

[Mo Jun 29 12:40:15 2020] [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out!
[Mo Jun 29 12:40:20 2020] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=5760, emitted seq=5762
[Mo Jun 29 12:40:20 2020] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process chrome pid 2338 thread chrome:cs0 pid 2353
[Mo Jun 29 12:40:20 2020] [drm] GPU recovery disabled.
[Mo Jun 29 12:40:40 2020] GpuWatchdog[2370]: segfault at 0 ip 0000556e2ae897ad sp 00007f2c85962490 error 6 in chrome[556e267dc000+785b000]
[Mo Jun 29 12:40:40 2020] Code: 00 79 09 48 8b 7d b0 e8 f1 95 6c fe c7 45 b0 aa aa aa aa 0f ae f0 41 8b 84 24 e0 00 00 00 89 45 b0 48 8d 7d b0 e8 f3 5a ba fb <c7> 04 25 00 00 00 00 37 13 00 00 48 83 c4 38 5b 41 5c 41 5d 41 5e

Revision history for this message
Thomas Stieler (launchp0d) wrote :

Hi!

Any news regarding this issue?
Can I help with more information?

Revision history for this message
Thomas Stieler (launchp0d) wrote :

Hey,

another ping from me: Is there any chance to get a backported fix for that issue? I would guess that this issues with current AMD hardware affects enough users to justify a backport.

Currently I'm using the Ubuntu mainline kernel, but I would prefer to use the version from the standard repositories...

Revision history for this message
Serhii Ponomarets (nefelim) wrote :

Hi!
I managed to fix this bug only by installing a new kernel. I installed the latter and the freezes has been stopped. You can check solution on https://linuxhint.com/update_ubuntu_kernel_20_04/

Revision history for this message
Thomas Stieler (launchp0d) wrote :

Well, I just updated to Groovy Gorilla and it's kernel version 5.8 fixen my issue.

But people sticking on LTS version 20.04 with same AMD GPU issue may still be interested in a fix!

So, any chance for them to get an backport?

Revision history for this message
Letatcest (ksoeteman) wrote :

I haven't had problems with this bug for a long time, as I installed the AMD ROCm drivers to use OpenCL. But now with a fresh system without ROCm, the bug is still here!

AMD Radeon RX 5600 XT (NAVI10, DRM 3.35.0, 5.4.0-58-generic, LLVM 10.0.0)

It's really tricky, as clicking Google Maps links is something you do quite often, even without knowing it. And I have to reboot completely.

Revision history for this message
Mike Gleason jr Couturier (mikegleasonjr) wrote :

Same thing here. Radeon 5700 XT, Ubuntu 20.04, Chromium 91.0.4449.6, Google Maps freezes.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

I think the desktop should use HWE kernel (5.8) by now. Can you please give it a try?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.