System hangs when loading google maps on chrome with navi AMD GPU

Bug #1875459 reported by Nicholas Skehin on 2020-04-27
38
This bug affects 8 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned

Bug Description

Since upgrading to 20.04 from 19.10 I can reliably hang my system by loading Google Maps in Chrome and zooming in and out.

After turning on DRM debugging:

echo 0x1ff | sudo tee /sys/module/drm/parameters/debug

I can see the following message in the kernel log:

[16333.092468] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=2542114, emitted seq=2542116
[16333.092537] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
[16333.092540] [drm] GPU recovery disabled.

It seems to be this upstream bug: https://gitlab.freedesktop.org/drm/amd/issues/892

I have two workarounds:

* Running Chrome with AMD_DEBUG=nongg fixes the issue on 5.4.0-26
* With the latest upstream kernel, 5.6.7-050607, Chrome can load google maps with no issues.

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.4.0-26-generic 5.4.0-26.30
ProcVersionSignature: Ubuntu 5.4.0-26.30-generic 5.4.30
Uname: Linux 5.4.0-26-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.11-0ubuntu27
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: nicholas 5696 F.... pulseaudio
 /dev/snd/controlC0: nicholas 5696 F.... pulseaudio
CasperMD5CheckResult: skip
CurrentDesktop: ubuntu:GNOME
Date: Mon Apr 27 18:36:12 2020
InstallationDate: Installed on 2019-11-04 (174 days ago)
InstallationMedia: Ubuntu 19.10 "Eoan Ermine" - Release amd64 (20191017)
MachineType: To Be Filled By O.E.M. To Be Filled By O.E.M.
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/BOOT/ubuntu_e02uh9@/vmlinuz-5.4.0-26-generic root=ZFS=rpool/ROOT/ubuntu_e02uh9 ro quiet splash vt.handoff=1
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-26-generic N/A
 linux-backports-modules-5.4.0-26-generic N/A
 linux-firmware 1.187
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
UpgradeStatus: Upgraded to focal on 2020-04-26 (0 days ago)
dmi.bios.date: 07/12/2019
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: P1.30
dmi.board.name: X570 Phantom Gaming 4
dmi.board.vendor: ASRock
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrP1.30:bd07/12/2019:svnToBeFilledByO.E.M.:pnToBeFilledByO.E.M.:pvrToBeFilledByO.E.M.:rvnASRock:rnX570PhantomGaming4:rvr:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.family: To Be Filled By O.E.M.
dmi.product.name: To Be Filled By O.E.M.
dmi.product.sku: To Be Filled By O.E.M.
dmi.product.version: To Be Filled By O.E.M.
dmi.sys.vendor: To Be Filled By O.E.M.

Revision history for this message
Nicholas Skehin (njs12345) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Thomas Stieler (launchp0d) wrote :

Same issue here: Zooming in and out with Google maps in chrome freezes the desktop reliably.

After installing mainline Kernel 5.6.5, the Problem disappeared.

Revision history for this message
You-Sheng Yang (vicamo) wrote :

Would it be possible for you to do a kernel bisection?

First, find the last good kernel (v5.6.5 in this case) and the first bad kernel version from http://kernel.ubuntu.com/~kernel-ppa/mainline/ . Then,

  $ sudo apt build-dep linux
  $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
  $ cd linux
  $ git bisect start
  $ git bisect good $(the good version you found)
  $ git bisect bad $(the bad version found)
  $ make localmodconfig
  $ make -j$(nproc) deb-pkg LOCALVERSION=-bisect-$(git log -1 --pretty=format:%h)
  $ sudo dpkg -i ../linux-image-*.deb ../linux-headers-*.deb

After installing the newly built kernel, then reboot with it.

If the issue still happens,

  $ git bisect bad

Otherwise,

  $ git bisect good

Repeat to "make -j$(nproc) deb-pkg ...." until you find the commit that causes the regression. Use `git bisect log` to dump bisect history.

Revision history for this message
Thomas Stieler (launchp0d) wrote :

I isolated version numbers working and not working:

v5.5.9 Chrome freezes while zooming in and out maps
v5.6.0 Everything is fine

git bisect seems to focus on regression bugs, I can't perform a bisect with v5.6.0 good and v5.5.9 bad.

How can I help to localize the required fix, so that it can be backported to 20.04 kernels?

Revision history for this message
Thomas Stieler (launchp0d) wrote :

Here are the last logs from kernel:

[Mo Jun 29 12:40:15 2020] [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out!
[Mo Jun 29 12:40:20 2020] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=5760, emitted seq=5762
[Mo Jun 29 12:40:20 2020] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process chrome pid 2338 thread chrome:cs0 pid 2353
[Mo Jun 29 12:40:20 2020] [drm] GPU recovery disabled.
[Mo Jun 29 12:40:40 2020] GpuWatchdog[2370]: segfault at 0 ip 0000556e2ae897ad sp 00007f2c85962490 error 6 in chrome[556e267dc000+785b000]
[Mo Jun 29 12:40:40 2020] Code: 00 79 09 48 8b 7d b0 e8 f1 95 6c fe c7 45 b0 aa aa aa aa 0f ae f0 41 8b 84 24 e0 00 00 00 89 45 b0 48 8d 7d b0 e8 f3 5a ba fb <c7> 04 25 00 00 00 00 37 13 00 00 48 83 c4 38 5b 41 5c 41 5d 41 5e

Revision history for this message
Thomas Stieler (launchp0d) wrote :

Hi!

Any news regarding this issue?
Can I help with more information?

Revision history for this message
Thomas Stieler (launchp0d) wrote :

Hey,

another ping from me: Is there any chance to get a backported fix for that issue? I would guess that this issues with current AMD hardware affects enough users to justify a backport.

Currently I'm using the Ubuntu mainline kernel, but I would prefer to use the version from the standard repositories...

Revision history for this message
Serhii Ponomarets (nefelim) wrote :

Hi!
I managed to fix this bug only by installing a new kernel. I installed the latter and the freezes has been stopped. You can check solution on https://linuxhint.com/update_ubuntu_kernel_20_04/

Revision history for this message
Thomas Stieler (launchp0d) wrote :

Well, I just updated to Groovy Gorilla and it's kernel version 5.8 fixen my issue.

But people sticking on LTS version 20.04 with same AMD GPU issue may still be interested in a fix!

So, any chance for them to get an backport?

Revision history for this message
Letatcest (ksoeteman) wrote :

I haven't had problems with this bug for a long time, as I installed the AMD ROCm drivers to use OpenCL. But now with a fresh system without ROCm, the bug is still here!

AMD Radeon RX 5600 XT (NAVI10, DRM 3.35.0, 5.4.0-58-generic, LLVM 10.0.0)

It's really tricky, as clicking Google Maps links is something you do quite often, even without knowing it. And I have to reboot completely.

Revision history for this message
Mike Gleason jr Couturier (mikegleasonjr) wrote :

Same thing here. Radeon 5700 XT, Ubuntu 20.04, Chromium 91.0.4449.6, Google Maps freezes.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

I think the desktop should use HWE kernel (5.8) by now. Can you please give it a try?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers