System is generally unstable with amdgpu errors in syslog

Bug #1863385 reported by Michael Crumpton on 2020-02-14
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned

Bug Description

I just set up a new computer to back up some files to a ZFS mirror on external USB3 drives.

I'm booting off of NVMe, with four older sata hard drives installed internally (each with a single NTFS partition), and 3 external USB3 drives (each set as part of a 3 way mirror in ZFS)

Motherboard:AsRock X570 PRO4 AM4
Processor: AMD Ryzen 5 3400G

The system has been crashing (no response to mouse/keyboard, but the screen seems to still be updating).

I see a lot of junk in dmesg about amdgpu.

ProblemType: Bug
DistroRelease: Ubuntu 19.10
Package: linux-image-5.3.0-18-generic 5.3.0-18.19+1
ProcVersionSignature: Ubuntu 5.3.0-18.19-generic 5.3.1
Uname: Linux 5.3.0-18-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.11-0ubuntu8
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: cwal 17230 F.... pulseaudio
 /dev/snd/controlC0: cwal 17230 F.... pulseaudio
CurrentDesktop: LXQt
Date: Fri Feb 14 17:10:33 2020
InstallationDate: Installed on 2020-02-14 (0 days ago)
InstallationMedia: Lubuntu 19.10 "Eoan Ermine" - Release amd64 (20191017.1)
IwConfig:
 enp3s0 no wireless extensions.

 lo no wireless extensions.
MachineType: To Be Filled By O.E.M. To Be Filled By O.E.M.
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.3.0-18-generic root=UUID=8a1e698e-a3ba-4ae4-8bf8-811803447743 ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-5.3.0-18-generic N/A
 linux-backports-modules-5.3.0-18-generic N/A
 linux-firmware 1.183
RfKill:

SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 09/10/2019
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: P1.70
dmi.board.name: X570 Pro4
dmi.board.vendor: ASRock
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrP1.70:bd09/10/2019:svnToBeFilledByO.E.M.:pnToBeFilledByO.E.M.:pvrToBeFilledByO.E.M.:rvnASRock:rnX570Pro4:rvr:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.family: To Be Filled By O.E.M.
dmi.product.name: To Be Filled By O.E.M.
dmi.product.sku: To Be Filled By O.E.M.
dmi.product.version: To Be Filled By O.E.M.
dmi.sys.vendor: To Be Filled By O.E.M.

Michael Crumpton (cwalv2) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Michael Crumpton (cwalv2) wrote :

After rebooting, I'm still getting errors, even some time after boot.

Michael Crumpton (cwalv2) wrote :

Still an issue after doing a full apt upgrade.

5.3.0-29-generic

Lin Manfu (linmanfu) wrote :

I also have a 3400G with a freezing bug. Two questions to help establish whether it's the same bug:

> The system has been crashing (no response to mouse/keyboard, but the screen seems to still be updating).

When you say the screen is updating, what exactly do you mean, please? My bug means the screen blanks to black, then returns to where it was (or an earlier frame), and the mouse cursor moves. But actually, it's a graphics crash: Clicking on buttons etc. has no effect and if I use the keyboard to open a Terminal (Ctrl+Alt+T) and reboot, I get a response about half the time. This is consistent with what I see in the SysLog, which reports a series of amdgpu crashes and failed resets.

> I see a lot of junk in dmesg about amdgpu.

Are you seeing messages like these or something different, please?:

[drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out!
Apr 5 01:12:11 paul kernel: [ 285.638026] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=92956, emitted seq=92958
Apr 5 01:12:11 paul kernel: [ 285.638075] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process RLTM3.exe pid 3670 thread RLTM3.exe:cs0 pid 3700
Apr 5 01:12:11 paul kernel: [ 285.638078] amdgpu 0000:27:00.0: amdgpu: GPU reset begin!

It might help you to know that many people with amdgpu bugs in Ryzen report that setting the kernel parameter iommu=pt fixes them, though this does not help me.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers