amdgpu kernel errors

Bug #1955049 reported by pigs-on-the-swim
This bug report is a duplicate of:  Bug #1956845: amdgpu: [gfxhub0] retry page fault. Edit Remove
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

System crashes about 3 times a day; no appearant relation to application or process or triggering situation. Happens constantly. Random time between incidents normally several hours. This is a problem for month now.
Description: Ubuntu 21.04
Release: 21.04

GUI crashed: Operation ceses, monitor becomes dark. In this case the monitor shows a static picture after some seconds of darkness.
Remote access via SSH is possible up to some extend.

Expected: GUI does not crash.

dmesg says:

[ 2284.101478] [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR* Waiting for fences timed out!
[ 2289.993462] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=132409, emitted seq=132410
[ 2289.993826] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
[ 2289.994175] amdgpu 0000:0a:00.0: amdgpu: GPU reset begin!
[ 2290.169982] [drm] free PSP TMR buffer
[ 2290.198611] amdgpu 0000:0a:00.0: amdgpu: MODE2 reset
[ 2290.198656] amdgpu 0000:0a:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 2290.198943] [drm] PCIE GART of 1024M enabled.
[ 2290.198945] [drm] PTB located at 0x000000F400900000
[ 2290.199115] [drm] PSP is resuming...
[ 2290.218996] [drm] reserve 0x400000 from 0xf457000000 for PSP TMR
[ 2290.262318] amdgpu 0000:0a:00.0: amdgpu: RAS: optional ras ta ucode is not available
[ 2290.266897] amdgpu 0000:0a:00.0: amdgpu: RAP: optional rap ta ucode is not available
[ 2290.266899] amdgpu 0000:0a:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[ 2290.468265] [drm] kiq ring mec 2 pipe 1 q 0
[ 2290.602777] [drm] VCN decode and encode initialized successfully(under SPG Mode).
[ 2290.602784] amdgpu 0000:0a:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
[ 2290.602787] amdgpu 0000:0a:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 2290.602788] amdgpu 0000:0a:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 2290.602790] amdgpu 0000:0a:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ 2290.602791] amdgpu 0000:0a:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ 2290.602792] amdgpu 0000:0a:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ 2290.602793] amdgpu 0000:0a:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ 2290.602793] amdgpu 0000:0a:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ 2290.602795] amdgpu 0000:0a:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ 2290.602796] amdgpu 0000:0a:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[ 2290.602797] amdgpu 0000:0a:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1
[ 2290.602798] amdgpu 0000:0a:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 1
[ 2290.602798] amdgpu 0000:0a:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 1
[ 2290.602799] amdgpu 0000:0a:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 1
[ 2290.602800] amdgpu 0000:0a:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 1
[ 2290.609397] amdgpu 0000:0a:00.0: amdgpu: recover vram bo from shadow start
[ 2290.609400] amdgpu 0000:0a:00.0: amdgpu: recover vram bo from shadow done
[ 2290.609464] amdgpu 0000:0a:00.0: amdgpu: GPU reset(1) succeeded!
[ 2290.609926] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[ 2290.609979] kfd kfd: amdgpu: error getting iommu info. is the iommu enabled?
[ 2290.609981] kfd kfd: amdgpu: Error initializing iommuv2
[ 2290.610175] kfd kfd: amdgpu: device 1002:15d8 NOT added due to errors
[ 2317.591058] rfkill: input handler enabled
[ 2412.382743] show_signal_msg: 29 callbacks suppressed
[ 2412.382746] apport-gtk[16593]: segfault at 18 ip 00007f7f7e90e194 sp 00007fff2ca56790 error 4 in libgtk-3.so.0.2404.21[7f7f7e805000+385000]
[ 2412.382753] Code: c4 08 5b 5d c3 90 f3 0f 1e fa 48 8b 7f 10 48 85 ff 74 0b e9 ce c6 ff ff 66 0f 1f 44 00 00 48 83 ec 08 48 89 d7 e8 8c 3c 17 00 <48> 8b 40 18 48 8b 78 10 e8 df 03 09 00 48 83 c4 08 48 89 c7 e9 a3

ProblemType: Bug
DistroRelease: Ubuntu 21.04
Package: xorg 1:7.7+22ubuntu1
Uname: Linux 5.14.0-051400drmtip20210904-generic x86_64
.tmp.unity_support_test.1:

ApportVersion: 2.20.11-0ubuntu65.4
Architecture: amd64
BootLog:

CasperMD5CheckResult: unknown
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
CompositorRunning: None
Date: Thu Dec 16 14:34:43 2021
DistUpgraded: Fresh install
DistroCodename: hirsute
DistroVariant: ubuntu
DkmsStatus:
 vhba, 20211023, 5.11.0-42-generic, x86_64: installed
 vhba, 20211023, 5.11.0-44-generic, x86_64: installed
ExtraDebuggingInterest: Yes
GraphicsCard:
 Advanced Micro Devices, Inc. [AMD/ATI] Picasso [1002:15d8] (rev c8) (prog-if 00 [VGA controller])
   Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Picasso [1002:15d8]
InstallationDate: Installed on 2020-12-16 (365 days ago)
InstallationMedia: Ubuntu 16.04.7 LTS "Xenial Xerus" - Release amd64 (20200806)
MachineType: To Be Filled By O.E.M. To Be Filled By O.E.M.
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.14.0-051400drmtip20210904-generic root=UUID=54c9f67c-180d-4545-8531-aef138dc5e0a ro
SourcePackage: xorg
Symptom: display
Title: Xorg crash
UpgradeStatus: No upgrade log present (probably fresh install)
XorgConf:
 Section "InputClass"
     Identifier "middle button emulation class"
     MatchIsPointer "on"
     Option "Emulate3Buttons" "on"
 EndSection
dmi.bios.date: 06/18/2020
dmi.bios.release: 5.14
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: P4.20
dmi.board.name: B450 Pro4
dmi.board.vendor: ASRock
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrP4.20:bd06/18/2020:br5.14:svnToBeFilledByO.E.M.:pnToBeFilledByO.E.M.:pvrToBeFilledByO.E.M.:skuToBeFilledByO.E.M.:rvnASRock:rnB450Pro4:rvr:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.family: To Be Filled By O.E.M.
dmi.product.name: To Be Filled By O.E.M.
dmi.product.sku: To Be Filled By O.E.M.
dmi.product.version: To Be Filled By O.E.M.
dmi.sys.vendor: To Be Filled By O.E.M.
mtime.conffile..etc.apport.crashdb.conf: 2021-08-13T12:52:00.669929
version.compiz: compiz 1:0.9.14.1+20.10.20200813-0ubuntu4
version.libdrm2: libdrm2 2.4.105-3~21.04.1
version.libgl1-mesa-dri: libgl1-mesa-dri 21.0.3-0ubuntu0.3
version.libgl1-mesa-glx: libgl1-mesa-glx 21.0.3-0ubuntu0.3
version.xserver-xorg-core: xserver-xorg-core 2:1.20.11-1ubuntu1.2
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.10.6-2build1
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:19.1.0-2
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20200714-1ubuntu1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.17-1
xserver.bootTime: Thu Dec 16 14:32:53 2021
xserver.configfile: /etc/X11/xorg.conf
xserver.errors: AMDGPU(0): Failed to make import prime FD as pixmap: 22
xserver.logfile: /var/log/Xorg.0.log
xserver.version: 2:1.20.11-1ubuntu1.2
xserver.video_driver: amdgpu

Revision history for this message
pigs-on-the-swim (dw1-4) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

You are using an unsupported kernel. Please start by trying an official Ubuntu kernel or a newer mainline kernel.

summary: - Xorg crash
+ amdgpu kernel errors
affects: xorg (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.