amdgpu driver failing repeatedly

Bug #2018286 reported by Heinrich Schuchardt
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

From time to time my KDE desktop becomes unresponsive. In the kernel log I find errors like the following repeating incessantly:

03:34.977226 kernel: [ 8355.577322] [drm] PTB located at 0x000000F400A00000
03:34.977232 kernel: [ 8355.577356] [drm] PSP is resuming...
03:34.996756 kernel: [ 8355.597239] [drm] reserve 0x400000 from 0xf47fc00000 for PSP TMR
03:35.204496 kernel: [ 8355.805718] amdgpu 0000:05:00.0: amdgpu: RAS: optional ras ta ucode is not available
03:35.236504 kernel: [ 8355.836957] amdgpu 0000:05:00.0: amdgpu: RAP: optional rap ta ucode is not available
03:35.236523 kernel: [ 8355.836962] amdgpu 0000:05:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
03:35.876947 kernel: [ 8356.474483] [drm] kiq ring mec 2 pipe 1 q 0
03:36.168521 kernel: [ 8356.767847] amdgpu 0000:05:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
03:36.168545 kernel: [ 8356.768191] [drm:amdgpu_gfx_enable_kcq [amdgpu]] *ERROR* KCQ enable failed
03:36.168552 kernel: [ 8356.768542] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v9_0> failed -110
03:36.168554 kernel: [ 8356.768895] amdgpu 0000:05:00.0: amdgpu: GPU reset(179) failed
03:36.168559 kernel: [ 8356.769009] amdgpu 0000:05:00.0: amdgpu: GPU reset end with ret = -110
03:36.168565 kernel: [ 8356.769017] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110
03:46.328574 kernel: [ 8366.926277] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=19364, emitted seq=19366
03:46.328593 kernel: [ 8366.927059] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
03:46.328599 kernel: [ 8366.927731] amdgpu 0000:05:00.0: amdgpu: GPU reset begin!
03:46.501074 kernel: [ 8367.098277] amdgpu 0000:05:00.0: amdgpu: GPU reset succeeded, trying to resume
03:46.501094 kernel: [ 8367.098806] [drm] PCIE GART of 1024M enabled.
03:46.501099 kernel: [ 8367.098811] [drm] PTB located at 0x000000F400A00000
03:46.501109 kernel: [ 8367.098848] [drm] PSP is resuming...
03:46.521070 kernel: [ 8367.118736] [drm] reserve 0x400000 from 0xf47fc00000 for PSP TMR
03:46.728491 kernel: [ 8367.326740] amdgpu 0000:05:00.0: amdgpu: RAS: optional ras ta ucode is not available
03:46.760475 kernel: [ 8367.358158] amdgpu 0000:05:00.0: amdgpu: RAP: optional rap ta ucode is not available
03:46.760482 kernel: [ 8367.358161] amdgpu 0000:05:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
03:47.384498 kernel: [ 8367.985147] [drm] kiq ring mec 2 pipe 1 q 0
03:47.680540 kernel: [ 8368.279937] amdgpu 0000:05:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
03:47.680561 kernel: [ 8368.280273] [drm:amdgpu_gfx_enable_kcq [amdgpu]] *ERROR* KCQ enable failed
03:47.680565 kernel: [ 8368.280618] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v9_0> failed -110
03:47.680570 kernel: [ 8368.280976] amdgpu 0000:05:00.0: amdgpu: GPU reset(180) failed
03:47.680575 kernel: [ 8368.281055] amdgpu 0000:05:00.0: amdgpu: GPU reset end with ret = -110
03:47.680580 kernel: [ 8368.281058] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110
03:57.848521 kernel: [ 8378.446230] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=19366, emitted seq=19367

ProblemType: Bug
DistroRelease: Ubuntu 23.10
Package: linux-image-6.2.0-21-generic 6.2.0-21.21
ProcVersionSignature: Ubuntu 6.2.0-21.21-generic 6.2.6
Uname: Linux 6.2.0-21-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.26.1-0ubuntu2
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: zfsdt 3122 F.... wireplumber
 /dev/snd/controlC0: zfsdt 3122 F.... wireplumber
 /dev/snd/seq: zfsdt 3115 F.... pipewire
CasperMD5CheckResult: pass
CurrentDesktop: KDE
Date: Tue May 2 11:13:11 2023
InstallationDate: Installed on 2021-07-01 (669 days ago)
InstallationMedia: Kubuntu 21.04 "Hirsute Hippo" - Release amd64 (20210420)
Lsusb:
 Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 002 Device 003: ID 04f2:b604 Chicony Electronics Co., Ltd Integrated Camera (1280x720@30)
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 003 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: LENOVO 20KV0008GE
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-6.2.0-21-generic root=/dev/mapper/vgubuntu-root ro
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-6.2.0-21-generic N/A
 linux-backports-modules-6.2.0-21-generic N/A
 linux-firmware 20230323.gitbcdcfbcf-0ubuntu1
RfKill:
 1: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 02/08/2023
dmi.bios.release: 1.63
dmi.bios.vendor: LENOVO
dmi.bios.version: R0UET83W (1.63 )
dmi.board.asset.tag: Not Available
dmi.board.name: 20KV0008GE
dmi.board.vendor: LENOVO
dmi.board.version: SDK0J40697 WIN
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.ec.firmware.release: 1.63
dmi.modalias: dmi:bvnLENOVO:bvrR0UET83W(1.63):bd02/08/2023:br1.63:efr1.63:svnLENOVO:pn20KV0008GE:pvrThinkPadE585:rvnLENOVO:rn20KV0008GE:rvrSDK0J40697WIN:cvnLENOVO:ct10:cvrNone:skuLENOVO_MT_20KV_BU_Think_FM_ThinkPadE585:
dmi.product.family: ThinkPad E585
dmi.product.name: 20KV0008GE
dmi.product.sku: LENOVO_MT_20KV_BU_Think_FM_ThinkPad E585
dmi.product.version: ThinkPad E585
dmi.sys.vendor: LENOVO

Revision history for this message
Heinrich Schuchardt (xypron) wrote :
Revision history for this message
Heinrich Schuchardt (xypron) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.