[amdgpu] Screen freezes ([drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout)

Bug #1949497 reported by Semih
30
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

When i try to run anything with my dGPU (Radeon R5 M430), application freezes.

But when i set "power_dpm_state" to "battery", or amdgpu.dpm=0 in boot options, it runs fine.

i get errors like this:

[ 56.647827] [drm] PCIE gen 3 link speeds already enabled
[ 56.649064] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 83.707076] [drm] PCIE gen 3 link speeds already enabled
[ 83.708350] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 94.813212] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=47, emitted seq=49
[ 94.814219] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process glmark2 pid 4471 thread glmark2:cs0 pid 4473
[ 94.815125] amdgpu 0000:01:00.0: amdgpu: GPU recovery disabled.
[ 95.069353] Asynchronous wait on fence 0000:00:02.0:gnome-shell[1648]:3c6 timed out (hint:intel_atomic_commit_ready [i915])
[ 95.069353] Asynchronous wait on fence dma_fence_chain:unbound:1 timed out (hint:submit_notify [i915])

ProblemType: Bug
DistroRelease: Ubuntu 21.10
Package: xorg 1:7.7+22ubuntu2
Uname: Linux 5.15.0-051500-generic x86_64
ApportVersion: 2.20.11-0ubuntu71
Architecture: amd64
BootLog: Error: [Errno 13] Erişim engellendi: '/var/log/boot.log'
CasperMD5CheckResult: pass
CompositorRunning: None
CurrentDesktop: ubuntu:GNOME
Date: Tue Nov 2 19:26:01 2021
DistUpgraded: Fresh install
DistroCodename: impish
DistroVariant: ubuntu
DkmsStatus:
 v4l2loopback, 0.12.5, 5.13.0-20-generic, x86_64: installed
 v4l2loopback, 0.12.5, 5.15.0-051500-generic, x86_64: installed
ExtraDebuggingInterest: Yes, if not too technical
GpuHangFrequency: Continuously
GpuHangReproducibility: Yes, I can easily reproduce it
GpuHangStarted: Since before I upgraded
GraphicsCard:
 Intel Corporation HD Graphics 620 [8086:5916] (rev 02) (prog-if 00 [VGA controller])
   Subsystem: Hewlett-Packard Company HD Graphics 620 [103c:81ee]
   Subsystem: Hewlett-Packard Company Sun XT [Radeon HD 8670A/8670M/8690M / R5 M330 / M430 / Radeon 520 Mobile] [103c:81ee]
InstallationDate: Installed on 2021-10-29 (4 days ago)
InstallationMedia: Ubuntu 21.10 "Impish Indri" - Release amd64 (20211012)
MachineType: HP HP Notebook
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=tr_TR.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-051500-generic root=UUID=51509311-ce8a-48fb-9ece-8e018d63fe03 ro quiet splash radeon.si_support=0 amdgpu.si_support=1 vt.handoff=7
SourcePackage: xorg
Symptom: display
Title: Xorg freeze
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 06/26/2020
dmi.bios.release: 15.49
dmi.bios.vendor: Insyde
dmi.bios.version: F.49
dmi.board.asset.tag: Type2 - Board Asset Tag
dmi.board.name: 81EE
dmi.board.vendor: HP
dmi.board.version: 62.29
dmi.chassis.type: 10
dmi.chassis.vendor: HP
dmi.chassis.version: Chassis Version
dmi.ec.firmware.release: 62.29
dmi.modalias: dmi:bvnInsyde:bvrF.49:bd06/26/2020:br15.49:efr62.29:svnHP:pnHPNotebook:pvrType1ProductConfigId:rvnHP:rn81EE:rvr62.29:cvnHP:ct10:cvrChassisVersion:skuX9Z24EA#AB8:
dmi.product.family: 103C_5335KV HP Notebook
dmi.product.name: HP Notebook
dmi.product.sku: X9Z24EA#AB8
dmi.product.version: Type1ProductConfigId
dmi.sys.vendor: HP
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.107-8ubuntu1
version.libgl1-mesa-dri: libgl1-mesa-dri 21.2.2-1ubuntu1
version.libgl1-mesa-glx: libgl1-mesa-glx N/A
version.xserver-xorg-core: xserver-xorg-core 2:1.20.13-1ubuntu1
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:19.1.0-2build1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20200714-1ubuntu2
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.17-1build1

Revision history for this message
Semih (kalegger) wrote :
summary: - AMD R5 M430 dGPU freezes
+ [amdgpu] AMD R5 M430 dGPU freezes
affects: xorg (Ubuntu) → linux (Ubuntu)
tags: added: amdgpu
tags: added: hybrid i915
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: [amdgpu] AMD R5 M430 dGPU freezes

Thanks for the bug report. Please try each of these separately:

------------------------------------------------------
Edit /etc/environment and add:

  MUTTER_DEBUG_ENABLE_ATOMIC_KMS=0

then reboot.
------------------------------------------------------
Select 'Ubuntu on Xorg' on the login screen.
------------------------------------------------------

Changed in linux (Ubuntu):
status: New → Incomplete
summary: - [amdgpu] AMD R5 M430 dGPU freezes
+ [amdgpu] AMD R5 M430 dGPU freezes ([drm:amdgpu_job_timedout [amdgpu]]
+ *ERROR* ring gfx timeout)
Revision history for this message
Semih (kalegger) wrote : Re: [amdgpu] AMD R5 M430 dGPU freezes ([drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout)

"MUTTER_DEBUG_ENABLE_ATOMIC_KMS=0" and "Ubuntu on Xorg" did not change anything. It gave the same error.

with "MUTTER_DEBUG_ENABLE_ATOMIC_KMS=0":

[ 58.240910] [drm] PCIE gen 3 link speeds already enabled
[ 58.242162] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 78.052145] [drm] PCIE gen 3 link speeds already enabled
[ 78.053444] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 89.004997] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=45, emitted seq=47
[ 89.005811] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process glmark2 pid 3161 thread glmark2:cs0 pid 3162
[ 89.006531] amdgpu 0000:01:00.0: amdgpu: GPU recovery disabled.
[ 89.261947] Asynchronous wait on fence dma_fence_chain:unbound:1 timed out (hint:submit_notify [i915])
[ 89.262349] Asynchronous wait on fence dma_fence_chain:unbound:1 timed out (hint:submit_notify [i915])

with "Ubuntu on Xorg":

[ 38.254745] [drm] PCIE gen 3 link speeds already enabled
[ 38.255880] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 68.697606] [drm] PCIE gen 3 link speeds already enabled
[ 68.698896] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 80.279529] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=49, emitted seq=51
[ 80.280382] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process glmark2 pid 3198 thread glmark2:cs0 pid 3199
[ 80.281144] amdgpu 0000:01:00.0: amdgpu: GPU recovery disabled.
[ 81.047412] Asynchronous wait on fence 0000:00:02.0:gnome-shell[1686]:666 timed out (hint:intel_atomic_commit_ready [i915])
[ 81.047412] Asynchronous wait on fence dma_fence_chain:unbound:1 timed out (hint:submit_notify [i915])
[ 81.047915] Asynchronous wait on fence dma_fence_chain:unbound:1 timed out (hint:submit_notify [i915])

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Sorry I just noticed you are not using an Ubuntu kernel so we can't track this bug here... unless you find the same issue occurs with the Ubuntu 5.13 kernel.

Changed in linux (Ubuntu):
status: Incomplete → New
status: New → Invalid
Revision history for this message
Semih (kalegger) wrote :

The problem occurs in Ubuntu 5.13 kernel too. I was testing the mainline kernel to see problem fixed or not.

Error message in 5.13 kernel:

[ 634.985891] [drm] PCIE gen 3 link speeds already enabled
[ 634.987228] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 645.955488] Asynchronous wait on fence 0000:00:02.0:gnome-shell[1627]:5ad2 timed out (hint:intel_atomic_commit_ready [i915])
[ 645.955508] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=50, emitted seq=52
[ 645.955643] Asynchronous wait on fence drm_sched:gfx:27 timed out (hint:submit_notify [i915])
[ 645.955770] Asynchronous wait on fence drm_sched:gfx:25 timed out (hint:submit_notify [i915])
[ 645.955865] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process glmark2 pid 5277 thread glmark2:cs0 pid 5278
[ 645.956123] amdgpu 0000:01:00.0: amdgpu: GPU recovery disabled.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thanks. It looks like several reports of amdgpu "ring gfx timeout" issues have been reported upstream to the developers:

https://gitlab.freedesktop.org/drm/amd/-/issues?scope=all&state=opened&search=%22ring+gfx+timeout%22

Changed in linux (Ubuntu):
status: Invalid → New
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
summary: - [amdgpu] AMD R5 M430 dGPU freezes ([drm:amdgpu_job_timedout [amdgpu]]
- *ERROR* ring gfx timeout)
+ [amdgpu] Screen freezes ([drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
+ gfx timeout)
Revision history for this message
TMiguelT (michael-r-milton) wrote (last edit ):

Do we know if this is kernel issue? Will downgrading the kernel fix it in the short term?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Yes it appears to be a kernel issue. I don't know if downgrading (or upgrading) will fix it but that's definitely worth a try.

See also https://gitlab.freedesktop.org/drm/amd/-/issues?scope=all&state=opened&search=%22ring+gfx+timeout%22

Revision history for this message
TMiguelT (michael-r-milton) wrote :

Thanks.

Something I should note (here, because I don't know where else to put it) is that both my and this original poster's graphics cards use the GCN 1.0 architecture (according to https://en.wikipedia.org/wiki/Radeon_Rx_200_series#Mobile_models).

This is the arch that only has pseudo support for the AMD-GPU driver: https://www.phoronix.com/scan.php?page=news_item&px=AMDGPU-Might-Drop-GCN-1.0. In my case I had to set a custom kernel param just to enable it, and the reason I did so is to get Vulkan support. I suspect the OP has done the same here. So it's still a new problem which is bad, but I suspect if I disable AMDGPU and go back to radeon then the problem will probably go away (in any kernel version).

Revision history for this message
Semih (kalegger) wrote :
Download full text (3.3 KiB)

I'm experiencing the problem on radeon driver too. Only way to run without any error is setting power_dpm_state to "battery" (or setting power_dpm_force_performance_level to "low").

It gives gpu fault error:

[ 173.274091] radeon 0000:01:00.0: WB enabled
[ 173.274093] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000080000c00
[ 173.274095] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000080000c04
[ 173.274096] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000080000c08
[ 173.274097] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000080000c0c
[ 173.274098] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000080000c10
[ 173.274642] debugfs: File 'radeon_ring_gfx' in directory '0' already present!
[ 173.274647] debugfs: File 'radeon_ring_cp1' in directory '0' already present!
[ 173.274649] debugfs: File 'radeon_ring_cp2' in directory '0' already present!
[ 173.274650] debugfs: File 'radeon_ring_dma1' in directory '0' already present!
[ 173.274653] debugfs: File 'radeon_ring_dma2' in directory '0' already present!
[ 173.469429] [drm] ring test on 0 succeeded in 1 usecs
[ 173.469435] [drm] ring test on 1 succeeded in 1 usecs
[ 173.469439] [drm] ring test on 2 succeeded in 1 usecs
[ 173.469448] [drm] ring test on 3 succeeded in 4 usecs
[ 173.469454] [drm] ring test on 4 succeeded in 3 usecs
[ 173.469488] [drm] ib test on ring 0 succeeded in 0 usecs
[ 173.469516] [drm] ib test on ring 1 succeeded in 0 usecs
[ 173.469542] [drm] ib test on ring 2 succeeded in 0 usecs
[ 173.469556] [drm] ib test on ring 3 succeeded in 0 usecs
[ 173.469568] [drm] ib test on ring 4 succeeded in 0 usecs
[ 173.690183] radeon 0000:01:00.0: GPU fault detected: 146 0x0aa31014
[ 173.690189] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00100DD5
[ 173.690191] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x03010014
[ 173.690192] VM fault (0x04, vmid 1) at page 1052117, write from CB (16)
[ 184.196457] Asynchronous wait on fence radeon:radeon.gfx:28 timed out (hint:submit_notify [i915])
[ 184.196439] Asynchronous wait on fence 0000:00:02.0:gnome-shell[1653]:3b6 timed out (hint:intel_atomic_commit_ready [i915])
[ 184.196921] Asynchronous wait on fence radeon:radeon.gfx:26 timed out (hint:submit_notify [i915])
[ 184.232357] radeon 0000:01:00.0: ring 0 stalled for more than 10244msec
[ 184.232376] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000025 last fence id 0x0000000000000029 on ring 0)
[ 184.740610] radeon 0000:01:00.0: ring 0 stalled for more than 10752msec
[ 184.740635] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000025 last fence id 0x0000000000000029 on ring 0)
[ 185.252668] radeon 0000:01:00.0: ring 0 stalled for more than 11264msec
[ 185.252690] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000025 last fence id 0x0000000000000029 on ring 0)
[ 185.764745] radeon 0000:01:00.0: ring 0 stalled for more than 11776msec
[ 185.764771] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000000025 last fence id 0x0000000000000029 on ring 0)
...

I tried the 4.19 kernel 1-2 weeks ...

Read more...

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Semih, that's not this bug, it's a completely different bug in a different driver. You should open a new bug by running:

  ubuntu-bug linux

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

See also bug 1971460.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.