amdgpu reset during usage of firefox
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Linux |
Unknown
|
Unknown
|
|||
linux (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned | ||
mesa (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Running nightly on 23.10 (since monday), I have been experiencing a few amdgpu resets in the past hours
ProblemType: Bug
DistroRelease: Ubuntu 23.10
Package: linux-image-
ProcVersionSign
Uname: Linux 6.5.0-9-generic x86_64
ApportVersion: 2.27.0-0ubuntu5
Architecture: amd64
CasperMD5CheckR
CurrentDesktop: ubuntu:GNOME
Date: Thu Oct 19 18:26:43 2023
HibernationDevice: RESUME=
InstallationDate: Installed on 2022-07-04 (472 days ago)
InstallationMedia: Ubuntu 22.04 LTS "Jammy Jellyfish" - Release amd64 (20220419)
MachineType: {report[
ProcEnviron:
LANG=fr_FR.UTF-8
PATH=(custom, no user)
SHELL=/bin/bash
TERM=xterm-
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=
RelatedPackageV
linux-
linux-
linux-firmware 20230919.
SourcePackage: linux
UpgradeStatus: Upgraded to mantic on 2023-10-16 (3 days ago)
dmi.bios.date: 05/15/2023
dmi.bios.release: 1.24
dmi.bios.vendor: LENOVO
dmi.bios.version: R1MET54W (1.24 )
dmi.board.
dmi.board.name: 21A0CTO1WW
dmi.board.vendor: LENOVO
dmi.board.version: Not Defined
dmi.chassis.
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.
dmi.ec.
dmi.modalias: dmi:bvnLENOVO:
dmi.product.family: ThinkPad P14s Gen 2a
dmi.product.name: 21A0CTO1WW
dmi.product.sku: LENOVO_
dmi.product.
dmi.sys.vendor: LENOVO
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#5 |
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#6 |
This is more likely a mesa issue than a kernel issue.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#7 |
I will try to test with amdgpu-pro sometimes this week with the kernel that I mentioned above. If the application works as expected, it could be an issue with mesa opengl bug.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#8 |
(In reply to Alex Deucher from comment #1)
> This is more likely a mesa issue than a kernel issue.
no, 4.14 kernel with latest mesa libs works very vell without any stucks
but from 4.20.4 and in all latest kernels (including 5.0) OS freezes and stucks every 30s ... 1min for 30s when browsing youtube with HW acceleration enabled(uvd) or playing a game, RX550, Arch, vanilla kernel
365.021164] amdgpu: [powerplay]
[ 365.045198] [drm:amdgpu_
[ 365.570667] amdgpu: [powerplay]
[ 366.115228] [drm:amdgpu_
[ 366.115377] [drm:amdgpu_
[ 366.115388] [drm] Timeout, but no hardware hang detected.
[ 366.689407] amdgpu: [powerplay]
[ 367.232287] amdgpu: [powerplay]
[ 367.787043] amdgpu: [powerplay]
[ 368.320138] amdgpu: [powerplay]
[ 369.367739] amdgpu: [powerplay]
[ 369.907559] amdgpu: [powerplay]
[ 370.994478] amdgpu: [powerplay]
[ 371.538753] amdgpu: [powerplay]
[ 372.075079] amdgpu: [powerplay]
[ 372.598565] amdgpu: [powerplay]
[ 373.657188] amdgpu: [powerplay]
[ 374.198637] amdgpu: [powerplay]
[ 375.075076] [drm:amdgpu_
[ 375.284948] amdgpu: [powerplay]
[ 375.830347] amdgpu: [powerplay]
[ 376.138428] [drm:amdgpu_
[ 376.138783] [drm:amdgpu_
[ 376.138797] [drm] IP block:sdma_v3_0 is hung!
[ 376.138809] [drm] GPU recovery disabled.
[ 376.394657] amdgpu: [powerplay]
[ 376.934375] amdgpu: [powerplay]
[ 377.463230] amdgpu: [powerplay]
[ 377.977725] amdgpu: [powerplay]
[ 378.518406] amdgpu: [powerplay]
[ 379.060098] amdgpu: [powerplay]
[ 379.556880] amdgpu: [powerplay]
[ 380.075217] amdgpu: [powerp...
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#9 |
Can you bisect?
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#10 |
I'm having a very similar issue, running Linux Mint 19.1. The issue has persisted from at least 4.15, I'm currently running 5.0.1 and the issue remains.
Here is the latest syslog of the error:
[37258.615599] gmc_v9_
[37258.615608] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615615] amdgpu 0000:06:00.0: in page starting at address 0x0000800107805000 from 27
[37258.615619] amdgpu 0000:06:00.0: VM_L2_PROTECTIO
[37258.615629] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615633] amdgpu 0000:06:00.0: in page starting at address 0x0000800107807000 from 27
[37258.615636] amdgpu 0000:06:00.0: VM_L2_PROTECTIO
[37258.615645] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615648] amdgpu 0000:06:00.0: in page starting at address 0x0000800107801000 from 27
[37258.615651] amdgpu 0000:06:00.0: VM_L2_PROTECTIO
[37258.615660] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615663] amdgpu 0000:06:00.0: in page starting at address 0x0000800107803000 from 27
[37258.615666] amdgpu 0000:06:00.0: VM_L2_PROTECTIO
[37258.615675] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615678] amdgpu 0000:06:00.0: in page starting at address 0x0000800107809000 from 27
[37258.615681] amdgpu 0000:06:00.0: VM_L2_PROTECTIO
[37258.615689] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615692] amdgpu 0000:06:00.0: in page starting at address 0x000080010780b000 from 27
[37258.615695] amdgpu 0000:06:00.0: VM_L2_PROTECTIO
[37258.615704] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615707] amdgpu 0000:06:00.0: in page starting at address 0x0000800107805000 from 27
[37258.615710] amdgpu 0000:06:00.0: VM_L2_PROTECTIO
[37258.615740] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615743] amdgpu 0000:06:00.0: in page starting at address 0x0000800107807000 from 27
[37258.615746] amdgpu 0000:06:00.0: VM_L2_PROTECTIO
[37258.615756] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615759] amdgpu 0000:06:00.0: in page starting at address 0x0000800107801000 from 27
[37258.615762] amdgpu 0000:06:00.0: VM_L2_PROTECTIO
[37258.615771] amdgpu 0000:06:00.0: [gfxhub] VMC page fau...
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#11 |
tried linux-amd-
Apr 01 21:01:03 kernel: amdgpu 0000:03:00.0: [drm:amdgpu_
Apr 01 21:01:03 kernel: [drm:amdgpu_
Apr 01 21:01:03 kernel: [drm:amdgpu_
Apr 01 20:26:59 kernel: [drm] amdgpu kernel modesetting enabled.
Apr 01 20:26:59 kernel: vga_switcheroo: detected switching method \_SB_.PCI0.
Apr 01 20:26:59 kernel: [drm] initializing kernel modesetting (CARRIZO 0x1002:0x9874 0x1025:0x1201 0xCA).
Apr 01 20:26:59 kernel: [drm] register mmio base: 0xD1500000
Apr 01 20:26:59 kernel: [drm] register mmio size: 262144
Apr 01 20:26:59 kernel: [drm] add ip block number 0 <vi_common>
Apr 01 20:26:59 kernel: [drm] add ip block number 1 <gmc_v8_0>
Apr 01 20:26:59 kernel: [drm] add ip block number 2 <cz_ih>
Apr 01 20:26:59 kernel: [drm] add ip block number 3 <gfx_v8_0>
Apr 01 20:26:59 kernel: [drm] add ip block number 4 <sdma_v3_0>
Apr 01 20:26:59 kernel: [drm] add ip block number 5 <powerplay>
Apr 01 20:26:59 kernel: [drm] add ip block number 6 <dm>
Apr 01 20:26:59 kernel: [drm] add ip block number 7 <uvd_v6_0>
Apr 01 20:26:59 kernel: [drm] add ip block number 8 <vce_v3_0>
Apr 01 20:26:59 kernel: [drm] add ip block number 9 <acp_ip>
Apr 01 20:26:59 kernel: [drm] UVD is enabled in physical mode
Apr 01 20:26:59 kernel: [drm] VCE enabled in physical mode
Apr 01 20:26:59 kernel: ATOM BIOS: 113-C91400-007
Apr 01 20:26:59 kernel: [drm] RAS INFO: ras initialized successfully, hardware ability[0] ras_mask[0]
Apr 01 20:26:59 kernel: [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
Apr 01 20:26:59 kernel: amdgpu 0000:00:01.0: VRAM: 512M 0x000000F400000000 - 0x000000F41FFFFFFF (512M used)
Apr 01 20:26:59 kernel: amdgpu 0000:00:01.0: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF
Apr 01 20:26:59 kernel: [drm] Detected VRAM RAM=512M, BAR=512M
Apr 01 20:26:59 kernel: [drm] RAM width 64bits UNKNOWN
Apr 01 20:26:59 kernel: [TTM] Zone kernel: Available graphics memory: 3804974 KiB
Apr 01 20:26:59 kernel: [TTM] Zone dma32: Available graphics memory: 2097152 KiB
Apr 01 20:26:59 kernel: [TTM] Initializing pool allocator
Apr 01 20:26:59 kernel: [TTM] Initializing DMA pool allocator
Apr 01 20:26:59 kernel: [drm] amdgpu: 512M of VRAM memory ready
Apr 01 20:26:59 kernel: [drm] amdgpu: 3072M of GTT memory ready.
Apr 01 20:26:59 kernel: [drm] GART: num cpu pages 262144, num gpu pages 262144
Apr 01 20:26:59 kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F4007E9
Apr 01 20:26:59 kernel: [drm] Found UVD firmware Version: 1.91 Family ID: 11
Apr 01 20:26:59 kernel: [drm] UVD ENC is disabled
Apr 01 20:26:59 kernel: [drm] Found VCE firmware Version: 52.4 Binary ID: 3
Apr 01 20:26:59 kernel: smu version 27.17.00
Apr 01 20:26:59 kernel: [drm] DM_PPLIB: values for Engine clock
Apr 01 20:26:59 kernel: [drm] DM_PPLIB: 30000...
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#12 |
(In reply to Alex Deucher from comment #4)
> Can you bisect?
Unfortunately this is not possible as all latest kernels are now shipped with Display Core enabled by default and as I told 4.14 vanilla kernel works like a charm on same HW and with same mesa libs - no lags, no stucks or freezes and no warnings like listed above. So it's no sense to do "git bisect" as it's not a single commit which works incorrectly with GPU. DC - this a completely new functionality which replaces old amdgpu code
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#13 |
Hi, i have a very similar problem. My system is working with 4.15 and with 5.1.16 but not with other 5.x kernels:
The System does not boot with 5.x kernels. With 5.1.16 the gui system freezes sometimes but sshd and mouse is still working.
CPU: Ryzen 5 2400g, BOARD: AORUS B450 I PRO WIFI, X Server 1.19.6
Kernel 5.0.x not working (blank screen after boot)
Kernel 5.2.x ( x <= 9 ) is not working (blank screen after boot)
but Kernel 5.1.16 is working (mostly)!
Error LOG with 5.1.16:
[Mi Aug 14 14:22:21 2019] amdgpu 0000:09:00.0: VM_L2_PROTECTIO
[Mi Aug 14 14:22:21 2019] amdgpu 0000:09:00.0: [gfxhub] no-retry page fault (src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1848 thread Xorg:cs0 pid 1849)
[Mi Aug 14 14:22:21 2019] amdgpu 0000:09:00.0: in page starting at address 0x000080010c205000 from 27
[Mi Aug 14 14:22:21 2019] amdgpu 0000:09:00.0: VM_L2_PROTECTIO
[Mi Aug 14 14:22:31 2019] [drm:amdgpu_
[Mi Aug 14 14:22:31 2019] [drm:amdgpu_
[Mi Aug 14 14:22:31 2019] [drm] GPU recovery disabled.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#14 |
Just got something similar while playing Left 4 Dead. The system simply froze with altered colors on the screen and the sound just looping over the last second or so. Cannot confirm SSH access.
journalctl -b -1 ends with
[drm:gfx_
[drm:amdgpu_
[drm:amdgpu_
OS: Ubuntu 19.04 on
Kernel: 5.0.0-27-generic
GPU: Radeon RX580
CPU: Ryzen 5 1600x
Thanks!
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#15 |
(In reply to Ungureanu Alexandru from comment #9)
> Just got something similar while playing Left 4 Dead. The system simply
> froze with altered colors on the screen and the sound just looping over the
> last second or so. Cannot confirm SSH access.
> Kernel: 5.0.0-27-generic
> GPU: Radeon RX580
> CPU: Ryzen 5 1600x
5.0 is very outdated kernel, use latest from kernel.org
as for me all works perfectly in 5.3 (Chip polaris RX540)
finally I have no more any errors like these ones:
- ERROR* resume of IP block <uvd_v6_0> failed -110
- [drm] Fence fallback timer expired on ring sdma0
- last message was failed ret is **
- [drm:amdgpu_
- IP block:sdma_v3_0 is hung!
- Timeout, but no hardware hang detected.
Tested on youtube with HW accelerated video and in several games
Thank you guys from AMD a lot, I had to wait 1y+ to get these bugs fixed
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#16 |
Same problem here. It happens when I run looking-glass [1], but not everytime. I tied downgrading my kernel from 5.3.1 to 5.2.11 (I'm pretty sure it worked then), downgrading mesa from 19.2.0 to 19.1.7 (I'm sure it worked with 19.2.0-rc) and downgrading my firmware to 2019-09-23 (oldest in repo).
When it happens looking glass starts blinking and sometimes my other monitor stuck that I can only move cursor on it.
Spec:
Gentoo ~amd64
Ryzen 1600 (other have Ryzen too, coincidence?)
Linux GPU: R7 240 (with radeon driver)
Windows GPU: RX580
ASRock X370 Gaming X
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#17 |
Hi,
I think I have the same bug and opened https:/
At first it looked a bit different, because in newer kernels the error message has changed. But as you can see I did some testing and this seems to go way back. Sadly I couldn't test a 4.18 kernel.
Can somebody mark my report as duplicate? Because I think it is.
And Would some more debug info help?
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#18 |
*** Bug 204683 has been marked as a duplicate of this bug. ***
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#19 |
Also experiencing this with Radeon RX 5700 XT and amdgpu 19.1.0+
Didn't have any heavy load for the GPU to do.
First I had some artifacts appeared on Plasma Hard Disk Monitor widget and CPU Load Widget (here is a screenshot: https:/
I checked the logs for the period when this could've happened, but the only logs from that period are from KScreen that start like this:
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.
Oct 24 16:34:58 perk11-home org.kde.KScreen...
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#20 |
My kernel version is 5.3.7-050307-
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#21 |
Created attachment 285665
5 second video clip that triggers a crash
Hi,
I think I'm having the same problem as you guys. I run a mythbackend where I record cable television and those recordings often crash my system when hardware decoding is enabled. Usually it's just the screen that freezes and I can still ssh to it.
Kernel 5.1.6 was an exception for me too, with that kernel I'm able to restart the display manager and recover without having to reboot.
Attached is a short video that crashes my system. I can trigger the alert by running:
mpv --vo=vaapi out.ts
I'm wondering if it crashes your systems too and if it's related.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#22 |
(In reply to shallowaloe from comment #16)
> Created attachment 285665 [details]
> 5 second video clip that triggers a crash
>
> Hi,
>
> I think I'm having the same problem as you guys. I run a mythbackend where
> I record cable television and those recordings often crash my system when
> hardware decoding is enabled. Usually it's just the screen that freezes and
> I can still ssh to it.
>
> Kernel 5.1.6 was an exception for me too, with that kernel I'm able to
> restart the display manager and recover without having to reboot.
>
> Attached is a short video that crashes my system. I can trigger the alert
> by running:
>
> mpv --vo=vaapi out.ts
>
> I'm wondering if it crashes your systems too and if it's related.
Just to add a data point, I tried running `mpv --vo=vaapi out.ts` against your file, and while it crashed the application, it did not freeze the system.
My hardware is a Ryzen 3700X with a Radeon RX 5700, running Ubuntu 19.10 with default kernel (5.3.0-19-generic).
The command did result in the following lines in /var/log/syslog repeated every 5 seconds:
Nov 10 07:04:23 redacted kernel: [ 2266.802162] gmc_v10_
Nov 10 07:04:23 redacted kernel: [ 2266.802166] amdgpu 0000:0b:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.802170] amdgpu 0000:0b:00.0: at page 0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.802171] amdgpu 0000:0b:00.0: VM_L2_PROTECTIO
Nov 10 07:04:23 redacted kernel: [ 2266.802176] amdgpu 0000:0b:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.802178] amdgpu 0000:0b:00.0: at page 0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.802179] amdgpu 0000:0b:00.0: VM_L2_PROTECTIO
Nov 10 07:04:23 redacted kernel: [ 2266.802566] amdgpu 0000:0b:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.802568] amdgpu 0000:0b:00.0: at page 0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.802569] amdgpu 0000:0b:00.0: VM_L2_PROTECTIO
Nov 10 07:04:23 redacted kernel: [ 2266.802573] amdgpu 0000:0b:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.802575] amdgpu 0000:0b:00.0: at page 0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.802576] amdgpu 0000:0b:00.0: VM_L2_PROTECTIO
Nov 10 07:04:23 redacted kernel: [ 2266.802984] amdgpu 0000:0b:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.802985] amdgpu 0000:0b:00.0: at page 0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.802987] amdgpu 0000:0b:00.0: VM_L2_PROTECTIO
Nov 10 07:04:23 redacted kernel: [ 2266.802993] amdgpu 0000:0b:00.0: [mmhub] VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.802994] amdgpu 0000:0b:00.0: at page 0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.802995] amdg...
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#23 |
Hi,
I recently built a 5.4.0-rc7 from drm-next (my HEAD was 17eee668b3cad42
Since then I didn't get any crashes. I have tested this for a few hours now, but it's entirely possible that I just didn't run into the bug for some reason, although it usually appeared after half an hour.
If possible please try this setup and see if it is fixed.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#24 |
Hi,
This issue is still present in the latest kernels:
5.4.1, 5.4, 5.3.14
Last usable kernel for me is 4.20.17
System Specs
- Gigabyte b450-ds3h
- Ryzen 5 3400G (with RX Vega 11)
- Mesa 19.1.2 - padoka PPA (Stable)
- Ubuntu 18.04.3 LTS
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#25 |
Dear j.cordoba,
is it possible that you try to build 5.4.0-rc7 from drm-next and give it a test as I mentioned in Comment 18?
I'm running on this for some time now and the bug should have appeared by now, so I'm getting more confident that it is fixed.
Best regards
Matthias
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#26 |
Same is happening to me on 5.4.1. No issue with 4.9.
[ 44.172714] [drm:amdgpu_
[ 49.292694] [drm:amdgpu_
[ 58.469316] [drm:amdgpu_
[ 63.586055] [drm:amdgpu_
[ 156.606591] [drm:amdgpu_
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#27 |
(In reply to shallowaloe from comment #16)
> Created attachment 285665 [details]
> 5 second video clip that triggers a crash
>
> Hi,
>
> I think I'm having the same problem as you guys. I run a mythbackend where
> I record cable television and those recordings often crash my system when
> hardware decoding is enabled. Usually it's just the screen that freezes and
> I can still ssh to it.
>
> Kernel 5.1.6 was an exception for me too, with that kernel I'm able to
> restart the display manager and recover without having to reboot.
>
> Attached is a short video that crashes my system. I can trigger the alert
> by running:
>
> mpv --vo=vaapi out.ts
>
> I'm wondering if it crashes your systems too and if it's related.
This one is probably a Mesa issue, see https:/
What Mesa version are you using?
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#28 |
Created attachment 286227
attachment-
Thanks for the link to the bug. I'm running an ubuntu based system and am
using the oibaf ppa. The current version is 20.0.
On Wed, Dec 4, 2019 at 1:54 AM <email address hidden> wrote:
> https:/
>
> Pierre-Eric Pelloux-Prayer (<email address hidden>) changed:
>
> What |Removed |Added
>
> -------
> CC|
> |pierre-
> | |amd.com
>
> --- Comment #22 from Pierre-Eric Pelloux-Prayer (
> <email address hidden>) ---
> (In reply to shallowaloe from comment #16)
> > Created attachment 285665 [details]
> > 5 second video clip that triggers a crash
> >
> > Hi,
> >
> > I think I'm having the same problem as you guys. I run a mythbackend
> where
> > I record cable television and those recordings often crash my system when
> > hardware decoding is enabled. Usually it's just the screen that freezes
> and
> > I can still ssh to it.
> >
> > Kernel 5.1.6 was an exception for me too, with that kernel I'm able to
> > restart the display manager and recover without having to reboot.
> >
> > Attached is a short video that crashes my system. I can trigger the
> alert
> > by running:
> >
> > mpv --vo=vaapi out.ts
> >
> > I'm wondering if it crashes your systems too and if it's related.
>
>
> This one is probably a Mesa issue, see
> https:/
>
> What Mesa version are you using?
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#29 |
Hi everyone,
I have the same issue with a Fiji Nano GPU: UVD6 and VCE3 timeout in ring buffer test @ boot with the AMDGPU driver. Other rings seem to work correctly.
To make sure the hardware functions like it should, and it's not a HW error, where (in the amdgpu driver) can I increase the timeout value?
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#30 |
Created attachment 286575
kernel config 5.4.7 Fiji
Some additional info for my case:
- Running kernel 5.4.7 (vanilla), firmware 20191108 on gentoo
- Dmesg | grep -E "(drm)|(amdgpu)":
[ 3.930023] [drm] amdgpu kernel modesetting enabled.
[ 3.930217] amdgpu 0000:0a:00.0: remove_
[ 3.930219] amdgpu 0000:0a:00.0: remove_
[ 3.930221] amdgpu 0000:0a:00.0: remove_
[ 3.930224] fb0: switching to amdgpudrmfb from EFI VGA
[ 3.930475] [drm] initializing kernel modesetting (FIJI 0x1002:0x7300 0x1002:0x0B36 0xCA).
[ 3.930486] [drm] register mmio base: 0xFCE00000
[ 3.930486] [drm] register mmio size: 262144
[ 3.930495] [drm] add ip block number 0 <vi_common>
[ 3.930495] [drm] add ip block number 1 <gmc_v8_0>
[ 3.930496] [drm] add ip block number 2 <tonga_ih>
[ 3.930497] [drm] add ip block number 3 <gfx_v8_0>
[ 3.930498] [drm] add ip block number 4 <sdma_v3_0>
[ 3.930498] [drm] add ip block number 5 <powerplay>
[ 3.930499] [drm] add ip block number 6 <dm>
[ 3.930500] [drm] add ip block number 7 <uvd_v6_0>
[ 3.930500] [drm] add ip block number 8 <vce_v3_0>
[ 3.930715] [drm] UVD is enabled in physical mode
[ 3.930715] [drm] VCE enabled in physical mode
[ 3.930743] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[ 3.930751] amdgpu 0000:0a:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[ 3.930753] amdgpu 0000:0a:00.0: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF
[ 3.930758] [drm] Detected VRAM RAM=4096M, BAR=256M
[ 3.930759] [drm] RAM width 512bits HBM
[ 3.930838] [drm] amdgpu: 4096M of VRAM memory ready
[ 3.930841] [drm] amdgpu: 4096M of GTT memory ready.
[ 3.930860] [drm] GART: num cpu pages 262144, num gpu pages 262144
[ 3.930928] [drm] PCIE GART of 1024M enabled (table at 0x000000F4001D5
[ 3.934174] [drm] Chained IB support enabled!
[ 3.940198] amdgpu: [powerplay] hwmgr_sw_init smu backed is fiji_smu
[ 3.941748] [drm] Found UVD firmware Version: 1.91 Family ID: 12
[ 3.941752] [drm] UVD ENC is disabled
[ 3.943542] [drm] Found VCE firmware Version: 55.2 Binary ID: 3
[ 4.009146] [drm] dce110_
[ 4.040084] [drm] Display Core initialized with v3.2.48!
[ 4.040542] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 4.040543] [drm] Driver supports precise vblank timestamp query.
[ 4.067774] [drm] UVD initialized successfully.
[ 4.168780] [drm] VCE initialized successfully.
[ 4.170163] [drm] Cannot find any crtc or sizes
[ 4.171948] [drm] Initialized amdgpu 3.35.0 20150101 for 0000:0a:00.0 on minor 0
[ 7.280062] amdgpu 0000:0a:00.0: [drm:amdgpu_
[ 8.400365] amdgpu 0000:0a:00.0: [drm:amdgpu_
[ 8.400370] [drm:process_
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#31 |
Hello, I have the same problem on a Huawei Matebook D lapop, processor is an AMD Ryzen 5 with an integrated Radeon Vega Mobile GPU.
I use Fedora 31. The problem appeared when upgrading from then 5.3.16 kernel to the 5.4.6 kernel. Reverting to 5.3.16 solved the issue.
At some moments the UI (XFCE) freezes for about 5 seconds; I can move the mouse cursor but I can't get any keyboard input (not in X, not by switching console). Each time the freeze occurs dmesg shows the messages
[ 45.530374] [drm:amdgpu_
[ 50.139408] [drm:amdgpu_
I include /proc/cpuinfo and lspci outputs.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#32 |
Created attachment 286899
/proc/cpuinfo
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#33 |
Created attachment 286901
lspci output
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#34 |
Hi. This bug is already reported here by me https:/
If possible try a 5.5-rc kernel and see if it's fixed there. It's fixed - at least for me - in the drm-tree.
Best regards
Matthias
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#35 |
I"m seeing the same issue on Ubuntu 18.04 with
Upstream PPA "sudo add-apt-repository ppa:oibaf/
[ 321.412530] [drm:amdgpu_
[ 326.286306] [drm:amdgpu_
[ 326.286395] [drm:amdgpu_
AMDGPUPRO driver 19.50-967956
[20913.330563] [drm:amdgpu_
[20918.450513] [drm:amdgpu_
[20923.570306] [drm:amdgpu_
[20928.690699] [drm:amdgpu_
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#36 |
Hi,
for me this bug is fixed with a 5.5 kernel. And I'm wondering if this is fixed for all of you, too.
Best
Matthias
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#37 |
I agree. Fixed for me too
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#38 |
I still see them on 5.6.13:
[191571.372560] sd 11:0:0:0: [sde] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00
[205796.424607] [drm:amdgpu_
[205796.424637] [drm:amdgpu_
[205796.424640] amdgpu 0000:0a:00.0: GPU reset begin!
[205800.840504] [drm:amdgpu_
[205800.937565] amdgpu 0000:0a:00.0: GPU reset succeeded, trying to resume
[205800.938060] [drm] PCIE GART of 1024M enabled (table at 0x000000F400900
[205800.938849] [drm] PSP is resuming...
[205800.958729] [drm] reserve 0x400000 from 0xf47f800000 for PSP TMR
[205800.972414] [drm] psp command (0x5) failed and response status is (0xFFFF0007)
[205801.176411] amdgpu 0000:0a:00.0: RAS: ras ta ucode is not available
[205801.460775] [drm] kiq ring mec 2 pipe 1 q 0
[205801.460986] amdgpu 0000:0a:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0002 address=0x800002300 flags=0x0000]
[205801.516698] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[205801.516709] amdgpu 0000:0a:00.0: ring gfx uses VM inv eng 0 on hub 0
[205801.516713] amdgpu 0000:0a:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[205801.516717] amdgpu 0000:0a:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[205801.516720] amdgpu 0000:0a:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[205801.516724] amdgpu 0000:0a:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[205801.516727] amdgpu 0000:0a:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[205801.516730] amdgpu 0000:0a:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[205801.516733] amdgpu 0000:0a:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[205801.516736] amdgpu 0000:0a:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[205801.516740] amdgpu 0000:0a:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[205801.516743] amdgpu 0000:0a:00.0: ring sdma0 uses VM inv eng 0 on hub 1
[205801.516746] amdgpu 0000:0a:00.0: ring vcn_dec uses VM inv eng 1 on hub 1
[205801.516749] amdgpu 0000:0a:00.0: ring vcn_enc0 uses VM inv eng 4 on hub 1
[205801.516752] amdgpu 0000:0a:00.0: ring vcn_enc1 uses VM inv eng 5 on hub 1
[205801.516755] amdgpu 0000:0a:00.0: ring jpeg_dec uses VM inv eng 6 on hub 1
[205801.525996] [drm] recover vram bo from shadow start
[205801.525998] [drm] recover vram bo from shadow done
[205801.526008] [drm] Skip scheduling IBs!
[205801.526051] amdgpu 0000:0a:00.0: GPU reset(1) succeeded!
[205802.536444] [drm:amdgpu_
[205802.536523] [drm:amdgpu_
[205802.536531] amdgpu 0000:0a:00.0: GPU reset begin!
[205806.728558] [drm:amdgpu_
[205806.821326] amdgpu 0000:0a:00.0: GPU reset succeeded, trying to resume
[205806.821578] [drm] PCIE GART of 1024M enabled (table at 0x000000F400900
[205806.821899] [drm] PSP is...
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#39 |
The problem still exists with Linux Kernel 5.8-rc1 from git. (My graphics card is Radeon 5600XT)
[20581.087159] [drm:amdgpu_
[20581.087212] [drm:amdgpu_
[20581.087217] amdgpu 0000:29:00.0: amdgpu: GPU reset begin!
[20583.381257] [drm:amdgpu_
[20585.087232] amdgpu 0000:29:00.0: amdgpu: failed to suspend display audio
[20585.156036] snd_hda_codec_hdmi hdaudioC0D0: HDMI: ELD buf size is 0, force 128
[20585.156052] snd_hda_codec_hdmi hdaudioC0D0: HDMI: invalid ELD data byte 0
[20585.463157] amdgpu 0000:29:00.0: [drm:amdgpu_
[20585.463205] [drm:gfx_
[20585.694999] amdgpu 0000:29:00.0: [drm:amdgpu_
[20585.695047] [drm:gfx_
[20585.926951] [drm:gfx_
[20588.045497] amdgpu 0000:29:00.0: amdgpu: GPU reset succeeded, trying to resume
[20588.045605] [drm] PCIE GART of 512M enabled (table at 0x0000008000E10
[20588.045682] [drm] VRAM is lost due to GPU reset!
[20588.048023] [drm] PSP is resuming...
[20588.218089] [drm] reserve 0x900000 from 0x817e400000 for PSP TMR
[20588.287093] amdgpu 0000:29:00.0: amdgpu: RAS: optional ras ta ucode is not available
[20588.293101] amdgpu: SMU is resuming...
[20588.295088] amdgpu: SMU is resumed successfully!
[20588.413155] [drm] kiq ring mec 2 pipe 1 q 0
[20588.417493] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[20588.417632] [drm] JPEG decode initialized successfully.
[20588.417690] amdgpu 0000:29:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[20588.417693] amdgpu 0000:29:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[20588.417697] amdgpu 0000:29:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[20588.417700] amdgpu 0000:29:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[20588.417703] amdgpu 0000:29:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[20588.417707] amdgpu 0000:29:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[20588.417709] amdgpu 0000:29:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[20588.417713] amdgpu 0000:29:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[20588.417716] amdgpu 0000:29:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[20588.417719] amdgpu 0000:29:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[20588.417721] amdgpu 0000:29:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[20588.417724] amdgpu 0000:29:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[20588.417726] amdgpu 0000:29:00.0: amdgpu: ring vcn_dec uses VM inv eng 0 on hub 1
[20588.417728] amdgpu 0000:29:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 1 on hub 1
[20588.417730] amdgpu 0000:29:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 4 on h...
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#40 |
I've been getting "ring gfx timeouts" for some time, most of the time it's when the computer has not had any input for a while (while I'm away from it). When it freezes I can SSH into it but when I try to do a: "shutdown -h now" it boots me out of SSH as it should but the computer never seems to actually shutdown. The screen stays frozen with whatever was on the display when it froze. Any help would be greatly appreciated, here is my info:
Mobo: AsRock AB350 Pro4 UEFI: 5.80
Video card: Sapphire Nitro+ RX580 (8GB)
Distro: Manjaro
Kernel: 5.7.9-1-MANJARO
Aug 09 21:33:06.054857 kernel: pcieport 0000:00:03.1: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:00.0
Aug 09 21:33:06.068305 kernel: pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=
Aug 09 21:33:06.068636 kernel: pcieport 0000:00:03.1: AER: device [1022:1453] error status/
Aug 09 21:33:06.068863 kernel: pcieport 0000:00:03.1: AER: [21] ACSViol (First)
Aug 09 21:33:06.069137 kernel: amdgpu 0000:0a:00.0: AER: can't recover (no error_detected callback)
Aug 09 21:33:06.069421 kernel: snd_hda_intel 0000:0a:00.1: AER: can't recover (no error_detected callback)
Aug 09 21:33:06.069633 kernel: pcieport 0000:00:03.1: AER: device recovery failed
Aug 09 21:33:16.258283 kernel: [drm:amdgpu_
Aug 09 21:33:16.258412 kernel: [drm:amdgpu_
Aug 09 21:33:16.258446 kernel: amdgpu 0000:0a:00.0: GPU reset begin!
Aug 09 21:33:16.258741 kernel: [drm:amdgpu_
Aug 09 21:33:16.258773 kernel: amdgpu: [powerplay]
Aug 09 21:33:16.258803 kernel: amdgpu: [powerplay]
Aug 09 21:33:16.258835 kernel: amdgpu: [powerplay]
Aug 09 21:33:16.258869 kernel: amdgpu: [powerplay]
Aug 09 21:33:16.258896 kernel: amdgpu: [powerplay]
Aug 09 21:33:16.258925 kernel: amdgpu: [powerplay]
Aug 09 21:33:16.258951 kernel: amdgpu: [powerplay]
Aug 09 21:33:16.258977 kernel: amdgpu: [powerplay]
Aug 09 21:33:16.259009 kernel: amdgpu: [powerplay]
Aug 09 21:33:16.259035 kernel: amdgpu: [powerplay]
Aug 09 21:33:16.259060 kernel: amdgpu: [powerplay]
Aug 09 21:33:16.259084 kernel: amdgpu: [powerplay]
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#41 |
Linux kernel 5.4.61/amd64 /
Radeon RX 560 got the same problem today:
[86631.543134] [drm] Fence fallback timer expired on ring gfx
[86642.133543] [drm:amdgpu_
[86642.133628] [drm:amdgpu_
[86642.133634] amdgpu 0000:41:00.0: GPU reset begin!
[86642.134073] amdgpu: [powerplay]
[86642.134075] amdgpu: [powerplay]
I have never seen a similar problem before.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#42 |
I have this problem with 2 different brand new rx580s in a brand new asus prime-p x570 and an old asus p9x79 with various ubuntu 20.04 kernels 5.4.x - 5.8.x - ...
I wanted to play these games on Linux so badly, the heartbreaking solution is to purchase a windows license... ;_;
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#43 |
I have a similar problem, a cascade of errors that typically starts with one of these:
[drm:amdgpu_
This used to occur only when playing Dauntless, and only after my MSI Radeon RX580 ran hot for a while. Warframe never crashed. Totally different methods of running the games (Dauntless=Lutris and Epic Games Store, Warframe = Steam and Proton). Something then changed after one of the updates within the last month, and now it crashes on both Warframe and Dauntless well before the card is at a high temp. Basically can't run more than about 5 minutes.
I was running Ubuntu 18.04, so I figured maybe a newer kernel would fix this, but updating to 20.10 did nothing but waste a couple of days of reloading everything.
System: Ryzen 5 3600 on Gigabyte x570 UD with a MSI Radeon RX580 8GB
I'm willing to work with whoever sending whatever info/logs are necessary to get this fixed.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#44 |
There doesn't appear to be any progress on this bug, does anyone have any suggestions with regards on how to fix this issue?
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#45 |
(In reply to Randune from comment #39)
> There doesn't appear to be any progress on this bug, does anyone have any
> suggestions with regards on how to fix this issue?
Try to add iommu=pt as parameter
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#46 |
(In reply to j.cordoba from comment #40)
> (In reply to Randune from comment #39)
> > There doesn't appear to be any progress on this bug, does anyone have any
> > suggestions with regards on how to fix this issue?
>
> Try to add iommu=pt as parameter
I'm running Linux Kernel 5.10.9 with those kernel parameters "amdgpu.
Meanwhile i looked at https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#47 |
I made a change a while back. I added:
amdgpu.
as a grub parameter. I have no other (of the many suggested) parameters set:
GRUB_CMDLINE_
The feature mask was used to enable reducing the top speed of my video card to reduce heating, and I was using corectrl for that. However, it was something I had to set manually after each boot. Of course, I forgot to do so, and yet it still stopped occurring. So in reality, I don't think I need that anymore, either.
Just checked my linux logs grepping for "ring gfx". Before the change, I had multiples each day up to Dec 10th. Since then, I've had 3.
Also of note - for the last two, it was when I WASN'T playing. Well, I was playing a game, but I was AFK. It seemed when I returned and did something, it went black then.
Lastly, just to confirm, I checked my change log (my own log), and I did, indeed, make that change on 10 Dec.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#48 |
(In reply to Panagiotis Polychronis from comment #41)
> (In reply to j.cordoba from comment #40)
> > (In reply to Randune from comment #39)
> > > There doesn't appear to be any progress on this bug, does anyone have any
> > > suggestions with regards on how to fix this issue?
> >
> > Try to add iommu=pt as parameter
>
> I'm running Linux Kernel 5.10.9 with those kernel parameters
> "amdgpu.
> amdgpu.
> iommu=pt" My graphics card is Radeon 5600XT and i can confirm that this
> issue still exist :)
> Meanwhile i looked at
> https:/
> there are some patches about ring timeout which i think they aren't yet
> merged for the next Linux Kernel release. Probably Alex Deucher will merge
> them later.
Thanks for the suggestion Panagliotis Polychronis, I've tried that in the past and it didn't seem to help. I'm running Manjaro currently and I'm on the Linux 5.11.rc3 kernel as supposedly there are many changes regarding AMDGPU (I'm not sure if there are many changes for my RX580) but it's worth a shot, I'm basically shooting in the dark at this point :).
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#49 |
Here's another thing I tried which also may have made a difference. Gonna sound weird, but worth a try. I had a 675VA UPS that my system was plugged into. One time, it started shrieking (weird beepish sounds) as I was doing heavy gaming with lots of visual effects going on. I looked it up, and it seems that if your UPS, or your power strip, can't deliver enough power, it can cause the issues with these GPU cards. I mentioned Dec 10th as the date I made the change for my boot parameters, but it's also the date I plugged my system directly into the wall. Responding yesterday reminded me I have a new, more powerful UPS and I plugged my system into that today. I'll see if it changes anything.
P.S. I know the argument...power is power...but it's not. If the surge protector, or UPS has cheap, thin wiring, then that restricts the amount of amps that can flow though them.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#50 |
I still have this issue when I play "Interstellar Marines"
kernel: [drm:amdgpu_
kernel: [drm:amdgpu_
kernel: 5.10.14-200.fc33.
videocard: Radeon HD7770
When this happens, the image freezes, the system stops responding to keypresses but the background music plays for a few minutes and I have to hit <reset>.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#51 |
(In reply to MajorGonzo from comment #44)
> Here's another thing I tried which also may have made a difference. Gonna
> sound weird, but worth a try. I had a 675VA UPS that my system was plugged
> into. One time, it started shrieking (weird beepish sounds) as I was doing
> heavy gaming with lots of visual effects going on. I looked it up, and it
> seems that if your UPS, or your power strip, can't deliver enough power, it
> can cause the issues with these GPU cards. I mentioned Dec 10th as the date
> I made the change for my boot parameters, but it's also the date I plugged
> my system directly into the wall. Responding yesterday reminded me I have a
> new, more powerful UPS and I plugged my system into that today. I'll see if
> it changes anything.
>
> P.S. I know the argument...power is power...but it's not. If the surge
> protector, or UPS has cheap, thin wiring, then that restricts the amount of
> amps that can flow though them.
I had an old PSU, which was repaired once, so I replaced it. That did not resolve the issue. The PSU is connected directly to the wall socket.
Kernel 5.10.18-200.fc33
AMD Ryzen 3 2200G with Radeon Vega Graphics
The bug is most often triggered when using Firefox.
[42174.187004] amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32772, for process firefox pid 21156 thread firefox:cs0 pid 21244)
[42174.187007] amdgpu 0000:06:00.0: amdgpu: in page starting at address 0x0000000000200000 from client 27
[42174.187008] amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTIO
[42174.187009] amdgpu 0000:06:00.0: amdgpu: Faulty UTCL2 client ID: IA (0x2)
[42174.187010] amdgpu 0000:06:00.0: amdgpu: MORE_FAULTS: 0x1
[42174.187010] amdgpu 0000:06:00.0: amdgpu: WALKER_ERROR: 0x0
[42174.187011] amdgpu 0000:06:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[42174.187012] amdgpu 0000:06:00.0: amdgpu: MAPPING_ERROR: 0x0
[42174.187012] amdgpu 0000:06:00.0: amdgpu: RW: 0x0
... (the above messages are repeated many times)
[42184.187655] amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32772, for process firefox pid 21156 thread firefox:cs0 pid 21244)
[42184.187656] amdgpu 0000:06:00.0: amdgpu: in page starting at address 0x0000000000200000 from client 27
[42184.187656] amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTIO
[42184.187657] amdgpu 0000:06:00.0: amdgpu: Faulty UTCL2 client ID: IA (0x2)
[42184.187657] amdgpu 0000:06:00.0: amdgpu: MORE_FAULTS: 0x1
[42184.187658] amdgpu 0000:06:00.0: amdgpu: WALKER_ERROR: 0x0
[42184.187658] amdgpu 0000:06:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[42184.187659] amdgpu 0000:06:00.0: amdgpu: MAPPING_ERROR: 0x0
[42184.187660] amdgpu 0000:06:00.0: amdgpu: RW: 0x0
[42184.328388] [drm:amdgpu_
[42184.328538] [drm:amdgpu_
[42184.328542] amdgpu 0000:06:00.0: amdgpu: GPU reset begin!
[42184.330868] amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x10cd079a0 ...
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#52 |
I have something very similar with my Vega56. I can reproduce it with Win10 too.
I think it's an AMD Hw issue.
march 28 15:07:35 PC-home kernel: [drm:amdgpu_
march 28 15:07:35 PC-home kernel: qcm fence wait loop timeout expired
march 28 15:07:35 PC-home kernel: The cp might be in an unrecoverable state due to an unsuccessful queues preemption
march 28 15:07:35 PC-home kernel: amdgpu: Failed to evict process queues
march 28 15:07:35 PC-home kernel: amdgpu 0000:0a:00.0: amdgpu: GPU reset begin!
march 28 15:07:35 PC-home kernel: amdgpu: Failed to quiesce KFD
march 28 15:07:35 PC-home kernel: [drm:amdgpu_
march 28 15:07:35 PC-home kernel: [drm:amdgpu_
march 28 15:07:35 PC-home kernel: amdgpu 0000:0a:00.0: amdgpu: GPU reset begin!
march 28 15:07:35 PC-home kernel: amdgpu 0000:0a:00.0: amdgpu: Bailing on TDR for s_job:869c2, as another already in progress
march 28 15:07:36 PC-home kernel: [drm:amdgpu_
march 28 15:07:36 PC-home kernel: [drm:amdgpu_
march 28 15:07:36 PC-home kernel: amdgpu 0000:0a:00.0: amdgpu: GPU reset begin!
march 28 15:07:36 PC-home kernel: amdgpu 0000:0a:00.0: amdgpu: Bailing on TDR for s_job:4f80, as another already in progress
march 28 15:07:39 PC-home kernel: amdgpu 0000:0a:00.0: amdgpu: failed to suspend display audio
march 28 15:07:39 PC-home kernel: BUG: unable to handle page fault for address: ffffa9c54bb4f910
march 28 15:07:39 PC-home kernel: #PF: supervisor write access in kernel mode
march 28 15:07:39 PC-home kernel: #PF: error_code(0x0002) - not-present page
march 28 15:07:39 PC-home kernel: PGD 100000067 P4D 100000067 PUD 1001b9067 PMD 1cdabb067 PTE 0
march 28 15:07:39 PC-home kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
march 28 15:07:39 PC-home kernel: CPU: 9 PID: 8586 Comm: kworker/9:0 Tainted: G OE 5.11.6-1-MANJARO #1
march 28 15:07:39 PC-home kernel: Hardware name: System manufacturer System Product Name/PRIME A320M-K, BIOS 5603 10/14/2020
march 28 15:07:39 PC-home kernel: Workqueue: events kfd_process_
march 28 15:07:39 PC-home kernel: RIP: 0010:amdgpu_
march 28 15:07:39 PC-home kernel: Code: 1f 44 00 00 31 c0 ba 01 00 00 00 f0 0f b1 97 f4 77 01 00 45 31 c0 85 c0 75 64 53 48 89 fb 48 8d bf 00 78 01 00 e8 e7 16 27 c9 <f0> ff 83 40 >
march 28 15:07:39 PC-home kernel: RSP: 0018:ffffa9c54c
march 28 15:07:39 PC-home kernel: RAX: ffff951f0c155dc0 RBX: ffffa9c54bb495d0 RCX: 0000000000000001
march 28 15:07:39 PC-home kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa9c54bb60dd0
march 28 15:07:39 PC-home kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
march 28 15:07:39 PC-home kernel: R10: 0000000000000003 R11: 0000000000000000 R12: ffffa9c54bb495d0
march 28 15:07:39 PC-home...
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#53 |
This seems to be a firmware(-related) problem. After downgrading to linux firmware 2020-09-18, I'm running 6 days without a crash on the same work loads. (I was getting multiple crashes per day before).
My GPU is Vega8 Mobile (ThinkPad A485). Currently running 5.13.11.
An extensive discussion of different firmware versions in the context of a similar issue on Arch Forums: https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#54 |
Ryzen 4700U same error. openSUSE Tumbleweed
X11
Kernel version is 5.14.14
Mesa version is 21.2.5-293.2
Firmware version is 20211027-1.1
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#55 |
(In reply to i-am-not-a-robot from comment #48)
> This seems to be a firmware(-related) problem. After downgrading to linux
> firmware 2020-09-18, I'm running 6 days without a crash on the same work
> loads. (I was getting multiple crashes per day before).
Did you test any other versions? Was 09-18 the last working release?
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#56 |
A possible solution is to pass
amdgpu.dpm=0
as a kernel launch option.
However: this kills fps in many games and probably anything that depends on the gpu for rendering.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#57 |
I can confirm that
amdgpu.dpm=0
removes the issue
on an AMD Radeon PRO FIJI (Dual Fury) kernel: 5.15.10|FW: 20211027.
Works perfectly fine in Gnome as long as there is no application accessing the 2nd GPU.
When opening Radeon-profile as long as card0 is selected, there is no issue but as soon as I select card1 I get instantly
Dec 22 21:15:46 Workstation kernel: amdgpu:
Dec 22 21:15:49 Workstation kernel: amdgpu:
The application Radeon-profile freezes but desktop is still responsive.
When opening CS:GO with mangohud and configuring either
pci_dev = 0000:3d:00.0 # primary card works fine
or
pci_dev = 0000:3e:00.0 # secondary card, errors from above occur and CS:GO loads super slow and after menu is visible it is stuck
When CSM is disabled in BIOS I have 2 GPUs
Dec 22 20:45:50 Workstation kernel: [drm] amdgpu kernel modesetting enabled.
Dec 22 20:45:50 Workstation kernel: amdgpu: CRAT table not found
Dec 22 20:45:50 Workstation kernel: amdgpu: Virtual CRAT table created for CPU
Dec 22 20:45:50 Workstation kernel: amdgpu: Topology: Add CPU node
Dec 22 20:45:50 Workstation kernel: amdgpu 0000:3d:00.0: vgaarb: deactivate vga console
Dec 22 20:45:50 Workstation kernel: amdgpu 0000:3d:00.0: enabling device (0106 -> 0107)
Dec 22 20:45:50 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
Dec 22 20:45:50 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: Fetched VBIOS from ROM BAR
Dec 22 20:45:50 Workstation kernel: amdgpu: ATOM BIOS: 113-C88801MS-102
Dec 22 20:45:50 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
Dec 22 20:45:50 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF
Dec 22 20:45:50 Workstation kernel: [drm] amdgpu: 4096M of VRAM memory ready
Dec 22 20:45:50 Workstation kernel: [drm] amdgpu: 4096M of GTT memory ready.
Dec 22 20:45:50 Workstation kernel: amdgpu: hwmgr_sw_init smu backed is fiji_smu
Dec 22 20:45:50 Workstation kernel: snd_hda_intel 0000:3d:00.1: bound 0000:3d:00.0 (ops amdgpu_
Dec 22 20:45:50 Workstation kernel: [drm:retrieve_
Dec 22 20:45:50 Workstation kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on gart
Dec 22 20:45:50 Workstation kernel: amdgpu: Virtual CRAT table created for GPU
Dec 22 20:45:50 Workstation kernel: amdgpu: Topology: Add dGPU node [0x7300:0x1002]
Dec 22 20:45:50 Workstation kernel: kfd kfd: amdgpu: added device 1002:7300
Dec 22 20:45:50 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 16, active_cu_number 64
Dec 22 20:45:50 Workstation kernel: fbcon: amdgpu (fb0) is primary device
Dec 22 20:45:51 Workstation kernel: amdgpu 0000:3d:00.0: [drm] fb0: amdgpu frame buffer device
Dec 22 20:45:51 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: Using BACO for runtime pm
Dec 22 20:45:51 Workstation kernel: [drm] Initialize...
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#58 |
(In reply to roman from comment #52)
> I can confirm that
> amdgpu.dpm=0
> removes the issue
> on an AMD Radeon PRO FIJI (Dual Fury) kernel: 5.15.10|FW:
> 20211027.
>
> Works perfectly fine in Gnome as long as there is no application accessing
> the 2nd GPU.
In sourse games it works fine for me but in many non-source games it'll just fucking die.
Anyways, now I cant boot withouth dpm, it freezes, meaning that source games will crash, along with Risk of Rain 2 and others.
> Hopefully @Alex can do/forward this since this is a P1 blocking issue and
> open for 3 years.
I can only hope it gets fixed one day soon.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#59 |
I can confirm as well that disabling dynamic power management with the amdgpu.drm=0 kernel parameter removes the issue with Dishonored 2 on Ubuntu 21.10, kernel 5.13.0, Radeon RX 580 with Mesa 21.2.2.
Same boat as Spencer: hope it gets fixed one day.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#60 |
I don't know if it's related, but my display freaks out before shutting off. It's still on, and it doesn't reboot when I do it by SSH. I have to do it on the desktop itself.
Jan 22 06:17:30 Y4M1-II kernel: [drm:amdgpu_
Jan 22 06:17:30 Y4M1-II kernel: [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
Jan 22 06:17:30 Y4M1-II kernel: [drm:psp_hw_start [amdgpu]] *ERROR* PSP create ring failed!
Jan 22 06:17:30 Y4M1-II kernel: [drm] PSP is resuming...
Jan 22 06:17:30 Y4M1-II kernel: [drm] VRAM is lost due to GPU reset!
Jan 22 06:17:30 Y4M1-II kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000753
Jan 22 06:17:30 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset succeeded, trying to resume
Jan 22 06:17:26 Y4M1-II kernel: [drm:amdgpu_
Jan 22 06:17:19 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU smu mode1 reset
Jan 22 06:17:19 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU mode1 reset
Jan 22 06:17:19 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: MODE1 reset
Jan 22 06:17:19 Y4M1-II kernel: [drm:amdgpu_
Jan 22 06:17:19 Y4M1-II kernel: [drm:psp_suspend [amdgpu]] *ERROR* Failed to terminate ras ta
Jan 22 06:17:19 Y4M1-II kernel: [drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x0)
Jan 22 06:17:16 Y4M1-II kernel: [drm:amdgpu_
Jan 22 06:17:16 Y4M1-II kernel: [drm:amdgpu_
Jan 22 06:17:15 Y4M1-II kernel: [drm] REG_WAIT timeout 1us * 200 tries - hubp2_set_blank line:950
Jan 22 06:17:15 Y4M1-II kernel: [drm] REG_WAIT timeout 1us * 200 tries - hubp2_set_blank line:950
Jan 22 06:17:15 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: Failed to disable gfxoff!
Jan 22 06:17:15 Y4M1-II kernel: [drm:drm_
Jan 22 06:17:15 Y4M1-II kernel: [drm:drm_
Jan 22 06:17:10 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: Bailing on TDR for s_job:18e3f, as another already in progress
Jan 22 06:17:10 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 22 06:17:10 Y4M1-II kernel: [drm:amdgpu_
Jan 22 06:17:10 Y4M1-II kernel: [drm:amdgpu_
Jan 22 06:17:10 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 22 06:17:10 Y4M1-II kernel: [drm:amdgpu_
Jan 22 06:17:10 Y4M1-II kernel: [drm:amdgpu_
Jan 22 06:17:10 Y4M1-II kernel: [drm:amdgpu_
Jan 22 06:17:05 Y4M1-II kernel: [drm:amdgpu_
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#61 |
Another instance, when my desktop has been idle for a while and the display has been shut off for a while, the display won't come back on. Here's the journal entry I think is relevant to this:
Jan 22 08:07:58 Y4M1-II kernel: "echo 0 > /proc/sys/
Jan 22 08:07:58 Y4M1-II kernel: Tainted: G OE 5.15.11-
Jan 22 08:07:58 Y4M1-II kernel: INFO: task Xorg:1692 blocked for more than 120 seconds.
Jan 22 08:07:58 Y4M1-II kernel: </TASK>
Jan 22 08:07:58 Y4M1-II kernel: ret_from_
Jan 22 08:07:58 Y4M1-II kernel: ? set_kthread_
Jan 22 08:07:58 Y4M1-II kernel: ? process_
Jan 22 08:07:58 Y4M1-II kernel: kthread+0x11e/0x140
Jan 22 08:07:58 Y4M1-II kernel: worker_
Jan 22 08:07:58 Y4M1-II kernel: process_
Jan 22 08:07:58 Y4M1-II kernel: drm_sched_
Jan 22 08:07:58 Y4M1-II kernel: amdgpu_
Jan 22 08:07:58 Y4M1-II kernel: amdgpu_
Jan 22 08:07:58 Y4M1-II kernel: ? drm_fb_
Jan 22 08:07:58 Y4M1-II kernel: amdgpu_
Jan 22 08:07:58 Y4M1-II kernel: amdgpu_
Jan 22 08:07:58 Y4M1-II kernel: amdgpu_
Jan 22 08:07:58 Y4M1-II kernel: ? amdgpu_
Jan 22 08:07:58 Y4M1-II kernel: ? nv_common_
Jan 22 08:07:58 Y4M1-II kernel: dm_suspend+
Jan 22 08:07:58 Y4M1-II kernel: mutex_lock+
Jan 22 08:07:58 Y4M1-II kernel: __mutex_
Jan 22 08:07:58 Y4M1-II kernel: __mutex_
Jan 22 08:07:58 Y4M1-II kernel: schedule_
Jan 22 08:07:58 Y4M1-II kernel: schedule+0x4e/0xb0
Jan 22 08:07:58 Y4M1-II kernel: __schedule+
Jan 22 08:07:58 Y4M1-II kernel: <TASK>
Jan 22 08:07:58 Y4M1-II kernel: Call Trace:
Jan 22 08:07:58 Y4M1-II kernel: Workqueue: events drm_sched_
Jan 22 08:07:58 Y4M1-II kernel: task:kworker/12:1 state:D stack: 0 pid: 246 ppid: 2 flags:0x00004000
Jan 22 08:07:58 Y4M1-II kernel: "echo 0 > /proc/sys/
Jan 22 08:07:58 Y4M1-II kernel: Tainted: G OE 5.15.11-
Jan 22 08:07:58 Y4M1-II kernel: INFO: task kworker/12:1:246 blocked for more than 120 seconds.
Jan 22 08:05:24 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: Bailing on TDR for s_job:1123, as another already in progress
Jan 22 08:05:24 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: Bailing on TDR for s_job:43c, as another already in progress
Jan 22 08:05:24 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 22 08:05:24 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 22 08:05:24 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 22 08:05:2...
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#62 |
Created attachment 300315
Kernel config
OS: Gentoo
Kernel: 5.15.16, config attached, built with make -j12
Launch options: root=/dev/sda2 ro quiet
I'd like to be able to boot with amdgpu.dpm=0, as this seems to fix the bug with minor tradeoffs, however:
When I boot with dpm disabled, my screen will freeze and leave this nice little stinker to ruin my day
Jan 24 16:33:05 [kernel] [ 2.572474] Loading firmware: amdgpu/
Jan 24 16:33:05 [kernel] [ 2.572475] Loading firmware: amdgpu/
Jan 24 16:33:05 [kernel] [ 2.572476] Loading firmware: amdgpu/
Jan 24 16:33:05 [kernel] [ 2.572477] Loading firmware: amdgpu/
Jan 24 16:33:05 [kernel] [ 2.572477] Loading firmware: amdgpu/
Jan 24 16:33:05 [kernel] [ 2.572478] Loading firmware: amdgpu/
Jan 24 16:33:05 [kernel] [ 2.572968] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: discard. Quota mode: none.
Jan 24 16:33:05 [kernel] [ 2.573030] Loading firmware: amdgpu/
Jan 24 16:33:05 [kernel] [ 2.573032] Loading firmware: amdgpu/
Jan 24 16:33:05 [kernel] [ 2.573071] Loading firmware: amdgpu/
Jan 24 16:33:05 [kernel] [ 2.573072] [drm] Found VCN firmware Version ENC: 1.14 DEC: 5 VEP: 0 Revision: 20
Jan 24 16:33:05 [kernel] [ 2.573075] amdgpu 0000:28:00.0: amdgpu: Will use PSP to load VCN firmware
Jan 24 16:33:05 [kernel] [ 2.747244] [drm] reserve 0x900000 from 0x817e400000 for PSP TMR
Jan 24 16:33:05 [kernel] [ 2.785931] amdgpu 0000:28:00.0: amdgpu: RAS: optional ras ta ucode is not available
Jan 24 16:33:05 [kernel] [ 2.790137] amdgpu 0000:28:00.0: amdgpu: RAP: optional rap ta ucode is not available
Jan 24 16:33:05 [kernel] [ 2.790138] amdgpu 0000:28:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Jan 24 16:33:05 [kernel] [ 2.790140] amdgpu: smu firmware loading failed
Jan 24 16:33:05 [kernel] [ 2.790141] amdgpu 0000:28:00.0: amdgpu: amdgpu_
Jan 24 16:33:05 [kernel] [ 2.790143] amdgpu 0000:28:00.0: amdgpu: Fatal error during GPU init
Jan 24 16:33:05 [kernel] [ 2.790144] amdgpu 0000:28:00.0: amdgpu: amdgpu: finishing device.
Jan 24 16:33:05 [kernel] [ 2.793726] [drm] free PSP TMR buffer
Jan 24 16:33:05 [kernel] [ 2.825874] amdgpu: probe of 0000:28:00.0 failed with error -95
Jan 24 16:33:05 [kernel] [ 2.825951] BUG: unable to handle page fault for address: ffffa4af5100d000
Jan 24 16:33:05 [kernel] [ 2.825954] #PF: supervisor write access in kernel mode
Jan 24 16:33:05 [kernel] [ 2.825955] #PF: error_code(0x0002) - not-present page
Jan 24 16:33:05 [kernel] [ 2.825957] PGD 100000067 P4D 100000067 PUD 100104067 PMD 0
Jan 24 16:33:05 [kernel] [ 2.825960] Oops: 0002 [#1] SMP NOPTI
Jan 24 16:33:05 [kernel] [ 2.825962] CPU: 6 PID: 759 Comm: systemd-udevd Not tainted 5.15.16-gentoo #8
Jan 24 16:33:05 [kernel] [ 2.825965] Hardware name: Micro-Star International Co., Ltd MS-7B86/B450 GAMING PLUS MAX (MS-7B86), BIOS H.60 04/18/2020
Jan 24 16:33:05 [kernel] [ 2.825967] RIP: 0010:vcn_
Jan 24 16:33:05 [kernel] [ 2.826139] C...
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#63 |
> Jan 24 16:33:05 [kernel] [ 2.785931] amdgpu 0000:28:00.0: amdgpu: RAS:
> optional ras ta ucode is not available
> Jan 24 16:33:05 [kernel] [ 2.790137] amdgpu 0000:28:00.0: amdgpu: RAP:
> optional rap ta ucode is not available
> Jan 24 16:33:05 [kernel] [ 2.790138] amdgpu 0000:28:00.0: amdgpu:
> SECUREDISPLAY: securedisplay ta ucode is not available
> Jan 24 16:33:05 [kernel] [ 2.790140] amdgpu: smu firmware loading failed
> Jan 24 16:33:05 [kernel] [ 2.790141] amdgpu 0000:28:00.0: amdgpu:
> amdgpu_
> Jan 24 16:33:05 [kernel] [ 2.790143] amdgpu 0000:28:00.0: amdgpu: Fatal
> error during GPU init
Is this a custom built kernel? Is amdgpu built into the kernel or enabled as a module? In the former case, is all required firmware also built into the kernel? In the later case, is all required firmware available on the initramfs (if amdgpu is incorporated in the initramfs)? The required firmware files are listed here: https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#64 |
>Is this a custom built kernel? Is amdgpu built into the kernel or enabled as a
>module? In the former case, is all required firmware also built into the
>kernel? In the later case, is all required firmware available on the initramfs
>(if amdgpu is incorporated in the initramfs)? The required firmware files are
>listed here:
It's a custom, but I have them all builtin.
>grep navi10 .config && echo
>amdgpu/
amdgpu/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#65 |
As an append to both comments, a working boot spits out this:
Loading firmware: amdgpu/
Loading firmware: amdgpu/
Loading firmware: amdgpu/
amdgpu 0000:28:00.0: amdgpu: PSP runtime database doesn't exist
Loading firmware: amdgpu/
Loading firmware: amdgpu/
Loading firmware: amdgpu/
Loading firmware: amdgpu/
Loading firmware: amdgpu/
Loading firmware: amdgpu/
Loading firmware: amdgpu/
Loading firmware: amdgpu/
Loading firmware: amdgpu/
Loading firmware: amdgpu/
amdgpu 0000:28:00.0: amdgpu: Will use PSP to load VCN firmware
amdgpu 0000:28:00.0: amdgpu: RAS: optional ras ta ucode is not available
amdgpu 0000:28:00.0: amdgpu: RAP: optional rap ta ucode is not available
amdgpu 0000:28:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
amdgpu 0000:28:00.0: amdgpu: use vbios provided pptable
amdgpu 0000:28:00.0: amdgpu: smc_dpm_info table revision(
amdgpu 0000:28:00.0: amdgpu: SMU is initialized successfully!
kfd kfd: amdgpu: Allocated 3969056 bytes on gart
amdgpu: HMM registered 6128MB device memory
amdgpu: SRAT table not found
amdgpu: Virtual CRAT table created for GPU
amdgpu: Topology: Add dGPU node [0x731f:0x1002]
kfd kfd: amdgpu: added device 1002:731f
amdgpu 0000:28:00.0: amdgpu: SE 2, SH per SE 2, CU per SH 10, active_cu_number 36
fbcon: amdgpudrmfb (fb0) is primary device
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#66 |
Chiming in as another victim of:
[drm:amdgpu_
Radeon RX 6700 XT (NAVY_FLOUNDER, DRM 3.42.0, 5.15.15-
AMD Ryzen 9 5900X
Ubuntu Mate
Mesa 21.2.2
Haven't attempted the amdgpu.dpm=0 workaround because the side effects of it appear to be bad.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#67 |
I've been getting "ring gfx timeouts" for some time (See comment 35), most of the time it's when the computer has not had any input for a while (while I'm away from it). When it freezes I can SSH into it but when I try to do a: "shutdown -h now" it boots me out of SSH as it should but the computer never seems to actually shutdown.
I've tried many different kernel parameters but no luck so far. I'm now trying the amdgpu.runpm=0 as suggested here: https:/
For my system specs see my previous comment 35.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#68 |
(In reply to Jon from comment #61)
> Chiming in as another victim of:
> [drm:amdgpu_
>
> Radeon RX 6700 XT (NAVY_FLOUNDER, DRM 3.42.0, 5.15.15-
> 12.0.1)
> AMD Ryzen 9 5900X
> Ubuntu Mate
> Mesa 21.2.2
>
> Haven't attempted the amdgpu.dpm=0 workaround because the side effects of it
> appear to be bad.
I've tried amdgpu.dpm=0 and it seriously kills the frame rate in super tux kart at least.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#69 |
(In reply to Jon from comment #61)
> Chiming in as another victim of:
> [drm:amdgpu_
>
This is just a symptom of an application trying to use the GPU after a GPU reset without re-initializing it's context. The cause of a GPU reset can be a lot of things. If you have different hardware from other people on this ticket, it's not likely the same issue.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#70 |
I have same bug with firefox (happened once a day, starting about a week ago)
[ 4409.071226] BUG: unable to handle page fault for address: fffffffffffffff8
[ 4409.071234] #PF: supervisor read access in kernel mode
[ 4409.071235] #PF: error_code(0x0000) - not-present page
[ 4409.071237] PGD 427e12067 P4D 427e12067 PUD 427e14067 PMD 0
[ 4409.071240] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 4409.071242] CPU: 18 PID: 191 Comm: uvd Tainted: G OE 5.16.8uksm #1
[ 4409.071245] Hardware name: Hewlett-Packard HP Z420 Workstation/1589, BIOS J61 v03.96 10/29/2019
[ 4409.071246] RIP: 0010:swake_
[ 4409.071251] Code: ff ff ff eb ad 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 48 8b 57 08 48 8d 47 08 48 39 c2 74 25 53 48 8b 5f 08 <48> 8b 7b f8 e8 80 7f fe ff 48 8b 13 48 8b 43 08 48 89 42 08 48 89
[ 4409.071253] RSP: 0018:ffffbbdf01
[ 4409.071254] RAX: ffff9719549270b0 RBX: 0000000000000000 RCX: 0000000000000000
[ 4409.071256] RDX: 0000000000000000 RSI: ffff97185d547250 RDI: ffff9719549270a8
[ 4409.071257] RBP: ffff9719549270a8 R08: ffff9716473efec0 R09: ffff9716473efed8
[ 4409.071258] R10: ffff971646cc3000 R11: ffff971646cc3000 R12: 0000000000000286
[ 4409.071259] R13: ffff9716473eebe0 R14: ffff9716ee901bc0 R15: ffff9719549270a0
[ 4409.071260] FS: 000000000000000
[ 4409.071262] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4409.071263] CR2: fffffffffffffff8 CR3: 0000000427e10006 CR4: 00000000001706e0
[ 4409.071264] Call Trace:
[ 4409.071267] <TASK>
[ 4409.071269] complete+0x2f/0x40
[ 4409.071271] drm_sched_
[ 4409.071274] ? wait_woken+
[ 4409.071289] ? drm_sched_
[ 4409.071290] kthread+0x169/0x190
[ 4409.071294] ? set_kthread_
[ 4409.071297] ret_from_
[ 4409.071301] </TASK>
[ 4409.071302] Modules linked in: xt_conntrack nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter cmac rfcomm vboxnetadp(OE) vboxnetflt(OE) iptable_mangle xt_CHECKSUM xt_tcpudp iptable_nat xt_comment xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc overlay iptable_filter vboxdrv(OE) bnep cpufreq_powersave zram binfmt_misc squashfs snd_emu10k1_synth snd_hda_
[ 4409.071342] wmi mac_hid xpad ff_memless coretemp mei_me mei hwmon_vid i5500_temp msr ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid crc32_pclmul ghash_clmulni_intel aesni_intel e1000e psmou...
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#71 |
So I've been running for about 2.5 weeks now using the amdgpu.runpm=0 kernel parameter and I've had no crashes or freezes so far. I'm cautiously optimistic that for me at least this may have solved the problem. So far I haven't noticed any side effects (performance degradation etc.).
I understand that amdgpu.runpm=0 is related to power management but I don't know the specifics. Possibly Alex Deucher can chime in and specify exactly what this parameter does?
See my previous comments for some context:
comment 35
comment 62
comment 63
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#72 |
(In reply to Randune from comment #66)
>
> I understand that amdgpu.runpm=0 is related to power management but I don't
> know the specifics. Possibly Alex Deucher can chime in and specify exactly
> what this parameter does?
The runpm parameter allows you to disable runtime power management which powers down dGPUs at runtime if they are not being used (e.g., hybrid graphics laptops or desktop systems with multiple GPUs) to save power. It does not affect dynamic power management while the chip is powered up. Disabling it will increase idle power usage.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#73 |
Had this problem with Ryzen3 3200 CPU (Vega8 integrated) on A320M-DVS R4.0 motherboard.
microcode: CPU: patch_level=
microcode: Microcode Update Driver: v2.2.
I had 100% scenario to trigger freeze:
1. play video (in webbrowser or video player, should stay visible(dont hide tab or minimize window))
2. open shadertoy website (any shader, keep it rendering also keep window visible)
3. open any OpenGL or Vulkan application (that use integrated GPU)
4. start pressing fullscreen/
... and freeze
I use this PC for 2 years, every Linux kernel had this "freeze" when used integrated GPU. Current kernel OpenSuse 5.17.4-1-default.
(my solution for all this time was obvious - disable integrated GPU in BIOS and use discrete only, and everything works)
Today I checked motherboard website - https:/
So I updated BIOS to 7.00 and 7.10 (now)... and everything works - no freezes anymore.
So it was firmware problem (atleast for me) that fixed by BIOS update.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#74 |
Edit - got freeze after using PC for 4 hours, before it was 20 min longest time I could use integrated GPU, so it not fixed completely look like, just some improvement(or I just got lucky)... im back to use Discrete GPU.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#75 |
My Ubuntu 20.04 desktop is crashing several times per day due to this bug since I've upgraded my computer from an old Intel Xeon to an AMD Ryzen 9 5900X on a B550 mainboard. I've had the same AMD RX Vega 56 graphics card in both computers, so I assume this is probably more related to the mainboard/CPU than to the graphics card.
The crashes from today:
```
martin@martin ~ % grep amdgpu /var/log/syslog | grep ERROR | grep -v 'Failed to initialize parser'
Jun 11 03:15:33 martin kernel: [21494.642889] [drm:amdgpu_
Jun 11 03:15:33 martin kernel: [21494.643055] [drm:amdgpu_
Jun 11 03:15:50 martin kernel: [21511.795007] [drm:amdgpu_
Jun 11 03:15:50 martin kernel: [21511.795174] [drm:amdgpu_
Jun 11 15:56:07 martin kernel: [ 1477.069969] [drm:amdgpu_
Jun 11 15:56:07 martin kernel: [ 1477.070140] [drm:amdgpu_
Jun 11 15:56:22 martin kernel: [ 1492.174077] [drm:amdgpu_
Jun 11 15:56:22 martin kernel: [ 1492.174248] [drm:amdgpu_
Jun 11 16:03:28 martin kernel: [ 1918.161101] [drm:amdgpu_
Jun 11 16:03:28 martin kernel: [ 1918.161271] [drm:amdgpu_
Jun 11 16:03:49 martin kernel: [ 1938.385307] [drm:amdgpu_
Jun 11 16:03:49 martin kernel: [ 1938.385479] [drm:amdgpu_
Jun 11 23:28:12 martin kernel: [25491.854294] [drm:amdgpu_
Jun 11 23:28:12 martin kernel: [25491.854460] [drm:amdgpu_
Jun 11 23:28:28 martin kernel: [25507.982446] [drm:amdgpu_
Jun 11 23:28:28 martin kernel: [25507.982613] [drm:amdgpu_
Jun 11 23:29:51 martin kernel: [25591.333483] amdgpu 0000:2d:00.0: amdgpu: WALKER_ERROR: 0x0
Jun 11 23:29:51 martin kernel: [25591.333485] amdgpu 0000:2d:00.0: amdgpu: MAPPING_ERROR: 0x0
Jun 11 23:30:01 martin kernel: [25601.412838] [drm:amdgpu_
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#76 |
(In reply to Martin von Wittich from comment #70)
> My Ubuntu 20.04 desktop is crashing several times per day due to this bug
> since I've upgraded my computer from an old Intel Xeon to an AMD Ryzen 9
> 5900X on a B550 mainboard. I've had the same AMD RX Vega 56 graphics card in
> both computers, so I assume this is probably more related to the
> mainboard/CPU than to the graphics card.
>
> The crashes from today:
>
> ```
> martin@martin ~ % grep amdgpu /var/log/syslog | grep ERROR | grep -v 'Failed
> to initialize parser'
> Jun 11 03:15:33 martin kernel: [21494.642889] [drm:amdgpu_
> [amdgpu]] *ERROR* ring gfx timeout, signaled seq=1750601, emitted seq=1750603
> Jun 11 03:15:33 martin kernel: [21494.643055] [drm:amdgpu_
> [amdgpu]] *ERROR* Process information: process firefox pid 5037 thread
> firefox:cs0 pid 5123
> Jun 11 03:15:50 martin kernel: [21511.795007] [drm:amdgpu_
> [amdgpu]] *ERROR* ring gfx timeout, signaled seq=1750605, emitted seq=1750608
> Jun 11 03:15:50 martin kernel: [21511.795174] [drm:amdgpu_
> [amdgpu]] *ERROR* Process information: process firefox pid 5037 thread
> firefox:cs0 pid 5123
> Jun 11 15:56:07 martin kernel: [ 1477.069969] [drm:amdgpu_
> [amdgpu]] *ERROR* ring gfx timeout, signaled seq=216293, emitted seq=216295
> Jun 11 15:56:07 martin kernel: [ 1477.070140] [drm:amdgpu_
> [amdgpu]] *ERROR* Process information: process firefox pid 5237 thread
> firefox:cs0 pid 5302
> Jun 11 15:56:22 martin kernel: [ 1492.174077] [drm:amdgpu_
> [amdgpu]] *ERROR* ring gfx timeout, signaled seq=216297, emitted seq=216300
> Jun 11 15:56:22 martin kernel: [ 1492.174248] [drm:amdgpu_
> [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
> Jun 11 16:03:28 martin kernel: [ 1918.161101] [drm:amdgpu_
> [amdgpu]] *ERROR* ring gfx timeout, signaled seq=264406, emitted seq=264408
> Jun 11 16:03:28 martin kernel: [ 1918.161271] [drm:amdgpu_
> [amdgpu]] *ERROR* Process information: process firefox pid 10569 thread
> firefox:cs0 pid 10633
> Jun 11 16:03:49 martin kernel: [ 1938.385307] [drm:amdgpu_
> [amdgpu]] *ERROR* ring gfx timeout, signaled seq=264410, emitted seq=264413
> Jun 11 16:03:49 martin kernel: [ 1938.385479] [drm:amdgpu_
> [amdgpu]] *ERROR* Process information: process firefox pid 10569 thread
> firefox:cs0 pid 10633
> Jun 11 23:28:12 martin kernel: [25491.854294] [drm:amdgpu_
> [amdgpu]] *ERROR* ring gfx timeout, signaled seq=2390985, emitted seq=2390987
> Jun 11 23:28:12 martin kernel: [25491.854460] [drm:amdgpu_
> [amdgpu]] *ERROR* Process information: process firefox pid 4922 thread
> firefox:cs0 pid 4989
> Jun 11 23:28:28 martin kernel: [25507.982446] [drm:amdgpu_
> [amdgpu]] *ERROR* ring gfx timeout, signaled seq=2390989, emitted seq=2390992
> Jun 11 23:28:28 martin kernel: [25507.982613] [drm:amdgpu_
> [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
> Jun 11 23:29:51 martin kernel: [25591.333483] amdgpu 0000:2d:00.0: amdgpu:
> WALKER_ERROR: 0x0
> Jun 11 23:29:51 martin kernel: [25591.333485] am...
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#77 |
I can confirm that adding "amdgpu.dpm=0" to the kernel command line seems to resolve this issue - I enabled that option on 2022-06-12 13:24, and my system didn't crash at all on 2022-06-12 - 2022-06-14 (I was on vacation from 2022-06-15 on and didn't use my computer from then on).
I don't use Linux for gaming and therefore can't comment how badly this affects gaming performance, but I did notice mpv could no longer play 1080p x264 video without stuttering when it defaults to --vo=gpu. Using another --vo like sdl seems to be a viable workaround.
> Did you try with the latest Linux Kernel? I had a lot of gpu lockups like this. Also try these kernel parameters : "amdgpu.
I'll try these next.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#78 |
Sorry, forgot to mention in my last post and now can't edit: interestingly enough, the attached video "5 second video clip that triggers a crash" still successfully triggers the crash.
Seems to me like the root issue isn't actually in the dynamic power management code, but somewhere else, and the DPM is just one of several things that can trigger it?
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#79 |
> Did you try with the latest Linux Kernel? I had a lot of gpu lockups like this. Also try these kernel parameters : "amdgpu.
I can confirm that at least on the current Ubuntu linux-image-
```
martin@martin ~ % uname -a
Linux martin 5.14.0-1042-oem #47-Ubuntu SMP Fri Jun 3 18:17:11 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
martin@martin ~ % cat /proc/cmdline
BOOT_IMAGE=
martin@martin ~ % dmesg -T | grep 'ring gfx timeout'
[Mi Jun 22 14:48:07 2022] [drm:amdgpu_
[Mi Jun 22 14:48:18 2022] [drm:amdgpu_
```
I had enabled these options on 2022-06-20 14:14 UTC+2, this is the first crash I've encountered since then.
I have no idea how to build the latest kernel and therefore haven't tested that yet.
I'll now revert back to amdgpu.dpm=0.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#80 |
> Did you try with the latest Linux Kernel? I had a lot of gpu lockups like
> this. Also try these kernel parameters : "amdgpu.
> amdgpu.noretry=0 amdgpu.
> amdgpu.audio=0 amdgpu.deep_color=1 amd_iommu=on iommu=pt"" ( you might also
> try with amdgpu.
I tried.
my kernel:
"Linux 5.17.4-1-default #1 SMP PREEMPT Wed Apr 20 07:43:03 UTC 2022 (75e9961) x86_64 x86_64 x86_64 GNU/Linux"
(this video linked above - were not able to freeze integrated AMD GPU for me, I mean before I tested with no kernel parameters)
Result is surprising - no crash/freeze for 4+ hours already, I did launch lots of apps that were reason of freeze for me before.
As I described above - https:/
Full kernel boot option now: "splash=silent quiet amdgpu.
Now, after boot with these options, I see:
Just after boot everything working (OpenGL/Vulkan acceleration by integrated GPU) with expected performance.
After trying to "trigger bug" (opening multiple OpenGL apps with Vulkan and WebGL and playing many videos) - OpenGL and Vulkan drops FPS to 20(constant for single triangle in fullscreen), WebGL2 does not work anymore in webbrowser(even after browser restart), but Video - still playing with 60 fps with no lag, and system UI also does not lag.
So GPU graphics acceleration just drop to very low performance mode look like, but everything else works fine. (also launching graphic apps(native only) using Nvidia GPU works with 60fps as expected).
Interesting - since FPS droped 20 I can no longer launch "anything" in Wine (any version include Proton) (after boot it was working), I launched few apps after boot and check them when GPU FPS drops wine always crash with:
"wine: Unhandled page fault on execute access to 00007F894E200460 at address 00007F894E200460 (thread 0070), starting debugger..."
(not being able to use Wine is a big disadvantage)
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#81 |
Wine problem - this happened because (how/why/when) '/usr/share/
so fix for wine gonna be - "VK_ICD_
super weird, so wine problem fixed I think
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#82 |
but even creating nvidia_icd.json
{
"file_
"ICD": {
}
}
does not help wine, Wine still crashing with same error on trying use/initialize Nvidia
but I can use Nvidia outside of Wine from native apps (and Vulkan works), so it must be related to AMD gpu driver somehow (before it was not happening, I first time seeing wine crashing this way(in previous times when I tested AMD GPU integrated))
P.S. I have second PC with same AMD Vega 8 integrated GPU, and there it works fine(never crashed/freeze even once), other PC has other motherboard, this why I originally think it problem with motherboard, but current "boot option" help to make integrated GPU stable on this PC.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#83 |
(I did small mistake in my file organizing, creating nvidia_icd.json with listed above content is enough to fix Wine for me, everything works now)
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#84 |
Updated to kernel 5.18.4-1-default #1 SMP PREEMPT_DYNAMIC Wed Jun 15 06:00:33 UTC 2022 (ed6345d) x86_64 x86_64 x86_64 GNU/Linux (OpenSuSe latest for now)
Seems my integrated AMD GPU freeze completely fixed even without using previous boot option (in 5.17 it was freezing without boot option), also integrated GPU does not go to "low performance mode forever"(like it was with boot option before) it continues working for hours on max performance(I mean it works without slowdown like before)
... but now Nvidia GPU does not work anymore from AMD (when integrated is main GPU), Nvidia 515.48.07 driver(latest now), in X11 and Wayland, Nvidia driver correctly installed and device visible (nvidia-smi works and vulkaninfo --summary list Nvidia GPU correctly), on creating Vulkan surface on Nvidia device application always crash (any application)... (just tested - disabling AMD integrated and boot using Nvidia - everything works there, Vulkan etc)
So fixing integrated AMD GPU result in Nvidia does not work anymore... okey (im back to use discrete Nvidia only again)
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#85 |
same issue here with (also LTS kernel as well)
Linux archlinux 5.18.7-262-tkg-pds #1 TKG SMP PREEMPT_DYNAMIC Mon, 27 Jun 2022 15:50:06 +0000 x86_64 GNU/Linux
[11090.086287] amdgpu 0000:02:00.0: amdgpu:
last message was failed ret is 65535
[11090.086296] amdgpu 0000:02:00.0: amdgpu:
last message was failed ret is 65535
[11090.086302] amdgpu 0000:02:00.0: amdgpu:
last message was failed ret is 65535
[11090.195133] amdgpu 0000:02:00.0: amdgpu:
last message was failed ret is 65535
[11090.195139] amdgpu 0000:02:00.0: amdgpu:
last message was failed ret is 65535
[11090.195143] amdgpu 0000:02:00.0: amdgpu:
last message was failed ret is 65535
[11090.195150] [drm] Cannot get clockgating state when UVD is powergated.
[11090.195152] [drm] Cannot get clockgating state when VCE is powergated.
[11090.695288] amdgpu 0000:02:00.0: amdgpu:
last message was failed ret is 65535
[11090.699331] amdgpu 0000:02:00.0: amdgpu:
last message was failed ret is 65535
[11091.194893] amdgpu 0000:02:00.0: amdgpu:
last message was failed ret is 65535
[11091.194898] amdgpu 0000:02:00.0: amdgpu:
last message was failed ret is 65535
[11091.194901] amdgpu 0000:02:00.0: amdgpu:
last message was failed ret is 65535
[11091.194908] [drm] Cannot get clockgating state when UVD is powergated.
[11091.194909] [drm] Cannot get clockgating state when VCE is powergated.
[11091.695473] amdgpu 0000:02:00.0: amdgpu:
last message was failed ret is 65535
[11092.194965] amdgpu 0000:02:00.0: amdgpu:
last message was failed ret is 65535
[11092.194969] amdgpu 0000:02:00.0: amdgpu:
last message was failed ret is 65535
[11092.194973] amdgpu 0000:02:00.0: amdgpu:
last message was failed ret is 65535
[11092.194979] [drm] Cannot get clockgating state when UVD is powergated.
[11092.194980] [drm] Cannot get clockgating state when VCE is powergated.
[11092.695749] amdgpu 0000:02:00.0: amdgpu:
last message was failed ret is 65535
[11093.195046] amdgpu 0000:02:00.0: amdgpu:
last message was failed ret is 65535
[11093.195050] amdgpu 0000:02:00.0: amdgpu:
last message was failed ret is 65535
[11093.195053] amdgpu 0000:02:00.0: amdgpu:
last message was failed ret is 65535
[11093.195060] [drm] Cannot get clockgating state when UVD is powergated.
[11093.195061] [drm] Cannot get clockgating state when VCE is powergated.
[11093.695004] amdgpu 0000:02:00.0: amdgpu:
last message was failed ret is 65535
[11094.195065] amdgpu 0000:02:00.0: amdgpu:
last message was failed ret is 65535
[11094.195070] amdgpu 0000:02:00.0: amdgpu:
last message was failed ret is 65535
[11094.195074] amdgpu 0000:02:00.0: amdgpu:
last message was failed ret is 65535
[11094.195082] [drm] Cannot get clockgating state when UVD is powergated.
[11094.195083] [drm] Cannot get clockgating state when VCE is powergated.
[11094.695286] amdgpu 0000:02:00.0: amdgpu:
last mess...
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#86 |
Nvidia released 515.57 drivers that fix "Nvidia being broken when used as second GPU in Linux", my bug above.
Nvidia GPU works again when AMD GPU main.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#87 |
Afteer using this PC for few days with AMD Vega 8 (integrated) as main GPU I see no freezes at all. (before in 2021 it was freeze every 10-20 mins so I had to use Nvidia as main GPU)
(works with and without listed above kernel boot option)
I use OpenSuse kernel 5.18.4-1-default (not going to update for some time, because it works)
Maybe it just fixed for "my motherboard+CPU combination", my hardware:
Ryzen3 3200 CPU (Vega8 integrated) on A320M-DVS R4.0 motherboard.
microcode: CPU: patch_level=
microcode: Microcode Update Driver: v2.2.
Wayland and x11 works, with Nvidia as second GPU.
Wayland slowdown(to like 1-2FPS whole UI performance) once after few hours of using, but it fixed just by switching to system-
integrated GPU performance still goes down(in few hours, randomly in 2-6 hours of PC use) and never go back, but its fine(since I have Nvidia second GPU for complex graphic), Vega 8 performance go down only in "complex shaders" FPS drop from 60 fullscreen(1080p) to 10-20 on complex raymarching shaders, but for system UI (Wayland/x11 Gnome 42) this is not noticeable, and video play on 60fps as expected. (Sleep mode also works, not every time(because Nvidia) but most of the time, same as when used Nvidia as main GPU)
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#88 |
Log from what I described above - "fixed just by switching to system-
Logs:
Jul 17 22:54:04 home-danil kernel: amdgpu 0000:07:00.0: amdgpu: Failed to send Message 7.
Jul 17 22:54:09 home-danil kernel: amdgpu 0000:07:00.0: amdgpu: Failed to send Message 7.
Jul 17 22:54:12 home-danil kernel: ------------[ cut here ]------------
Jul 17 22:54:12 home-danil kernel: WARNING: CPU: 1 PID: 1100 at drivers/
Jul 17 22:54:12 home-danil kernel: Modules linked in: dm_crypt essiv authenc trusted asn1_encoder tee nvidia_uvm(POE) nvidia_modeset(POE) nvidia(POE) snd_seq_dummy snd_hrtimer snd_seq snd_seq_device af_packet nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set iscsi_ibft iscsi_boot_sysfs nfnetlink rfkill ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter qrtr vboxnetadp(O) vboxnetflt(O) vboxdrv(O) dmi_sysfs joydev intel_rapl_msr intel_rapl_common snd_hda_codec_hdmi snd_hda_
Jul 17 22:54:12 home-danil kernel: libphy irqbypass snd soundcore efi_pstore i2c_piix4 gpio_amdpt gpio_generic acpi_cpufreq k10temp tiny_power_button nls_iso8859_1 squashfs nls_cp437 loop ext4 mbcache vfat jbd2 fat fuse configfs ip_tables x_tables hid_generic usbhid uas usb_storage amdgpu crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drm_ttm_helper ttm iommu_v2 gpu_sched i2c_algo_bit drm_dp_helper drm_kms_helper aesni_intel crypto_simd syscopyarea sysfillrect sysimgblt fb_sys_fops cryptd drm cec xhci_pci xhci_pci_renesas sp5100_tco ccp rc_core xhci_hcd usbcore wmi video button btrfs blake2b_generic libcrc32c crc32c_intel xor raid6_pq sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr efivarfs
Jul 17 22:54:12 home-danil kernel: CPU: 1 PID: 1100 Comm: systemd-logind Tainted: P OE 5.18.4-1-default #1 openSUSE Tumbleweed 59778fa2462c9ee
Jul 17 22:54:12 home-danil kernel: Hardware name: To Be Filled By O.E.M. A320M-DVS R4.0/A320M-DVS R4.0, BIOS P7.10 12/23/2021
Jul 17 22:54:12 home-danil kernel: RIP: 0010:rv1_
Jul 17 22:54:12 home-danil kernel: Code: 62 01 00 e8 8f 4e f5 ff 85 c0 74 d8 83 f8 01 75 19 48 8b 7d 00 5b be 93 62 01 00 48 c7 c2 00 99 cd c0 5d 41 5c e9 6d 4e f5 ff <0f> 0b eb e3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 81 c6 e7 03
Jul 17 22:54:12 home-danil kernel: RSP: 0018:ffff9f0a00
Jul 17 22:54:12 home-danil kernel: RAX: 00007570227d95d8 RBX: 00000000000000...
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#89 |
amd driver problem,u can connect me ,i'll give u the final solution,email <email address hidden> ,maybe in China will get more efficent communication
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#90 |
[67760.805903] [drm:amdgpu_
[67760.806285] [drm:amdgpu_
[67760.806667] amdgpu 0000:0d:00.0: amdgpu: GPU reset begin!
[67761.257012] amdgpu 0000:0d:00.0: [drm:amdgpu_
[67761.257232] [drm:gfx_
[67761.307862] [drm:amdgpu_
[67761.516374] [drm:gfx_
[67761.542980] [drm] free PSP TMR buffer
[67761.587266] amdgpu 0000:0d:00.0: amdgpu: MODE1 reset
[67761.587269] amdgpu 0000:0d:00.0: amdgpu: GPU mode1 reset
[67761.587329] amdgpu 0000:0d:00.0: amdgpu: GPU smu mode1 reset
[67762.091974] amdgpu 0000:0d:00.0: amdgpu: GPU reset succeeded, trying to resume
[67762.092156] [drm] PCIE GART of 512M enabled (table at 0x0000008000300
[67762.092219] [drm] VRAM is lost due to GPU reset!
[67762.092220] [drm] PSP is resuming...
[67762.168492] [drm] reserve 0xa00000 from 0x8001000000 for PSP TMR
[67762.269801] amdgpu 0000:0d:00.0: amdgpu: RAS: optional ras ta ucode is not available
[67762.283510] amdgpu 0000:0d:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[67762.283513] amdgpu 0000:0d:00.0: amdgpu: SMU is resuming...
[67762.283516] amdgpu 0000:0d:00.0: amdgpu: smu driver if version = 0x0000000e, smu fw if version = 0x00000012, smu fw program = 0, version = 0x00413900 (65.57.0)
[67762.283519] amdgpu 0000:0d:00.0: amdgpu: SMU driver if version not matched
[67762.283549] amdgpu 0000:0d:00.0: amdgpu: use vbios provided pptable
[67762.343739] amdgpu 0000:0d:00.0: amdgpu: SMU is resumed successfully!
[67762.345104] [drm] DMUB hardware initialized: version=0x02020017
[67762.615558] [drm] kiq ring mec 2 pipe 1 q 0
[67762.618728] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[67762.618910] [drm] JPEG decode initialized successfully.
[67762.618918] amdgpu 0000:0d:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[67762.618921] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[67762.618922] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[67762.618924] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[67762.618925] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[67762.618926] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[67762.618927] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[67762.618929] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[67762.618930] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[67762.618931] amdgpu 0000:0d:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[67762.618933] amdgpu 0000:0d:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[67762.618934] amdgpu 0000:0d:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[67762.618936] amd...
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#91 |
Created attachment 304307
Started testing kernel 6.4-rc3 got the same problem
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#92 |
Is it worth the effort of bisecting this as it seems to be on a lot of kernel versions ?
thanks
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#93 |
Status = NEW after nearly 5 years?
I have the same problem
Aug 15 14:18:19 nb-tz kernel: [drm:amdgpu_
Aug 15 14:18:19 nb-tz kernel: [drm:amdgpu_
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#94 |
AMD Vega 64 (vega10 chip)
kernel: 6.4.9
linux-firmware: 20230724
# graphical session died and had to log in again, computer didn't boot though...
aug 20 02:11:06 Zen kernel: [drm:amdgpu_
aug 20 02:11:06 Zen kernel: [drm:amdgpu_
linux-firmware: 20230810 (upgraded it... although there was no "vega10" changes inbetween)
# just freeze for like 30s and then it got unstuck again.
aug 23 23:09:24 Zen kernel: [drm:amdgpu_
aug 23 23:09:34 Zen kernel: [drm:amdgpu_
aug 23 23:09:44 Zen kernel: [drm:amdgpu_
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#95 |
AMD Ryzen 3700U APU (Vega 10)
This issue has recently started happening, mostly when firing up games or graphically intensive tasks. One case of lockup during normal desktop use.
Worked fine on 6.4.X series (currently running on 6.4.12). However, all kernels in the 6.5 series cause the following:
[ 112.727138] [drm:amdgpu_
[ 112.728214] [drm:amdgpu_
[ 112.729270] amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
[ 112.885652] amdgpu 0000:04:00.0: amdgpu: MODE2 reset
[ 112.885709] amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 112.886024] [drm] PCIE GART of 1024M enabled.
[ 112.886027] [drm] PTB located at 0x000000F400A00000
[ 112.886143] [drm] PSP is resuming...
[ 112.906168] [drm] reserve 0x400000 from 0xf47fc00000 for PSP TMR
[ 112.985033] amdgpu 0000:04:00.0: amdgpu: RAS: optional ras ta ucode is not available
[ 112.992320] amdgpu 0000:04:00.0: amdgpu: RAP: optional rap ta ucode is not available
[ 113.733685] [drm] kiq ring mec 2 pipe 1 q 0
[ 113.998619] amdgpu 0000:04:00.0: [drm:amdgpu_
[ 113.999249] [drm:amdgpu_
[ 113.999957] amdgpu 0000:04:00.0: amdgpu: GPU reset(2) failed
[ 114.000006] amdgpu 0000:04:00.0: amdgpu: GPU reset end with ret = -110
[ 114.000010] [drm:amdgpu_
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#96 |
I can confirm this bug
Experiencing it on an AMD Ryzen 5 3500U (Vega 8), Fedora 39 beta, kernel 6.5.2.
Also on Arch (kernel 6.5.2).
No problems on Fedora 38 (kernel 6.2.x).
In my case it happens frequently with normal desktop use on Fedora and Arch.
Sep 23 03:39:34 jackdaw kernel: [drm:amdgpu_
Sep 23 03:39:34 jackdaw kernel: [drm:amdgpu_
Sep 23 03:39:34 jackdaw kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset begin!
Sep 23 03:39:34 jackdaw kernel: amdgpu 0000:05:00.0: amdgpu: MODE2 reset
Sep 23 03:39:34 jackdaw kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset succeeded, trying to resume
Sep 23 03:39:34 jackdaw kernel: [drm] PCIE GART of 1024M enabled.
Sep 23 03:39:34 jackdaw kernel: [drm] PTB located at 0x000000F400A00000
Sep 23 03:39:34 jackdaw kernel: [drm] PSP is resuming...
Sep 23 03:39:34 jackdaw kernel: [drm] reserve 0x400000 from 0xf47fc00000 for PSP TMR
Sep 23 03:39:34 jackdaw kernel: amdgpu 0000:05:00.0: amdgpu: RAS: optional ras ta ucode is not available
Sep 23 03:39:34 jackdaw kernel: amdgpu 0000:05:00.0: amdgpu: RAP: optional rap ta ucode is not available
Sep 23 03:39:34 jackdaw kernel: [drm] kiq ring mec 2 pipe 1 q 0
Sep 23 03:39:35 jackdaw kernel: amdgpu 0000:05:00.0: [drm:amdgpu_
Sep 23 03:39:35 jackdaw kernel: [drm:amdgpu_
Sep 23 03:39:35 jackdaw kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset(2) failed
Sep 23 03:39:35 jackdaw kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset end with ret = -110
Sep 23 03:39:35 jackdaw kernel: [drm:amdgpu_
Sep 23 03:39:35 jackdaw kernel: [drm] Skip scheduling IBs!
Sep 23 03:39:45 jackdaw kernel: [drm:amdgpu_
Sep 23 03:39:45 jackdaw kernel: [drm:amdgpu_
Sep 23 03:39:45 jackdaw kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset begin!
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#97 |
AMDGPU development is on its own bug tracker:
https:/
If you're still affected, check for existing bug reports and if there are none, please repost over there.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#98 |
I have also been having this issue. It started occurring recently (last 2-3 months). No other changes.
Mostly lockups while gaming (yuzu), one lockup because of chrome.
I was able to fix this issue by switching from HDMI to DP or DVI.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#99 |
Created attachment 305165
attachment-
In my case the fix was adding amdgpu.mcbp=0 to the kernel parameters.
On Sat, Sep 30, 2023 at 8:57 PM <email address hidden> wrote:
> https:/
>
> <email address hidden> changed:
>
> What |Removed |Added
>
> -------
> CC| |<email address hidden>
>
> --- Comment #93 from <email address hidden> ---
> I have also been having this issue. It started occurring recently (last 2-3
> months). No other changes.
>
> Mostly lockups while gaming (yuzu), one lockup because of chrome.
>
> I was able to fix this issue by switching from HDMI to DP or DVI.
>
> --
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
> You are on the CC list for the bug.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#100 |
(In reply to KC from comment #94)
Did you have it set to 1 previously? If not, I'm not sure if that was the silver bullet, because it looks like it defaults to 0. https:/
mcbp (int)
It is used to enable mid command buffer preemption. (0 = disabled (default), 1 = enabled)
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#101 |
Created attachment 305166
attachment-
The default is now -1.
https:/
https:/
I set it to zero and I haven't had a single crash since (Fedora 39 beta,
Linux 6.5.5).
This one parameter change made my system entirely unusable (it would crash
very quickly after booting).
On Sat, Sep 30, 2023 at 9:35 PM <email address hidden> wrote:
> https:/
>
> --- Comment #95 from <email address hidden> ---
> (In reply to KC from comment #94)
>
> Did you have it set to 1 previously? If not, I'm not sure if that was the
> silver bullet, because it looks like it defaults to 0.
> https:/
>
> mcbp (int)
>
> It is used to enable mid command buffer preemption. (0 = disabled
> (default), 1
> = enabled)
>
> --
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
> You are on the CC list for the bug.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Pirouette Cacahuète (lissyx) wrote : | #1 |
- AlsaInfo.txt Edit (91.4 KiB, text/plain; charset="utf-8")
- AudioDevicesInUse.txt Edit (669 bytes, text/plain; charset="utf-8")
- CRDA.txt Edit (5.8 KiB, text/plain; charset="utf-8")
- CurrentDmesg.txt Edit (156.1 KiB, text/plain; charset="utf-8")
- Dependencies.txt Edit (3.3 KiB, text/plain; charset="utf-8")
- IwConfig.txt Edit (733 bytes, text/plain; charset="utf-8")
- Lspci.txt Edit (84.9 KiB, text/plain; charset="utf-8")
- Lspci-vt.txt Edit (2.6 KiB, text/plain; charset="utf-8")
- Lsusb.txt Edit (1.5 KiB, text/plain; charset="utf-8")
- Lsusb-t.txt Edit (3.0 KiB, text/plain; charset="utf-8")
- Lsusb-v.txt Edit (143.6 KiB, text/plain; charset="utf-8")
- ProcCpuinfo.txt Edit (24.6 KiB, text/plain; charset="utf-8")
- ProcCpuinfoMinimal.txt Edit (1.5 KiB, text/plain; charset="utf-8")
- ProcInterrupts.txt Edit (23.2 KiB, text/plain; charset="utf-8")
- ProcModules.txt Edit (11.0 KiB, text/plain; charset="utf-8")
- RfKill.txt Edit (250 bytes, text/plain; charset="utf-8")
- UdevDb.txt Edit (454.8 KiB, text/plain; charset="utf-8")
- WifiSyslog.txt Edit (230.4 KiB, text/plain; charset="utf-8")
- acpidump.txt Edit (1.0 MiB, text/plain; charset="utf-8")
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed | #2 |
This change was made by a bot.
Changed in linux (Ubuntu): | |
status: | New → Confirmed |
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Pirouette Cacahuète (lissyx) wrote : | #3 |
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Erich Eickmeyer (eeickmeyer) wrote (last edit ): | #4 |
Working with Pirouette on IRC, we determined this may be related to https:/
They also found mentions of https:/
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Mario Limonciello (superm1) wrote : | #102 |
6.5.6 has the fix for preemption issue, it should get fixed when stable updates come in Mantic.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Pirouette Cacahuète (lissyx) wrote : | #103 |
Thanks, I'll try and keep you updated, however I am also facing bug 2039958 (probably a dupe of bug 2034619), so I might still need GNOME 45.1 to be released.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#104 |
Hello, I'm having this same issue with my thinkpad z16 laptop, Ryzen 6850H and Radeon RX 6500M graphics card.
I do not use the laptop for gaming but for audio and video editing. I have not had trouble with any video editing software but I can easily reproduce the issue by loading up Ardour or Mixbus32C and either leaving it alone or working. After 15 minutes the screen freezes although audio will continue for a time. At this point Ardour or Mixbus will close and I can continue using the machine. If I load up either program again it will fail again, usually within a couple minutes and the whole laptop will freeze up until I ctrl-alt-F2 to get to a terminal prompt.
The issue always happens when Im recording audio with an HDMI device attached and 90% of the time without HDMI
I will attempt to set this kernel parameter amdgpu.mcbp=0 and report back.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#105 |
(In reply to jeremy boyd from comment #97)
> Hello, I'm having this same issue with my thinkpad z16 laptop, Ryzen 6850H
> and Radeon RX 6500M graphics card.
>
> I do not use the laptop for gaming but for audio and video editing. I have
> not had trouble with any video editing software but I can easily reproduce
> the issue by loading up Ardour or Mixbus32C and either leaving it alone or
> working. After 15 minutes the screen freezes although audio will continue
> for a time. At this point Ardour or Mixbus will close and I can continue
> using the machine. If I load up either program again it will fail again,
> usually within a couple minutes and the whole laptop will freeze up until I
> ctrl-alt-F2 to get to a terminal prompt.
>
> The issue always happens when Im recording audio with an HDMI device
> attached and 90% of the time without HDMI
>
> I will attempt to set this kernel parameter amdgpu.mcbp=0 and report back.
I can confirm that this did not solve my problem. I tested my system out for several hours with no issue and thought that perhaps it had been solved but while doing a libreoffice presentation with my audio software running it happened again. here is the error from journalctl
Oct 22 09:40:01 fedora kernel: [drm:amdgpu_
Oct 22 09:40:01 fedora kernel: [drm:amdgpu_
Oct 22 09:40:01 fedora kernel: amdgpu 0000:67:00.0: amdgpu: GPU reset begin!
Oct 22 09:40:02 fedora kernel: amdgpu 0000:67:00.0: amdgpu: MODE2 reset
Oct 22 09:40:02 fedora kernel: amdgpu 0000:67:00.0: amdgpu: GPU reset succeeded, trying to resume
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#106 |
#98
The amdgpu.mcbp=0 will only help GFX9 products. For GFX10 this is a different problem, please open at AMD Gitlab.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Launchpad Janitor (janitor) wrote : | #107 |
Status changed to 'Confirmed' because the bug affects multiple users.
Changed in mesa (Ubuntu): | |
status: | New → Confirmed |
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Pirouette Cacahuète (lissyx) wrote : | #108 |
There's 6.5.0-15 package incoming on mantic-update, does it contains the fix?
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
Timo Aaltonen (tjaalton) wrote : | #109 |
no, -17 does
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#110 |
I am pretty sure I have amdgpu.mcbp=0 set
and after doing Ubuntu 24.04 LTS , just doing just about anything crashes the GPU
open web browser = crash , then I have to ssh in and restart desktop session
GL_VENDOR: AMD
GL_RENDERER: AMD Radeon RX 6800 XT (radeonsi, navi21, LLVM 15.0.7, DRM 3.57, 6.8.0-31-generic)
GL_VERSION: 4.6 (Compatibility Profile) Mesa 24.2~git2406010
6.8.0-31-generic
[ 26.417827] [drm] amdgpu kernel modesetting enabled.
[ 26.431708] amdgpu: Virtual CRAT table created for CPU
[ 26.431727] amdgpu: Topology: Add CPU node
[ 26.431934] [drm] initializing kernel modesetting (SIENNA_CICHLID 0x1002:0x73BF 0x1043:0x04F0 0xC1).
[ 26.431949] [drm] register mmio base: 0xFC900000
[ 26.431951] [drm] register mmio size: 1048576
[ 26.435975] [drm] add ip block number 0 <nv_common>
[ 26.435978] [drm] add ip block number 1 <gmc_v10_0>
[ 26.435980] [drm] add ip block number 2 <navi10_ih>
[ 26.435982] [drm] add ip block number 3 <psp>
[ 26.435983] [drm] add ip block number 4 <smu>
[ 26.435985] [drm] add ip block number 5 <dm>
[ 26.435986] [drm] add ip block number 6 <gfx_v10_0>
[ 26.435988] [drm] add ip block number 7 <sdma_v5_2>
[ 26.435990] [drm] add ip block number 8 <vcn_v3_0>
[ 26.435996] [drm] add ip block number 9 <jpeg_v3_0>
[ 26.436013] amdgpu 0000:0e:00.0: No more image in the PCI ROM
[ 26.436028] amdgpu 0000:0e:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 26.436031] amdgpu: ATOM BIOS: 115-D412BS0-101
[ 26.473962] [drm] VCN(0) decode is enabled in VM mode
[ 26.473965] [drm] VCN(1) decode is enabled in VM mode
[ 26.473967] [drm] VCN(0) encode is enabled in VM mode
[ 26.473968] [drm] VCN(1) encode is enabled in VM mode
[ 26.477565] [drm] JPEG decode is enabled in VM mode
[ 26.477596] amdgpu 0000:0e:00.0: vgaarb: deactivate vga console
[ 26.478479] Console: switching to colour dummy device 80x25
[ 26.478490] amdgpu 0000:0e:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
[ 26.478548] amdgpu 0000:0e:00.0: amdgpu: MEM ECC is not presented.
[ 26.478550] amdgpu 0000:0e:00.0: amdgpu: SRAM ECC is not presented.
[ 26.478570] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[ 26.478577] amdgpu 0000:0e:00.0: amdgpu: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
[ 26.478580] amdgpu 0000:0e:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[ 26.478588] [drm] Detected VRAM RAM=16368M, BAR=256M
[ 26.478589] [drm] RAM width 256bits GDDR6
[ 26.478734] [drm] amdgpu: 16368M of VRAM memory ready
[ 26.478739] [drm] amdgpu: 64363M of GTT memory ready.
[ 26.478768] [drm] GART: num cpu pages 131072, num gpu pages 131072
[ 26.478919] [drm] PCIE GART of 512M enabled (table at 0x0000008000900
[ 27.968739] amdgpu 0000:0e:00.0: amdgpu: STB initialized to 2048 entries
[ 27.969354] [drm] Loading DMUB firmware via PSP: version=0x02020020
[ 27.969777] [drm] use_doorbell being set to: [true]
[ 27.969791] [drm] use_doorbell being set to: [true]
[ 27.969803] [drm] use_doorbell being set to: [true]
[ ...
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#111 |
#100:
You have a GFX10 product, this is not affected by amdgpu.mcbp=0/1. That's only for GFX9. Please open your own issue for it. Also in the kernel bug tracker please only report issues with mainline kernels. 6.8 is already EoL.
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#112 |
issue seems to only be with xorg , used wayland today and could not trigger it
![](/+icing/build/overlay/assets/skins/sam/images/close.gif)
|
#113 |
and 6.9.3 also crashed
Error message: PROTECTION_ FAULT_ADDR 0x00000000 PROTECTION_ FAULT_STATUS 0x0604800C job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=37241, emitted seq=37244
[Dec 5 22:08] amdgpu 0000:23:00.0: GPU fault detected: 146 0x0000480c for process yuzu pid 2920 thread yuzu:cs0 pid 2935
[ +0.000005] amdgpu 0000:23:00.0: VM_CONTEXT1_
[ +0.000002] amdgpu 0000:23:00.0: VM_CONTEXT1_
[ +0.000003] amdgpu 0000:23:00.0: VM fault (0x0c, vmid 3, pasid 32770) at page 0, read from 'TC4' (0x54433400) (72)
[ +10.053011] [drm:amdgpu_
[ +0.000007] [drm] GPU recovery disabled.
How to reproduce the issue:
1. Playing with yuzu-emulator
2. Load Super Mario Odyssey
3. Start new game
4. When Mario is about to jump for the first time after being woken up by Cappy, this bug must occur.
During the issue, the following occured:
1. Graphic locked up.
2. System can be access through SSH.
System specification:
Debian Sid
Radeon RX 580
I have tried the following combination:
1. Kernel 4.17, 4.18, 4.19, 4.20, drm-next-4.21.wip
2. Mesa 18.2, 18.3, 19.0-development branch
But none of the above combination fixes the issue. Let me know if you need more information and more testing from me.