amdgpu: [gfxhub0] no-retry page fault

Bug #2037641 reported by Martin Vysny
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned
mesa (Ubuntu)
Triaged
Undecided
Unassigned

Bug Description

Whenever I use Intellij IDEA, after a couple of minutes the screen locks up and pretty much renders the machine unusable. I can still ssh to it, but the UI is pretty much gone and doesn't respond to Ctrl+Alt+F* keys.

The setup: the machine is connected to an external monitor via USB-C. Running kernel 6.5 on Ubuntu 23.10.

There's the following in kern.log:

```
2023-09-28T13:58:23.077679+03:00 mavi-ThinkPad-T14s kernel: [ 422.206433] amdgpu 0000:06:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:6 pasid:32773, for process Xwayland pid 7640 thread Xwayland:cs0 pid 7663)
2023-09-28T13:58:23.077694+03:00 mavi-ThinkPad-T14s kernel: [ 422.206450] amdgpu 0000:06:00.0: amdgpu: in page starting at address 0x0000b15852923000 from IH client 0x1b (UTCL2)
2023-09-28T13:58:23.077695+03:00 mavi-ThinkPad-T14s kernel: [ 422.206460] amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00600431
2023-09-28T13:58:23.077696+03:00 mavi-ThinkPad-T14s kernel: [ 422.206466] amdgpu 0000:06:00.0: amdgpu: Faulty UTCL2 client ID: IA (0x2)
2023-09-28T13:58:23.077698+03:00 mavi-ThinkPad-T14s kernel: [ 422.206471] amdgpu 0000:06:00.0: amdgpu: MORE_FAULTS: 0x1
2023-09-28T13:58:23.077699+03:00 mavi-ThinkPad-T14s kernel: [ 422.206476] amdgpu 0000:06:00.0: amdgpu: WALKER_ERROR: 0x0
2023-09-28T13:58:23.077699+03:00 mavi-ThinkPad-T14s kernel: [ 422.206481] amdgpu 0000:06:00.0: amdgpu: PERMISSION_FAULTS: 0x3
2023-09-28T13:58:23.077700+03:00 mavi-ThinkPad-T14s kernel: [ 422.206485] amdgpu 0000:06:00.0: amdgpu: MAPPING_ERROR: 0x0
2023-09-28T13:58:23.077701+03:00 mavi-ThinkPad-T14s kernel: [ 422.206490] amdgpu 0000:06:00.0: amdgpu: RW: 0x0
2023-09-28T13:58:23.077701+03:00 mavi-ThinkPad-T14s kernel: [ 422.206497] amdgpu 0000:06:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:6 pasid:32773, for process Xwayland pid 7640 thread Xwayland:cs0 pid 7663)
2023-09-28T13:58:23.077702+03:00 mavi-ThinkPad-T14s kernel: [ 422.206506] amdgpu 0000:06:00.0: amdgpu: in page starting at address 0x0000b07248096000 from IH client 0x1b (UTCL2)
2023-09-28T13:58:23.077703+03:00 mavi-ThinkPad-T14s kernel: [ 422.206514] amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00600431
2023-09-28T13:58:23.077704+03:00 mavi-ThinkPad-T14s kernel: [ 422.206519] amdgpu 0000:06:00.0: amdgpu: Faulty UTCL2 client ID: IA (0x2)
2023-09-28T13:58:23.077704+03:00 mavi-ThinkPad-T14s kernel: [ 422.206524] amdgpu 0000:06:00.0: amdgpu: MORE_FAULTS: 0x1
2023-09-28T13:58:23.077705+03:00 mavi-ThinkPad-T14s kernel: [ 422.206528] amdgpu 0000:06:00.0: amdgpu: WALKER_ERROR: 0x0
2023-09-28T13:58:23.077705+03:00 mavi-ThinkPad-T14s kernel: [ 422.206533] amdgpu 0000:06:00.0: amdgpu: PERMISSION_FAULTS: 0x3
2023-09-28T13:58:23.077706+03:00 mavi-ThinkPad-T14s kernel: [ 422.206538] amdgpu 0000:06:00.0: amdgpu: MAPPING_ERROR: 0x0
2023-09-28T13:58:23.077707+03:00 mavi-ThinkPad-T14s kernel: [ 422.206542] amdgpu 0000:06:00.0: amdgpu: RW: 0x0
2023-09-28T13:58:23.077708+03:00 mavi-ThinkPad-T14s kernel: [ 422.206549] amdgpu 0000:06:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:6 pasid:32773, for process Xwayland pid 7640 thread Xwayland:cs0 pid 7663)
2023-09-28T13:58:23.077709+03:00 mavi-ThinkPad-T14s kernel: [ 422.206556] amdgpu 0000:06:00.0: amdgpu: in page starting at address 0x0000af8c3d808000 from IH client 0x1b (UTCL2)
2023-09-28T13:58:23.077710+03:00 mavi-ThinkPad-T14s kernel: [ 422.206564] amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00600431
2023-09-28T13:58:23.077710+03:00 mavi-ThinkPad-T14s kernel: [ 422.206569] amdgpu 0000:06:00.0: amdgpu: Faulty UTCL2 client ID: IA (0x2)
2023-09-28T13:58:23.077711+03:00 mavi-ThinkPad-T14s kernel: [ 422.206574] amdgpu 0000:06:00.0: amdgpu: MORE_FAULTS: 0x1
2023-09-28T13:58:23.077712+03:00 mavi-ThinkPad-T14s kernel: [ 422.206578] amdgpu 0000:06:00.0: amdgpu: WALKER_ERROR: 0x0
2023-09-28T13:58:23.077712+03:00 mavi-ThinkPad-T14s kernel: [ 422.206583] amdgpu 0000:06:00.0: amdgpu: PERMISSION_FAULTS: 0x3
2023-09-28T13:58:23.077713+03:00 mavi-ThinkPad-T14s kernel: [ 422.206588] amdgpu 0000:06:00.0: amdgpu: MAPPING_ERROR: 0x0
2023-09-28T13:58:23.077713+03:00 mavi-ThinkPad-T14s kernel: [ 422.206592] amdgpu 0000:06:00.0: amdgpu: RW: 0x0
2023-09-28T13:58:23.077726+03:00 mavi-ThinkPad-T14s kernel: [ 422.206597] amdgpu 0000:06:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:6 pasid:32773, for process Xwayland pid 7640 thread Xwayland:cs0 pid 7663)
2023-09-28T13:58:23.077727+03:00 mavi-ThinkPad-T14s kernel: [ 422.206605] amdgpu 0000:06:00.0: amdgpu: in page starting at address 0x0000aea632f7a000 from IH client 0x1b (UTCL2)
2023-09-28T13:58:23.077727+03:00 mavi-ThinkPad-T14s kernel: [ 422.206613] amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00600431
2023-09-28T13:58:23.077728+03:00 mavi-ThinkPad-T14s kernel: [ 422.206618] amdgpu 0000:06:00.0: amdgpu: Faulty UTCL2 client ID: IA (0x2)
2023-09-28T13:58:23.077729+03:00 mavi-ThinkPad-T14s kernel: [ 422.206622] amdgpu 0000:06:00.0: amdgpu: MORE_FAULTS: 0x1
2023-09-28T13:58:23.077729+03:00 mavi-ThinkPad-T14s kernel: [ 422.206627] amdgpu 0000:06:00.0: amdgpu: WALKER_ERROR: 0x0
2023-09-28T13:58:23.077730+03:00 mavi-ThinkPad-T14s kernel: [ 422.206632] amdgpu 0000:06:00.0: amdgpu: PERMISSION_FAULTS: 0x3
2023-09-28T13:58:23.077730+03:00 mavi-ThinkPad-T14s kernel: [ 422.206636] amdgpu 0000:06:00.0: amdgpu: MAPPING_ERROR: 0x0
2023-09-28T13:58:23.077731+03:00 mavi-ThinkPad-T14s kernel: [ 422.206641] amdgpu 0000:06:00.0: amdgpu: RW: 0x0
2023-09-28T13:58:23.077732+03:00 mavi-ThinkPad-T14s kernel: [ 422.206646] amdgpu 0000:06:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:6 pasid:32773, for process Xwayland pid 7640 thread Xwayland:cs0 pid 7663)
2023-09-28T13:58:23.077733+03:00 mavi-ThinkPad-T14s kernel: [ 422.206654] amdgpu 0000:06:00.0: amdgpu: in page starting at address 0x0000adc0286ed000 from IH client 0x1b (UTCL2)
2023-09-28T13:58:23.077733+03:00 mavi-ThinkPad-T14s kernel: [ 422.206661] amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00600431
2023-09-28T13:58:23.077734+03:00 mavi-ThinkPad-T14s kernel: [ 422.206666] amdgpu 0000:06:00.0: amdgpu: Faulty UTCL2 client ID: IA (0x2)
2023-09-28T13:58:23.077735+03:00 mavi-ThinkPad-T14s kernel: [ 422.206671] amdgpu 0000:06:00.0: amdgpu: MORE_FAULTS: 0x1
2023-09-28T13:58:23.077736+03:00 mavi-ThinkPad-T14s kernel: [ 422.206675] amdgpu 0000:06:00.0: amdgpu: WALKER_ERROR: 0x0
2023-09-28T13:58:23.077736+03:00 mavi-ThinkPad-T14s kernel: [ 422.206680] amdgpu 0000:06:00.0: amdgpu: PERMISSION_FAULTS: 0x3
2023-09-28T13:58:23.077737+03:00 mavi-ThinkPad-T14s kernel: [ 422.206685] amdgpu 0000:06:00.0: amdgpu: MAPPING_ERROR: 0x0
2023-09-28T13:58:23.077738+03:00 mavi-ThinkPad-T14s kernel: [ 422.206689] amdgpu 0000:06:00.0: amdgpu: RW: 0x0
2023-09-28T13:58:23.077738+03:00 mavi-ThinkPad-T14s kernel: [ 422.206694] amdgpu 0000:06:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:6 pasid:32773, for process Xwayland pid 7640 thread Xwayland:cs0
2023-09-28T13:58:33.168119+03:00 mavi-ThinkPad-T14s kernel: [ 432.295502] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_low timeout, signaled seq=25993, emitted seq=25995
2023-09-28T13:58:33.168136+03:00 mavi-ThinkPad-T14s kernel: [ 432.296378] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xwayland pid 7640 thread Xwayland:cs0 pid 7663
2023-09-28T13:58:33.168137+03:00 mavi-ThinkPad-T14s kernel: [ 432.297223] amdgpu 0000:06:00.0: amdgpu: GPU reset begin!
2023-09-28T13:58:33.396161+03:00 mavi-ThinkPad-T14s kernel: [ 432.524999] [drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x117)
2023-09-28T13:58:33.423859+03:00 mavi-ThinkPad-T14s kernel: [ 432.551234] amdgpu 0000:06:00.0: amdgpu: MODE2 reset
2023-09-28T13:58:33.423877+03:00 mavi-ThinkPad-T14s kernel: [ 432.551951] amdgpu 0000:06:00.0: amdgpu: GPU reset succeeded, trying to resume
2023-09-28T13:58:33.423880+03:00 mavi-ThinkPad-T14s kernel: [ 432.552149] [drm] PCIE GART of 1024M enabled.
2023-09-28T13:58:33.423881+03:00 mavi-ThinkPad-T14s kernel: [ 432.552150] [drm] PTB located at 0x000000F43FC00000
2023-09-28T13:58:33.423882+03:00 mavi-ThinkPad-T14s kernel: [ 432.552213] [drm] VRAM is lost due to GPU reset!
2023-09-28T13:58:33.423883+03:00 mavi-ThinkPad-T14s kernel: [ 432.552214] [drm] PSP is resuming...
2023-09-28T13:58:34.127837+03:00 mavi-ThinkPad-T14s kernel: [ 433.253828] [drm] reserve 0x400000 from 0xf43f800000 for PSP TMR
2023-09-28T13:58:34.412186+03:00 mavi-ThinkPad-T14s kernel: [ 433.538307] amdgpu 0000:06:00.0: amdgpu: RAS: optional ras ta ucode is not available
2023-09-28T13:58:34.423846+03:00 mavi-ThinkPad-T14s kernel: [ 433.549684] amdgpu 0000:06:00.0: amdgpu: RAP: optional rap ta ucode is not available
2023-09-28T13:58:34.423858+03:00 mavi-ThinkPad-T14s kernel: [ 433.549691] amdgpu 0000:06:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
2023-09-28T13:58:34.423860+03:00 mavi-ThinkPad-T14s kernel: [ 433.549699] amdgpu 0000:06:00.0: amdgpu: SMU is resuming...
2023-09-28T13:58:34.423861+03:00 mavi-ThinkPad-T14s kernel: [ 433.549976] amdgpu 0000:06:00.0: amdgpu: SMU is resumed successfully!
2023-09-28T13:58:34.423863+03:00 mavi-ThinkPad-T14s kernel: [ 433.550542] [drm] DMUB hardware initialized: version=0x01010027
2023-09-28T13:58:34.819829+03:00 mavi-ThinkPad-T14s kernel: [ 433.948908] [drm] kiq ring mec 2 pipe 1 q 0
2023-09-28T13:58:34.823848+03:00 mavi-ThinkPad-T14s kernel: [ 433.951815] [drm] VCN decode and encode initialized successfully(under DPG Mode).
2023-09-28T13:58:34.823857+03:00 mavi-ThinkPad-T14s kernel: [ 433.951987] [drm] JPEG decode initialized successfully.
2023-09-28T13:58:34.823858+03:00 mavi-ThinkPad-T14s kernel: [ 433.951992] amdgpu 0000:06:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
2023-09-28T13:58:34.823860+03:00 mavi-ThinkPad-T14s kernel: [ 433.951995] amdgpu 0000:06:00.0: amdgpu: ring gfx_low uses VM inv eng 1 on hub 0
2023-09-28T13:58:34.823862+03:00 mavi-ThinkPad-T14s kernel: [ 433.951997] amdgpu 0000:06:00.0: amdgpu: ring gfx_high uses VM inv eng 4 on hub 0
2023-09-28T13:58:34.823863+03:00 mavi-ThinkPad-T14s kernel: [ 433.951999] amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 5 on hub 0
2023-09-28T13:58:34.823864+03:00 mavi-ThinkPad-T14s kernel: [ 433.952001] amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 6 on hub 0
2023-09-28T13:58:34.823865+03:00 mavi-ThinkPad-T14s kernel: [ 433.952003] amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 7 on hub 0
2023-09-28T13:58:34.823866+03:00 mavi-ThinkPad-T14s kernel: [ 433.952005] amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 8 on hub 0
2023-09-28T13:58:34.823867+03:00 mavi-ThinkPad-T14s kernel: [ 433.952007] amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 9 on hub 0
2023-09-28T13:58:34.823868+03:00 mavi-ThinkPad-T14s kernel: [ 433.952009] amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 10 on hub 0
2023-09-28T13:58:34.823868+03:00 mavi-ThinkPad-T14s kernel: [ 433.952011] amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 11 on hub 0
2023-09-28T13:58:34.823869+03:00 mavi-ThinkPad-T14s kernel: [ 433.952013] amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 12 on hub 0
2023-09-28T13:58:34.823870+03:00 mavi-ThinkPad-T14s kernel: [ 433.952015] amdgpu 0000:06:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 13 on hub 0
2023-09-28T13:58:34.823871+03:00 mavi-ThinkPad-T14s kernel: [ 433.952017] amdgpu 0000:06:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 8
2023-09-28T13:58:34.823872+03:00 mavi-ThinkPad-T14s kernel: [ 433.952019] amdgpu 0000:06:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 8
2023-09-28T13:58:34.823872+03:00 mavi-ThinkPad-T14s kernel: [ 433.952021] amdgpu 0000:06:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 8
2023-09-28T13:58:34.823874+03:00 mavi-ThinkPad-T14s kernel: [ 433.952022] amdgpu 0000:06:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 8
2023-09-28T13:58:34.823875+03:00 mavi-ThinkPad-T14s kernel: [ 433.952024] amdgpu 0000:06:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 8
2023-09-28T13:58:34.825508+03:00 mavi-ThinkPad-T14s kernel: [ 433.954097] amdgpu 0000:06:00.0: amdgpu: recover vram bo from shadow start
2023-09-28T13:58:34.825531+03:00 mavi-ThinkPad-T14s kernel: [ 433.954099] amdgpu 0000:06:00.0: amdgpu: recover vram bo from shadow done
2023-09-28T13:58:34.825533+03:00 mavi-ThinkPad-T14s kernel: [ 433.954113] amdgpu 0000:06:00.0: amdgpu: GPU reset(2) succeeded!
2023-09-28T13:58:34.825534+03:00 mavi-ThinkPad-T14s kernel: [ 433.954304] [drm] Skip scheduling IBs!
2023-09-28T13:58:34.825536+03:00 mavi-ThinkPad-T14s kernel: [ 433.954324] [drm] Skip scheduling IBs!
2023-09-28T13:58:34.825538+03:00 mavi-ThinkPad-T14s kernel: [ 433.954331] [drm] Skip scheduling IBs!
2023-09-28T13:58:34.825540+03:00 mavi-ThinkPad-T14s kernel: [ 433.954338] [drm] Skip scheduling IBs!
2023-09-28T13:58:34.825541+03:00 mavi-ThinkPad-T14s kernel: [ 433.954344] [drm] Skip scheduling IBs!
2023-09-28T13:58:34.825543+03:00 mavi-ThinkPad-T14s kernel: [ 433.954350] [drm] Skip scheduling IBs!
2023-09-28T13:58:34.825544+03:00 mavi-ThinkPad-T14s kernel: [ 433.954357] [drm] Skip scheduling IBs!
2023-09-28T13:58:34.825546+03:00 mavi-ThinkPad-T14s kernel: [ 433.954363] [drm] Skip scheduling IBs!
2023-09-28T13:58:34.825547+03:00 mavi-ThinkPad-T14s kernel: [ 433.954369] [drm] Skip scheduling IBs!
```

ProblemType: Bug
DistroRelease: Ubuntu 23.10
Package: linux-image-6.5.0-5-generic 6.5.0-5.5
ProcVersionSignature: Ubuntu 6.5.0-5.5-generic 6.5.0
Uname: Linux 6.5.0-5-generic x86_64
ApportVersion: 2.27.0-0ubuntu2
Architecture: amd64
CRDA: N/A
CasperMD5CheckResult: pass
CurrentDesktop: ubuntu:GNOME
Date: Thu Sep 28 14:03:23 2023
InstallationDate: Installed on 2022-11-14 (318 days ago)
InstallationMedia: Ubuntu 22.10 "Kinetic Kudu" - Release amd64 (20221020)
MachineType: {report['dmi.sys.vendor']} {report['dmi.product.name']}
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-6.5.0-5-generic root=UUID=7eaa33c9-8937-4292-8f12-b798b8564757 ro rootflags=subvol=@ quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-6.5.0-5-generic N/A
 linux-backports-modules-6.5.0-5-generic N/A
 linux-firmware 20230919.git3672ccab-0ubuntu2
SourcePackage: linux
UpgradeStatus: Upgraded to mantic on 2023-09-28 (0 days ago)
dmi.bios.date: 07/31/2023
dmi.bios.release: 1.45
dmi.bios.vendor: LENOVO
dmi.bios.version: R1CET76W(1.45 )
dmi.board.asset.tag: Not Available
dmi.board.name: 20UH001QMX
dmi.board.vendor: LENOVO
dmi.board.version: SDK0J40697 WIN
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.ec.firmware.release: 1.45
dmi.modalias: dmi:bvnLENOVO:bvrR1CET76W(1.45):bd07/31/2023:br1.45:efr1.45:svnLENOVO:pn20UH001QMX:pvrThinkPadT14sGen1:rvnLENOVO:rn20UH001QMX:rvrSDK0J40697WIN:cvnLENOVO:ct10:cvrNone:skuLENOVO_MT_20UH_BU_Think_FM_ThinkPadT14sGen1:
dmi.product.family: ThinkPad T14s Gen 1
dmi.product.name: 20UH001QMX
dmi.product.sku: LENOVO_MT_20UH_BU_Think_FM_ThinkPad T14s Gen 1
dmi.product.version: ThinkPad T14s Gen 1
dmi.sys.vendor: LENOVO

Revision history for this message
Martin Vysny (vyzivus) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Martin Vysny (vyzivus) wrote :
Download full text (10.0 KiB)

Yup, as I suspected, there is a kernel panic too:

```
[ 702.019208] [drm] ring 0 timeout to preempt ib
[ 711.547840] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_low timeout, signaled seq=70630, emitted seq=70632
[ 711.548723] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xwayland pid 4958 thread Xwayland:cs0 pid 4979
[ 711.549595] amdgpu 0000:06:00.0: amdgpu: GPU reset begin!
[ 711.776824] [drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x117)
[ 711.802932] amdgpu 0000:06:00.0: amdgpu: MODE2 reset
[ 711.803000] amdgpu 0000:06:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 711.803210] [drm] PCIE GART of 1024M enabled.
[ 711.803215] [drm] PTB located at 0x000000F43FC00000
[ 711.803348] [drm] PSP is resuming...
[ 712.510703] [drm] reserve 0x400000 from 0xf43f800000 for PSP TMR
[ 712.789403] amdgpu 0000:06:00.0: amdgpu: RAS: optional ras ta ucode is not available
[ 712.800536] amdgpu 0000:06:00.0: amdgpu: RAP: optional rap ta ucode is not available
[ 712.800545] amdgpu 0000:06:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[ 712.800553] amdgpu 0000:06:00.0: amdgpu: SMU is resuming...
[ 712.800840] amdgpu 0000:06:00.0: amdgpu: SMU is resumed successfully!
[ 712.801302] [drm] DMUB hardware initialized: version=0x01010027
[ 713.207082] [drm] kiq ring mec 2 pipe 1 q 0
[ 713.222540] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* got no status for stream 00000000e7f7b0df on acrtc0000000015c04636
[ 713.223561] ------------[ cut here ]------------
[ 713.223564] WARNING: CPU: 3 PID: 6794 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:3113 update_planes_and_stream_state+0x176/0x460 [amdgpu]
[ 713.224355] Modules linked in: nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter ccm rfcomm snd_seq_dummy snd_hrtimer xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc overlay cmac algif_hash algif_skcipher af_alg bnep binfmt_misc snd_soc_dmic snd_acp3x_pdm_dma snd_acp3x_rn snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_sof intel_rapl_msr intel_rapl_common snd_sof_utils iwlmvm snd_ctl_led snd_soc_core snd_usb_audio snd_hda_codec_realtek snd_compress btusb nls_iso8859_1 edac_mce_amd snd_hda_codec_generic snd_hda_codec_hdmi ac97_bus snd_usbmidi_lib btrtl uvcvideo mac80211 snd_ump snd_pcm_dmaengine videobuf2_vmalloc uvc snd_pci_ps snd_hda_intel btbcm snd_rpl_pci_acp6x videobuf2_memops snd_intel_dspcfg kvm_amd btintel snd_acp_pci videobuf2_v4l2 snd_seq_midi snd_pci_acp6x btmtk snd_intel_sdw_acpi tps6598x libarc4 snd_pci_acp5x
[ 713.224562] snd_seq_midi_event kvm videodev snd_hda_codec bluetooth snd_rn_pci_acp3x snd_rawmidi snd_hda_core iwlwifi irqbypass think_lmi videobuf2_common snd_hwdep snd_acp_config thinkpad_acpi ecdh_generic snd_soc_acpi mc rapl ipmi_devintf ecc firmware_attributes_class wmi_bmof nvram snd_seq k10temp snd_pcm snd_pci_acp3x snd_seq_device i2c_piix4 cfg80211 snd_timer ccp ipmi_msghandler snd soundcore ledtrig_audio platform_profile serial_mu...

Revision history for this message
Martin Vysny (vyzivus) wrote :
description: updated
Revision history for this message
Martin Vysny (vyzivus) wrote :

I've been testing with the older kernel 6.2 and the issue seems not to be reproducible there. Therefore, the workaround looks to be to use kernel 6.2 for the time being.

Revision history for this message
Mario Limonciello (superm1) wrote :

Are you up to date on the current version of mesa in 23.10? 23.2.1-1ubuntu2?

Revision history for this message
Martin Vysny (vyzivus) wrote :

Thanks Mario! I checked, and I have the newest mesa:

```
$ apt search mesa-vdpau
Sorting... Done
Full Text Search... Done
mesa-vdpau-drivers/mantic,now 23.2.1-1ubuntu2 amd64 [installed,automatic]
  Mesa VDPAU video acceleration drivers
```

Funny thing is that the bug is no longer reproducible - everything is working properly, yay! Could it be that mesa was upgraded a couple of days ago, resolving the issue?

Anyways, the issue looks to be fixed now. Thanks again!

Revision history for this message
Mario Limonciello (superm1) wrote :

I'd say that's very likely. That mesa upgrade just landed in the archive a few days ago and the trace you reported looks more like how a mesa bug manifests.

Changed in linux (Ubuntu):
status: Confirmed → Invalid
Changed in mesa (Ubuntu):
status: New → Fix Released
Revision history for this message
Martin Vysny (vyzivus) wrote :
Download full text (8.7 KiB)

Unfortunately the problem is still reproducible even with the newest mesa, even though it looks like it's much less frequent. Yesterday evening I got another crash, with mesa 23.2.1-1ubuntu2:

```
2023-10-04T22:33:15.415854+03:00 mavi-ThinkPad-T14s kernel: [ 1076.119146] amdgpu 0000:06:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:4 pasid:32770, for process Xwayland pid 4940 thread Xwayland:cs0 pid 5015)
2023-10-04T22:33:15.415871+03:00 mavi-ThinkPad-T14s kernel: [ 1076.119168] amdgpu 0000:06:00.0: amdgpu: in page starting at address 0x0000e5ea2326e000 from IH client 0x1b (UTCL2)
2023-10-04T22:33:15.415873+03:00 mavi-ThinkPad-T14s kernel: [ 1076.119180] amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00400430
2023-10-04T22:33:15.415874+03:00 mavi-ThinkPad-T14s kernel: [ 1076.119187] amdgpu 0000:06:00.0: amdgpu: Faulty UTCL2 client ID: IA (0x2)
2023-10-04T22:33:15.415875+03:00 mavi-ThinkPad-T14s kernel: [ 1076.119194] amdgpu 0000:06:00.0: amdgpu: MORE_FAULTS: 0x0
2023-10-04T22:33:15.415876+03:00 mavi-ThinkPad-T14s kernel: [ 1076.119200] amdgpu 0000:06:00.0: amdgpu: WALKER_ERROR: 0x0
2023-10-04T22:33:15.415878+03:00 mavi-ThinkPad-T14s kernel: [ 1076.119206] amdgpu 0000:06:00.0: amdgpu: PERMISSION_FAULTS: 0x3
2023-10-04T22:33:15.415878+03:00 mavi-ThinkPad-T14s kernel: [ 1076.119212] amdgpu 0000:06:00.0: amdgpu: MAPPING_ERROR: 0x0
2023-10-04T22:33:15.415879+03:00 mavi-ThinkPad-T14s kernel: [ 1076.119218] amdgpu 0000:06:00.0: amdgpu: RW: 0x0
2023-10-04T22:33:15.415881+03:00 mavi-ThinkPad-T14s kernel: [ 1076.119750] amdgpu 0000:06:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:4 pasid:32770, for process Xwayland pid 4940 thread Xwayland:cs0 pid 5015)
2023-10-04T22:33:15.415881+03:00 mavi-ThinkPad-T14s kernel: [ 1076.119768] amdgpu 0000:06:00.0: amdgpu: in page starting at address 0x0000e5ea2326f000 from IH client 0x1b (UTCL2)
2023-10-04T22:33:15.415883+03:00 mavi-ThinkPad-T14s kernel: [ 1076.119780] amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00400430
2023-10-04T22:33:15.415884+03:00 mavi-ThinkPad-T14s kernel: [ 1076.119787] amdgpu 0000:06:00.0: amdgpu: Faulty UTCL2 client ID: IA (0x2)
2023-10-04T22:33:15.415885+03:00 mavi-ThinkPad-T14s kernel: [ 1076.119793] amdgpu 0000:06:00.0: amdgpu: MORE_FAULTS: 0x0
2023-10-04T22:33:15.415885+03:00 mavi-ThinkPad-T14s kernel: [ 1076.119800] amdgpu 0000:06:00.0: amdgpu: WALKER_ERROR: 0x0
2023-10-04T22:33:15.415886+03:00 mavi-ThinkPad-T14s kernel: [ 1076.119806] amdgpu 0000:06:00.0: amdgpu: PERMISSION_FAULTS: 0x3
2023-10-04T22:33:15.415887+03:00 mavi-ThinkPad-T14s kernel: [ 1076.119812] amdgpu 0000:06:00.0: amdgpu: MAPPING_ERROR: 0x0
2023-10-04T22:33:15.415888+03:00 mavi-ThinkPad-T14s kernel: [ 1076.119818] amdgpu 0000:06:00.0: amdgpu: RW: 0x0
2023-10-04T22:33:25.115577+03:00 mavi-ThinkPad-T14s kernel: [ 1085.820288] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_low timeout, signaled seq=75561, emitted seq=75563
2023-10-04T22:33:25.115600+03:00 mavi-ThinkPad-T14s kernel: [ 1085.821169] [drm:amdgpu_job_timedout [amdgpu]] *ERROR*...

Read more...

Revision history for this message
Martin Vysny (vyzivus) wrote :

The process is Intellij IDEA, running on Java 17 in Xwayland.

Revision history for this message
Mario Limonciello (superm1) wrote :

Ok in this case can you please open an upstream mesa bug?

Changed in mesa (Ubuntu):
status: Fix Released → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.