[amdgpu] kernel fault on monitor wakeup, only reboot recovers.

Bug #1960242 reported by Harry Coin
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

After a normal day's operations, the monitors switch off after an hour or so. Every morning since Impish, upon waking up and unlocking the system: 2 of the monitors wake, the rest remain in sleep mode. The only recourse is to do a graceful poweroff and cold boot. There is a kernel fault, as follows and in the attached file. It is 100% repeatable.

[121665.686476] ceph: mds0 caps renewed
[126829.470180] ------------[ cut here ]------------
[126829.470183] WARNING: CPU: 6 PID: 4368 at drivers/gpu/drm/ttm/ttm_bo.c:437 ttm_bo_release+0x2de/0x330 [ttm]
[126829.470191] Modules linked in: ceph libceph fscache netfs xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nft_chain_nat nf_nat xfrm_user nf_conntrack xfrm_algo nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype nft_compat nft_counter br_netfilter nf_tables nfnetlink vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) bridge stp llc overlay lz4 lz4_compress zram snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg intel_rapl_msr snd_intel_sdw_acpi intel_rapl_common snd_hda_codec sch_fq_codel snd_usb_audio snd_hda_core snd_usbmidi_lib snd_hwdep snd_pcm snd_seq_midi edac_mce_amd snd_seq_midi_event snd_rawmidi uvcvideo msr videobuf2_vmalloc parport_pc videobuf2_memops kvm_amd videobuf2_v4l2 videobuf2_common ppdev snd_seq videodev kvm snd_seq_device lp snd_timer rapl parport eeepc_wmi wmi_bmof joydev input_leds mc k10temp ccp snd soundcore mac_hid sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor
[126829.470220] zstd_compress raid6_pq libcrc32c hid_generic uas usbhid usb_storage hid amdgpu iommu_v2 gpu_sched nouveau radeon mxm_wmi i2c_algo_bit drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect crct10dif_pclmul sysimgblt fb_sys_fops crc32_pclmul cec ghash_clmulni_intel rc_core mfd_aaeon aesni_intel asus_wmi crypto_simd sparse_keymap nvme video xhci_pci r8169 gpio_amdpt ahci drm cryptd realtek i2c_piix4 libahci xhci_pci_renesas nvme_core wmi gpio_generic
[126829.470239] CPU: 6 PID: 4368 Comm: kwin_x11:cs0 Tainted: G OE 5.13.0-28-generic #31-Ubuntu
[126829.470241] Hardware name: System manufacturer System Product Name/PRIME B450-PLUS, BIOS 3211 08/10/2021
[126829.470242] RIP: 0010:ttm_bo_release+0x2de/0x330 [ttm]
[126829.470246] Code: e8 17 27 3f f0 e9 ac fd ff ff 49 8b 7e 98 b9 4c 1d 00 00 31 d2 be 01 00 00 00 e8 4d 4a 3f f0 49 8b 46 e8 eb 9d 4c 89 e0 eb 98 <0f> 0b 41 c7 86 84 00 00 00 00 00 00 00 49 8d 76 08 31 d2 4c 89 ef
[126829.470247] RSP: 0018:ffffb0df43547d28 EFLAGS: 00010202
[126829.470248] RAX: 0000000000000002 RBX: 0000000000000002 RCX: 0000000000000000
[126829.470249] RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffff89e51b945ac0
[126829.470250] RBP: ffffb0df43547d50 R08: ffff89e4529488e8 R09: 0000000000000002
[126829.470251] R10: ffff89e894fe4d38 R11: ffff89e51c2ffce8 R12: ffff89e51b9452b0
[126829.470251] R13: ffff89e8aa050c58 R14: ffff89e8aa050db8 R15: ffff89eb3bb8c480
[126829.470252] FS: 00007f21ee182640(0000) GS:ffff89ec1e300000(0000) knlGS:0000000000000000
[126829.470253] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[126829.470254] CR2: 00007fbf8805e7e8 CR3: 000000045811c000 CR4: 0000000000350ee0
[126829.470255] Call Trace:
[126829.470256] <TASK>
[126829.470258] ttm_bo_put+0x30/0x50 [ttm]
[126829.470262] amdgpu_bo_unref+0x1e/0x30 [amdgpu]
[126829.470365] amdgpu_gem_object_free+0x34/0x50 [amdgpu]
[126829.470454] drm_gem_object_free+0x1d/0x30 [drm]
[126829.470470] drm_gem_dmabuf_release+0x40/0x60 [drm]
[126829.470486] dma_buf_release+0x46/0xa0
[126829.470488] __dentry_kill+0x10e/0x190
[126829.470490] dentry_kill+0x52/0x1c0
[126829.470491] dput+0x12f/0x180
[126829.470492] __fput+0xf0/0x250
[126829.470494] ____fput+0xe/0x10
[126829.470495] task_work_run+0x6d/0xa0
[126829.470497] exit_to_user_mode_loop+0x150/0x160
[126829.470499] exit_to_user_mode_prepare+0x9f/0xb0
[126829.470500] syscall_exit_to_user_mode+0x27/0x50
[126829.470503] do_syscall_64+0x6e/0xb0
[126829.470504] ? do_syscall_64+0x6e/0xb0
[126829.470506] ? asm_sysvec_apic_timer_interrupt+0xa/0x20
[126829.470507] entry_SYSCALL_64_after_hwframe+0x44/0xae
[126829.470509] RIP: 0033:0x7f21fb43e9cb
[126829.470510] Code: ff ff ff 85 c0 79 8b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 35 a4 0f 00 f7 d8 64 89 01 48
[126829.470511] RSP: 002b:00007f21ee181738 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[126829.470513] RAX: 0000000000000000 RBX: 00007f21ee181770 RCX: 00007f21fb43e9cb
[126829.470513] RDX: 00007f21ee181770 RSI: 0000000040086409 RDI: 000000000000000d
[126829.470514] RBP: 0000000040086409 R08: 000055849cb1c370 R09: 0000000000000000
[126829.470515] R10: 0000000000000000 R11: 0000000000000246 R12: 000055849c056508
[126829.470515] R13: 000000000000000d R14: 000055849c057434 R15: 00007f21ee1817a0
[126829.470516] </TASK>
[126829.470517] ---[ end trace 85f61b72534c54f5 ]---

ProblemType: Bug
DistroRelease: Ubuntu 21.10
Package: linux-image-5.13.0-28-generic 5.13.0-28.31
ProcVersionSignature: Ubuntu 5.13.0-28.31-generic 5.13.19
Uname: Linux 5.13.0-28-generic x86_64
ApportVersion: 2.20.11-0ubuntu71
Architecture: amd64
CasperMD5CheckResult: unknown
Date: Mon Feb 7 08:46:22 2022
InstallationDate: Installed on 2020-02-12 (725 days ago)
InstallationMedia: Kubuntu 19.10 "Eoan Ermine" - Release amd64 (20191017)
MachineType: System manufacturer System Product Name
ProcEnviron:
 LANGUAGE=
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/@/boot/vmlinuz-5.13.0-28-generic root=UUID=ea50360b-b304-4b4c-aa64-fa994b95da37 ro rootflags=subvol=@ acpi_enforce_resources=lax radeon.si_support=0 amdgpu.si_support=1
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-5.13.0-28-generic N/A
 linux-backports-modules-5.13.0-28-generic N/A
 linux-firmware 1.201.3
RfKill:

SourcePackage: linux
UpgradeStatus: Upgraded to impish on 2022-01-04 (34 days ago)
dmi.bios.date: 08/10/2021
dmi.bios.release: 5.17
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 3211
dmi.board.asset.tag: Default string
dmi.board.name: PRIME B450-PLUS
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev X.0x
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr3211:bd08/10/2021:br5.17:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnPRIMEB450-PLUS:rvrRevX.0x:cvnDefaultstring:ct3:cvrDefaultstring:skuSKU:
dmi.product.family: To be filled by O.E.M.
dmi.product.name: System Product Name
dmi.product.sku: SKU
dmi.product.version: System Version
dmi.sys.vendor: System manufacturer

Revision history for this message
Harry Coin (hcoin) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
summary: - kernel fault on monitor wakeup, only reboot recovers.
+ [amdgpu] kernel fault on monitor wakeup, only reboot recovers.
tags: added: amdgpu
Revision history for this message
James Renken (jrenken) wrote :
Download full text (5.1 KiB)

I've been encountering this problem with an RX 6600 using linux-image-5.15.0-22-lowlatency and linux-image-5.15.0-23-generic on Jammy. This had not been happening with linux-image-5.13.0-37-lowlatency on Impish.

[78421.618570] INFO: task gnome-shell:2364 blocked for more than 241 seconds.
[78421.618575] Not tainted 5.15.0-23-generic #23-Ubuntu
[78421.618577] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[78421.618578] task:gnome-shell state:D stack: 0 pid: 2364 ppid: 2194 flags:0x00004002
[78421.618581] Call Trace:
[78421.618583] <TASK>
[78421.618585] __schedule+0x23d/0x590
[78421.618590] schedule+0x4e/0xb0
[78421.618591] schedule_preempt_disabled+0xe/0x10
[78421.618593] __mutex_lock.constprop.0+0x263/0x490
[78421.618596] __mutex_lock_slowpath+0x13/0x20
[78421.618597] mutex_lock+0x34/0x40
[78421.618598] amdgpu_dm_atomic_commit_tail+0x5a5/0x1430 [amdgpu]
[78421.618737] ? dcn30_calculate_wm_and_dlg_fp+0x7b0/0xac0 [amdgpu]
[78421.618861] ? hubbub3_get_dcc_compression_cap+0x92/0x2d0 [amdgpu]
[78421.618995] ? dcn20_get_dcc_compression_cap+0x23/0x30 [amdgpu]
[78421.619126] ? fill_gfx9_plane_attributes_from_modifiers+0x217/0x2e0 [amdgpu]
[78421.619240] ? dcn30_validate_bandwidth+0x1a0/0x340 [amdgpu]
[78421.619355] ? ttm_bo_mem_compat+0x30/0x90 [ttm]
[78421.619359] ? fill_plane_buffer_attributes+0x137/0x290 [amdgpu]
[78421.619470] ? __cond_resched+0x1a/0x50
[78421.619472] ? __wait_for_common+0x3e/0x150
[78421.619474] ? dm_plane_helper_prepare_fb+0x236/0x290 [amdgpu]
[78421.619579] ? usleep_range_state+0x90/0x90
[78421.619581] ? wait_for_completion_timeout+0x1d/0x20
[78421.619583] commit_tail+0xc5/0x170 [drm_kms_helper]
[78421.619593] ? drm_atomic_helper_swap_state+0x246/0x370 [drm_kms_helper]
[78421.619601] drm_atomic_helper_commit+0x123/0x150 [drm_kms_helper]
[78421.619608] drm_atomic_commit+0x4a/0x50 [drm]
[78421.619623] drm_mode_atomic_ioctl+0x530/0x740 [drm]
[78421.619636] ? drm_plane_create_color_properties.cold+0x48/0x48 [drm]
[78421.619650] ? drm_atomic_set_property+0x150/0x150 [drm]
[78421.619661] drm_ioctl_kernel+0xae/0xf0 [drm]
[78421.619673] drm_ioctl+0x264/0x4b0 [drm]
[78421.619683] ? drm_atomic_set_property+0x150/0x150 [drm]
[78421.619695] ? recalibrate_cpu_khz+0x10/0x10
[78421.619697] ? ktime_get_mono_fast_ns+0x52/0xa0
[78421.619699] amdgpu_drm_ioctl+0x4e/0x80 [amdgpu]
[78421.619770] __x64_sys_ioctl+0x91/0xc0
[78421.619773] do_syscall_64+0x5c/0xc0
[78421.619776] ? syscall_exit_to_user_mode+0x27/0x50
[78421.619777] ? do_syscall_64+0x69/0xc0
[78421.619778] ? syscall_exit_to_user_mode+0x27/0x50
[78421.619779] ? do_syscall_64+0x69/0xc0
[78421.619781] ? do_syscall_64+0x69/0xc0
[78421.619782] ? do_syscall_64+0x69/0xc0
[78421.619783] ? irqentry_exit_to_user_mode+0x9/0x20
[78421.619784] ? irqentry_exit+0x19/0x30
[78421.619785] ? sysvec_apic_timer_interrupt+0x4e/0x90
[78421.619786] ? asm_sysvec_apic_timer_interrupt+0xa/0x20
[78421.619788] entry_SYSCALL_64_after_hwframe+0x44/0xae
[78421.619789] RIP: 0033:0x7fd826692aff
[78421.619791] RSP: 002b:00007ffed2c1a150 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[78421.619793] RAX: ffffffffffffffda RBX: 00007ffed2c1a1...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.