AMDGPU lockup on every computer sleep if monitor is already asleep

Bug #1919508 reported by Sergiu Bivol
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux-signed-oem-5.10 (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

The system always locks up, requiring a reboot.

Steps to reproduce:
1. Configure power saving to turn off the monitor after a period of inactivity.
2. Configure power saving to suspend the PC automatically after a certain delay which is longer than the above one.
3. Wait.
4. The system will lock up with no way of returning to an operational state.

=== Hardware tested ===
GPU: Radeon RX 5600XT
Monitor: Samsung LS24H850QFU (FreeSync enabled)
Connection: DisplayPort

=== Previous kernels ===
Same issue on 5.4 and 5.8 kernels (from Ubuntu packages).

=== Relevant crash info ===
Mar 17 19:48:56 laptop kernel: [ 8692.935426] [drm] free PSP TMR buffer
Mar 17 19:49:04 laptop kernel: [ 8700.925536] [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
Mar 17 19:49:04 laptop kernel: [ 8700.925549] [drm] PSP is resuming...
Mar 17 19:49:04 laptop kernel: [ 8701.107392] [drm] reserve 0xa00000 from 0x803f400000 for PSP TMR
Mar 17 19:49:04 laptop kernel: [ 8701.295353] amdgpu 0000:0a:00.0: amdgpu: RAS: optional ras ta ucode is not available
Mar 17 19:49:04 laptop kernel: [ 8701.319351] amdgpu 0000:0a:00.0: amdgpu: RAP: optional rap ta ucode is not available
Mar 17 19:49:04 laptop kernel: [ 8701.319353] amdgpu 0000:0a:00.0: amdgpu: SMU is resuming...
Mar 17 19:49:04 laptop kernel: [ 8701.319358] amdgpu 0000:0a:00.0: amdgpu: smu driver if version = 0x00000036, smu fw if version = 0x00000035, smu fw version = 0x002a3200 (42.50.0)
Mar 17 19:49:04 laptop kernel: [ 8701.319358] amdgpu 0000:0a:00.0: amdgpu: SMU driver if version not matched
Mar 17 19:49:07 laptop kernel: [ 8703.784372] amdgpu 0000:0a:00.0: amdgpu: failed send message: EnableAllSmuFeatures (6) param: 0x00000000 response 0xffffffc2
Mar 17 19:49:07 laptop kernel: [ 8703.784375] amdgpu 0000:0a:00.0: amdgpu: Failed to enable requested dpm features!
Mar 17 19:49:07 laptop kernel: [ 8703.784377] amdgpu 0000:0a:00.0: amdgpu: Failed to setup smc hw!
Mar 17 19:49:07 laptop kernel: [ 8703.784458] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
Mar 17 19:49:07 laptop kernel: [ 8703.784460] amdgpu 0000:0a:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
Mar 17 19:49:07 laptop kernel: [ 8703.803693] snd_hda_intel 0000:0a:00.1: refused to change power state from D3hot to D0
Mar 17 19:49:07 laptop kernel: [ 8703.907879] snd_hda_intel 0000:0a:00.1: CORB reset timeout#2, CORBRP = 65535
Mar 17 19:49:07 laptop kernel: [ 8703.929045] amdgpu: Move buffer fallback to memcpy unavailable
Mar 17 19:49:07 laptop kernel: [ 8703.929137] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Mar 17 19:49:09 laptop kernel: [ 8705.904322] amdgpu: Move buffer fallback to memcpy unavailable
Mar 17 19:49:09 laptop kernel: [ 8705.904385] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Mar 17 19:49:19 laptop kernel: [ 8715.651540] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=11267, emitted seq=11269
Mar 17 19:49:19 laptop kernel: [ 8715.651633] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
Mar 17 19:49:19 laptop kernel: [ 8715.651638] amdgpu 0000:0a:00.0: amdgpu: GPU reset begin!
Mar 17 19:49:19 laptop kernel: [ 8715.651667] ------------[ cut here ]------------
Mar 17 19:49:19 laptop kernel: [ 8715.651668] kernel BUG at mm/slub.c:304!
Mar 17 19:49:19 laptop kernel: [ 8715.651674] invalid opcode: 0000 [#1] SMP NOPTI
Mar 17 19:49:19 laptop kernel: [ 8715.651677] CPU: 11 PID: 8470 Comm: kworker/11:1 Tainted: P W O 5.10.0-1016-oem #17-Ubuntu
Mar 17 19:49:19 laptop kernel: [ 8715.651678] Hardware name: Gigabyte Technology Co., Ltd. B550I AORUS PRO AX/B550I AORUS PRO AX, BIOS F12 01/18/2021
Mar 17 19:49:19 laptop kernel: [ 8715.651684] Workqueue: events drm_sched_job_timedout [gpu_sched]
Mar 17 19:49:19 laptop kernel: [ 8715.651689] RIP: 0010:__slab_free+0x1c9/0x380
Mar 17 19:49:19 laptop kernel: [ 8715.651691] Code: 41 5e 41 5f 5d c3 41 f7 46 08 00 0d 21 00 0f 85 f0 fe ff ff 4d 85 ed 0f 85 e7 fe ff ff 80 4c 24 5b 80 45 31 c0 e9 2a ff ff ff <0f> 0b 49 3b 5c 24 28 75 97 4c 89 c0 41 89 f0 44 89 fe 49 89 4c 24
Mar 17 19:49:19 laptop kernel: [ 8715.651693] RSP: 0018:ffffbb0cc06ffbc0 EFLAGS: 00010246
Mar 17 19:49:19 laptop kernel: [ 8715.651695] RAX: ffff96a256699d00 RBX: 000000008020001f RCX: ffff96a256699c00
Mar 17 19:49:19 laptop kernel: [ 8715.651697] RDX: ffff96a256699c00 RSI: ffffe7330459a600 RDI: ffff96a240043600
Mar 17 19:49:19 laptop kernel: [ 8715.651698] RBP: ffffbb0cc06ffc60 R08: 0000000000000001 R09: ffffffffc098a1ae
Mar 17 19:49:19 laptop kernel: [ 8715.651699] R10: ffff96a256699c00 R11: 0000000000000001 R12: ffffe7330459a600
Mar 17 19:49:19 laptop kernel: [ 8715.651700] R13: ffff96a256699c00 R14: ffff96a240043600 R15: ffff96a256699c00
Mar 17 19:49:19 laptop kernel: [ 8715.651702] FS: 0000000000000000(0000) GS:ffff96b13ecc0000(0000) knlGS:0000000000000000
Mar 17 19:49:19 laptop kernel: [ 8715.651703] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 17 19:49:19 laptop kernel: [ 8715.651705] CR2: 00002ca11a4bf000 CR3: 0000000223f54000 CR4: 0000000000350ee0
Mar 17 19:49:19 laptop kernel: [ 8715.651706] Call Trace:
Mar 17 19:49:19 laptop kernel: [ 8715.651711] kfree+0x3af/0x400
Mar 17 19:49:19 laptop kernel: [ 8715.651715] ? _cond_resched+0x19/0x30
Mar 17 19:49:19 laptop kernel: [ 8715.651802] ? uninit_queue+0xe/0x10 [amdgpu]
Mar 17 19:49:19 laptop kernel: [ 8715.651885] uninit_queue+0xe/0x10 [amdgpu]
Mar 17 19:49:19 laptop kernel: [ 8715.651967] kernel_queue_uninit+0x94/0x100 [amdgpu]
Mar 17 19:49:19 laptop kernel: [ 8715.652047] pm_uninit+0x16/0x20 [amdgpu]
Mar 17 19:49:19 laptop kernel: [ 8715.652128] stop_cpsch+0xa7/0xd0 [amdgpu]
Mar 17 19:49:19 laptop kernel: [ 8715.652209] kgd2kfd_suspend.part.0+0x35/0x50 [amdgpu]
Mar 17 19:49:19 laptop kernel: [ 8715.652288] kgd2kfd_pre_reset+0x47/0x60 [amdgpu]
Mar 17 19:49:19 laptop kernel: [ 8715.652367] amdgpu_amdkfd_pre_reset+0x1a/0x20 [amdgpu]
Mar 17 19:49:19 laptop kernel: [ 8715.652471] amdgpu_device_gpu_recover.cold+0x352/0x98e [amdgpu]
Mar 17 19:49:19 laptop kernel: [ 8715.652560] amdgpu_job_timedout+0x123/0x150 [amdgpu]
Mar 17 19:49:19 laptop kernel: [ 8715.652564] drm_sched_job_timedout+0x72/0xc0 [gpu_sched]
Mar 17 19:49:19 laptop kernel: [ 8715.652568] process_one_work+0x1ef/0x390
Mar 17 19:49:19 laptop kernel: [ 8715.652570] worker_thread+0x4d/0x3f0
Mar 17 19:49:19 laptop kernel: [ 8715.652573] kthread+0x114/0x150
Mar 17 19:49:19 laptop kernel: [ 8715.652575] ? process_one_work+0x390/0x390
Mar 17 19:49:19 laptop kernel: [ 8715.652577] ? kthread_park+0x90/0x90
Mar 17 19:49:19 laptop kernel: [ 8715.652580] ret_from_fork+0x22/0x30
Mar 17 19:49:19 laptop kernel: [ 8715.652582] Modules linked in: ccm md4 nls_utf8 cifs fscache libdes veth xt_comment zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype aufs rfcomm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bpfilter cmac algif_hash algif_skcipher af_alg bnep binfmt_misc snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence snd_hda_codec snd_hda_core soundwire_bus nls_iso8859_1 snd_soc_core snd_usb_audio snd_compress snd_usbmidi_lib ac97_bus snd_hwdep snd_pcm_dmaengine snd_pcm iwlmvm edac_mce_amd snd_seq_midi uvcvideo snd_seq_midi_event kvm_amd mac80211 videobuf2_vmalloc snd_rawmidi kvm libarc4 videobuf2_memops videobuf2_v4l2 videobuf2_common snd_seq
Mar 17 19:49:19 laptop kernel: [ 8715.652633] crct10dif_pclmul cdc_ether crc32_pclmul btusb videodev btrtl snd_seq_device btbcm usbnet snd_timer iwlwifi snd r8152 btintel ghash_clmulni_intel k10temp sch_fq_codel efi_pstore mc wmi_bmof i2c_piix4 rapl mii bluetooth soundcore ccp cfg80211 ecdh_generic ecc overlay ahci libahci iptable_filter ip6table_filter ip6_tables br_netfilter bridge stp llc arp_tables parport_pc ppdev lp parport sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic libcrc32c xor raid6_pq input_leds dm_crypt joydev hid_corsair hid_generic usbhid hid amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops cec glue_helper rc_core crypto_simd cryptd drm nvme r8169 realtek xhci_pci nvme_core xhci_pci_renesas wmi gpio_amdpt gpio_generic mac_hid
Mar 17 19:49:19 laptop kernel: [ 8715.652686] ---[ end trace 7840b52a6b850688 ]---
Mar 17 19:49:19 laptop kernel: [ 8715.788337] RIP: 0010:__slab_free+0x1c9/0x380
Mar 17 19:49:19 laptop kernel: [ 8715.788340] Code: 41 5e 41 5f 5d c3 41 f7 46 08 00 0d 21 00 0f 85 f0 fe ff ff 4d 85 ed 0f 85 e7 fe ff ff 80 4c 24 5b 80 45 31 c0 e9 2a ff ff ff <0f> 0b 49 3b 5c 24 28 75 97 4c 89 c0 41 89 f0 44 89 fe 49 89 4c 24
Mar 17 19:49:19 laptop kernel: [ 8715.788342] RSP: 0018:ffffbb0cc06ffbc0 EFLAGS: 00010246
Mar 17 19:49:19 laptop kernel: [ 8715.788344] RAX: ffff96a256699d00 RBX: 000000008020001f RCX: ffff96a256699c00
Mar 17 19:49:19 laptop kernel: [ 8715.788345] RDX: ffff96a256699c00 RSI: ffffe7330459a600 RDI: ffff96a240043600
Mar 17 19:49:19 laptop kernel: [ 8715.788347] RBP: ffffbb0cc06ffc60 R08: 0000000000000001 R09: ffffffffc098a1ae
Mar 17 19:49:19 laptop kernel: [ 8715.788348] R10: ffff96a256699c00 R11: 0000000000000001 R12: ffffe7330459a600
Mar 17 19:49:19 laptop kernel: [ 8715.788349] R13: ffff96a256699c00 R14: ffff96a240043600 R15: ffff96a256699c00
Mar 17 19:49:19 laptop kernel: [ 8715.788351] FS: 0000000000000000(0000) GS:ffff96b13ecc0000(0000) knlGS:0000000000000000
Mar 17 19:49:19 laptop kernel: [ 8715.788352] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 17 19:49:19 laptop kernel: [ 8715.788354] CR2: 00002ca11a4bf000 CR3: 0000000223f54000 CR4: 0000000000350ee0

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.10.0-1016-oem 5.10.0-1016.17
ProcVersionSignature: Ubuntu 5.10.0-1016.17-oem 5.10.11
Uname: Linux 5.10.0-1016-oem x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.11-0ubuntu27.16
Architecture: amd64
CasperMD5CheckResult: skip
CurrentDesktop: KDE
Date: Wed Mar 17 20:22:38 2021
SourcePackage: linux-signed-oem-5.10
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Sergiu Bivol (sergiu-bivol) wrote :
Revision history for this message
Michael Gratton (mjog) wrote :

Getting this with 5.13 on 21.10 with a Radeon RX 6800 XT with a AMD Ryzen 5 3600.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-signed-oem-5.10 (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.