Bug #2033327 “dGPU suspend fails under memory pressure” : Bugs : Linux

Revision history for this message

Pierre Equoy (pieq) wrote on 2023-08-29:

#1

Dependencies.txt Edit (3.0 KiB, text/plain; charset="utf-8")
ProcCpuinfoMinimal.txt Edit (1.3 KiB, text/plain; charset="utf-8")
ProcEnviron.txt Edit (305 bytes, text/plain; charset="utf-8")

Revision history for this message

Pierre Equoy (pieq) wrote on 2023-08-29:

#2

journalctl-b0-20230829.log Edit (4.3 MiB, text/html)

Download full text (10.3 KiB)

Attached are the logs for the last boot. As you can see, the device had been running for a week without any problem, but suspending/resuming killed it.

We can see the following stack trace in the logs:

Aug 29 09:12:32 coltrane kernel: kworker/u8:24: page allocation failure: order:5, mode:0x40d00(GFP_NOIO|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
Aug 29 09:12:32 coltrane kernel: CPU: 3 PID: 903164 Comm: kworker/u8:24 Not tainted 6.2.0-26-generic #26~22.04.1-Ubuntu
Aug 29 09:12:32 coltrane kernel: Hardware name: ASUS All Series/B85M-K, BIOS 3602 03/26/2018
Aug 29 09:12:32 coltrane kernel: Workqueue: events_unbound async_run_entry_fn
Aug 29 09:12:32 coltrane kernel: Call Trace:
Aug 29 09:12:32 coltrane kernel: <TASK>
Aug 29 09:12:32 coltrane kernel: dump_stack_lvl+0x48/0x70
Aug 29 09:12:32 coltrane kernel: dump_stack+0x10/0x20
Aug 29 09:12:32 coltrane kernel: warn_alloc+0x14b/0x1c0
Aug 29 09:12:32 coltrane kernel: ? __alloc_pages_direct_compact+0xa7/0x240
Aug 29 09:12:32 coltrane kernel: __alloc_pages_slowpath.constprop.0+0x910/0x990
Aug 29 09:12:32 coltrane kernel: __alloc_pages+0x32c/0x360
Aug 29 09:12:32 coltrane kernel: __kmalloc_large_node+0x89/0x170
Aug 29 09:12:32 coltrane kernel: kmalloc_large+0x21/0xc0
Aug 29 09:12:32 coltrane kernel: dc_set_power_state+0x49/0x190 [amdgpu]
Aug 29 09:12:32 coltrane kernel: dm_suspend+0x9f/0x290 [amdgpu]
Aug 29 09:12:32 coltrane kernel: ? vi_common_set_clockgating_state+0xd7/0x190 [amdgpu]
Aug 29 09:12:32 coltrane kernel: amdgpu_device_ip_suspend_phase1+0xb9/0x1d0 [amdgpu]
Aug 29 09:12:32 coltrane kernel: amdgpu_device_suspend+0xca/0x1d0 [amdgpu]
Aug 29 09:12:32 coltrane kernel: amdgpu_pmops_suspend+0x33/0x50 [amdgpu]
Aug 29 09:12:32 coltrane kernel: pci_pm_suspend+0x8a/0x1c0
Aug 29 09:12:32 coltrane kernel: ? __pfx_pci_pm_suspend+0x10/0x10
Aug 29 09:12:32 coltrane kernel: dpm_run_callback+0x54/0x1a0
Aug 29 09:12:32 coltrane kernel: __device_suspend+0x14b/0x400
Aug 29 09:12:32 coltrane kernel: async_suspend+0x1f/0x80
Aug 29 09:12:32 coltrane kernel: async_run_entry_fn+0x33/0x130
Aug 29 09:12:32 coltrane kernel: process_one_work+0x21f/0x440
Aug 29 09:12:32 coltrane kernel: worker_thread+0x50/0x3f0
Aug 29 09:12:32 coltrane kernel: ? __pfx_worker_thread+0x10/0x10
Aug 29 09:12:32 coltrane kernel: kthread+0xee/0x120
Aug 29 09:12:32 coltrane kernel: ? __pfx_kthread+0x10/0x10
Aug 29 09:12:32 coltrane kernel: ret_from_fork+0x2c/0x50
Aug 29 09:12:32 coltrane kernel: </TASK>
Aug 29 09:12:32 coltrane kernel: Mem-Info:
Aug 29 09:12:32 coltrane kernel: active_anon:6 inactive_anon:2748750 isolated_anon:0
                                  active_file:309 inactive_file:232 isolated_file:0
                                  unevictable:41 dirty:1 writeback:8
                                  slab_reclaimable:60650 slab_unreclaimable:80021
                                  mapped:1 shmem:70425 pagetables:29671
                                  sec_pagetables:0 bounce:0
                                  kernel_misc_reclaimable:0
                                  free:106406 free_pcp:0 free_cma:0
Aug 29 09:12:32 coltrane kernel: Node 0 active_anon:24kB inactive_anon:10995000...

Attached are the logs for the last boot. As you can see, the device had been running for a week without any problem, but suspending/resuming killed it.

We can see the following stack trace in the logs:

Aug 29 09:12:32 coltrane kernel: kworker/u8:24: page allocation failure: order:5, mode:0x40d00(GFP_NOIO|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
Aug 29 09:12:32 coltrane kernel: CPU: 3 PID: 903164 Comm: kworker/u8:24 Not tainted 6.2.0-26-generic #26~22.04.1-Ubuntu
Aug 29 09:12:32 coltrane kernel: Hardware name: ASUS All Series/B85M-K, BIOS 3602 03/26/2018
Aug 29 09:12:32 coltrane kernel: Workqueue: events_unbound async_run_entry_fn
Aug 29 09:12:32 coltrane kernel: Call Trace:
Aug 29 09:12:32 coltrane kernel:  <TASK>
Aug 29 09:12:32 coltrane kernel:  dump_stack_lvl+0x48/0x70
Aug 29 09:12:32 coltrane kernel:  dump_stack+0x10/0x20
Aug 29 09:12:32 coltrane kernel:  warn_alloc+0x14b/0x1c0
Aug 29 09:12:32 coltrane kernel:  ? __alloc_pages_direct_compact+0xa7/0x240
Aug 29 09:12:32 coltrane kernel:  __alloc_pages_slowpath.constprop.0+0x910/0x990
Aug 29 09:12:32 coltrane kernel:  __alloc_pages+0x32c/0x360
Aug 29 09:12:32 coltrane kernel:  __kmalloc_large_node+0x89/0x170
Aug 29 09:12:32 coltrane kernel:  kmalloc_large+0x21/0xc0
Aug 29 09:12:32 coltrane kernel:  dc_set_power_state+0x49/0x190 [amdgpu]
Aug 29 09:12:32 coltrane kernel:  dm_suspend+0x9f/0x290 [amdgpu]
Aug 29 09:12:32 coltrane kernel:  ? vi_common_set_clockgating_state+0xd7/0x190 [amdgpu]
Aug 29 09:12:32 coltrane kernel:  amdgpu_device_ip_suspend_phase1+0xb9/0x1d0 [amdgpu]
Aug 29 09:12:32 coltrane kernel:  amdgpu_device_suspend+0xca/0x1d0 [amdgpu]
Aug 29 09:12:32 coltrane kernel:  amdgpu_pmops_suspend+0x33/0x50 [amdgpu]
Aug 29 09:12:32 coltrane kernel:  pci_pm_suspend+0x8a/0x1c0
Aug 29 09:12:32 coltrane kernel:  ? __pfx_pci_pm_suspend+0x10/0x10
Aug 29 09:12:32 coltrane kernel:  dpm_run_callback+0x54/0x1a0
Aug 29 09:12:32 coltrane kernel:  __device_suspend+0x14b/0x400
Aug 29 09:12:32 coltrane kernel:  async_suspend+0x1f/0x80
Aug 29 09:12:32 coltrane kernel:  async_run_entry_fn+0x33/0x130
Aug 29 09:12:32 coltrane kernel:  process_one_work+0x21f/0x440
Aug 29 09:12:32 coltrane kernel:  worker_thread+0x50/0x3f0
Aug 29 09:12:32 coltrane kernel:  ? __pfx_worker_thread+0x10/0x10
Aug 29 09:12:32 coltrane kernel:  kthread+0xee/0x120
Aug 29 09:12:32 coltrane kernel:  ? __pfx_kthread+0x10/0x10
Aug 29 09:12:32 coltrane kernel:  ret_from_fork+0x2c/0x50
Aug 29 09:12:32 coltrane kernel:  </TASK>
Aug 29 09:12:32 coltrane kernel: Mem-Info:
Aug 29 09:12:32 coltrane kernel: active_anon:6 inactive_anon:2748750 isolated_anon:0
                                  active_file:309 inactive_file:232 isolated_file:0
                                  unevictable:41 dirty:1 writeback:8
                                  slab_reclaimable:60650 slab_unreclaimable:80021
                                  mapped:1 shmem:70425 pagetables:29671
                                  sec_pagetables:0 bounce:0
                                  kernel_misc_reclaimable:0
                                  free:106406 free_pcp:0 free_cma:0
Aug 29 09:12:32 coltrane kernel: Node 0 active_anon:24kB inactive_anon:10995000kB active_file:1236kB inactive_file:928kB unevictable:164kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:4kB writeback:32kB shmem:281700kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 2048kB writeback_tmp:0kB kernel_stack:42288kB pagetables:118684kB sec_pagetables:0kB all_unreclaimable? no
Aug 29 09:12:32 coltrane kernel: Node 0 DMA free:13308kB boost:0kB min:64kB low:80kB high:96kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Aug 29 09:12:32 coltrane kernel: lowmem_reserve[]: 0 3380 15822 15822 15822
Aug 29 09:12:32 coltrane kernel: Node 0 DMA32 free:197068kB boost:32448kB min:46872kB low:50476kB high:54080kB reserved_highatomic:0KB active_anon:0kB inactive_anon:2122496kB active_file:0kB inactive_file:0kB unevictable:80kB writepending:0kB present:3606752kB managed:3541040kB mlocked:80kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Aug 29 09:12:32 coltrane kernel: lowmem_reserve[]: 0 0 12442 12442 12442
Aug 29 09:12:32 coltrane kernel: Node 0 Normal free:215248kB boost:119452kB min:172544kB low:185816kB high:199088kB reserved_highatomic:2048KB active_anon:24kB inactive_anon:8872504kB active_file:1040kB inactive_file:1136kB unevictable:84kB writepending:36kB present:13090816kB managed:12748956kB mlocked:84kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Aug 29 09:12:32 coltrane kernel: lowmem_reserve[]: 0 0 0 0 0
Aug 29 09:12:32 coltrane kernel: Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (U) 0*1024kB 2*2048kB (UM) 2*4096kB (M) = 13308kB
Aug 29 09:12:32 coltrane kernel: Node 0 DMA32: 49272*4kB (UME) 1*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 197096kB
Aug 29 09:12:32 coltrane kernel: Node 0 Normal: 53516*4kB (UMEH) 44*8kB (UMH) 19*16kB (H) 16*32kB (UH) 5*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 215552kB
Aug 29 09:12:32 coltrane kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Aug 29 09:12:32 coltrane kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Aug 29 09:12:32 coltrane kernel: 71214 total pagecache pages
Aug 29 09:12:32 coltrane kernel: 245 pages in swap cache
Aug 29 09:12:32 coltrane kernel: Free swap  = 176108kB
Aug 29 09:12:32 coltrane kernel: Total swap = 2097148kB
Aug 29 09:12:32 coltrane kernel: 4178389 pages RAM
Aug 29 09:12:32 coltrane kernel: 0 pages HighMem/MovableOnly
Aug 29 09:12:32 coltrane kernel: 102050 pages reserved
Aug 29 09:12:32 coltrane kernel: 0 pages hwpoisoned
Aug 29 09:12:32 coltrane kernel: ------------[ cut here ]------------
Aug 29 09:12:32 coltrane kernel: WARNING: CPU: 3 PID: 903164 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:4277 dc_set_power_state+0x186/0x190 [amdgpu]
Aug 29 09:12:32 coltrane kernel: Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs snd_seq_dummy uas usb_storage vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel sunrpc kvm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel mei_hdcp mei_pxp sha512_ssse3 aesni_intel binfmt_misc crypto_simd amdgpu cryptd nls_iso8859_1 rapl snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi intel_cstate uvcvideo videobuf2_vmalloc snd_hda_intel videobuf2_memops videobuf2_v4l2 snd_intel_dspcfg snd_intel_sdw_acpi videodev snd_hda_codec eeepc_wmi snd_usb_audio videobuf2_common wmi_bmof input_leds snd_hda_core snd_usbmidi_lib mc snd_hwdep joydev snd_seq_midi snd_pcm snd_seq_midi_event at24 iommu_v2 drm_buddy gpu_sched drm_ttm_helper snd_rawmidi ttm drm_display_helper snd_seq cec cmdlinepart spi_nor rc_core snd_seq_device drm_kms_helper mtd
Aug 29 09:12:32 coltrane kernel:  i2c_algo_bit snd_timer syscopyarea snd sysfillrect sysimgblt soundcore mei_me mei mac_hid sch_fq_codel msr parport_pc ppdev ramoops pstore_blk lp reed_solomon pstore_zone drm parport efi_pstore ip_tables x_tables autofs4 hid_generic usbhid hid mfd_aaeon asus_wmi ledtrig_audio spi_intel_platform sparse_keymap spi_intel platform_profile r8169 ahci crc32_pclmul realtek i2c_i801 i2c_smbus libahci lpc_ich xhci_pci xhci_pci_renesas video wmi
Aug 29 09:12:32 coltrane kernel: CPU: 3 PID: 903164 Comm: kworker/u8:24 Not tainted 6.2.0-26-generic #26~22.04.1-Ubuntu
Aug 29 09:12:32 coltrane kernel: Hardware name: ASUS All Series/B85M-K, BIOS 3602 03/26/2018
Aug 29 09:12:32 coltrane kernel: Workqueue: events_unbound async_run_entry_fn
Aug 29 09:12:32 coltrane kernel: RIP: 0010:dc_set_power_state+0x186/0x190 [amdgpu]
Aug 29 09:12:32 coltrane kernel: Code: bc 24 98 f4 01 00 49 8b 84 24 80 f3 01 00 49 8d 94 24 98 03 00 00 4c 89 e6 e8 06 0f 37 cb e9 47 ff ff ff 0f 0b e9 b4 fe ff ff <0f> 0b e9 39 ff ff ff 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90
Aug 29 09:12:32 coltrane kernel: RSP: 0018:ffffb9b98bde3c40 EFLAGS: 00010246
Aug 29 09:12:32 coltrane kernel: RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000000
Aug 29 09:12:32 coltrane kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Aug 29 09:12:32 coltrane kernel: RBP: ffffb9b98bde3c60 R08: 0000000000000000 R09: 0000000000000000
Aug 29 09:12:32 coltrane kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff922043d60000
Aug 29 09:12:32 coltrane kernel: R13: 0000000000000000 R14: ffffffffc126a410 R15: ffff9220537e0000
Aug 29 09:12:32 coltrane kernel: FS:  0000000000000000(0000) GS:ffff92234ed80000(0000) knlGS:0000000000000000
Aug 29 09:12:32 coltrane kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 29 09:12:32 coltrane kernel: CR2: 00005602366bd746 CR3: 0000000180210006 CR4: 00000000001706e0
Aug 29 09:12:32 coltrane kernel: Call Trace:
Aug 29 09:12:32 coltrane kernel:  <TASK>
Aug 29 09:12:32 coltrane kernel:  dm_suspend+0x9f/0x290 [amdgpu]
Aug 29 09:12:32 coltrane kernel:  ? vi_common_set_clockgating_state+0xd7/0x190 [amdgpu]
Aug 29 09:12:32 coltrane kernel:  amdgpu_device_ip_suspend_phase1+0xb9/0x1d0 [amdgpu]
Aug 29 09:12:32 coltrane kernel:  amdgpu_device_suspend+0xca/0x1d0 [amdgpu]
Aug 29 09:12:32 coltrane kernel:  amdgpu_pmops_suspend+0x33/0x50 [amdgpu]
Aug 29 09:12:32 coltrane kernel:  pci_pm_suspend+0x8a/0x1c0
Aug 29 09:12:32 coltrane kernel:  ? __pfx_pci_pm_suspend+0x10/0x10
Aug 29 09:12:32 coltrane kernel:  dpm_run_callback+0x54/0x1a0
Aug 29 09:12:32 coltrane kernel:  __device_suspend+0x14b/0x400
Aug 29 09:12:32 coltrane kernel:  async_suspend+0x1f/0x80
Aug 29 09:12:32 coltrane kernel:  async_run_entry_fn+0x33/0x130
Aug 29 09:12:32 coltrane kernel:  process_one_work+0x21f/0x440
Aug 29 09:12:32 coltrane kernel:  worker_thread+0x50/0x3f0
Aug 29 09:12:32 coltrane kernel:  ? __pfx_worker_thread+0x10/0x10
Aug 29 09:12:32 coltrane kernel:  kthread+0xee/0x120
Aug 29 09:12:32 coltrane kernel:  ? __pfx_kthread+0x10/0x10
Aug 29 09:12:32 coltrane kernel:  ret_from_fork+0x2c/0x50
Aug 29 09:12:32 coltrane kernel:  </TASK>
Aug 29 09:12:32 coltrane kernel: ---[ end trace 0000000000000000 ]---
Aug 29 09:12:32 coltrane kernel: amdgpu 0000:01:00.0: amdgpu: PCI CONFIG reset

Revision history for this message

Pierre Equoy (pieq) wrote on 2023-08-29:

#3

gpu-logs.tgz Edit (44.1 KiB, application/x-tar)

Attaching GPU-related logs after reboooting the device.

Outputs from:

modinfo amdgpu > modinfo.amdgpu.log
sudo lshw > lshw.log
sudo dmidecode > dmidecode.log
lspci -nn > lspci.nn.log
lspci -vnn > lspci.vnn.log
cp /var/log/Xorg.0.log .
lsmod | grep amdgpu > lsmod.amdgpu.log

Timo Aaltonen (tjaalton) on 2023-08-29

affects:

linux-signed-hwe-6.2 (Ubuntu) → linux (Ubuntu)

Revision history for this message

Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote on 2023-08-29: Missing required logs.

#4

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 2033327

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete

Revision history for this message

Juerg Haefliger (juergh) wrote on 2023-08-30: Re: Desktop cannot resume after suspend: screen not detected

#5

Download full text (4.5 KiB)

Aug 29 09:12:32 coltrane kernel: ------------[ cut here ]------------
Aug 29 09:12:32 coltrane kernel: WARNING: CPU: 3 PID: 903164 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:4277 dc_set_power_state+0x186/0x190 [amdgpu]
Aug 29 09:12:32 coltrane kernel: Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs snd_seq_dummy uas usb_storage vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel sunrpc kvm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel mei_hdcp mei_pxp sha512_ssse3 aesni_intel binfmt_misc crypto_simd amdgpu cryptd nls_iso8859_1 rapl snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi intel_cstate uvcvideo videobuf2_vmalloc snd_hda_intel videobuf2_memops videobuf2_v4l2 snd_intel_dspcfg snd_intel_sdw_acpi videodev snd_hda_codec eeepc_wmi snd_usb_audio videobuf2_common wmi_bmof input_leds snd_hda_core snd_usbmidi_lib mc snd_hwdep joydev snd_seq_midi snd_pcm snd_seq_midi_event at24 iommu_v2 drm_buddy gpu_sched drm_ttm_helper snd_rawmidi ttm drm_display_helper snd_seq cec cmdlinepart spi_nor rc_core snd_seq_device drm_kms_helper mtd
Aug 29 09:12:32 coltrane kernel: i2c_algo_bit snd_timer syscopyarea snd sysfillrect sysimgblt soundcore mei_me mei mac_hid sch_fq_codel msr parport_pc ppdev ramoops pstore_blk lp reed_solomon pstore_zone drm parport efi_pstore ip_tables x_tables autofs4 hid_generic usbhid hid mfd_aaeon asus_wmi ledtrig_audio spi_intel_platform sparse_keymap spi_intel platform_profile r8169 ahci crc32_pclmul realtek i2c_i801 i2c_smbus libahci lpc_ich xhci_pci xhci_pci_renesas video wmi
Aug 29 09:12:32 coltrane kernel: CPU: 3 PID: 903164 Comm: kworker/u8:24 Not tainted 6.2.0-26-generic #26~22.04.1-Ubuntu
Aug 29 09:12:32 coltrane kernel: Hardware name: ASUS All Series/B85M-K, BIOS 3602 03/26/2018
Aug 29 09:12:32 coltrane kernel: Workqueue: events_unbound async_run_entry_fn
Aug 29 09:12:32 coltrane kernel: RIP: 0010:dc_set_power_state+0x186/0x190 [amdgpu]
Aug 29 09:12:32 coltrane kernel: Code: bc 24 98 f4 01 00 49 8b 84 24 80 f3 01 00 49 8d 94 24 98 03 00 00 4c 89 e6 e8 06 0f 37 cb e9 47 ff ff ff 0f 0b e9 b4 fe ff ff <0f> 0b e9 39 ff ff ff 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90
Aug 29 09:12:32 coltrane kernel: RSP: 0018:ffffb9b98bde3c40 EFLAGS: 00010246
Aug 29 09:12:32 coltrane kernel: RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000000
Aug 29 09:12:32 coltrane kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Aug 29 09:12:32 coltrane kernel: RBP: ffffb9b98bde3c60 R08: 0000000000000000 R09: 0000000000000000
Aug 29 09:12:32 coltrane kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff922043d60000
Aug 29 09:12:32 coltrane kernel: R13: 0000000000000000 R14: ffffffffc126a410 R15: ffff9220537e0000
Aug 29 09:12:32 coltrane kernel: FS: 0000000000000000(0000) GS:ffff92234ed80000(0000) knlGS:0000000000000000
Aug 29 09:12:32 coltrane kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 29 09:12:32 coltrane kernel: CR2: 00005602366bd746 CR3: 0000000180210006 CR4: 00000000001706e0
Aug 29 09:12:3...

Aug 29 09:12:32 coltrane kernel: ------------[ cut here ]------------
Aug 29 09:12:32 coltrane kernel: WARNING: CPU: 3 PID: 903164 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:4277 dc_set_power_state+0x186/0x190 [amdgpu]
Aug 29 09:12:32 coltrane kernel: Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs snd_seq_dummy uas usb_storage vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel sunrpc kvm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel mei_hdcp mei_pxp sha512_ssse3 aesni_intel binfmt_misc crypto_simd amdgpu cryptd nls_iso8859_1 rapl snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi intel_cstate uvcvideo videobuf2_vmalloc snd_hda_intel videobuf2_memops videobuf2_v4l2 snd_intel_dspcfg snd_intel_sdw_acpi videodev snd_hda_codec eeepc_wmi snd_usb_audio videobuf2_common wmi_bmof input_leds snd_hda_core snd_usbmidi_lib mc snd_hwdep joydev snd_seq_midi snd_pcm snd_seq_midi_event at24 iommu_v2 drm_buddy gpu_sched drm_ttm_helper snd_rawmidi ttm drm_display_helper snd_seq cec cmdlinepart spi_nor rc_core snd_seq_device drm_kms_helper mtd
Aug 29 09:12:32 coltrane kernel:  i2c_algo_bit snd_timer syscopyarea snd sysfillrect sysimgblt soundcore mei_me mei mac_hid sch_fq_codel msr parport_pc ppdev ramoops pstore_blk lp reed_solomon pstore_zone drm parport efi_pstore ip_tables x_tables autofs4 hid_generic usbhid hid mfd_aaeon asus_wmi ledtrig_audio spi_intel_platform sparse_keymap spi_intel platform_profile r8169 ahci crc32_pclmul realtek i2c_i801 i2c_smbus libahci lpc_ich xhci_pci xhci_pci_renesas video wmi
Aug 29 09:12:32 coltrane kernel: CPU: 3 PID: 903164 Comm: kworker/u8:24 Not tainted 6.2.0-26-generic #26~22.04.1-Ubuntu
Aug 29 09:12:32 coltrane kernel: Hardware name: ASUS All Series/B85M-K, BIOS 3602 03/26/2018
Aug 29 09:12:32 coltrane kernel: Workqueue: events_unbound async_run_entry_fn
Aug 29 09:12:32 coltrane kernel: RIP: 0010:dc_set_power_state+0x186/0x190 [amdgpu]
Aug 29 09:12:32 coltrane kernel: Code: bc 24 98 f4 01 00 49 8b 84 24 80 f3 01 00 49 8d 94 24 98 03 00 00 4c 89 e6 e8 06 0f 37 cb e9 47 ff ff ff 0f 0b e9 b4 fe ff ff <0f> 0b e9 39 ff ff ff 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90
Aug 29 09:12:32 coltrane kernel: RSP: 0018:ffffb9b98bde3c40 EFLAGS: 00010246
Aug 29 09:12:32 coltrane kernel: RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000000
Aug 29 09:12:32 coltrane kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Aug 29 09:12:32 coltrane kernel: RBP: ffffb9b98bde3c60 R08: 0000000000000000 R09: 0000000000000000
Aug 29 09:12:32 coltrane kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff922043d60000
Aug 29 09:12:32 coltrane kernel: R13: 0000000000000000 R14: ffffffffc126a410 R15: ffff9220537e0000
Aug 29 09:12:32 coltrane kernel: FS:  0000000000000000(0000) GS:ffff92234ed80000(0000) knlGS:0000000000000000
Aug 29 09:12:32 coltrane kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 29 09:12:32 coltrane kernel: CR2: 00005602366bd746 CR3: 0000000180210006 CR4: 00000000001706e0
Aug 29 09:12:32 coltrane kernel: Call Trace:
Aug 29 09:12:32 coltrane kernel:  <TASK>
Aug 29 09:12:32 coltrane kernel:  dm_suspend+0x9f/0x290 [amdgpu]
Aug 29 09:12:32 coltrane kernel:  ? vi_common_set_clockgating_state+0xd7/0x190 [amdgpu]
Aug 29 09:12:32 coltrane kernel:  amdgpu_device_ip_suspend_phase1+0xb9/0x1d0 [amdgpu]
Aug 29 09:12:32 coltrane kernel:  amdgpu_device_suspend+0xca/0x1d0 [amdgpu]
Aug 29 09:12:32 coltrane kernel:  amdgpu_pmops_suspend+0x33/0x50 [amdgpu]
Aug 29 09:12:32 coltrane kernel:  pci_pm_suspend+0x8a/0x1c0
Aug 29 09:12:32 coltrane kernel:  ? __pfx_pci_pm_suspend+0x10/0x10
Aug 29 09:12:32 coltrane kernel:  dpm_run_callback+0x54/0x1a0
Aug 29 09:12:32 coltrane kernel:  __device_suspend+0x14b/0x400
Aug 29 09:12:32 coltrane kernel:  async_suspend+0x1f/0x80
Aug 29 09:12:32 coltrane kernel:  async_run_entry_fn+0x33/0x130
Aug 29 09:12:32 coltrane kernel:  process_one_work+0x21f/0x440
Aug 29 09:12:32 coltrane kernel:  worker_thread+0x50/0x3f0
Aug 29 09:12:32 coltrane kernel:  ? __pfx_worker_thread+0x10/0x10
Aug 29 09:12:32 coltrane kernel:  kthread+0xee/0x120
Aug 29 09:12:32 coltrane kernel:  ? __pfx_kthread+0x10/0x10
Aug 29 09:12:32 coltrane kernel:  ret_from_fork+0x2c/0x50
Aug 29 09:12:32 coltrane kernel:  </TASK>
Aug 29 09:12:32 coltrane kernel: ---[ end trace 0000000000000000 ]---
Aug 29 09:12:32 coltrane kernel: amdgpu 0000:01:00.0: amdgpu: PCI CONFIG reset

Revision history for this message

Mario Limonciello (superm1) wrote on 2023-08-30:

#6

I don't see the full logs for your failure in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2033327/+attachment/5695799/+files/journalctl-b0-20230829.log, I'm interested in the stuff outside of the trace itself.

But I /suspect/ from that trace alone that you were out of memory at suspend time. This is supposed to be prevented by '8d4de331f1b2 ("drm/amd: Fail the suspend if resources can't be evicted")' but if that's indeed the cause need to see the rest of the context to explain it.

Revision history for this message

Pierre Equoy (pieq) wrote on 2023-08-31: AlsaInfo.txt

#7

AlsaInfo.txt Edit (129.2 KiB, text/plain)

apport information

tags:	added: apport-collected staging
description:	updated

Revision history for this message

Pierre Equoy (pieq) wrote on 2023-08-31: CurrentDmesg.txt

#8

CurrentDmesg.txt Edit (104.5 KiB, text/plain)

apport information

Revision history for this message

Pierre Equoy (pieq) wrote on 2023-08-31: Lspci.txt

#9

Lspci.txt Edit (10.6 KiB, text/plain)

apport information

Revision history for this message

Pierre Equoy (pieq) wrote on 2023-08-31: Lspci-vt.txt

#10

Lspci-vt.txt Edit (1.1 KiB, text/plain)

apport information

Revision history for this message

Pierre Equoy (pieq) wrote on 2023-08-31: Lsusb.txt

#11

Lsusb.txt Edit (687 bytes, text/plain)

apport information

Revision history for this message

Pierre Equoy (pieq) wrote on 2023-08-31: Lsusb-t.txt

#12

Lsusb-t.txt Edit (1.1 KiB, text/plain)

apport information

Revision history for this message

Pierre Equoy (pieq) wrote on 2023-08-31: Lsusb-v.txt

#13

Lsusb-v.txt Edit (66.9 KiB, text/plain)

apport information

Revision history for this message

Pierre Equoy (pieq) wrote on 2023-08-31: PaInfo.txt

#14

PaInfo.txt Edit (198.9 KiB, text/plain)

apport information

Revision history for this message

Pierre Equoy (pieq) wrote on 2023-08-31: ProcCpuinfo.txt

#15

ProcCpuinfo.txt Edit (5.3 KiB, text/plain)

apport information

Revision history for this message

Pierre Equoy (pieq) wrote on 2023-08-31: ProcCpuinfoMinimal.txt

#16

ProcCpuinfoMinimal.txt Edit (1.3 KiB, text/plain)

apport information

Revision history for this message

Pierre Equoy (pieq) wrote on 2023-08-31: ProcEnviron.txt

#17

ProcEnviron.txt Edit (314 bytes, text/plain)

apport information

Revision history for this message

Pierre Equoy (pieq) wrote on 2023-08-31: ProcInterrupts.txt

#18

ProcInterrupts.txt Edit (2.5 KiB, text/plain)

apport information

Revision history for this message

Pierre Equoy (pieq) wrote on 2023-08-31: ProcModules.txt

#19

ProcModules.txt Edit (7.0 KiB, text/plain)

apport information

Revision history for this message

Pierre Equoy (pieq) wrote on 2023-08-31: PulseList.txt

#20

PulseList.txt Edit (46.3 KiB, text/plain)

apport information

Revision history for this message

Pierre Equoy (pieq) wrote on 2023-08-31: UdevDb.txt

#21

UdevDb.txt Edit (240.8 KiB, text/plain)

apport information

Revision history for this message

Pierre Equoy (pieq) wrote on 2023-08-31: WifiSyslog.txt

#22

WifiSyslog.txt Edit (159.1 KiB, text/plain)

apport information

Revision history for this message

Pierre Equoy (pieq) wrote on 2023-08-31: acpidump.txt

#23

acpidump.txt Edit (336.0 KiB, text/plain)

apport information

Revision history for this message

Pierre Equoy (pieq) wrote on 2023-08-31 (last edit on 2023-08-31): Re: Desktop cannot resume after suspend: screen not detected

#24

journalctl-b0-20230829.log Edit (4.3 MiB, text/html)

@superm1

Thanks for your feedback! Here is the full journal from that boot.

Edit: ah crap, it's the same file, I thought I had originally sent an edited version with just the stack trace... not sure what else to provide.

Revision history for this message

Mario Limonciello (superm1) wrote on 2023-08-31:

#25

Can you just manually fetch the journal from the failed boot and attach that?

Revision history for this message

Pierre Equoy (pieq) wrote on 2023-08-31:

#26

Download full text (7.7 KiB)

That's what I attached. When I could not get back to a proper graphical environment, I switched to a TTY, ran `sudo journalctl -b0 > journalctl-b0-20230829.log`, and I attached it to that bug. As you can see in the logs, the device had been running since August 21, and on Aug 28 evening I suspended it:

Aug 28 19:34:36 coltrane systemd[1]: Reached target Sleep.
Aug 28 19:34:36 coltrane systemd[1]: Starting Record successful boot for GRUB...
Aug 28 19:34:36 coltrane systemd[1]: Starting Restart Syncthing after resume...
Aug 28 19:34:36 coltrane systemd[1]: Starting System Suspend...
Aug 28 19:34:36 coltrane systemd-sleep[903135]: Entering sleep state 'suspend'...
Aug 28 19:34:36 coltrane kernel: PM: suspend entry (deep)
Aug 28 19:34:36 coltrane systemd[1]: grub-common.service: Deactivated successfully.
Aug 28 19:34:36 coltrane systemd[1]: Finished Record successful boot for GRUB.
Aug 28 19:34:36 coltrane systemd[1]: Starting GRUB failed boot detection...

and then resumed it on Aug 29 at 09:12, and this is when the stack trace appears:

Aug 29 09:12:26 coltrane kernel: Filesystems sync: 0.018 seconds
Aug 29 09:12:31 coltrane kernel: Freezing user space processes
Aug 29 09:12:31 coltrane kernel: Freezing user space processes completed (elapsed 0.003 seconds)
Aug 29 09:12:31 coltrane kernel: OOM killer disabled.
Aug 29 09:12:32 coltrane kernel: Freezing remaining freezable tasks
Aug 29 09:12:32 coltrane kernel: Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
Aug 29 09:12:32 coltrane kernel: printk: Suspending console(s) (use no_console_suspend to debug)
Aug 29 09:12:32 coltrane kernel: sd 1:0:0:0: [sda] Synchronizing SCSI cache
Aug 29 09:12:32 coltrane kernel: sd 1:0:0:0: [sda] Stopping disk
Aug 29 09:12:32 coltrane kernel: sd 2:0:0:0: [sdb] Synchronizing SCSI cache
Aug 29 09:12:32 coltrane kernel: sd 3:0:0:0: [sdc] Synchronizing SCSI cache
Aug 29 09:12:32 coltrane kernel: sd 4:0:0:0: [sdd] Synchronizing SCSI cache
Aug 29 09:12:32 coltrane kernel: sd 2:0:0:0: [sdb] Stopping disk
Aug 29 09:12:32 coltrane kernel: sd 4:0:0:0: [sdd] Stopping disk
Aug 29 09:12:32 coltrane kernel: sd 3:0:0:0: [sdc] Stopping disk
(...)
Aug 29 09:12:32 coltrane kernel: kworker/u8:24: page allocation failure: order:5, mode:0x40d00(GFP_NOIO|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
Aug 29 09:12:32 coltrane kernel: CPU: 3 PID: 903164 Comm: kworker/u8:24 Not tainted 6.2.0-26-generic #26~22.04.1-Ubuntu
Aug 29 09:12:32 coltrane kernel: Hardware name: ASUS All Series/B85M-K, BIOS 3602 03/26/2018
Aug 29 09:12:32 coltrane kernel: Workqueue: events_unbound async_run_entry_fn
Aug 29 09:12:32 coltrane kernel: Call Trace:
Aug 29 09:12:32 coltrane kernel: <TASK>
Aug 29 09:12:32 coltrane kernel: dump_stack_lvl+0x48/0x70
Aug 29 09:12:32 coltrane kernel: dump_stack+0x10/0x20
Aug 29 09:12:32 coltrane kernel: warn_alloc+0x14b/0x1c0
Aug 29 09:12:32 coltrane kernel: ? __alloc_pages_direct_compact+0xa7/0x240
Aug 29 09:12:32 coltrane kernel: __alloc_pages_slowpath.constprop.0+0x910/0x990
Aug 29 09:12:32 coltrane kernel: __alloc_pages+0x32c/0x360
Aug 29 09:12:32 coltrane kernel: __kmalloc_large_node+0x89/0x170
Aug 29 09:12:32 coltrane kernel...

That's what I attached. When I could not get back to a proper graphical environment, I switched to a TTY, ran `sudo journalctl -b0 > journalctl-b0-20230829.log`, and I attached it to that bug. As you can see in the logs, the device had been running since August 21, and on Aug 28 evening I suspended it:

Aug 28 19:34:36 coltrane systemd[1]: Reached target Sleep.
Aug 28 19:34:36 coltrane systemd[1]: Starting Record successful boot for GRUB...
Aug 28 19:34:36 coltrane systemd[1]: Starting Restart Syncthing after resume...
Aug 28 19:34:36 coltrane systemd[1]: Starting System Suspend...
Aug 28 19:34:36 coltrane systemd-sleep[903135]: Entering sleep state 'suspend'...
Aug 28 19:34:36 coltrane kernel: PM: suspend entry (deep)
Aug 28 19:34:36 coltrane systemd[1]: grub-common.service: Deactivated successfully.
Aug 28 19:34:36 coltrane systemd[1]: Finished Record successful boot for GRUB.
Aug 28 19:34:36 coltrane systemd[1]: Starting GRUB failed boot detection...

and then resumed it on Aug 29 at 09:12, and this is when the stack trace appears:

Aug 29 09:12:26 coltrane kernel: Filesystems sync: 0.018 seconds
Aug 29 09:12:31 coltrane kernel: Freezing user space processes
Aug 29 09:12:31 coltrane kernel: Freezing user space processes completed (elapsed 0.003 seconds)
Aug 29 09:12:31 coltrane kernel: OOM killer disabled.
Aug 29 09:12:32 coltrane kernel: Freezing remaining freezable tasks
Aug 29 09:12:32 coltrane kernel: Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
Aug 29 09:12:32 coltrane kernel: printk: Suspending console(s) (use no_console_suspend to debug)
Aug 29 09:12:32 coltrane kernel: sd 1:0:0:0: [sda] Synchronizing SCSI cache
Aug 29 09:12:32 coltrane kernel: sd 1:0:0:0: [sda] Stopping disk
Aug 29 09:12:32 coltrane kernel: sd 2:0:0:0: [sdb] Synchronizing SCSI cache
Aug 29 09:12:32 coltrane kernel: sd 3:0:0:0: [sdc] Synchronizing SCSI cache
Aug 29 09:12:32 coltrane kernel: sd 4:0:0:0: [sdd] Synchronizing SCSI cache
Aug 29 09:12:32 coltrane kernel: sd 2:0:0:0: [sdb] Stopping disk
Aug 29 09:12:32 coltrane kernel: sd 4:0:0:0: [sdd] Stopping disk
Aug 29 09:12:32 coltrane kernel: sd 3:0:0:0: [sdc] Stopping disk
(...)
Aug 29 09:12:32 coltrane kernel: kworker/u8:24: page allocation failure: order:5, mode:0x40d00(GFP_NOIO|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
Aug 29 09:12:32 coltrane kernel: CPU: 3 PID: 903164 Comm: kworker/u8:24 Not tainted 6.2.0-26-generic #26~22.04.1-Ubuntu
Aug 29 09:12:32 coltrane kernel: Hardware name: ASUS All Series/B85M-K, BIOS 3602 03/26/2018
Aug 29 09:12:32 coltrane kernel: Workqueue: events_unbound async_run_entry_fn
Aug 29 09:12:32 coltrane kernel: Call Trace:
Aug 29 09:12:32 coltrane kernel:  <TASK>
Aug 29 09:12:32 coltrane kernel:  dump_stack_lvl+0x48/0x70
Aug 29 09:12:32 coltrane kernel:  dump_stack+0x10/0x20
Aug 29 09:12:32 coltrane kernel:  warn_alloc+0x14b/0x1c0
Aug 29 09:12:32 coltrane kernel:  ? __alloc_pages_direct_compact+0xa7/0x240
Aug 29 09:12:32 coltrane kernel:  __alloc_pages_slowpath.constprop.0+0x910/0x990
Aug 29 09:12:32 coltrane kernel:  __alloc_pages+0x32c/0x360
Aug 29 09:12:32 coltrane kernel:  __kmalloc_large_node+0x89/0x170
Aug 29 09:12:32 coltrane kernel:  kmalloc_large+0x21/0xc0
Aug 29 09:12:32 coltrane kernel:  dc_set_power_state+0x49/0x190 [amdgpu]
Aug 29 09:12:32 coltrane kernel:  dm_suspend+0x9f/0x290 [amdgpu]
Aug 29 09:12:32 coltrane kernel:  ? vi_common_set_clockgating_state+0xd7/0x190 [amdgpu]
Aug 29 09:12:32 coltrane kernel:  amdgpu_device_ip_suspend_phase1+0xb9/0x1d0 [amdgpu]
Aug 29 09:12:32 coltrane kernel:  amdgpu_device_suspend+0xca/0x1d0 [amdgpu]
Aug 29 09:12:32 coltrane kernel:  amdgpu_pmops_suspend+0x33/0x50 [amdgpu]
Aug 29 09:12:32 coltrane kernel:  pci_pm_suspend+0x8a/0x1c0
Aug 29 09:12:32 coltrane kernel:  ? __pfx_pci_pm_suspend+0x10/0x10
Aug 29 09:12:32 coltrane kernel:  dpm_run_callback+0x54/0x1a0
Aug 29 09:12:32 coltrane kernel:  __device_suspend+0x14b/0x400
Aug 29 09:12:32 coltrane kernel:  async_suspend+0x1f/0x80
Aug 29 09:12:32 coltrane kernel:  async_run_entry_fn+0x33/0x130
Aug 29 09:12:32 coltrane kernel:  process_one_work+0x21f/0x440
Aug 29 09:12:32 coltrane kernel:  worker_thread+0x50/0x3f0
Aug 29 09:12:32 coltrane kernel:  ? __pfx_worker_thread+0x10/0x10
Aug 29 09:12:32 coltrane kernel:  kthread+0xee/0x120
Aug 29 09:12:32 coltrane kernel:  ? __pfx_kthread+0x10/0x10
Aug 29 09:12:32 coltrane kernel:  ret_from_fork+0x2c/0x50
Aug 29 09:12:32 coltrane kernel:  </TASK>
Aug 29 09:12:32 coltrane kernel: Mem-Info:
Aug 29 09:12:32 coltrane kernel: active_anon:6 inactive_anon:2748750 isolated_anon:0
                                  active_file:309 inactive_file:232 isolated_file:0
                                  unevictable:41 dirty:1 writeback:8
                                  slab_reclaimable:60650 slab_unreclaimable:80021
                                  mapped:1 shmem:70425 pagetables:29671
                                  sec_pagetables:0 bounce:0
                                  kernel_misc_reclaimable:0
                                  free:106406 free_pcp:0 free_cma:0
Aug 29 09:12:32 coltrane kernel: Node 0 active_anon:24kB inactive_anon:10995000kB active_file:1236kB inactive_file:928kB unevictable:164kB isolated(anon):0kB isolated(file):0kB mapped:4kB dirty:4kB writeback:32kB shmem:281700kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 2048kB writeback_tmp:0kB kernel_stack:42288kB pagetables:118684kB sec_pagetables:0kB all_unreclaimable? no
Aug 29 09:12:32 coltrane kernel: Node 0 DMA free:13308kB boost:0kB min:64kB low:80kB high:96kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Aug 29 09:12:32 coltrane kernel: lowmem_reserve[]: 0 3380 15822 15822 15822
Aug 29 09:12:32 coltrane kernel: Node 0 DMA32 free:197068kB boost:32448kB min:46872kB low:50476kB high:54080kB reserved_highatomic:0KB active_anon:0kB inactive_anon:2122496kB active_file:0kB inactive_file:0kB unevictable:80kB writepending:0kB present:3606752kB managed:3541040kB mlocked:80kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Aug 29 09:12:32 coltrane kernel: lowmem_reserve[]: 0 0 12442 12442 12442
Aug 29 09:12:32 coltrane kernel: Node 0 Normal free:215248kB boost:119452kB min:172544kB low:185816kB high:199088kB reserved_highatomic:2048KB active_anon:24kB inactive_anon:8872504kB active_file:1040kB inactive_file:1136kB unevictable:84kB writepending:36kB present:13090816kB managed:12748956kB mlocked:84kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Aug 29 09:12:32 coltrane kernel: lowmem_reserve[]: 0 0 0 0 0
Aug 29 09:12:32 coltrane kernel: Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (U) 0*1024kB 2*2048kB (UM) 2*4096kB (M) = 13308kB
Aug 29 09:12:32 coltrane kernel: Node 0 DMA32: 49272*4kB (UME) 1*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 197096kB
Aug 29 09:12:32 coltrane kernel: Node 0 Normal: 53516*4kB (UMEH) 44*8kB (UMH) 19*16kB (H) 16*32kB (UH) 5*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 215552kB
Aug 29 09:12:32 coltrane kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Aug 29 09:12:32 coltrane kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Aug 29 09:12:32 coltrane kernel: 71214 total pagecache pages
Aug 29 09:12:32 coltrane kernel: 245 pages in swap cache
Aug 29 09:12:32 coltrane kernel: Free swap  = 176108kB
Aug 29 09:12:32 coltrane kernel: Total swap = 2097148kB
Aug 29 09:12:32 coltrane kernel: 4178389 pages RAM
Aug 29 09:12:32 coltrane kernel: 0 pages HighMem/MovableOnly
Aug 29 09:12:32 coltrane kernel: 102050 pages reserved
Aug 29 09:12:32 coltrane kernel: 0 pages hwpoisoned
(...)

Revision history for this message

Mario Limonciello (superm1) wrote on 2023-08-31:

#27

I don't know if launchpad messed it up, but I'm not seeing anything past the 28th.

Revision history for this message

Pierre Equoy (pieq) wrote on 2023-08-31:

#28

journalctl-b0-20230829.log.xz Edit (183.7 KiB, application/octet-stream)

That's weird... I downloaded the last attachment I sent and checked it with vim and I can see past the 28th...

Anyway, here is another attempt. Same log, xzipped. I checked and the last line of the log is on Aug 29 09:26:34.

Pierre Equoy (pieq) on 2023-09-05

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed
assignee:	nobody → Mario Limonciello (superm1)

Revision history for this message

Mario Limonciello (superm1) wrote on 2023-09-05:

#29

Got it; so what happened is there is a separate malloc call elsewhere in the amdgpu display code that failed due to the low memory situation. Is this a "one time thing" or is this a regular occurrence that you can reproduce in how you use your machine? I would have to look at context to see if it made sense to reserve that memory all the time, but if this is the only place that explodes maybe we can store a local variable in the amdgpu device structure for the thing that is malloc'ed.

Revision history for this message

James Phillips (78luphr0rnk2nuqimstywepozxn9kl19tqh0tx66b5dki1xxsh5mkz9gl21a5rlwfnr8jn6ln0m3jxne2k9x1ohg85w3jabxlrqbgszpjpwcmvkbcvq9spp6z3w5j1m33k06t-launchpad-a811i2i3ytqlsztthjth0svbccw8inm65tmkqp9sarr553jq53in4xm1m8wn3o4rlwaer06ogwvqwv9mrqoku2x334n7di44o65qze67n1wneepmidnuwnde1rqcbpgdf70gt) wrote on 2023-10-21 (last edit on 2023-10-21):

#30

Message #26:
"Aug 29 09:12:32 coltrane kernel: sd 2:0:0:0: [sdb] Stopping disk
Aug 29 09:12:32 coltrane kernel: sd 4:0:0:0: [sdd] Stopping disk
Aug 29 09:12:32 coltrane kernel: sd 3:0:0:0: [sdc] Stopping disk
(...)
Aug 29 09:12:32 coltrane kernel: kworker/u8:24: page allocation failure: order:5, mode:0x40d00(GFP_NOIO|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0"

Confirms that it is probably upstream bug number 2362
https://gitlab.freedesktop.org/drm/amd/-/issues/2362

The issue is that the GPU driver tries to dump the contents of VRAM to system RAM upon suspend. However, this is not able to happen because the disks (with swap) are suspended first. It appears that the solution is likely to dump the VRAM contents during the "pre-suspend" hook. Incidentally the bug appear to date back to at least 2010: but VRAM was a lot smaller compared to system RAM back then.

I got the same errors: but was able to resume from suspend for some reason. I *suspect* it is because I am using the Rocm GPU drivers. Edit: Another non-standard thing I did was up vm.min_free_kbytes to 393216 (384MiB - 64MiB per core) from the default of 64MiB. I also enabled zswap with lz4/z3fold (merging two different guides) and disabled deduplication (made system unusable under extreme swap testing with 'mprime'; with dedup disabled the system was kinda-sorta usable (still 80% waiting on disk) with over 75% of RAM in use).

Revision history for this message

James Phillips (78luphr0rnk2nuqimstywepozxn9kl19tqh0tx66b5dki1xxsh5mkz9gl21a5rlwfnr8jn6ln0m3jxne2k9x1ohg85w3jabxlrqbgszpjpwcmvkbcvq9spp6z3w5j1m33k06t-launchpad-a811i2i3ytqlsztthjth0svbccw8inm65tmkqp9sarr553jq53in4xm1m8wn3o4rlwaer06ogwvqwv9mrqoku2x334n7di44o65qze67n1wneepmidnuwnde1rqcbpgdf70gt) wrote on 2023-10-21 (last edit on 2023-10-21):

#31

Oct_20_crash.kern.log.gz Edit (9.3 KiB, application/octet-stream)

Kern.log with similar suspend messages. I was looking at the logs because my game of Euro Truck simulator 2 appeared to crash: but in hindsight it was probably just the video driver that crashed. Steam was still running right up until reboot. The game log was truncated by 4 minutes: despite logging into a terminal and typing 'sync' before "sysctl reboot -i".

When the video driver crashes I was unable to use the desktop environment: but the display manager seems to work.

$ uname -a
Linux cathy 5.15.0-87-generic #97-Ubuntu SMP Mon Oct 2 21:09:21 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

I am using the AMD Rocm 5.6 video drivers. (The last ones supporting my Vega 56 card, apparently).

The actual video crash happens in this part of the log:
Oct 20 10:47:59 cathy kernel: [67779.641047] amdgpu 0000:06:00.0: amdgpu: IH ring buffer overflow (0x00080C60, 0x000002C0, 0x00000C80)
Oct 20 10:47:59 cathy kernel: [67779.753369] [drm] ring 0 timeout to preempt ib
Oct 20 10:48:09 cathy kernel: [67789.894431] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_high timeout, signaled seq=1282275, emitted seq
=1282276
Oct 20 10:48:09 cathy kernel: [67789.895531] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 1994 thread
gnome-shel:cs0 pid 2002

Edit: to add: I had spent a 100+ hours "testing" the game the past two weeks. The suspend related error messages did not occur until I "fixed" the game crashing problem with a work-around where you pass the kernel module parameter amdgpu.noretry=0 on the kernel command line. I suspect the suspend problem manifest because:
1. Played a 7+ hour marathon session to make sure the work-around worked (the GPU page faults still happen with the work-around: they are just of the form "retry page fault" instead of "no-retry page fault").
2. Let steam run overnight with suspend inhibited. 2 game updates were installed during that time.

This served to deplete the available memory when I actually got around to suspending the machine.

Kern.log with similar suspend messages. I was looking at the logs because my game of Euro Truck simulator 2 appeared to crash: but in hindsight it was probably just the video driver that crashed. Steam was still running right up until reboot. The game log was truncated by 4 minutes: despite logging into a terminal and typing 'sync' before "sysctl reboot -i".

When the video driver crashes I was unable to use the desktop environment: but the display manager seems to work.

$ uname -a
Linux cathy 5.15.0-87-generic #97-Ubuntu SMP Mon Oct 2 21:09:21 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

I am using the AMD Rocm 5.6 video drivers. (The last ones supporting my Vega 56 card, apparently).

The actual video crash happens in this part of the log:
Oct 20 10:47:59 cathy kernel: [67779.641047] amdgpu 0000:06:00.0: amdgpu: IH ring buffer overflow (0x00080C60, 0x000002C0, 0x00000C80)
Oct 20 10:47:59 cathy kernel: [67779.753369] [drm] ring 0 timeout to preempt ib
Oct 20 10:48:09 cathy kernel: [67789.894431] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_high timeout, signaled seq=1282275, emitted seq
=1282276
Oct 20 10:48:09 cathy kernel: [67789.895531] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 1994 thread
 gnome-shel:cs0 pid 2002

Edit: to add: I had spent a 100+ hours "testing" the game the past two weeks. The suspend related error messages did not occur until I "fixed" the game crashing problem with a work-around where you pass the kernel module parameter amdgpu.noretry=0 on the kernel command line. I suspect the suspend problem manifest because:
1. Played a 7+ hour marathon session to make sure the work-around worked (the GPU page faults still happen with the work-around: they are just of the form "retry page fault" instead of "no-retry page fault").
2. Let steam run overnight with suspend inhibited. 2 game updates were installed during that time.

This served to deplete the available  memory when I actually got around to suspending the machine.

Revision history for this message

Mario Limonciello (superm1) wrote on 2023-10-21:

#32

You're totally right that there is a problem when memory pressure is high that the suspend sequence can't evict the RAM.

This is upstream https://gitlab.freedesktop.org/drm/amd/-/issues/2362

A bunch of changes are going into 6.7 that will move the parts that allocate memory into the prepare() pmops sequence. This will "fix" the immediate problem in that the suspend itself will "return" an error code.

There is a secondary problem though that the PM core disables swap "too soon", so if memory is low it can't be moved into swap.

This secondary problem can either be fixed by PM core either moving the timing of the swap disable or evicting some other usage (userspace?) into swap before letting the prepare()/suspend() sequence start.

Changed in linux (Ubuntu):
importance:	Undecided → Wishlist
summary:	- Desktop cannot resume after suspend: screen not detected + dGPU suspend fails under memory pressure
Changed in linux (Ubuntu):
status:	Confirmed → Triaged
assignee:	Mario Limonciello (superm1) → nobody

Bug Watch Updater (bug-watch-updater) on 2023-10-22

Changed in linux:
status:	Unknown → New

Affects		Status	Importance	Assigned to	Milestone
	Linux	New	Unknown	auto-gitlab.freedesktop.org-drm-amd-- #2362
	linux (Ubuntu)	Triaged	Wishlist	Unassigned

Linux

dGPU suspend fails under memory pressure

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches