Comment 31 for bug 2033327

Revision history for this message
James Phillips (78luphr0rnk2nuqimstywepozxn9kl19tqh0tx66b5dki1xxsh5mkz9gl21a5rlwfnr8jn6ln0m3jxne2k9x1ohg85w3jabxlrqbgszpjpwcmvkbcvq9spp6z3w5j1m33k06t-launchpad-a811i2i3ytqlsztthjth0svbccw8inm65tmkqp9sarr553jq53in4xm1m8wn3o4rlwaer06ogwvqwv9mrqoku2x334n7di44o65qze67n1wneepmidnuwnde1rqcbpgdf70gt) wrote (last edit ): Re: Desktop cannot resume after suspend: screen not detected

Kern.log with similar suspend messages. I was looking at the logs because my game of Euro Truck simulator 2 appeared to crash: but in hindsight it was probably just the video driver that crashed. Steam was still running right up until reboot. The game log was truncated by 4 minutes: despite logging into a terminal and typing 'sync' before "sysctl reboot -i".

When the video driver crashes I was unable to use the desktop environment: but the display manager seems to work.

$ uname -a
Linux cathy 5.15.0-87-generic #97-Ubuntu SMP Mon Oct 2 21:09:21 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

I am using the AMD Rocm 5.6 video drivers. (The last ones supporting my Vega 56 card, apparently).

The actual video crash happens in this part of the log:
Oct 20 10:47:59 cathy kernel: [67779.641047] amdgpu 0000:06:00.0: amdgpu: IH ring buffer overflow (0x00080C60, 0x000002C0, 0x00000C80)
Oct 20 10:47:59 cathy kernel: [67779.753369] [drm] ring 0 timeout to preempt ib
Oct 20 10:48:09 cathy kernel: [67789.894431] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_high timeout, signaled seq=1282275, emitted seq
=1282276
Oct 20 10:48:09 cathy kernel: [67789.895531] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 1994 thread
 gnome-shel:cs0 pid 2002

Edit: to add: I had spent a 100+ hours "testing" the game the past two weeks. The suspend related error messages did not occur until I "fixed" the game crashing problem with a work-around where you pass the kernel module parameter amdgpu.noretry=0 on the kernel command line. I suspect the suspend problem manifest because:
1. Played a 7+ hour marathon session to make sure the work-around worked (the GPU page faults still happen with the work-around: they are just of the form "retry page fault" instead of "no-retry page fault").
2. Let steam run overnight with suspend inhibited. 2 game updates were installed during that time.

This served to deplete the available memory when I actually got around to suspending the machine.