AMD GPU hang/crash/black screen after suspend(ing)

Bug #1842954 reported by Richard Baka
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Linux
Unknown
Unknown
xserver-xorg-video-amdgpu (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

This is a freeze which is obviously caused by a bug in the xserver-xorg-video-amdgpu package ver: 19.0.1* which is shipped with Ubuntu 19.04 and 19.10. Upstream fix is needed.

Workaround:
sudo nano /etc/apt/sources.list, copy-paste one line which contains "main multiverse universe restricted" and change the distribution name "eoan" or "disco" to "bionic" (for this line only)
So you will have a line like
deb [url] bionic multiverse main restricted universe
ctrl+o, save it

sudo update
sudo apt install xserver-xorg-core=2:1.19.6-1ubuntu4
sudo apt install xserver-xorg-video-amdgpu=18.0.1-1

sudo reboot

If everything is OK then you should keep these packages by using:
sudo apt-mark hold xserver-xorg-core=2:1.19.6-1ubuntu4
sudo apt-mark hold xserver-xorg-video-amdgpu=18.0.1-1

You can later unhold them by using the same commands with "unhold"

If a newer bionic package version comes out (fe. a security update) you should unhold the packages do an apt update and use apt policy [package name without = and version] to check the new bionic versions that you can install using my original install commands with the proper version paramter.

If something is not OK, then press E on grub menu, paste nomodeset parameter at the and of the kernel line then f10. After the kernel loading and the command line login you should just do an apt upgrade if you hadn't held the packages before.

Revision history for this message
In , Malkovjohnny (malkovjohnny) wrote :
Download full text (5.1 KiB)

Update kernel from 5.1.20-300.fc30.x86_64 to newer versions cause black screen. After entering password in welcome screen it turns completely black.

Problem exists on new kernels:
- 5.2.8-200.fc30.x86_64
- 5.2.9-200.fc30.x86_64

Environment:
CPU - AMD Ryzen 3 2200G with Radeon Vega Graphics
video - VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] (rev c8)
XFCE: xfce4-panel 4.13.7 (Xfce 4.14pre2)
Xorg: Build ID: xorg-x11-server 1.20.5-4.fc30

Aug 21 21:06:57 kernel: ------------[ cut here ]------------
Aug 21 21:06:57 kernel: WARNING: CPU: 0 PID: 182 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:854 dcn10_verify_allow_pstate_change_high.cold+0xc/0x229 [amdgpu]
Aug 21 21:06:57 kernel: Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables sunrpc snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq edac_mce_amd snd_seq_device joydev snd_pcm snd_timer ccp kvm snd soundcore irqbypass crct10dif_pclmul pcc_cpufreq crc32_pclmul ghash_clmulni_intel acpi_cpufreq wmi_bmof k10temp sp5100_tco i2c_piix4 gpio_amdpt gpio_generic amdgpu amd_iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper crc32c_intel drm r8169 wmi pinctrl_amd video
Aug 21 21:06:57 kernel: CPU: 0 PID: 182 Comm: kworker/u32:7 Not tainted 5.2.9-200.fc30.x86_64 #1
Aug 21 21:06:57 kernel: Hardware name: Gigabyte Technology Co., Ltd. A320M-S2H V2/A320M-S2H V2-CF, BIOS F2 12/25/2018
Aug 21 21:06:57 kernel: Workqueue: events_unbound commit_work [drm_kms_helper]
Aug 21 21:06:57 kernel: RIP: 0010:dcn10_verify_allow_pstate_change_high.cold+0xc/0x229 [amdgpu]
Aug 21 21:06:57 kernel: Code: 83 c8 ff e9 31 b0 f9 ff 48 c7 c7 f8 80 68 c0 e8 f4 d0 b5 c1 0f 0b 83 c8 ff e9 1b b0 f9 ff 48 c7 c7 f8 80 68 c0 e8 de d0 b5 c1 <0f> 0b 80 bb 93 01 00 00 00 75 05 e9 75 d4 f9 ff 48 8b 83 80 02 00
Aug 21 21:06:57 kernel: RSP: 0018:ffffb5b741037b58 EFLAGS: 00010246
Aug 21 21:06:57 kernel: RAX: 0000000000000024 RBX: ffff9cfa0d83d000 RCX: 0000000000000006
Aug 21 21:06:57 kernel: RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff9cfa18a17900
Aug 21 21:06:57 kernel: RBP: ffff9cfa0d83d000 R08: 0000000000000001 R09: 00000000000003fd
Aug 21 21:06:57 kernel: R10: ffffffff83bef4e4 R11: 0000000000000003 R12: ffff9cf9f68b81b8
Aug 21 21:06:57 kernel: R13: 0000000000000000 R14: ffff9cf9f68b81b8 R15: 0000000000000004
Aug 21 21:06:57 kernel: FS: 0000000000000000(0000) GS:ffff9cfa18a00000(0000) knlGS:0000000000000000
Aug 21 21:06:57 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 21 21:06:57 kernel: CR2: 000055afe80cca08 CR3: 00000001ea35a000 CR4: 00000000003406f0
Aug 21 21:06:57 kernel: Call Trace:
Aug 21 21:06:57 kernel: dcn10_pipe_control_lock.part.0+0x69/0x70 [amdgpu]
Aug 21 21:06:57 kernel: dc_commit_updates_f...

Read more...

Revision history for this message
In , Malkovjohnny (malkovjohnny) wrote :

Created attachment 145120
Xorg log

Revision history for this message
In , Tom Seewald (tseewald) wrote :

Could you try applying the following patch set from AMD's Nicholas Kazlauskas:
https://patchwork.freedesktop.org/series/64505/

There have been similar reports filed on the kernel bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=204181

Revision history for this message
In , Malkovjohnny (malkovjohnny) wrote :
Download full text (4.8 KiB)

Unfortunately I don't know how to apply this patch/patches.

Updated to new kernel 5.2.11-200.fc30.x86_64, problem still exists.

Sep 04 19:45:23 kernel: WARNING: CPU: 2 PID: 1014 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:854 dcn10_verify_allow_pstate_change_high.cold+0xc/0x229 [amdgpu]
Sep 04 19:45:23 kernel: Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables sunrpc snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core edac_mce_amd snd_hwdep snd_seq ccp snd_seq_device snd_pcm snd_timer kvm snd irqbypass joydev soundcore sp5100_tco i2c_piix4 crct10dif_pclmul wmi_bmof crc32_pclmul k10temp ghash_clmulni_intel pcc_cpufreq gpio_amdpt gpio_generic acpi_cpufreq amdgpu amd_iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper crc32c_intel drm r8169 wmi video pinctrl_amd
Sep 04 19:45:23 kernel: CPU: 2 PID: 1014 Comm: InputThread Not tainted 5.2.11-200.fc30.x86_64 #1
Sep 04 19:45:23 kernel: Hardware name: Gigabyte Technology Co., Ltd. A320M-S2H V2/A320M-S2H V2-CF, BIOS F2 12/25/2018
Sep 04 19:45:23 kernel: RIP: 0010:dcn10_verify_allow_pstate_change_high.cold+0xc/0x229 [amdgpu]
Sep 04 19:45:23 kernel: Code: 83 c8 ff e9 85 af f9 ff 48 c7 c7 f8 20 78 c0 e8 e8 27 a6 f0 0f 0b 83 c8 ff e9 6f af f9 ff 48 c7 c7 f8 20 78 c0 e8 d2 27 a6 f0 <0f> 0b 80 bb 93 01 00 00 00 75 05 e9 c9 d3 f9 ff 48 8b 83 80 02 00
Sep 04 19:45:23 kernel: RSP: 0018:ffff9ba201da3a00 EFLAGS: 00010246
Sep 04 19:45:23 kernel: RAX: 0000000000000024 RBX: ffff8a12ce72c000 RCX: 0000000000000006
Sep 04 19:45:23 kernel: RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff8a12d8a97900
Sep 04 19:45:23 kernel: RBP: ffff8a12ce72c000 R08: 0000000000000001 R09: 00000000000003f9
Sep 04 19:45:23 kernel: R10: ffffffffb2bf03e0 R11: 0000000000000003 R12: ffff8a12caef81b8
Sep 04 19:45:23 kernel: R13: ffff8a12caef9bc8 R14: ffff8a12caef81b8 R15: ffff8a12b8420200
Sep 04 19:45:23 kernel: FS: 00007fc4cec4b700(0000) GS:ffff8a12d8a80000(0000) knlGS:0000000000000000
Sep 04 19:45:23 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 04 19:45:23 kernel: CR2: 00007f19f6f298a0 CR3: 00000001fb7e4000 CR4: 00000000003406e0
Sep 04 19:45:23 kernel: Call Trace:
Sep 04 19:45:23 kernel: dcn10_pipe_control_lock.part.0+0x69/0x70 [amdgpu]
Sep 04 19:45:23 kernel: dc_stream_set_cursor_attributes+0x121/0x170 [amdgpu]
Sep 04 19:45:23 kernel: handle_cursor_update.isra.0+0x1af/0x310 [amdgpu]
Sep 04 19:45:23 kernel: drm_atomic_helper_async_commit+0x63/0xd0 [drm_kms_helper]
Sep 04 19:45:23 kernel: drm_atomic_helper_commit+0xdb/0x110 [drm_kms_helper]
Sep 04 19:45:23 kernel: drm_atomic_helper_update_plane+0xec/0x100 [drm_kms_helper]
Sep 04 19:45:23 kernel: drm_mode_cursor_universal+0x12c/0x240 [drm]
Sep 04 19:45:23 kernel: drm_mode_cursor_common+0xc9/0x220 [drm]
Sep 04 19:45:23 kernel: ? drm_mode_setplane+0x...

Read more...

Revision history for this message
In , Malkovjohnny (malkovjohnny) wrote :
description: updated
Changed in linux:
status: Unknown → Confirmed
Revision history for this message
In , Tajgaividra (tajgaividra) wrote :

Hi,

Have you tried reverting the xorg amdgpu package to an older version? Of course that is just a workaround.

description: updated
Revision history for this message
In , Malkovjohnny (malkovjohnny) wrote :

(In reply to tajgaividra from comment #5)
> Hi,
>
> Have you tried reverting the xorg amdgpu package to an older version? Of
> course that is just a workaround.

only kernel driver is used

dnf list available | grep xorg-x11-drv-amdgpu
xorg-x11-drv-amdgpu.x86_64 19.0.1-1.fc30

Revision history for this message
Richard Baka (bakarichard91) wrote :

This has been fixed in: xserver-xorg-video-amdgpu 19.0.1-1ubuntu1

Changed in xserver-xorg-video-amdgpu (Ubuntu):
status: New → Fix Released
Revision history for this message
Russell Smith (qqrs) wrote :

I'm also experiencing this problem after upgrading to Ubuntu 19.10. I'm using xserver-xorg-video-amdgpu 19.0.1-1ubuntu1 so either it's not completely fixed, or I'm experiencing a different but very similar issue.

One of the linked bug reports suggested the crash occurs in the XFCE compositor. I tried disabling the compositor:

xfconf-query -c xfwm4 -p /general/use_compositing -s false

This seems to be a usable workaround for now, though it does introduce some visual glitches.

--

More detail:

After booting to login screen, if I select an "XFCE" or "Xubuntu" session and log in, the system crashes to a black screen and is unresponsive to Ctrl-Alt-F2 or Ctrl-Alt-Del or anything else I try. (Alt+SysRq+b does reboot though.)

Selecting an "Ubuntu" session is fine.

After rebooting, I see 232 lines in syslog like this:

WARNING: CPU: 2 PID: 274 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:932 dcn10_verify_allow_pstate_change_high.cold+0xc/0x23d [amdgpu]

--

Hardware and version info:

$ cat /proc/cpuinfo
model name : AMD Ryzen 5 2400G with Radeon Vega Graphics

$ lsb_release -a
Description: Ubuntu 19.10
Release: 19.10
Codename: eoan

$ uname -r
5.3.0-23-generic

$ apt show xserver-xorg
Version: 1:7.7+19ubuntu12

$ apt show xserver-xorg-core
Version: 2:1.20.5+git20191008-0ubuntu1

$ apt show xserver-xorg-video-amdgpu
Version: 19.0.1-1ubuntu1

$ xfce4-session --version
xfce4-session 4.14.0 (Xfce 4.14)

Revision history for this message
In , Martin-peres-n (martin-peres-n) wrote :

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/891.

Changed in linux:
status: Confirmed → Unknown
Revision history for this message
Christian Sarrasin (sxc731) wrote :
Download full text (8.4 KiB)

Issue definitely also occurs with stock Ubuntu GNOME DE (just did a fresh install on a ThinkPad t495s with "AMD Ryzen 7 PRO 3700U w/ Radeon Vega Mobile Gfx"; up-to-date kernel 5.3.0-26-generic and the issue occurred twice within 12 hours of installing - once upon resuming and the 2nd time just after forcing a reboot due to first issue.

I fully understand that complaining here isn't likely to have much effect but it's a real shame that just when AMD finally appears to be taking the Edge over Intel (we *desperately* need the competition IMHO; see the vulnerabilities debacle), the premier Linux desktop distribution falls flat on its face. I also understand that the issue is likely not Debian/Ubuntu specific.

That said, I really hope that Canonical sees this bug and sponsors a fix (perhaps in partnership with AMD) at least before 20.04 LTS hits the shelves.

Stack trace:

Jan 21 08:56:57 t495s kernel: [ 7403.538132] ------------[ cut here ]------------
Jan 21 08:56:57 t495s kernel: [ 7403.538257] WARNING: CPU: 5 PID: 1566 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:932 dcn10_verify_allow_pstate_change_high.col
d+0xc/0x23d [amdgpu]
Jan 21 08:56:57 t495s kernel: [ 7403.538258] Modules linked in: rfcomm ccm cmac bnep nls_iso8859_1 iwlmvm snd_hda_codec_realtek mac80211 libarc4 snd_hda_codec_generic uvcvideo btusb snd_h
da_codec_hdmi btrtl videobuf2_vmalloc btbcm videobuf2_memops snd_seq_midi btintel videobuf2_v4l2 edac_mce_amd videobuf2_common snd_seq_midi_event snd_hda_intel kvm_amd snd_hda_codec video
dev bluetooth ccp kvm irqbypass snd_rawmidi snd_hda_core joydev mc serio_raw input_leds iwlwifi snd_hwdep wmi_bmof thinkpad_acpi snd_pcm nvram ecdh_generic k10temp ecc ledtrig_audio snd_s
eq snd_pci_acp3x snd_seq_device rtsx_pci_ms snd_timer ipmi_devintf memstick cfg80211 ipmi_msghandler snd ucsi_acpi typec_ucsi typec soundcore mac_hid sch_fq_codel parport_pc ppdev lp parp
ort ip_tables x_tables autofs4 hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid dm_crypt amdgpu crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel amd_iommu_v2 rtsx_pc
i_sdmmc gpu_sched i2c_algo_bit ttm drm_kms_helper aes_x86_64 crypto_simd syscopyarea sysfillrect
Jan 21 08:56:57 t495s kernel: [ 7403.538286] sysimgblt cryptd glue_helper psmouse fb_sys_fops drm i2c_piix4 nvme rtsx_pci r8169 nvme_core realtek wmi video i2c_scmi
Jan 21 08:56:57 t495s kernel: [ 7403.538295] CPU: 5 PID: 1566 Comm: Xorg Tainted: G W 5.3.0-26-generic #28-Ubuntu
Jan 21 08:56:57 t495s kernel: [ 7403.538296] Hardware name: LENOVO 20QJCTO1WW/20QJCTO1WW, BIOS R13ET40W(1.14 ) 10/29/2019
Jan 21 08:56:57 t495s kernel: [ 7403.538381] RIP: 0010:dcn10_verify_allow_pstate_change_high.cold+0xc/0x23d [amdgpu]
Jan 21 08:56:57 t495s kernel: [ 7403.538385] Code: 83 c8 ff e9 59 f7 f7 ff 48 c7 c7 08 f1 a2 c0 e8 9d 97 79 f8 0f 0b 83 c8 ff e9 43 f7 f7 ff 48 c7 c7 08 f1 a2 c0 e8 87 97 79 f8 <0f> 0b 80 bb 9f 01 00 00 00 75 05 e9 6a 1e f8 ff 48 8b 83 f8 02 00
Jan 21 08:56:57 t495s kernel: [ 7403.538386] RSP: 0018:ffff9d04820f7710 EFLAGS: 00010246
Jan 21 08:56:57 t495s kernel: [ 7403.538387] RAX: 0000000000000024 RBX: ffff88e30cc80000 RCX: 0000000000000006
Jan 21 08...

Read more...

Revision history for this message
jose (o1485726) wrote :

This bug also affects xserver-xorg-video-amdgpu 18.0.1-1 on Ubuntu 18.04 + HWE packages.
Fix is https://github.com/freedesktop/xorg-xf86-video-amdgpu/commit/a2b32e72fdaff3007a79b84929997d8176c2d512 as shown here: https://bbs.archlinux.org/viewtopic.php?id=247761&p=3

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.