Freeze on system-resume caused by kwin and amdgpu driver

Bug #1849084 reported by Richard Baka
48
This bug affects 9 people
Affects Status Importance Assigned to Milestone
kwin (Ubuntu)
Undecided
Unassigned
linux (Ubuntu)
Undecided
Unassigned
plasma-desktop (Ubuntu)
Undecided
Unassigned
xserver-xorg-video-amdgpu (Ubuntu)
Undecided
Unassigned

Bug Description

This is a well investigated bug by me.

Problem: a black screen freeze occurs by suspend-resume process
Ubuntu version: Kubuntu 19.10 default plasma, or the backported 5.17
Kernel version: default 5.3.10 or the mainline 5.4.0rc3
xserver-xorg-video-amdgpu: 19.* or the newest from git, the 18.0.1-1 works well

This freeze can be reproduced only if opengl 2.0 or 3.1 compositor is enabled in Plasma settings. Xrender is OK however I don't like screen tearing.

On Ubuntu 19.10 Gnome desktop (X based) this doesn't occur even if the Gnome's compositor is enabled. (I don't know what is the default one there but I haven't disabled it)

I think this is a Kwin / amdgpu driver related bug because of the differentiation. These can be fixed even in Kwin/opengl compositor not just in the amdgpu driver or in kernel.

One possible solution: disable opengl compositor by suspend and re-enable it after login. Use fe.: Xrender before login / before the system restored from suspend.

For Ubuntu's maintainers: Couldn't be this problem solved by a downstream solution? Maybe a proper script could be enough by resume and by suspend. There are a lot of bugs like this reported in freedesktop bugreport and the developers haven't got enough time for fix them fast enough.

Syslog:

Oct 21 10:57:26 pc kernel: [ 9475.308852] Code: 85 78 ff ff ff e9 9f f8 ff ff 8b b0 98 04 00 00 48 c7 c7 ef 5f a5 c0 e8 49 2d 9d ff 44 0f b6 45 a3 49 8b 4d 08 e9 bf fa ff ff <0f> 0b e9 ca fb ff ff 0f 0b e8 7d 36 84 c1 66 66 2e 0f 1f 84 00 00
Oct 21 10:57:26 pc kernel: [ 9475.308853] RSP: 0018:ffffb7b54274b7b0 EFLAGS: 00010002
Oct 21 10:57:26 pc kernel: [ 9475.308855] RAX: 0000000000000202 RBX: 0000000000000202 RCX: 000000000000046a
Oct 21 10:57:26 pc kernel: [ 9475.308856] RDX: 0000000000000001 RSI: 0000000000000202 RDI: 0000000000000002
Oct 21 10:57:26 pc kernel: [ 9475.308857] RBP: ffffb7b54274b870 R08: 0000000000000000 R09: ffff94da76b2d170
Oct 21 10:57:26 pc kernel: [ 9475.308858] R10: ffffb7b54274b708 R11: ffffb7b54274b70c R12: ffff94da76b2d000
Oct 21 10:57:26 pc kernel: [ 9475.308859] R13: ffff94d970f2c300 R14: ffff94da758d25d0 R15: ffff94da270d1400
Oct 21 10:57:26 pc kernel: [ 9475.308861] FS: 00007f2a1b9bea80(0000) GS:ffff94da87c40000(0000) knlGS:0000000000000000
Oct 21 10:57:26 pc kernel: [ 9475.308863] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 21 10:57:26 pc kernel: [ 9475.308864] CR2: 00007fb083e9b4a5 CR3: 000000012c110000 CR4: 00000000003406e0
Oct 21 10:57:26 pc kernel: [ 9475.308865] Call Trace:
Oct 21 10:57:26 pc kernel: [ 9475.308977] amdgpu_dm_atomic_commit_tail+0x96f/0x1030 [amdgpu]
Oct 21 10:57:26 pc kernel: [ 9475.308991] commit_tail+0x50/0xc0 [drm_kms_helper]
Oct 21 10:57:26 pc kernel: [ 9475.309000] ? commit_tail+0x50/0xc0 [drm_kms_helper]
Oct 21 10:57:26 pc kernel: [ 9475.309009] drm_atomic_helper_commit+0x118/0x120 [drm_kms_helper]
Oct 21 10:57:26 pc kernel: [ 9475.309115] amdgpu_dm_atomic_commit+0x95/0xa0 [amdgpu]
Oct 21 10:57:26 pc kernel: [ 9475.309135] drm_atomic_commit+0x4a/0x50 [drm]
Oct 21 10:57:26 pc kernel: [ 9475.309144] drm_atomic_helper_set_config+0x89/0xa0 [drm_kms_helper]
Oct 21 10:57:26 pc kernel: [ 9475.309159] drm_mode_setcrtc+0x1cd/0x7a0 [drm]
Oct 21 10:57:26 pc kernel: [ 9475.309234] ? amdgpu_cs_wait_ioctl+0xd6/0x150 [amdgpu]
Oct 21 10:57:26 pc kernel: [ 9475.309249] ? drm_mode_getcrtc+0x190/0x190 [drm]
Oct 21 10:57:26 pc kernel: [ 9475.309262] drm_ioctl_kernel+0xae/0xf0 [drm]
Oct 21 10:57:26 pc kernel: [ 9475.309276] drm_ioctl+0x234/0x3d0 [drm]
Oct 21 10:57:26 pc kernel: [ 9475.309290] ? drm_mode_getcrtc+0x190/0x190 [drm]
Oct 21 10:57:26 pc kernel: [ 9475.309370] amdgpu_drm_ioctl+0x4e/0x80 [amdgpu]
Oct 21 10:57:26 pc kernel: [ 9475.309376] do_vfs_ioctl+0x407/0x670
Oct 21 10:57:26 pc kernel: [ 9475.309379] ? do_futex+0x10f/0x1e0
Oct 21 10:57:26 pc kernel: [ 9475.309382] ksys_ioctl+0x67/0x90
Oct 21 10:57:26 pc kernel: [ 9475.309384] __x64_sys_ioctl+0x1a/0x20
Oct 21 10:57:26 pc kernel: [ 9475.309388] do_syscall_64+0x57/0x190
Oct 21 10:57:26 pc kernel: [ 9475.309392] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Oct 21 10:57:26 pc kernel: [ 9475.309394] RIP: 0033:0x7f2a1bd0c67b
Oct 21 10:57:26 pc kernel: [ 9475.309397] Code: 0f 1e fa 48 8b 05 15 28 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e5 27 0d 00 f7 d8 64 89 01 48
Oct 21 10:57:26 pc kernel: [ 9475.309398] RSP: 002b:00007ffe0e27c518 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Oct 21 10:57:26 pc kernel: [ 9475.309400] RAX: ffffffffffffffda RBX: 00007ffe0e27c550 RCX: 00007f2a1bd0c67b
Oct 21 10:57:26 pc kernel: [ 9475.309401] RDX: 00007ffe0e27c550 RSI: 00000000c06864a2 RDI: 000000000000000d
Oct 21 10:57:26 pc kernel: [ 9475.309402] RBP: 00000000c06864a2 R08: 0000000000000000 R09: 0000562bcc215600
Oct 21 10:57:26 pc kernel: [ 9475.309403] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Oct 21 10:57:26 pc kernel: [ 9475.309404] R13: 000000000000000d R14: 0000562bcb275db0 R15: 0000000000000000
Oct 21 10:57:26 pc kernel: [ 9475.309407] ---[ end trace cae28d1e69119104 ]---
Oct 21 10:57:26 pc kernel: [ 9475.309432] ------------[ cut here ]------------

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in kwin (Ubuntu):
status: New → Confirmed
Changed in plasma-desktop (Ubuntu):
status: New → Confirmed
Changed in xserver-xorg-video-amdgpu (Ubuntu):
status: New → Confirmed
Revision history for this message
Mike Lykov (combr) wrote :

Yes, this bug also affects me.
I upgrade from Kubuntu 19.04 with 5.0 kernel to 19.10 with 5.3 kernel, and got described behaviour:

1. On 5.0 there is no problems (if I boot it in 19.10)
2. on 5.3 with default settings - there is a black screen after try to resume.
3. If I choose XRender with 5.3 in plasma settings - no problems.

current kernel 5.3.0-19-generic #20-Ubuntu SMP Fri Oct 18 09:04:39
I have no traces of oops in log, journalctl -b -1 -k show only
kernel: PM: suspend entry (deep)
string before resume/hang at black screen.

current driver xserver-xorg-video-amdgpu 19.0.1-1ubuntu1 amd64
xorg log
[ 10.353] (II) AMDGPU(0): glamor X acceleration enabled on AMD RAVEN (DRM 3.33.0, 5.3.0-19-generic, LLVM 9.0.0)
[ 10.353] (II) AMDGPU(0): glamor detected, initialising EGL layer.
[ 10.353] (==) AMDGPU(0): TearFree property default: auto
[ 10.353] (==) AMDGPU(0): VariableRefresh: disabled
[ 10.353] (II) AMDGPU(0): KMS Pageflipping: enabled

notebook with attached monitor
[ 10.396] (II) AMDGPU(0): EDID for output eDP
[ 10.396] (II) AMDGPU(0): Manufacturer: BOE Model: 6a5 Serial#: 0
[ 10.396] (II) AMDGPU(0): Year: 2015 Week: 1
[ 10.396] (II) AMDGPU(0): EDID Version: 1.4
[ 10.396] (II) AMDGPU(0): Digital Display Input
[ 10.396] (II) AMDGPU(0): 6 bits per channel
[ 10.396] (II) AMDGPU(0): Digital interface is DisplayPort
[ 10.397] (II) AMDGPU(0): EDID for output HDMI-A-0
[ 10.397] (II) AMDGPU(0): Manufacturer: BNQ Model: 801b Serial#: 21573
[ 10.397] (II) AMDGPU(0): Year: 2017 Week: 15
[ 10.397] (II) AMDGPU(0): EDID Version: 1.3
[ 10.397] (II) AMDGPU(0): Digital Display Input
[ 10.398] (II) AMDGPU(0): Output eDP connected
[ 10.398] (II) AMDGPU(0): Output HDMI-A-0 connected
[ 10.398] (II) AMDGPU(0): Using spanning desktop for initial modes
[ 10.398] (II) AMDGPU(0): Output eDP using initial mode 1366x768 +0+0
[ 10.398] (II) AMDGPU(0): Output HDMI-A-0 using initial mode 2560x1440 +1366+0
[ 10.399] (II) AMDGPU(0): [DRI2] Setup complete
[ 10.399] (II) AMDGPU(0): [DRI2] DRI driver: radeonsi
[ 10.399] (II) AMDGPU(0): [DRI2] VDPAU driver: radeonsi
[ 10.533] (II) AMDGPU(0): Front buffer pitch: 15872 bytes
[ 10.534] (II) AMDGPU(0): SYNC extension fences enabled
[ 10.534] (II) AMDGPU(0): Present extension enabled
[ 10.534] (==) AMDGPU(0): DRI3 enabled
[ 10.534] (==) AMDGPU(0): Backing store enabled
[ 10.534] (II) AMDGPU(0): Direct rendering enabled
[ 10.541] (II) AMDGPU(0): Use GLAMOR acceleration.
[ 10.541] (II) AMDGPU(0): Acceleration enabled
[ 10.541] (==) AMDGPU(0): DPMS enabled
[ 10.541] (==) AMDGPU(0): Silken mouse enabled
[ 10.541] (II) AMDGPU(0): Set up textured video (glamor)

Revision history for this message
Mike Lykov (combr) wrote :

I see the miracle :)
according https://01.org/blogs/rzhang/2015/best-practice-debug-linux-suspend/hibernate-issues

I try two options:
1. initcall_debug
2. echo 0 > /sys/power/pm_async

Before I can reproduce the bug each time I press power button for suspend/resume (after resume I have a black screen with in 100% times when conditions described above satisfied).

then I try to disable pm_async and after suspend/resume I have no this bug. BUT, after restoring pm_async value to 1 via reboot, I cannot reproduce bug anymore. It is not happen in same situation as earlier.

Revision history for this message
Mike Lykov (combr) wrote : apport information

ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu8.1
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: combr 3311 F.... pulseaudio
 /dev/snd/controlC0: combr 3311 F.... pulseaudio
CurrentDesktop: KDE
DistroRelease: Ubuntu 19.10
InstallationDate: Installed on 2018-11-22 (346 days ago)
InstallationMedia: Kubuntu 18.10 "Cosmic Cuttlefish" - Release amd64 (20181017.2)
MachineType: HP HP Laptop 15-db0xxx
Package: linux-image-5.3.0-19-generic 5.3.0-19.20
PackageArchitecture: amd64
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.3.0-19-generic root=UUID=ff84e034-43ff-4c6f-914b-42dcdd53cf31 ro quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 5.3.0-19.20-generic 5.3.1
RelatedPackageVersions:
 linux-restricted-modules-5.3.0-19-generic N/A
 linux-backports-modules-5.3.0-19-generic N/A
 linux-firmware 1.183.1
Tags: eoan
Uname: Linux 5.3.0-19-generic x86_64
UpgradeStatus: Upgraded to eoan on 2019-11-02 (1 days ago)
UserGroups: adm cdrom dip disk docker libvirt lpadmin plugdev sambashare sudo wireshark
_MarkForUpload: True
dmi.bios.date: 04/06/2018
dmi.bios.vendor: Insyde
dmi.bios.version: F.02
dmi.board.asset.tag: Type2 - Board Asset Tag
dmi.board.name: 84AE
dmi.board.vendor: HP
dmi.board.version: 86.19
dmi.chassis.asset.tag: Chassis Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: HP
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnInsyde:bvrF.02:bd04/06/2018:svnHP:pnHPLaptop15-db0xxx:pvrType1ProductConfigId:rvnHP:rn84AE:rvr86.19:cvnHP:ct10:cvrChassisVersion:
dmi.product.family: 103C_5335KV HP Notebook
dmi.product.name: HP Laptop 15-db0xxx
dmi.product.sku: 4MK59EA#ACB
dmi.product.version: Type1ProductConfigId
dmi.sys.vendor: HP

tags: added: apport-collected eoan
Revision history for this message
Mike Lykov (combr) wrote : AlsaInfo.txt

apport information

Revision history for this message
Mike Lykov (combr) wrote : CRDA.txt

apport information

Revision history for this message
Mike Lykov (combr) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Mike Lykov (combr) wrote : Dependencies.txt

apport information

Revision history for this message
Mike Lykov (combr) wrote : IwConfig.txt

apport information

Revision history for this message
Mike Lykov (combr) wrote : Lspci.txt

apport information

Revision history for this message
Mike Lykov (combr) wrote : Lsusb.txt

apport information

Revision history for this message
Mike Lykov (combr) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Mike Lykov (combr) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Mike Lykov (combr) wrote : ProcEnviron.txt

apport information

Revision history for this message
Mike Lykov (combr) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Mike Lykov (combr) wrote : ProcModules.txt

apport information

Revision history for this message
Mike Lykov (combr) wrote : PulseList.txt

apport information

Revision history for this message
Mike Lykov (combr) wrote : RfKill.txt

apport information

Revision history for this message
Mike Lykov (combr) wrote : UdevDb.txt

apport information

Revision history for this message
Mike Lykov (combr) wrote : WifiSyslog.txt

apport information

Revision history for this message
Mike Lykov (combr) wrote :

Fixed by bios upgrade.....

-kernel: DMI: HP HP Laptop 15-db0xxx/84AE, BIOS F.02 04/06/2018
+kernel: DMI: HP HP Laptop 15-db0xxx/84AE, BIOS F.20 05/15/2019

also interesting
-kernel: amdgpu 0000:04:00.0: VRAM: 256M 0x000000F400000000 - 0x000000F40FFFFFFF (256M used)
+kernel: amdgpu 0000:04:00.0: VRAM: 1024M 0x000000F400000000 - 0x000000F43FFFFFFF (1024M used)

-kernel: AMD-Vi: [Firmware Bug]: : IOAPIC[4] not in IVRS table
-kernel: AMD-Vi: [Firmware Bug]: : IOAPIC[5] not in IVRS table
-kernel: AMD-Vi: [Firmware Bug]: : No southbridge IOAPIC found
-kernel: AMD-Vi: Disabling interrupt remapping

and other changes
with previous bios kernel 5.4 did not load (black screen at boot).
with current bios it is loaded now :
Linux 5.4.0-050400rc5-generic #201910271430

Revision history for this message
khitschler (klaus-hitschler) wrote :

Back to freeze on resume.

I'm running Ubuntu 18.04.4 LTS on a Lenovo Desktop V530-15ARR with AMD Ryzen 5 2400G with Vega 11 Graphics and a kernel 5.3.0-28-generic.

With a kernel 5.0.0-29 suspend and resume were perfect (from the users point of view). But I had sometimes graphic freezes during normal usage like browsing. So I upgraded the kernel to 5.3.0-28. This kernel shows suspend and resume problems.
I think I've found a workaround for the failing suspend described at https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe/+bug/1851054 . But now resume sometimes traps into a black screen.

It seems that only the first resume after a cold start fails. When resume fail I'm able to login via SSH and to initiate a reboot. Subsequent suspend and resumes do survive. Since I've repeated only a few times the same experiment I'm not 100% sure about this.

Working resume: Please see attachment suspend-resume-ok.txt

Failing resume: Please see attachment suspend-resume-blank-screen,txt

Revision history for this message
khitschler (klaus-hitschler) wrote :
Revision history for this message
khitschler (klaus-hitschler) wrote :

Please note the lines
* Feb 16 15:47:43 Lin kernel: [drm] pstate TEST_DEBUG_DATA: 0xB7F60000
and
* Feb 16 15:47:43 Lin kernel: [drm] pstate TEST_DEBUG_DATA: 0x37F60000

(I've cut the repeating parts)

The lines are emitted in the kernel source code at ../drivers/gpu/drm/amd/display/dc/dcn10/ by the function hubbub1_verify_allow_pstate_change_high()

Revision history for this message
khitschler (klaus-hitschler) wrote :

Sorry, I made a mistake I'm running kubuntu (not Ubuntu like mentioned in #24).

Indeed I can confirm that with a switched off compositor resume will work. And I can confirm that resume only fails the first time after a power up reset of the computer. When the screen turn blank after the first time resume I do a login via SSH and initiate a reboot. After this all suspend / resume cycles will work perfectly.

It is strange that a compositor's bug will affect the amdgpu driver module. This snippet of journalctl log shows a line emitted by the amdgpu module where a test runs into a timeout:

Feb 27 10:43:02 Lin kernel: PM: suspend exit
Feb 27 10:43:02 Lin systemd-sleep[2125]: System resumed.
Feb 27 10:43:02 Lin kernel: [drm] pstate TEST_DEBUG_DATA: 0xB7F60000
Feb 27 10:43:02 Lin kernel: ------------[ cut here ]------------
Feb 27 10:43:02 Lin kernel: WARNING: CPU: 2 PID: 2191 at /build/linux-hwe-LNyirI/linux-hwe-5.3.0/drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:932 dcn10_verify_allow_pstate_change_high+0x37/0x2d0 [amdgpu]

I've attached the log parts of a failing resume (compositor is switched on):
* 10:40 Start
* 10:42 Suspend 1st time
* 10:43 Resume 1st time
* 10:45 reboot
* 10:46 Suspend 2nd time
* 10:47 Resume second time

I think I'll try the workaround mentioned in https://forum.manjaro.org/t/lenovo-laptop-wont-resume-after-sleep/105233/27

Revision history for this message
khitschler (klaus-hitschler) wrote :

Yes it was only a workaround. The workaround running well for a few days until this update:

$ cat /var/log/apt/history.log

Start-Date: 2020-03-02 21:13:02
Commandline: packagekit role='update-packages'
Requested-By: klaus (1000)
Upgrade: libegl-mesa0:amd64 (19.2.8-0ubuntu0~18.04.2, 19.2.8-0ubuntu0~18.04.3), devolo-dlan-cockpit:amd64 (5.1.1-0, 5.1.2-0), libglapi-mesa:amd64 (19.2.8-0ubuntu0~18.04.2, 19.2.8-0ubuntu0~18.04.3), libarchive13:amd64 (3.2.2-3.1ubuntu0.5, 3.2.2-3.1ubuntu0.6), libxatracker2:amd64 (19.2.8-0ubuntu0~18.04.2, 19.2.8-0ubuntu0~18.04.3), libegl1-mesa:amd64 (19.2.8-0ubuntu0~18.04.2, 19.2.8-0ubuntu0~18.04.3), libgbm1:amd64 (19.2.8-0ubuntu0~18.04.2, 19.2.8-0ubuntu0~18.04.3), libwayland-egl1-mesa:amd64 (19.2.8-0ubuntu0~18.04.2, 19.2.8-0ubuntu0~18.04.3), libgl1-mesa-dri:amd64 (19.2.8-0ubuntu0~18.04.2, 19.2.8-0ubuntu0~18.04.3), libgl1-mesa-glx:amd64 (19.2.8-0ubuntu0~18.04.2, 19.2.8-0ubuntu0~18.04.3), mesa-vdpau-drivers:amd64 (19.2.8-0ubuntu0~18.04.2, 19.2.8-0ubuntu0~18.04.3), mesa-va-drivers:amd64 (19.2.8-0ubuntu0~18.04.2, 19.2.8-0ubuntu0~18.04.3), libglx-mesa0:amd64 (19.2.8-0ubuntu0~18.04.2, 19.2.8-0ubuntu0~18.04.3)
End-Date: 2020-03-02 21:13:07

Now the behavior is the same as before. Start of a new game :-)

Revision history for this message
khitschler (klaus-hitschler) wrote :

The above mentioned workaround (#27) is working again after this updates:

$ cat /var/log/apt/history.log

Start-Date: 2020-03-16 19:54:48
Commandline: packagekit role='update-packages'
Requested-By: klaus (1000)
Install: linux-image-5.3.0-42-generic:amd64 (5.3.0-42.34~18.04.1, automatic), linux-headers-5.3.0-42-generic:amd64 (5.3.0-42.34~18.04.1, automatic), linux-modules-extra-5.3.0-42-generic:amd64 (5.3.0-42.34~18.04.1, automatic), linux-headers-5.3.0-42:amd64 (5.3.0-42.34~18.04.1, automatic), linux-modules-5.3.0-42-generic:amd64 (5.3.0-42.34~18.04.1, automatic)
Upgrade: linux-libc-dev:amd64 (4.15.0-88.88, 4.15.0-91.92), linux-source-4.15.0:amd64 (4.15.0-88.88, 4.15.0-91.92), linux-headers-generic-hwe-18.04:amd64 (5.3.0.40.97, 5.3.0.42.99), linux-source-5.3.0:amd64 (5.3.0-40.32~18.04.1, 5.3.0-42.34~18.04.1), linux-source:amd64 (4.15.0.88.80, 4.15.0.91.83), linux-image-generic-hwe-18.04:amd64 (5.3.0.40.97, 5.3.0.42.99), linux-signed-generic-hwe-18.04:amd64 (5.3.0.40.97, 5.3.0.42.99), linux-generic-hwe-18.04:amd64 (5.3.0.40.97, 5.3.0.42.99)
End-Date: 2020-03-16 19:55:27

Start-Date: 2020-03-17 18:41:03
Commandline: packagekit role='update-packages'
Requested-By: klaus (1000)
Upgrade: libicu60:amd64 (60.2-3ubuntu3, 60.2-3ubuntu3.1), libicu60:i386 (60.2-3ubuntu3, 60.2-3ubuntu3.1)
End-Date: 2020-03-17 18:41:05

Revision history for this message
P. B. (pembpb) wrote :

I don't know if this will help find the solution to this problem, but for what it's worth, here's a solution that worked in my instance & I would very much like to know why it worked.
System; Lenovo V155-15API (V155 Series)
Processor; AMD Ryzen 5 3500U
Graphics adapter; AMD Radeon RX Vega 8
Memory; 12288 MB (Windows partition)

Dual booting Windows 10 & Ubuntu 19.10 (Ubuntu being main OS on separate partition)

Problem that developed in Ubuntu, hibernate/sleep/resume (including lid close) not working correctly consistently, requiring a power off reboot.

After failing to find an answer to the problem in Linux, I eventually booted into Windows to find the problem even worse there, no hibernate/sleep/resume at all, a power off reboot the only option.

Solution (So far); I downloaded the AMD Driver Auto-detect tool & let it install the drivers it detected as needed or needing updating.

After this I rebooted into Windows & found hibernate/sleep/resume to be working perfectly.

Thinking this to be a Windows only fix, I rebooted into Ubuntu with the intention of finding drivers suitable for fixing the problem in Linux, but before doing so, found to my surprise that hibernate/sleep/resume were now working perfectly in Ubuntu as well.

Is these drivers somehow being shared via the EFI partition?

Revision history for this message
khitschler (klaus-hitschler) wrote :

This is a very interesting observation! In one of my previous posts I described the fact that after the first unsuccessful resume and a following soft-reset with Alt+SysReq+B all subsequent resumes will succeed.
My guess was that there was something changed in the GPU's hardware registers before or after the first resume. It cannot be any software influence since the software started again. After a cold reset the game starts again. My assumption was that the GPU's hardware register change was volatile.

Maybe the "AMD Driver Auto-detect tool" stores this one time change non-volatile in any kind of configuration storage element within the GPU. Then it behaves the same like you described. Currently I do not know of any dual use of drivers between Linux and Windows, but to have a built-in non-volatile configuration storage supporting the GPU is very likely.

Unfortunately I do not have a dual boot machine. I'm running only Linux. So I'll try to create a USB-stick based Windows 10 to verify your experience. It will take some time.

Again, very interesting ...

Revision history for this message
khitschler (klaus-hitschler) wrote :

P.B. I tried to verify your observation. Without luck. First I generated a Windows-10 bootable USB drive and managed to install and run the "AMD Driver Auto-detect tool". On Windows it was successful to set the computer into standby and resume. Then I've done a warm-start of the computer an booted into Linux. Now I was able to set the machine into standby and back to resume successfully.

The next experiment was to do a power down reset and try the standby/resume with Linux again. It stopped with a blank black screen with backlight switched on. After a warm start (Alt-SysReq-B) all subsequent standby/resume cycles worked perfectly.

My guess is that after a warm start some GPU register settings remain untouched causing success of the resume. Or in other words, the initial setup of the GPU is missing something which is done with the first (unsuccessful) standby/resume after power up.

My ubuntu kernel version is meanwhile 5.3.0-46-generic.

Revision history for this message
jman6495 (jman6495) wrote :

I also believe i'm experiencing this or something similar. The issue is much wider than plasma though, this is a general issue with the amdgpu driver, i believe.

If it could be reclassified as such that would probably help get it to the right people.

Experiencing the same issues on both Ubuntu 18.04LTS, Ubuntu 20.04 Beta and Manjaro.

Revision history for this message
jman6495 (jman6495) wrote :

Also here is my dmesg when I attemt the suspend

Revision history for this message
jman6495 (jman6495) wrote :

This additionally crashes gnome-shell (see further dmesg)

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1849084

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
jman6495 (jman6495)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
khitschler (klaus-hitschler) wrote :

I can confirm that with switched off compositor (KDE, Shift-ALT-F12) resume is working. But I think the compositor act as trigger only and is not the reason for the bug.

I've attached the relevant journalctl output of a current unsuccessful resume. I started the sleep at 13th april about 20:28:06 and tried to resume about 20:28:20. Some minutes later I logged in via ssh and initialized a reboot.

Revision history for this message
khitschler (klaus-hitschler) wrote : apport information

ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.14
Architecture: amd64
CurrentDesktop: KDE
DistroRelease: Ubuntu 18.04
InstallationDate: Installed on 2019-06-17 (301 days ago)
InstallationMedia: Kubuntu 18.04.2 LTS "Bionic Beaver" - Release amd64 (20190210)
Package: xserver-xorg-video-amdgpu
PackageArchitecture: all
ProcVersionSignature: Ubuntu 5.3.0-46.38~18.04.1-generic 5.3.18
Tags: bionic third-party-packages
Uname: Linux 5.3.0-46-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True

tags: added: bionic third-party-packages
Revision history for this message
khitschler (klaus-hitschler) wrote : Dependencies.txt

apport information

Revision history for this message
khitschler (klaus-hitschler) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
khitschler (klaus-hitschler) wrote : ProcEnviron.txt

apport information

Revision history for this message
jose (o1485726) wrote :

This bug also affects xserver-xorg-video-amdgpu 18.0.1-1 on Ubuntu 18.04 + HWE packages. It's not related to plasma/kwin (I think).

Fix is https://github.com/freedesktop/xorg-xf86-video-amdgpu/commit/a2b32e72fdaff3007a79b84929997d8176c2d512 as shown here: https://bbs.archlinux.org/viewtopic.php?id=247761&p=3

BTW, you can find similar bug reports by googling the following: site:bugs.launchpad.net "dcn10_hw_sequencer.c:* dcn10_verify_allow_pstate_change_high.cold"

Revision history for this message
jose (o1485726) wrote :

To be clear: I'm talking about the "dcn10_verify_allow_pstate_change_high" bug, not about the original report...

Bug #1842954 seems to be the same bug.

Revision history for this message
Richard Baka (bakarichard91) wrote :

Hi, is this issue still present? 20.04 seems OK for me. No freezes.

Revision history for this message
khitschler (klaus-hitschler) wrote :

@Richard: Did you mean Ubuntu 20.04? What kernel have you running (uname -r)?

Revision history for this message
khitschler (klaus-hitschler) wrote :

@jose: The next days I'll try to verify if this fix will remove the bug with my computer.

Revision history for this message
khitschler (klaus-hitschler) wrote :

@jose: I'm not able to manage a setup to verify your reported bug fix. Sorry.

Revision history for this message
Richard Baka (bakarichard91) wrote :

Hi khitschler,

I have the default Ubuntu 20.04 kernel and the freeze still occurs. My assumption about the correct behavior was wrong however I think I have just found a workaround could somebody you/somebody test it?

From here (by tbg - Admired ): https://forum.manjaro.org/t/kde-nvidia-screen-corruption-after-resume/60064/44

Apr '19
Here’s my latest version of a service to restart KDE Plasma after suspending, This version seems to work very well and unlike my last version this does not depend on crashing plasma so it doesn’t throw an error when executed.

Plasma Restart Service

With a root text editor create:

/etc/systemd/system/plasma-restart@.service
Service file contents:

#/etc/systemd/system/plasma-restart@.service
#sudo systemctl enable plasma-restart@$USER.service
#sudo systemctl start plasma-restart@$USER.service
#sudo systemctl stop plasma-restart@$USER.service
#sudo systemctl disable plasma-restart@$USER.service

[Unit]
Description=Plasma Restart Service
After=suspend.target
StopWhenUnneeded=yes

[Service]
User=%i
WorkingDirectory=/home/%i
Type=oneshot
Slice=user-%i.slice
RemainAfterExit=yes
ExecStart=/bin/bash -alc "sudo -Hiu %i pkill -ABRT plasmashell"

[Install]
WantedBy=suspend.target
Alias=plasma-restart@%i.service

Save the service file with root permissions, and exit the text editor.

Then, enable the service:

sudo systemctl enable plasma-restart@$USER.service
Then restart.

-----------------------------

Why this is not the default setting if it fixes even the NVIDIA problems?

Revision history for this message
khitschler (klaus-hitschler) wrote :

@Richard:

I tried your suggestion without luck. The screen remains black, the backlight was switched on. To verify I tried multiple times.

The only true workaround (with KDE) is to switch the compositor off before turning into sleep. You have to switch the compositor on after return from suspend. Somewhere I've found a systemctl script to manage this automatically, but I experienced the script's working not reliable.

With KDE you have to press Shift-Alt-F12 to toggle manually the compositor's state.

This is the output after returning from suspend with compositor switched off:

klaus@Lin:~$ sudo systemctl status plasma-restart@$USER.service
● <email address hidden> - Plasma Restart Service
   Loaded: loaded (/etc/systemd/system/plasma-restart@.service; indirect; vendor preset: enabled)
   Active: inactive (dead)

Mai 03 16:20:59 Lin systemd[1]: <email address hidden>: Unit not needed anymore. Stopping.
Mai 03 16:20:59 Lin systemd[1]: <email address hidden>: Failed to enqueue stop job, ignoring: Transaction is destructive.
Mai 03 16:20:59 Lin systemd[1]: Starting Plasma Restart Service...
Mai 03 16:20:59 Lin sudo[3652]: klaus : TTY=unknown ; PWD=/home/klaus ; USER=klaus ; COMMAND=/bin/bash -c pkill -ABRT plasmashell
Mai 03 16:20:59 Lin sudo[3652]: pam_unix(sudo:session): session opened for user klaus by (uid=0)
Mai 03 16:20:59 Lin sudo[3652]: pam_unix(sudo:session): session closed for user klaus
Mai 03 16:20:59 Lin systemd[1]: Started Plasma Restart Service.
Mai 03 16:20:59 Lin systemd[1]: <email address hidden>: Unit not needed anymore. Stopping.
Mai 03 16:20:59 Lin systemd[1]: Stopped Plasma Restart Service.

Something seems to be wrong, but I'm not the specialist for systemctl scripts.

Regards, Klaus

Revision history for this message
Richard Baka (bakarichard91) wrote :

I have the exactly same problem I think, this script should work! I'm trying to figure out what could be wrong with your systemctl.

Revision history for this message
jman6495 (jman6495) wrote :

The issue persists for me on plasma, and also happens on gnome shell. I really think this is a issue with the display driver/kernel more than a userspace issue.

Revision history for this message
Richard Baka (bakarichard91) wrote :

Yes, that's correct. This the reason why Kwin's opengl compositor + suspend-resume combination causes it. What I wrote is just a workaround/trick to disable opengl compositor on resume.

Revision history for this message
khitschler (klaus-hitschler) wrote :

@jose: It seems that I already have installed a "amdgpu_drv.so" containing your mentioned patch.

My package manager claims that I have installed "xserver-xorg-video-amdgpu-hwe-18.04_19.0.1-1ubuntu1~18.04.1_amd64.deb" with version "19.0.1-1ubuntu1~18.04.1". I compared the binary deb-package of my local apt-archive with the package provided at launchpad.net and tracked the originating sources. They already contain the patch. Unfortunately it seems that the patch does not work for me.

Revision history for this message
khitschler (klaus-hitschler) wrote :

At least for me I found a reproducible workaround for the resume bug. I found it more or less with try and error.

I added the option "EnablePageFlip" to the configuration file "/usr/share/X11/xorg.conf.d/10-amdgpu.conf" like this:

Section "OutputClass"
        Identifier "AMDgpu"
        MatchDriver "amdgpu"
        Driver "amdgpu"
        Option "EnablePageFlip" "off"
EndSection

and restarted. With this option set my computer always return from suspend. I do not really know about the impact of "EnablePageFlip" but I do not feel any significant degradation of video performance.

Maybe this will help to catch the annoying bug.

Revision history for this message
khitschler (klaus-hitschler) wrote :

@jose: The above workaround seems to be related with your patch mentioned at #42.

The function "amdgpu_present_check_flip()" returns prematurely if the variable "info->allowPageFlip" is set to false. "info->allowPageFlip" contains the evaluation of the configuration "OPTION_PAGE_FLIP" evaluated from the option setting "EnablePageFlip". So the mentioned part of code where the patch is located is never processed.

Maybe this helps a piece more.

Revision history for this message
gnowak (gnowak-gmail) wrote :

Confirming that #54 works: Option "EnablePageFlip" "off"

Revision history for this message
Pietrek B. (ptrbrzozowski) wrote :

Hi. Experienced this issue on an HP Envy X360 (Ryzen 2500U) and can confirm #54 fixed resuming from sleep, it also fixed resuming from hibernation for me.

Revision history for this message
Pietrek B. (ptrbrzozowski) wrote :

It's also worth mentioning that using "EnablePageFlip" "off" causes serious screen tearing when interacting with the desktop... Switching KWin's compositing engine to xrender helps the tearing somewhat, but still - this is terrible.
That platform (Raven Ridge) has been around for over two years now and Linux still can't support it properly.

Revision history for this message
jman6495 (jman6495) wrote :

Sadly, adding Option "EnablePageFlip" "off" doesn't appear to resolve the issue on the 3500U, however i've made quite a few modifications on this install.

I'll give it a go on a fresh install this evening and get back to you.

Revision history for this message
Pietrek B. (ptrbrzozowski) wrote :

Good news. I was able to work-around this problem by disabling DRI 3:

#nano /usr/share/X11/xorg.conf.d/10-amdgpu.conf
Option "DRI" "2"
I also added:
Option "TearFree" "true"

to mitigate any screen tearing, although that was for purely cosmetic reasons.

See if this works for you and report back, please.

Revision history for this message
Pietrek B. (ptrbrzozowski) wrote :

Apologies for spamming, guys.

I tried another fix for this bug, as suggested here: https://gitlab.freedesktop.org/drm/amd/-/issues/695#note_313722
and it seems to be working.

I rebuilt the xserver-xorg-video-amdgpu-hwe-18.04_19.0.1 package with https://github.com/freedesktop/xorg-xf86-video-amdgpu/commit/a2b32e72fdaff3007a79b84929997d8176c2d512 reverted and my machine is able to resume from suspend/hibernate just fine while keeping both DRI3 and KWin's OpenGL compositing enabled.

Here's the deb for you to test: https://drive.google.com/file/d/1AK-nKuSkYqGKT09I7juddfIiJNViO86T/view?usp=sharing

Note that this is for 18.04 and you have to have the HWE stack enabled first: apt-get install --install-recommends linux-generic-hwe-18.04 xserver-xorg-hwe-18.04

Revision history for this message
khitschler (klaus-hitschler) wrote :

I can confirm that #61 removes the bug.

I'm running Kubuntu 18.04 with a hwe kernel 5.3.0-51-generic #44~18.04.2-Ubuntu. (Not 5.3.0-53 because it contains a power off bug.)

Great! Now I hope it will find a way to an "official" update.

Revision history for this message
Pietrek B. (ptrbrzozowski) wrote :

The power-off bug can be fixed by installing .56 kernel. I'm using the following PPA to get that release until it makes into official repos: https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/proposed.

Revision history for this message
khitschler (klaus-hitschler) wrote :

For all who used the workaround #61: The current update xserver-xorg-video-amdgpu-hwe-18.04:amd64 19.0.1-1ubuntu1~18.04.1 19.1.0-1~18.04.1 overwrites the file /usr/lib/xorg/modules/drivers/amdgpu_drv.so (file date May 21 09:16) and the bug returns again.

It is one step forward one step back ...

Hence either apply the workaround #61 again or step back to my suggested workaround #54.

Revision history for this message
Pietrek B. (ptrbrzozowski) wrote :
Revision history for this message
jman6495 (jman6495) wrote :

Has anyone built this package for 20.04 ?

Revision history for this message
Kenji (dude2k5) wrote :

New to the thread, HP envy, ryzen 4500. Running 5.7.10, Mint 20.

Had the same issues as everyone else, sleep with lid wont allow it to wake, but oddly if I manually suspend, it comes back fine.

Tried fix #54 and it did seem to fix my issues, thank you khitschler.

I dont have any screen tearing or any artifacts in 4k videos. So I might be lucky, it's a decent fix for me.

I'll keep an eye on the thread to see if a better fix comes out, not sure what EnablePageFlip actually does for my laptop.

Revision history for this message
Kenji (dude2k5) wrote :

Anyone try 5.8.0rc6 yet? I decided to try on a whim since this is a really new laptop. It got rid of many error messages I was getting. And, I removed the "EnablePageFlip" entry (so back to default) and sleep does work normally in 5.8.0r6. Maybe try that and see, or wait until stable releases. Furthermore, VLC would get stuck in the background after opening, I would have to kill the process, but now I no longer have to do that either!

Revision history for this message
Mike Lykov (combr) wrote :

Wow, I have now same bug with VLC. I can start it, but can not quit. It do not respond to quit and stays in tray and in background. I need to kill it to start again if I want to watch anything other.
If this bug is not a VLC bug, but amdgpu driver bug, I will check it when 5.8 will be released.

Revision history for this message
Uriel Tunn (u2n) wrote :

The post 54 workaround fixed it for me, thanks much @khitschler. (https://bugs.launchpad.net/ubuntu/+source/plasma-desktop/+bug/1849084/comments/54)

Running Ubuntu 18.04.1 with latest stable kernel (5.4.0-42), but this has been a problem since the upgrade from 16.04.

Since there are no issues with the workaround, it will remain in place until the upgrade to 20.04.

Revision history for this message
khitschler (klaus-hitschler) wrote :

Since beginning of September the bug disappeared for me. I'm not sure what update caused it but the usual suspects are: linux-image-5.4.0-45-generic:amd64 (5.4.0-45.49~18.04.2, automatic) with its kernel module amdgpu.ko or some xorg-...-hwe parts or a combination of them.

But I tried with a boot of a kernel 5.3.0-40 and the bug emerged again. So it is more likely that it was a kernel related problem.

I have to note that I removed the line with "EnablePageFlip" within the configuration file "/usr/share/X11/xorg.conf.d/10-amdgpu.conf" of my suggested workaround #54 like this:

Section "OutputClass"
        Identifier "AMDgpu"
        MatchDriver "amdgpu"
        Driver "amdgpu"
EndSection

Revision history for this message
Uriel Tunn (u2n) wrote :

After upgrade to 20.04.1, problem vanished. (also with k5.4.0-45)

So maybe bug was triggered by a backport snafu? Unknown, but after upgrading three systems to 20.04, no more resume crashes. Doesn't help much to find the fault, but maybe a good, 'permanent' workaround.

Revision history for this message
Saren Taşçıyan (sarentasciyan) wrote :

I am still experiencing this issue on AMD Ryzen 5 4500u with radeon graphics x 6 (Lenovo ideapad 5).

Revision history for this message
Saren Taşçıyan (sarentasciyan) wrote :

Forgot to mention that I am using 20.04 LTS up to date as of now with 5.4.0-47-generic kernel.

To post a comment you must log in.