[amdgpu] Graphics driver issue: Display goes black for a second at random: [drm:link_enc_cfg_validate [amdgpu]] *ERROR* link_enc_cfg_validate: Invalid link encoder assignments - 0x1c

Bug #2070096 reported by Nikhil Kaushik
62
This bug affects 10 people
Affects Status Importance Assigned to Milestone
linux-oem-6.5 (Ubuntu)
Won't Fix
Undecided
Unassigned
linux-oem-6.8 (Ubuntu)
New
Undecided
Unassigned

Bug Description

After some update my display randomly goes black for just a second and comes back. It happens randomly and I looked into journalctl and found this error message.

Jun 22 15:45:25 nikhil-T14 rtkit-daemon[1192]: Supervising 8 threads of 4 processes of 1 users.
Jun 22 15:45:25 nikhil-T14 rtkit-daemon[1192]: Supervising 8 threads of 4 processes of 1 users.
Jun 22 15:45:55 nikhil-T14 systemd[1579]: Started Application launched by gnome-shell.
Jun 22 15:46:12 nikhil-T14 gnome-shell[1735]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed
Jun 22 15:46:20 nikhil-T14 kernel: [drm:link_enc_cfg_validate [amdgpu]] *ERROR* link_enc_cfg_validate: Invalid link encoder assignments - 0x1c

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: xorg 1:7.7+23ubuntu2
ProcVersionSignature: Ubuntu 6.5.0-1024.25-oem 6.5.13
Uname: Linux 6.5.0-1024-oem x86_64
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
BootLog: Error: [Errno 13] Permission denied: '/var/log/boot.log'
CasperMD5CheckResult: pass
CompositorRunning: None
CurrentDesktop: ubuntu:GNOME
Date: Sat Jun 22 15:50:25 2024
DistUpgraded: Fresh install
DistroCodename: jammy
DistroVariant: ubuntu
ExtraDebuggingInterest: Yes
GraphicsCard:
 Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:15bf] (rev dd) (prog-if 00 [VGA controller])
   Subsystem: Lenovo Device [17aa:50d9]
InstallationDate: Installed on 2024-05-22 (30 days ago)
InstallationMedia: Ubuntu 22.04.4 LTS "Jammy Jellyfish" - Release amd64 (20240220)
MachineType: LENOVO 21K4CTO1WW
ProcEnviron:
 LANGUAGE=en_IN:en
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_IN
 SHELL=/usr/bin/zsh
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.5.0-1024-oem root=UUID=d21844b5-a890-4fbf-9467-81c69986a31d ro quiet splash vt.handoff=7
SourcePackage: xorg
Symptom: display
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 02/20/2024
dmi.bios.release: 1.35
dmi.bios.vendor: LENOVO
dmi.bios.version: R2FET55W (1.35 )
dmi.board.asset.tag: Not Available
dmi.board.name: 21K4CTO1WW
dmi.board.vendor: LENOVO
dmi.board.version: Not Defined
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.ec.firmware.release: 1.16
dmi.modalias: dmi:bvnLENOVO:bvrR2FET55W(1.35):bd02/20/2024:br1.35:efr1.16:svnLENOVO:pn21K4CTO1WW:pvrThinkPadT14Gen4:rvnLENOVO:rn21K4CTO1WW:rvrNotDefined:cvnLENOVO:ct10:cvrNone:skuLENOVO_MT_21K4_BU_Think_FM_ThinkPadT14Gen4:
dmi.product.family: ThinkPad T14 Gen 4
dmi.product.name: 21K4CTO1WW
dmi.product.sku: LENOVO_MT_21K4_BU_Think_FM_ThinkPad T14 Gen 4
dmi.product.version: ThinkPad T14 Gen 4
dmi.sys.vendor: LENOVO
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.113-2~ubuntu0.22.04.1
version.libgl1-mesa-dri: libgl1-mesa-dri 23.2.1-1ubuntu3.1~22.04.2
version.libgl1-mesa-glx: libgl1-mesa-glx N/A
version.xserver-xorg-core: xserver-xorg-core 2:21.1.4-2ubuntu1.7~22.04.10
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:19.1.0-2ubuntu1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20210115-1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.17-2build1

Revision history for this message
Nikhil Kaushik (nikhilkaushik) wrote :
summary: - Display goes black for a second at random
+ Graphics driver issue: Display goes black for a second at random
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: [amdgpu] Graphics driver issue: Display goes black for a second at random

Thanks for the bug report. CurrentDmesg.txt appears to show the amdgpu kernel driver crashed in display-related functions, so that is almost certainly the problem here.

Given how new the hardware is, I would recommend installing Ubuntu 24.04 instead, which includes a newer kernel and newer graphics drivers.

https://ubuntu.com/download/desktop

tags: added: amdgpu
summary: - Graphics driver issue: Display goes black for a second at random
+ [amdgpu] Graphics driver issue: Display goes black for a second at
+ random
affects: xorg (Ubuntu) → linux-oem-6.5 (Ubuntu)
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-oem-6.5 (Ubuntu):
status: New → Confirmed
Revision history for this message
Mark Chambers (mwchambers) wrote :

Hi,

I believe the blank screen always corresponds with a dmesg entry:

 *ERROR* link_enc_cfg_validate: Invalid link encoder assignments - 0x1c

This is not necessarily accompanied by a call trace.

It appears my hardware is similar to nikhilkaushik, they're both fairly new thinkpads (AMD)

I have had 22.04 running on this without the black screen issue since 2024-05-11 (May 11th).
The problem has only started recently (in last week or so).

Looking at what has changed I would guess it was the update to linux-firmware 20220329.git681281e4-0ubuntu3.31 which is when this started happening.

dpkg.log: 2024-06-20 17:52:45 upgrade linux-firmware:all 20220329.git681281e4-0ubuntu3.30 20220329.git681281e4-0ubuntu3.31

I've copied the amdgpu firmware blobs from 20220329.git681281e4-0ubuntu3.30 into /lib/firmware/amdgpu
to see if it is the firmware change that has caused it. I will have to wait to see if it continues to happen.

System:
  Host: ThinkPad Kernel: 6.5.0-1024-oem x86_64 bits: 64 Desktop: GNOME 42.9
    Distro: Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Machine:
  Type: Laptop System: LENOVO product: 21K9CTO1WW v: ThinkPad P16s Gen 2
    serial: <superuser required>
  Mobo: LENOVO model: 21K9CTO1WW serial: <superuser required> UEFI: LENOVO
    v: R2FET55W (1.35 ) date: 02/20/2024
CPU:
  Info: 8-core model: AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics bits: 64
    type: MT MCP cache: L2: 8 MiB
  Speed (MHz): avg: 819 min/max: 400/5447:6076:5132:5760:5918:5605:5289
    cores: 1: 400 2: 400 3: 1516 4: 1403 5: 400 6: 400 7: 400 8: 400 9: 1446
    10: 400 11: 1462 12: 400 13: 1443 14: 400 15: 1835 16: 400
Graphics:
  Device-1: AMD driver: amdgpu v: kernel
  Device-2: Luxvisions Innotech Integrated Camera type: USB
    driver: uvcvideo
  Display: wayland server: X.Org v: 1.22.1.1 with: Xwayland v: 22.1.1
    compositor: gnome-shell driver: gpu: amdgpu resolution: 1920x1200~60Hz
  OpenGL:
    renderer: GFX1103_R1 (gfx1103_r1 LLVM 15.0.7 DRM 3.54 6.5.0-1024-oem)
    v: 4.6 Mesa 23.2.1-1ubuntu3.1~22.04.2

Revision history for this message
Mark Chambers (mwchambers) wrote :

Update: The problem occurred again with the changed firmware, so please ignore my earlier message about the firmware.

I will run with the older 6.5.0-1023-oem for now to see if it works.

Others with similar hardware might find this of interest:

https://bugs.launchpad.net/ubuntu/+source/linux-oem-6.5/+bug/2069357

Especially the note that "6.5 OEM kernel will be retired soon."

Hope this helps someone.

Revision history for this message
Kyle Fazzari (kyrofa) wrote (last edit ):

> I would recommend installing Ubuntu 24.04 instead

Daniel, I'm seeing this same issue on the Lenovo Z13 gen2, which is certified by Canonical to work with 22.04 (and that is indeed what I'm running):

https://ubuntu.com/certified/202310-32232

Also, the kernel for this machine in 24.04 has its own issues, e.g. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2069416 .

Anyway, I logged a dupe at https://bugs.launchpad.net/ubuntu/+source/linux-signed-oem-6.5/+bug/2070960 before Mark caught it (thanks Mark). Note that this does not appear to be happening with 6.5.0-1023-oem, so it appears that 6.5.0-1024-oem introduced this issue.

summary: [amdgpu] Graphics driver issue: Display goes black for a second at
- random
+ random: [drm:link_enc_cfg_validate [amdgpu]] *ERROR*
+ link_enc_cfg_validate: Invalid link encoder assignments - 0x1c
Revision history for this message
Mario Limonciello (superm1) wrote :

Here's at least part of the problem.

commit 15c983d0cbb5a158eafb9cb88e6d8dfc4477d9c2
Author: Melissa Wen <email address hidden>
Date: Fri Dec 29 15:25:00 2023 -0100

    drm/amd/display: fix bandwidth validation failure on DCN 2.1

    BugLink: https://bugs.launchpad.net/bugs/2059068

    commit 3a0fa3bc245ef92838a8296e0055569b8dff94c4 upstream.

    IGT `amdgpu/amd_color/crtc-lut-accuracy` fails right at the beginning of
    the test execution, during atomic check, because DC rejects the
    bandwidth state for a fb sizing 64x64. The test was previously working
    with the deprecated dc_commit_state(). Now using
    dc_validate_with_context() approach, the atomic check needs to perform a
    full state validation. Therefore, set fast_validation to false in the
    dc_validate_global_state call for atomic check.

    Cc: <email address hidden>
    Fixes: b8272241ff9d ("drm/amd/display: Drop dc_commit_state in favor of dc_commit_streams")
    Signed-off-by: Melissa Wen <email address hidden>
    Signed-off-by: Hamza Mahfooz <email address hidden>
    Signed-off-by: Alex Deucher <email address hidden>
    Signed-off-by: Greg Kroah-Hartman <email address hidden>
    Signed-off-by: Portia Stephens <email address hidden>
    Signed-off-by: Stefan Bader <email address hidden>

That commit was reverted upstream.

https://github.com/torvalds/linux/commit/c2ab9ce0ee7225fc05f58a6671c43b8a3684f530

Revision history for this message
Mario Limonciello (superm1) wrote :

And that commit did go back to stable: https://git.kernel.org/stable/c/6266b3a312b7f69c883c2d7c82d85772464421d2

So I guess Canonical team missed it.

Revision history for this message
Petter Reinholdtsen (pere-hungry) wrote :

I suspect I experience the same problem with a Vivobook Go. I got messages like "[174397.518333] [drm:link_enc_cfg_validate [amdgpu]] *ERROR* link_enc_cfg_validate: Invalid link encoder assignments - 0x1c" in dmesg and the screen go black and stay like that for a while regularly.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

CC Stefan Bader

Revision history for this message
Kyle Fazzari (kyrofa) wrote :

Hey folks, I'm still locked on 6.5.0-1023-oem, which is working perfectly. Any ETA for a fix so I can actually receive kernel updates? Doesn't Canonical have a QA process for certified machines? How did this break and how has it remained broken for so long?

Revision history for this message
Petter Reinholdtsen (pere-hungry) wrote :

After having a look at drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c in linux-hwe-6.5 version 6.5.0-45.45~22.04.1, I find this line:

  status = dc_validate_global_state(dc, dm_state->context, false);

As far as I can tell, this is the exact line changed in https://git.kernel.org/stable/c/6266b3a312b7f69c883c2d7c82d85772464421d2 mentioned above, as a fix that might solve this issue. I'll try to build and test a kernel with this fix in place, but am unsure if I will be able to test with an unsigned kernel on the machine in question.

Revision history for this message
Fredrik Bakke (bakkefredrik) wrote :

I am getting the same with kernel 6.5.0-44.44. Black screen every 20~ second, accompanied by:

    "[drm:link_enc_cfg_validate [amdgpu]] *ERROR* link_enc_cfg_validate: Invalid link
    encoder assignments - 0x1c"

in the log.

Revision history for this message
timur (ba.timur) wrote :

I understood that this should be OK in 6.8, not sure. In Ubuntu based that is available, I removed 6.5.
Personally, even 6.8 has other issues for me, so I decided to use original 5.15.

Revision history for this message
Petter Reinholdtsen (pere-hungry) wrote :

After upgrading to Ubuntu 24.04 LTS on the Vivobook Go, the problem with the messed up display seem to be solved. Abount 24 hours experience so far is a lot better than before the upgrade.

Revision history for this message
Timo Jyrinki (timo-jyrinki) wrote (last edit ):

I have access to a certified laptop which has this problem.

If you have any test kernels one could use, please tell. Or better yet, a fixed kernel from Canonical/Lenovo (is there any proposed section in the OEM archives?).

Not willing to update to 24.04 since in production use and updating might have other regressions (also linked here). Plus 22.04 is supported still for 3 years.

edit: Adding, linux-oem kernels come from the main archives, not OEM repositories. But the last functional version was 6.5.0-1023-oem and we are already now at 1027, it looks like this omission has still been overlooked?

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

OEM-6.5 is dead now along with mantic/23.10, HWE-6.8 will replace it any time now as long as the latest version in jammy-proposed is ready to move to jammy-updates

Revision history for this message
Torsten Krah (tkrah) wrote :

Using a Lenovo ThinkPad T14 Gen 3, model 21CFCTO1WW:

[20354.474671] [drm:link_enc_cfg_validate [amdgpu]] *ERROR* link_enc_cfg_validate: Invalid link encoder assignments - 0x1c
[20355.134825] [drm:link_enc_cfg_validate [amdgpu]] *ERROR* link_enc_cfg_validate: Invalid link encoder assignments - 0x1c
[20355.165092] [drm:link_enc_cfg_validate [amdgpu]] *ERROR* link_enc_cfg_validate: Invalid link encoder assignments - 0x1c
[20995.690828] [drm:link_enc_cfg_validate [amdgpu]] *ERROR* link_enc_cfg_validate: Invalid link encoder assignments - 0x1c
[20999.503163] [drm] DM_MST: stopping TM on aconnector: 000000008c25c9ea [id: 102]
[20999.860981] [drm:link_enc_cfg_validate [amdgpu]] *ERROR* link_enc_cfg_validate: Invalid link encoder assignments - 0x1c
[21001.947648] [drm] DM_MST: starting TM on aconnector: 000000008c25c9ea [id: 102]
[21001.950605] [drm] DM_MST: DP14, 2-lane link detected
[21002.064165] [drm] Downstream port present 1, type 2
[21002.123743] [drm] Downstream port present 1, type 2
[21002.194721] [drm:link_enc_cfg_validate [amdgpu]] *ERROR* link_enc_cfg_validate: Invalid link encoder assignments - 0x1c
[23564.711532] [drm:link_enc_cfg_validate [amdgpu]] *ERROR* link_enc_cfg_validate: Invalid link encoder assignments - 0x1c
[23565.357847] [drm:link_enc_cfg_validate [amdgpu]] *ERROR* link_enc_cfg_validate: Invalid link encoder assignments - 0x1c
[23565.388147] [drm:link_enc_cfg_validate [amdgpu]] *ERROR* link_enc_cfg_validate: Invalid link encoder assignments - 0x1c
[30159.783983] [drm:link_enc_cfg_validate [amdgpu]] *ERROR* link_enc_cfg_validate: Invalid link encoder assignments - 0x1c
[30163.588495] [drm] DM_MST: stopping TM on aconnector: 000000008c25c9ea [id: 102]
[30163.794029] [drm:link_enc_cfg_validate [amdgpu]] *ERROR* link_enc_cfg_validate: Invalid link encoder assignments - 0x1c
[30164.013842] [drm:link_enc_cfg_validate [amdgpu]] *ERROR* link_enc_cfg_validate: Invalid link encoder assignments - 0x1c
[30166.031511] [drm] DM_MST: starting TM on aconnector: 000000008c25c9ea [id: 102]
[30166.034441] [drm] DM_MST: DP14, 2-lane link detected
[30166.181684] [drm] Downstream port present 1, type 2
[30166.265200] [drm] Downstream port present 1, type 2
[30166.292833] [drm:link_enc_cfg_validate [amdgpu]] *ERROR* link_enc_cfg_validate: Invalid link encoder assignments - 0x1c
[30166.606601] [drm:link_enc_cfg_validate [amdgpu]] *ERROR* link_enc_cfg_validate: Invalid link encoder assignments - 0x1c

Same here on 6.5 (hwe) with 22.04 base. 6.8 has other issues (freeze / hang) so I am unable to use that unfortunately.

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

try the version in jammy-proposed

Revision history for this message
Kyle Fazzari (kyrofa) wrote :

Timo, does that also contain the fix for https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2069416 ?

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

yes, it is based on -40

tags: added: regression-update
Revision history for this message
Anthony Wong (anthonywong) wrote :

Re: comment #8
It turns out the tooling missed picking up the revert that's in the same stable update as the original commit. The tooling has now been fixed.

Revision history for this message
Kyle Fazzari (kyrofa) wrote :

@anthonywong, does that mean the OEM-6.5 kernel will see a new release, then? Or is it still dead?

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

it's dead, replaced by hwe-6.8 now

Changed in linux-oem-6.5 (Ubuntu):
status: Confirmed → Won't Fix
Revision history for this message
Timo Jyrinki (timo-jyrinki) wrote :

Thank you for the fixes on the 6.8 series!

Revision history for this message
Kyle Fazzari (kyrofa) wrote (last edit ):

@tjaalton, looks like 6.8 is discovered last by /etc/grub.d/10_linux, thus ending up at the bottom of the grub menu, under the "advanced" menu. The first boot option is still using 6.5.0-1027-oem, (i.e. the one selected with the default /etc/default/grub that uses GRUB_DEFAULT=0). Looks like folks will still be booting into 6.5 by default unless those kernels are automatically removed (something I disabled so I could stay on 1023). That was a bit surprising to me, is that intentional?

Revision history for this message
Austin Esquirell (ajesquirell) wrote :

I am seeing the same thing - also thought it was something I must have done to stay on 1023, but maybe it is a different issue.

Revision history for this message
Timo Jyrinki (timo-jyrinki) wrote :

I have right now only access to another laptop, but on which I can confirm that 6.8 is discovered last, so basically anyone having this bug still has this bug unless one knows how to configure GRUB, that is for majority of people (especially people preferring vendor supported, pre-installed laptops) the bug is not fixed.

If it's wanted that people running pre-installed Ubuntu 22.04 LTS transform from linux-6.5.0-*-oem kernel to standard, non-oem, linux-6.8.0-40 (and newer), the files /etc/default/grub.d/oem-flavour.cfg symlink points to, in this laptop's question /usr/share/oem-sutton-carr-meta/oem-flavour.cfg, should be updated not to say "GRUB_FLAVOUR_ORDER=oem".

I'm sorry for using linux-oem-6.8 as a package, there are many "oem-sutton-*-meta" packages available in LP but not in particular oem-sutton-carr-meta for example. All the oem-*-meta packages should be updated that are affected, unless linux-oem-6.8 series will be offered for Ubuntu 22.04 LTS users as well and a transition to that is planned instead.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.