External monitor does not wake up on Titan Ridge laptops when docked (9500, TB16)

Bug #1922334 reported by Georgi Boiko
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Dell Sputnik
New
Undecided
Unassigned
linux (Ubuntu)
In Progress
Undecided
koba
linux-meta-oem-5.6 (Ubuntu)
Invalid
Undecided
Unassigned
linux-oem-5.10 (Ubuntu)
Won't Fix
Undecided
Unassigned
linux-oem-5.13 (Ubuntu)
Won't Fix
Undecided
Unassigned
linux-oem-5.14 (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

I've recently upgraded my workhorse from XPS 9560 (2016) to a newer generation XPS 9500 (2020) and ran into several things that feel like regressions, but are probably related to hardware changes. This is one of them.

tl;dr:
When using XPS 9500 with a TB16 dock and an external monitor (Benq EX2780Q) connected to the dock via USB-C to USB-C (DP alternate mode), most of the time the monitor fails to wake up after going blank.

details:
Simplest way to reproduce is for hit Super+L, wait for the monitor to properly go in standby, then type something in or move the mouse to wake it up. 8 times out of 10 it does not wake up when using XPS 9500. Most of the time this can be fixed by power cycling the monitor, however this has a seemingly random chance of triggering two other issues:

1. The system to lose sight of the monitor for a brief period of time, which sometimes causes it to hang with a black screen and needs some combination of restarting the monitor, the dock, and the laptop, because there is a separate USB issue making hot-plugging unusable.
2. The monitor itself gets stuck in a weird standby state where it stops reacting to button presses and I need to hold the power button a bit longer for it to sort of hard reset?

Notably, it wakes up fine every time when using XPS 9560 or Precision 5520 in the same setup which I have been using for ages.

This becomes particularly troublesome when screens go into short-lived standby during boot: after the Dell logo, after entering the LUKS password, after logging in. Each of these points has a chance of triggering the bug, because it looks like there is some sort of mode change and waking signal submission happening between them that triggers it. I had to disable screen timeout as a temporary workaround to be able to work on this system at all without having to play "monitor wake lottery" every time I go brew a cuppa. The workaround at boot time is to either re-roll the lottery, or boot with the dock connected and lid open, then close the lid and keep working from there.

The dock, the monitor, all USB peripherals are the same in both cases - it's literally just the TB16 cable plugged into a different laptop.

I have attempted 5.8 Ubuntu generic, 5.6 OEM (-20.04) and 5.10 OEM (-20.04b and -20.04-edge) kernels on the 9500 with same results. I have attempted disabling USB autosuspend via a GRUB kernel parameter to usbcore. I have attempted both NVidia and Intel GPUs. I have attempted playing with BIOS settings: wake on dell usb-c docks, disable early sign of life for both checkboxes, disabling SGX and SMM, checking all 3 boxes for Thunderbolt and switching off Thunderbolt security. None of these make a noticeable difference.

Since it was fine with 9560, 5520 and a friend with a 9570 has no issues either, my gut feeling is that this is due to the upgrade from Alpine Ridge to a Titan Ridge Thunderbolt controller that happened in this generation - something wrong with the driver, or the firmware may have missed some of the "lessons learned" in Alpine Ridge and caused this regression. That would also make it applicable to 9300, which has a "developer edition" option under Project Sputnik and several reports of the same problem scattered across the internet:

https://www.dell.com/community/XPS/XPS-13-9300-with-WD19TB-External-display-not-coming-on-after/td-p/7676922
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1889342 (may be a duplicate, but for WD19TB dock on 9300, so may be slightly different too)
https://www.reddit.com/r/linuxquestions/comments/ka0w2j/monitor_doesnt_wake_from_sleep/

etc etc. Those reports suggest high bandwidth usage sometimes affects it, so it's worth noting that my external monitor is a 1440p 144Hz one that uses tons of bandwidth indeed.

The only hint in terms of logs seems to be this message in dmesg that I see when power cycling the monitor and getting back into the system:

[ 250.777684] pcieport 0000:09:04.0: pciehp: Slot(0-1): Card not present
[ 250.777695] xhci_hcd 0000:0b:00.0: can't change power state from D3cold to D0 (config space inaccessible)
[ 250.778293] xhci_hcd 0000:0b:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[ 250.778336] xhci_hcd 0000:0b:00.0: Controller not ready at resume -19
[ 250.778340] xhci_hcd 0000:0b:00.0: PCI post-resume error -19!
[ 250.778343] xhci_hcd 0000:0b:00.0: HC died; cleaning up
[ 250.778374] xhci_hcd 0000:0b:00.0: remove, state 4
[ 250.778380] usb usb6: USB disconnect, device number 1
[ 250.778895] xhci_hcd 0000:0b:00.0: USB bus 6 deregistered
[ 250.778905] xhci_hcd 0000:0b:00.0: remove, state 4
[ 250.778910] usb usb5: USB disconnect, device number 1
[ 250.779396] xhci_hcd 0000:0b:00.0: Host halt failed, -19
[ 250.779400] xhci_hcd 0000:0b:00.0: Host not accessible, reset failed.
[ 250.779484] xhci_hcd 0000:0b:00.0: USB bus 5 deregistered

And there was one time when the i915 driver crashed in the process of power cycling the external monitor, which I could not reproduce. Just in case, I attached the kernel logs for it too.

System config:

XPS 9500, i7, 32GB RAM
BIOS 1.6.1, TB3 firmware NVM60
Ubuntu 20.04.02 LTS, kernels 5.6-oem, 5.8-generic, 5.10-oem (same behaviour)
TB16 dock firmware 1.0.4 (MST 3.12.02)
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu27.16
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC4: gboiko 2233 F.... pulseaudio
 /dev/snd/controlC3: gboiko 2233 F.... pulseaudio
 /dev/snd/controlC1: gboiko 2233 F.... pulseaudio
 /dev/snd/controlC0: gboiko 2233 F.... pulseaudio
CasperMD5CheckResult: skip
CurrentDesktop: ubuntu:GNOME
DistroRelease: Ubuntu 20.04
InstallationDate: Installed on 2021-03-31 (6 days ago)
InstallationMedia: Ubuntu 20.04.2.0 LTS "Focal Fossa" - Release amd64 (20210209.1)
MachineType: Dell Inc. XPS 15 9500
NonfreeKernelModules: nvidia_modeset nvidia
Package: linux-meta-oem-5.6
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.6.0-1052-oem root=/dev/mapper/vgubuntu-root ro net.ifnames=0 biosdevname=0 ipv6.disable=1 quiet splash pcie_aspm=off vt.handoff=7
ProcVersionSignature: Ubuntu 5.6.0-1052.56-oem 5.6.19
RelatedPackageVersions:
 linux-restricted-modules-5.6.0-1052-oem N/A
 linux-backports-modules-5.6.0-1052-oem N/A
 linux-firmware 1.187.10
Tags: focal
Uname: Linux 5.6.0-1052-oem x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin lxd plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 12/24/2020
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.6.1
dmi.board.name: 0RHXRG
dmi.board.vendor: Dell Inc.
dmi.board.version: A03
dmi.chassis.type: 10
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.6.1:bd12/24/2020:svnDellInc.:pnXPS159500:pvr:rvnDellInc.:rn0RHXRG:rvrA03:cvnDellInc.:ct10:cvr:
dmi.product.family: XPS
dmi.product.name: XPS 15 9500
dmi.product.sku: 097D
dmi.sys.vendor: Dell Inc.

Revision history for this message
Georgi Boiko (pandasauce) wrote :
Revision history for this message
Georgi Boiko (pandasauce) wrote :

Kernel logs with the xhci_hcd power state line when power cycling the monitor

Revision history for this message
Georgi Boiko (pandasauce) wrote :

Kernel logs for the non-reproducible i915 crash during power cycling of the monitor

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1922334

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Georgi Boiko (pandasauce) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected focal
description: updated
Revision history for this message
Georgi Boiko (pandasauce) wrote : CRDA.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : IwConfig.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : Lspci.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : Lspci-vt.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : Lsusb.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : Lsusb-t.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : Lsusb-v.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : ProcEnviron.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : ProcModules.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : PulseList.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : RfKill.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : UdevDb.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : WifiSyslog.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : acpidump.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Georgi Boiko (pandasauce) wrote :

The apport-collect run for this was actually perfect: the external screen didn't wake for LUKS and didn't wake for the greeter afterwards, so I had to get the laptop out of the stand and open the lid to be able to finish booting. I hope the logs have captured all relevant information.

Revision history for this message
koba (kobako) wrote :
Revision history for this message
Georgi Boiko (pandasauce) wrote :

@koba, please find it attached and thanks for looking into this.

In this instance, the lid was closed and the laptop suspended itself after losing the monitor on logon (@100.856966)

Revision history for this message
koba (kobako) wrote :

@Georgi,
Please help to collect another one.
Change kernel to 5.10-oem-1025 and enable drm debug(0x10e)
please dump dmesg after issue is triggered.

Thanks

Revision history for this message
Georgi Boiko (pandasauce) wrote :

@koba, please find it attached.

In this instance, it took several monitor power cycles and reopenings of the lid for it to put output on the external monitor after logon, even though it was clearly showing up in Display Settings.

Revision history for this message
koba (kobako) wrote : Re: [Bug 1922334] Re: External monitor does not wake up on Titan Ridge laptops when docked (9500, TB16)

@Georgi,
Would you please dump a log for s2idle that triggers the issue with
5.10-oem-1025 kernel?
Please also enable drm.debug=0x10e

Thanks
Koba Ko

Revision history for this message
Georgi Boiko (pandasauce) wrote :

@Koba,

Not sure what you meant by that, could you elaborate? Do you want me to send the laptop into s2idle suspend?

That is not necessary to trigger the issue; merely locking the screen or letting the monitor go blank after a timeout triggers it, although it may take a couple of attempts sometimes.

Revision history for this message
koba (kobako) wrote :

@Georgi,
Sorry for confusion,
would you please also try 5.12 and help to collect log with drm.debug=0x10e.
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.12/

Revision history for this message
koba (kobako) wrote :

@Georgi,
would you please also collect dmesg(drm.debug=0x10e) after issue is
triggered?
Thanks a lot

Revision history for this message
Georgi Boiko (pandasauce) wrote :

@koba,

5.12 logs attached.

It was triggered a few times during boot, but I checked dmesg before locking the screen and triggering the issue - it was at timestamp 20:18:43, so the lines you are interested in will be shortly after that.

Revision history for this message
koba (kobako) wrote :

@Georgi,
if you only plug the external monitor on tb16, would the issue be seen?

Changed in linux (Ubuntu):
assignee: nobody → koba (kobako)
Revision history for this message
Georgi Boiko (pandasauce) wrote :

@koba,
Yes. Even if I disconnect everything but the monitor from TB16 and the remaining USB-Cs on the laptop, keep the laptop lid open and use the built-in kb/touchpad to lock/unlock the screen, it triggers about 1 in 3 times. With the lid open, the laptop screen wakes up, but the external remains stuck asleep.

Start up -> Log on -> Super+L -> Wait for the monitor to show "No signal detected" -> Wait for the monitor to sleep, the LED indicator on it goes orange -> Wait 3-5 seconds -> Press any key -> 1 in 3 times (but not every third, can be many working/not working a row) reliably triggers it and the monitor does not wake up.

koba (kobako)
Changed in linux (Ubuntu):
status: Confirmed → In Progress
Revision history for this message
Georgi Boiko (pandasauce) wrote :

A couple of other behaviours I have observed now that may be useful for figuring this one out:

1. When this monitor (Benq EX2780Q) goes to standby, it stops showing signs of life on the signal line. This manifests as it disappearing completely from the system, including the sound output and the display device. Must be a power saving feature. I have not seen this on other monitors I've owned, so I thought this may be a contributing factor. Still, this does not result in the "fail to wake" issue on the XPS 9560 and on my Windows desktop connected to another input, so I don't think it is the monitor's fault.

2. When 2 external monitors are connected (not my original post's use case), Benq EX2780Q and LG 24MP57, and I power cycle the Benq for it to wake up, the LG 24MP57 goes into a slow loop where it blanks for a second, comes back for about 20 seconds, blanks for a second, repeat. If I get lucky and everything boots up on first attempt, without power cycling the Benq, this does not happen - which is why I thought it may be related.

Revision history for this message
koba (kobako) wrote :

@Georgi, would you please try the last drm-tip(amd64)?
https://kernel.ubuntu.com/~kernel-ppa/mainline/drm-tip/current/

Revision history for this message
Georgi Boiko (pandasauce) wrote :

@koba, the latest drm-tip triggers it more reliably. When I lock the screen, the issue triggers every single time after the external monitor goes into standby, and it needs to be power-cycled to wake up.

Revision history for this message
Georgi Boiko (pandasauce) wrote :

@koba, would it be better to move this issue to https://gitlab.freedesktop.org/drm/intel/-/issues, or is this more likely related to Thunderbolt in this generation of XPS going to Titan Ridge?

Revision history for this message
Georgi Boiko (pandasauce) wrote :

No change in linux-oem-5.13 (5.13.0-1009-oem), issue still applies as described.

Revision history for this message
Georgi Boiko (pandasauce) wrote :

Latest BIOS 1.8.1, TBT firmware v65, TB16 firmware 1.0.5 and OEM kernel 5.13 still affected

Revision history for this message
Georgi Boiko (pandasauce) wrote :

Latest BIOS 1.9.1, TBT firmware v65, TB16 firmware 1.0.5 and kernel 5.13.0-1012-oem still affected

Revision history for this message
koba (kobako) wrote :

@Georgi, would you please try 5.15, please get the ARCH=amd generic files and a header with "all" postfix file.
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15-rc2/

Revision history for this message
Georgi Boiko (pandasauce) wrote :

@koba, the issue still present on that one. Logging seems borked in that RC, it only started ~1s into boot.

In the attached log,

1. the affected monitor (BenQ) "went away" after logging in at boot
2. came back after getting power cycled
3. suspended and woke fine on first Super+L
4. went away again on second Super+L
5. came back after getting power cycled

The second monitor (LG) suspended and woke up fine in each case.

This is with:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash i915.enable_psr=0 i915.modeset=1 nvidia_drm.modeset=1 usbcore.autosuspend=-1 drm.debug=0x10e"
GRUB_CMDLINE_LINUX="net.ifnames=0 biosdevname=0 ipv6.disable=1"

The PSR=0 flag is needed to avoid random hard freezes on this laptop model (unrelated issue even with Windows).

Revision history for this message
Georgi Boiko (pandasauce) wrote :

If that would help, I could run the same scenario on a 9560 laptop to maybe see what's different in the logs?

The 9560 does not have this problem, although it comes with a previous-gen Intel GPU, HD630 vs UHD630 on the 9500.

Revision history for this message
koba (kobako) wrote :

@Georgi, could try these
1. prime-select intel and reboot, check the issue.
2. prime-select nvidia and reboot, check the issue.
please remove "i915.enable_psr=0 i915.modeset=1 nvidia_drm.modeset=1" from cmdline

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

let's just focus on stock linux for this

Changed in linux-oem-5.10 (Ubuntu):
status: New → Won't Fix
Changed in linux-meta-oem-5.6 (Ubuntu):
status: New → Invalid
Changed in linux-oem-5.13 (Ubuntu):
status: New → Won't Fix
Changed in linux-oem-5.14 (Ubuntu):
status: New → Won't Fix
Revision history for this message
Georgi Boiko (pandasauce) wrote :

@koba

nvidia module won't even build against oem kernel 5.14, so I can't test that. Removing i915.enable_psr=0 is not really a supported configuration on this machine. Keeping PSR on causes it to occasionally hard freeze after hours of work.

tbh after all this time I've lost any hope of this getting fixed and any will to continue trying different configurations. It must be something very non-trivial if we haven't even been able to pin this down after 8 months of back and forth with all the debug logs switched on, so I am thinking of just ditching this laptop and getting a Mac instead.

Revision history for this message
Georgi Boiko (pandasauce) wrote :

Got a newer nvidia driver (470) to work with this kernel over the holidays.

The issue still applies, regardless of the GPU selected in prime-select.

I started getting new ACPI errors with it, which could be unrelated (empty lines are in actual output):

[ 196.405777] ACPI Error: Thread 25047680 cannot release Mutex [ECMX] acquired by thread 14745600 (20210604/exmutex-378)

[ 196.405804] No Local Variables are initialized for Method [_Q66]

[ 196.405809] No Arguments are initialized for method [_Q66]

[ 196.405815] ACPI Error: Aborting method \_SB.PCI0.LPCB.ECDV._Q66 due to previous error (AE_AML_NOT_OWNER) (20210604/psparse-529)
[ 196.426883] input: U as /devices/pci0000:00/0000:00:1c.0/0000:02:00.0/0000:03:04.0/0000:39:00.0/0000:3a:04.0/0000:3c:00.0/0000:3d:01.0/0000:3e:00.0/usb5/5-1/5-1.1/5-1.1.4/5-1.1.4:1.0/0003:2A7A:4103.000F/input/input44
[ 196.483631] hid-generic 0003:2A7A:4103.000F: input,hidraw4: USB HID v1.10 Keyboard [U] on usb-0000:3e:00.0-1.1.4/input0
[ 196.489779] input: U Consumer Control as /devices/pci0000:00/0000:00:1c.0/0000:02:00.0/0000:03:04.0/0000:39:00.0/0000:3a:04.0/0000:3c:00.0/0000:3d:01.0/0000:3e:00.0/usb5/5-1/5-1.1/5-1.1.4/5-1.1.4:1.1/0003:2A7A:4103.0010/input/input45
[ 196.547646] input: U System Control as /devices/pci0000:00/0000:00:1c.0/0000:02:00.0/0000:03:04.0/0000:39:00.0/0000:3a:04.0/0000:3c:00.0/0000:3d:01.0/0000:3e:00.0/usb5/5-1/5-1.1/5-1.1.4/5-1.1.4:1.1/0003:2A7A:4103.0010/input/input46
[ 196.547809] hid-generic 0003:2A7A:4103.0010: input,hidraw5: USB HID v1.10 Device [U] on usb-0000:3e:00.0-1.1.4/input1
[ 196.563812] ACPI Error: Thread 24988608 cannot release Mutex [ECMX] acquired by thread 14745600 (20210604/exmutex-378)

[ 196.563851] No Local Variables are initialized for Method [_Q66]

[ 196.563859] No Arguments are initialized for method [_Q66]

[ 196.563869] ACPI Error: Aborting method \_SB.PCI0.LPCB.ECDV._Q66 due to previous error (AE_AML_NOT_OWNER) (20210604/psparse-529)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.