TB16 dock freezes X on hotplug when used with external displays

Bug #1752165 reported by Georgi Boiko
50
This bug affects 11 people
Affects Status Importance Assigned to Milestone
Dell Sputnik
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Precision 5520 with Quadro GPU. Latest Ubuntu 16.04, kernel Linux REDACTED 4.13.0-36-generic #40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux, latest BIOS 1.7.0 released 12/15/2017. If it can be of any help, I can get this tested on an XPS 9560 (GTX 1050) too.

Steps to reproduce:

1. Plug in monitors into DP and miniDP on the dock. In my case, DP-to-DVI cables are used, but I doubt that it matters.
2. Ubuntu 16.04 GNOME + nVidia drivers with PRIME.
3. Select nVidia GPU in PRIME settings, reboot if needed.
4. Either boot with the dock connected or connect it after booting into user session.
5. Set up monitors if necessary. At this point everything should be working fine.
6. Disconnect the dock, give it a moment to adjust to the new window layout etc.
7. Re-connect the dock. Within a few seconds laptop screen should freeze and external monitors should remain blank, as if not connected.
8. Disconnect the dock. Within a few seconds laptop screen should unfreeze. There is a small chance that it won't if you repeat this procedure multiple times.

Errors in dmesg indicate that i915 driver is somehow involved:

[ 328.966128] [drm:intel_dp_set_idle_link_train [i915]] *ERROR* Timed out waiting for DP idle patterns
[ 328.967848] [drm:intel_wait_ddi_buf_idle [i915]] *ERROR* Timeout waiting for DDI BUF B idle bit
[ 329.048839] [drm:intel_dp_set_idle_link_train [i915]] *ERROR* Timed out waiting for DP idle patterns
[ 329.050547] [drm:intel_wait_ddi_buf_idle [i915]] *ERROR* Timeout waiting for DDI BUF C idle bit

The kernel logs are flooded with these between steps 7 and 8 above.

This behaviour does not occur when Intel GPU is selected in PRIME settings or when nouveau is used. However, nouveau performance leaves a lot to be desired, particularly with 3D acceleration in Windows 10 VMs.

I have tried nvidia_drm.modeset=1 in boot options, but it doesn't make any difference.

Other reports that may be related:

https://devtalk.nvidia.com/default/topic/989704/linux/plugging-a-docking-station-w-two-monitors-into-a-quadro-m1000m-laptop-crashes-hangs-the-whole-machine/ - same error messages

https://www.reddit.com/r/Dell/comments/5nas3t/tb16_dock_with_5510_ubuntulinux/ddzjlu8/?st=je63h7lw&sh=40bf8189 - "Connecting and disconnecting the TB16 sometimes freeze the laptop."

https://www.dell.com/community/Sputnik/TB16-Dock-Linux-Support/m-p/5109128/highlight/true#M7466 - "Hot-plugging the screen does not work."
---
ApportVersion: 2.20.1-0ubuntu2.15
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: gboiko 2491 F.... pulseaudio
CurrentDesktop: GNOME
DistroRelease: Ubuntu 16.04
HibernationDevice: RESUME=UUID=08bba264-8658-442f-995d-6a745925ac6c
InstallationDate: Installed on 2017-10-01 (149 days ago)
InstallationMedia: Ubuntu-GNOME 16.04.3 LTS "Xenial Xerus" - Release amd64 (20170801)
Lsusb:
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 003: ID 04f3:24a1 Elan Microelectronics Corp.
 Bus 001 Device 002: ID 8087:0a2b Intel Corp.
 Bus 001 Device 004: ID 1bcf:2b95 Sunplus Innovation Technology Inc.
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Dell Inc. Precision 5520
NonfreeKernelModules: nvidia_uvm nvidia_drm nvidia_modeset nvidia
Package: linux (not installed)
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.13.0-36-generic root=/dev/mapper/ubuntu--gnome--vg-root ro ipv6.disable=1 net.ifnames=0 biosdevname=0 nvidia_drm.modeset=1 quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 4.13.0-36.40~16.04.1-generic 4.13.13
RelatedPackageVersions:
 linux-restricted-modules-4.13.0-36-generic N/A
 linux-backports-modules-4.13.0-36-generic N/A
 linux-firmware 1.170
Tags: xenial
Uname: Linux 4.13.0-36-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dialout dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 12/15/2017
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.7.0
dmi.board.name: 0R6JFH
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 10
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.7.0:bd12/15/2017:svnDellInc.:pnPrecision5520:pvr:rvnDellInc.:rn0R6JFH:rvrA00:cvnDellInc.:ct10:cvr:
dmi.product.family: Precision
dmi.product.name: Precision 5520
dmi.sys.vendor: Dell Inc.

Revision history for this message
Georgi Boiko (pandasauce) wrote :
Revision history for this message
Georgi Boiko (pandasauce) wrote :

Xorg logs

Revision history for this message
Georgi Boiko (pandasauce) wrote :

xrandr output

Revision history for this message
Georgi Boiko (pandasauce) wrote :

lspci output

Revision history for this message
Georgi Boiko (pandasauce) wrote :

lsbusb output

description: updated
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1752165

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: artful
Revision history for this message
Georgi Boiko (pandasauce) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Georgi Boiko (pandasauce) wrote : CRDA.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : IwConfig.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : JournalErrors.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : Lspci.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : ProcEnviron.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : ProcModules.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : PulseList.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : RfKill.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : UdevDb.txt

apport information

Revision history for this message
Georgi Boiko (pandasauce) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Georgi Boiko (pandasauce) wrote :

Tested this on XPS 15 9560 (GTX 1050) and it does not have this problem, hotplugging works fine on it. I was using this very setup - just plugged the thunderbolt cable into a different laptop. Exact same drivers and the same linux-firmware package from upstream (1.170) that I have installed for WiFi and Bluetooth reasons.

Added the apport logs as requested.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Can you try latest Bionic Daily image?

Also, do your Precision 5520 and XPS 15 9560 have the same display resolution?

Revision history for this message
Georgi Boiko (pandasauce) wrote :

5520 native resolution is 2160p, 9560 is 1080p. Both are configured to run 1080p through GNOME settings.

I will try out Bionic tonight or latest Monday - don't have it at hand at the moment.

Revision history for this message
Georgi Boiko (pandasauce) wrote :

Spent this evening attempting to get Bionic to work with no success. The installer crashes halfway (#1752535) so the only way to test is was to hotswap nouveau for nvidia while running in a textmode VT from a LiveUSB. To this end, prime-select is not working on Bionic, showing "unknown" on "query" and complaining about lack of alternatives on "prime-select nvidia". Consequently, X won't start after unloading nouveau and replacing it with nvidia.

Is there any alternative test I could run that does not require a working Bionic installer?

Revision history for this message
Georgi Boiko (pandasauce) wrote :

I did some additional testing to see if this is thunderbolt in general or just the TB16. The issue does not occur when using external monitors via HDMI + thunderbolt-to-HDMI connections directly on the laptop. It looks like the issue is indeed specific to TB16.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Have the two external monitors remained the same during your different setups?

Revision history for this message
Georgi Boiko (pandasauce) wrote :

Different monitors used in #26, same monitors in all other cases. Hopefully Bionic installer will be fixed in the .4 release and I will be able to try it out.

Possibly helpful: I was trying out 4.15 kernels from xenial-proposed for another ticket and the issue remains the same on 4.15. nVidia drivers have gone up to 390.25 since the original post too and the issue remains.

Revision history for this message
Piotr Kołaczkowski (pkolaczk-u) wrote :

I have the same issue. Tried today on daily Ubuntu 18.10 build run from live USB.
The system boots fine or resumes fine with tb16 sick attached. It also survives disconnecting the dock.

But it does not survive hotplugging the dock - I get a freeze where I can only move the mouse pointer, but the desktop doesn't respond. External monitors are blank and go into powersave mode.

Tested on: Ubuntu 18.04 with the official kernel (I don't remember exact version, but it was a week ago I tried it), then also with 4.18.6, 4.18.7 and 4.19.rc3. All kernels, same problem.

I have recent Dell Precision 5520 BIOS and fully updated firmware of the TB16 dock (1.0.0).

Revision history for this message
Piotr Kołaczkowski (pkolaczk-u) wrote :

Looks like it got solved at least for me on recent Ubuntu 18.10 with the official kernel. Looks like now hotplugging Dell TB16 dock works fine on Precision 5520.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Thanks for your update!

Revision history for this message
Georgi Boiko (pandasauce) wrote :

I can confirm this is fixed on the combination of:

- mainline kernel 4.18.16
- installed on 18.04 LTS via UKUU
- running nvidia 390.77

The issue still occurs on 4.15.0-36, which is the latest kernel available in Bionic repositories. We won't be getting a "clean" 4.18 until HWE update in February 2019 (3 months to go).

Revision history for this message
Christoph (chrisdeath) wrote :

I tried Piotr's setup and it worked only sometimes:

- Ubuntu 18.10 default install
- Kernel 4.18.10
- Dell Precision 5530 (similar XPS 15 9570)

The Xorg freezes with udev spamming following message when i remove the dock:
KERNEL[1058.705239] change /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)
UDEV [1058.705254] change /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)
KERNEL[1058.705267] change /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)
UDEV [1058.705280] change /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)
KERNEL[1058.705291] change /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)
UDEV [1058.705305] change /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)
KERNEL[1058.705317] change /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)
UDEV [1058.705330] change /devices/pci0000:00/0000:00:02.0/drm/card0 (drm)

System can only be rescued by being very patient and putting it into sleep (takes a while...). Afterwards it recovers.
Additionally the USB ports do never work after ones disconnected.

In Manjaro Gnome or Cinnamon the uduv issue ocurres everytime :(

Revision history for this message
Christoph (chrisdeath) wrote :

Small addition:
Also with mentioned Kernel 4.18.16-041816-generic getting the same result, maybe even more often. And also while plugin dock. But that may also happen with 4.18.10.

Revision history for this message
Georgi Boiko (pandasauce) wrote :

@Christoph,

That looks like an entirely different issue with different errors. You should create a separate bug report for it.

tags: added: bionic
Revision history for this message
Christoph (chrisdeath) wrote :

Hi Georgi

mhh strange as it has the same steps to reproduce and covers the dmesg with the same messages:
[ 1294.752820] [drm:intel_ddi_prepare_link_retrain [i915]] *ERROR* Timeout waiting for DDI BUF B idle bit
[ 1294.880066] [drm:intel_dp_start_link_train [i915]] *ERROR* Timed out waiting for DP idle patterns
[ 1294.888072] [drm:intel_ddi_prepare_link_retrain [i915]] *ERROR* Timeout waiting for DDI BUF B idle bit
[ 1295.015471] [drm:intel_dp_start_link_train [i915]] *ERROR* Timed out waiting for DP idle patterns
[ 1295.017266] [drm:intel_ddi_prepare_link_retrain [i915]] *ERROR* Timeout waiting for DDI BUF B idle bit

So what is "entirely" meaning here?

Revision history for this message
Georgi Boiko (pandasauce) wrote :

@Christoph,

Your previous post referred to a different error. If i915 drm errors are what you are actually getting please ignore me :)

Revision history for this message
Christoph (chrisdeath) wrote :

No problem,
the just come alonge with each other...so DP idle in dmesg and "change /devices/pci0000:00/0000:00:02.0/drm/card0" with "$ udevadm monitor"

I think it has something to do with the add and remove of additional DP4 to DP6 connectors (udev reports this). This seems to trigger something in X that is interfer with i915 prepare :(. And as the nvidia card goes through the intel one its harmed too. But i am not that familiar with that subject here :(

Revision history for this message
Shaun Crampton (fasaxc) wrote :

Still seems to be an issue; I tried 4.18.16 on 18.10 and it didn't help (nor did the most recent 4.19 kernel in UKUU).

Revision history for this message
Shaun Crampton (fasaxc) wrote :

Following kernel seems to fix the issue (running on 18.10 with kernel installed by UKUU):

4.20.0-042000-generic #201812232030 SMP Mon Dec 24 01:32:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

I still see one or two of these errors when hotplugging but the others are gone and the external screen works:

[drm:intel_mst_pre_enable_dp [i915]] *ERROR* failed to allocate vcpi

Revision history for this message
Slava Koyfman (slavakrl) wrote :

I'm also experiencing this issue on a Thinkpad T480 (20L5) connecting to a ThinkPad Thunderbolt 3 Dock. It worked fine on Ubuntu 18.04 LTS until about a week ago, when I started experiencing this issue and the bug in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1813663 more or less simultaneously.

The latter issue was resolved by downgrading to kernel 4.15.0-43, and later upgrading to 4.15.0-45.

This one remained a problem even after upgrading to Ubuntu 18.10 and kernel 4.18.0-14. I can confirm that installing kernel 4.20.6-042006 via Ukuu fixes it.

Revision history for this message
Georgi Boiko (pandasauce) wrote :

We've reached February now and HWE kernel on 18.04.1 (4.18.0-14-generic) with nvidia-390.77 works fine, no need for running mainline through Ukuu.

Changed in dell-sputnik:
status: New → Fix Released
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Slava Koyfman (slavakrl) wrote :

As I said, I was still experiencing this issue on 4.18.0-14-generic. However, my laptop doesn't use Nvidia graphics; it's running on the Intel UHD 620.

Revision history for this message
Georgi Boiko (pandasauce) wrote :

@Slava, this ticket was specifically related to issues when using nvidia drivers on machines with nvidia GPUs. If you are experiencing this on an Intel-only machine, please raise a new one.

Revision history for this message
Tim Serowski (palmino) wrote :

The problem still persists for me with Kernel 4.18.0-15-generic on Ubuntu 18.10.

I'm using a Lenovo Thinkpad T480 with a Nvidia MX150.

Revision history for this message
Shaun Crampton (fasaxc) wrote :

I tried the latest kernel on 18.10 but still see the issue: 4.18.0-17. Dell Precision 5530, Intel+NVIDIA graphics with Intel graphics selected.

I'm also using the workaround from https://bugs.freedesktop.org/show_bug.cgi?id=109675, which prevents the screen from "shuffling" back and forth after plugging in the dock.

Revision history for this message
Shaun Crampton (fasaxc) wrote :

another datapoint: kernel 5.0.4 from UKUU doesn't show the issue.

Revision history for this message
Georgi Boiko (pandasauce) wrote :

Started seeing this again on nvidia-396.54 with 4.18.0-16 and 4.18.0-17, possibly 4.18.0-15 - I don't recall having hot-plugged it on -15.

It's worse now as the system never recovers any of the displays after hot-plugging and even after removing the dock following a hot-plug the built-in laptop display remains blank. There is no error spam in kernel logs either, in fact USB devices seem to recover fine after hot-plugging, just not the monitors.

Once I have a chance I will try out kernel 5.0 and see if it's any different. Since this is potentially a new issue/regression, do we need a new ticket?

Revision history for this message
Tim Serowski (palmino) wrote :

I can confirm the issue is fixed for me after upgrading to Kernel 5.0.7 from 4.18.0-17

Revision history for this message
Georgi Boiko (pandasauce) wrote :

Updating to 5.0.x does not fix it; it disables the nvidia driver because that kernel version is not supported in non-bleeding edge drivers yet. If you update to 5.0.x you will end up with nouveau/intel, depending on your blacklisting setup.

Revision history for this message
Tim Serowski (palmino) wrote :

No, my integrated MX150 with Nvidia driver 390.116 is working fine with 5.0.7. I'm using Ubuntu 18.10.

Revision history for this message
Georgi Boiko (pandasauce) wrote :

Thanks for tipping me off. Support for 5.x kernels was added in 390.>=116 (legacy branch) and the 418.>=43 (bleeding edge) drivers, but not to the current stable or any other branch, so it was failing for me.

Mainline 5.0.8 with 418.56 also works fine, so it's "just" the LTS kernel branches that have the bug. Back to running mainline Frankenbuntu on prod I suppose.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.