thinkpad thunderbolt 3 dock gen2 with pci memory allocation errors on Yoga C940 unless plugged in before boot

Bug #1860284 reported by Benoit Grégoire
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Linux
Confirmed
Medium
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I have thinkpad thunderbolt 3 dock gen2 dock I am trying to use with a New Lenovo Yoga C940 laptop.

- The dock works fine when plugged-in before boot.
- The dock does NOT work when plugged after the system booted.
- The dock does NOT work when plugged-in at boot, subsequently unplugged and plugged back in.

When it fails, it fails with memory allocation messages such as:

[ 342.507320] pci 0000:2b:00.0: BAR 14: no space for [mem size 0x0c200000]
[ 342.507323] pci 0000:2b:00.0: BAR 14: failed to assign [mem size 0x0c200000]

Things I tried:
- Kernel mainline 5.4.12, same symptoms
- Kernel mainline 5.5.-rc6,
same symptoms.
- Plugging it after powering up the laptop, but at the grub screen before boot. In this case the dock works fine after boot.

Other potentially useful information to narrow it down:

- The tests were done with only an ethernet cable and power plugged into the dock to minimize the number of moving parts...
- Dock and laptop both have the very latest firmware as of 2020-01-17.
- The displayport part of the dock always work, but all other ports (USB, ethernet, card readers fail) when plugged-in after boot.
- Doesn't seem to be a thunderbolt authorization problem:
benoitg@benoitg-Yoga-C940:~$ boltctl
 ? Lenovo ThinkPad Thunderbolt 3 Dock
   ?? type: peripheral
   ?? name: ThinkPad Thunderbolt 3 Dock
   ?? vendor: Lenovo
   ?? uuid: 001730c5-7042-0801-ffff-ffffffffffff
   ?? status: authorized
   ? ?? domain: c06e823d-af8a-8680-ffff-ffffffffffff
   ? ?? authflags: none
   ?? authorized: Sun Jan 19 17:41:04 2020
   ?? connected: Sun Jan 19 17:41:04 2020
   ?? stored: Thu Jan 16 07:27:43 2020
      ?? policy: iommu
      ?? key: no

ProblemType: Bug
DistroRelease: Ubuntu 19.10
Package: linux-image-5.3.0-26-generic 5.3.0-26.28
ProcVersionSignature: Ubuntu 5.3.0-26.28-generic 5.3.13
Uname: Linux 5.3.0-26-generic x86_64
ApportVersion: 2.20.11-0ubuntu8.2
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: benoitg 1182 F.... pulseaudio
CurrentDesktop: KDE
Date: Sun Jan 19 12:38:17 2020
InstallationDate: Installed on 2020-01-16 (3 days ago)
InstallationMedia: Kubuntu 19.10 "Eoan Ermine" - Release amd64 (20191017)
MachineType: LENOVO 81Q9
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.3.0-26-generic root=UUID=078b76d6-6b72-4de4-9e10-f6ea33d9bc1a ro
RelatedPackageVersions:
 linux-restricted-modules-5.3.0-26-generic N/A
 linux-backports-modules-5.3.0-26-generic N/A
 linux-firmware 1.183.3
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 08/22/2019
dmi.bios.vendor: LENOVO
dmi.bios.version: AUCN45WW
dmi.board.asset.tag: NO Asset Tag
dmi.board.name: LNVNB161216
dmi.board.vendor: LENOVO
dmi.board.version: SDK0J40709 WIN
dmi.chassis.asset.tag: NO Asset Tag
dmi.chassis.type: 31
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Lenovo Yoga C940-14IIL
dmi.modalias: dmi:bvnLENOVO:bvrAUCN45WW:bd08/22/2019:svnLENOVO:pn81Q9:pvrLenovoYogaC940-14IIL:rvnLENOVO:rnLNVNB161216:rvrSDK0J40709WIN:cvnLENOVO:ct31:cvrLenovoYogaC940-14IIL:
dmi.product.family: Yoga C940-14IIL
dmi.product.name: 81Q9
dmi.product.sku: LENOVO_MT_81Q9_BU_idea_FM_Yoga C940-14IIL
dmi.product.version: Lenovo Yoga C940-14IIL
dmi.sys.vendor: LENOVO

Revision history for this message
Benoit Grégoire (benoitg) wrote :
Revision history for this message
Benoit Grégoire (benoitg) wrote :
Revision history for this message
Benoit Grégoire (benoitg) wrote :
Revision history for this message
Benoit Grégoire (benoitg) wrote :
Revision history for this message
Benoit Grégoire (benoitg) wrote :
Revision history for this message
Benoit Grégoire (benoitg) wrote :
Revision history for this message
Benoit Grégoire (benoitg) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
summary: - thinkpad thunderbolt 3 dock gen2 does not work on Yoga C940 unless
- plugged in before boot
+ thinkpad thunderbolt 3 dock gen2 with pci memory allocation errors on
+ Yoga C940 unless plugged in before boot
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Please test latest mainline kernel:
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.5/

Revision history for this message
Benoit Grégoire (benoitg) wrote :

With kernel mainline 5.5.1, it does get a little bit further, only BAR 14 and 0 fail to assign on the second try. See attached dmesg. But symptoms are still identical: no peripherals work except for video.

Any way I can systematically search for a workaround using pci= kernel parameters?

Revision history for this message
Benoit Grégoire (benoitg) wrote :

Still the same on 5.5.2

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287231
acpidump

I have thinkpad thunderbolt 3 dock gen2 dock I am trying to use with a New Lenovo Yoga C940-14IIL laptop. Laptop is very recent hardware, with a 10th gen intel cpu, and a bios with very few options :(

- The dock works fine when plugged-in before boot.
- The dock does NOT work when plugged after the system booted.
- The dock does NOT work when plugged-in at boot, subsequently unplugged and plugged back in.
- The dock work fine in windows, in all the above scenarios

When it fails, it fails with memory allocation messages such as:

[ 342.507320] pci 0000:2b:00.0: BAR 14: no space for [mem size 0x0c200000]
[ 342.507323] pci 0000:2b:00.0: BAR 14: failed to assign [mem size 0x0c200000]

Things I tried:
- Ubuntu kernel 5.3.0-26, same symptoms
- Kernel mainline 5.4.12, same symptoms
- Kernel mainline 5.5.2, same symptoms, but gets a little further allocating memory on the second pass.
- Plugging the dock after powering up the laptop, but at the grub screen before boot. In this case the dock works fine after boot.

Other potentially useful information to narrow it down:

- The tests were done with only an ethernet cable and power plugged into the dock to minimize the number of moving parts...

- Dock and laptop both have the very latest firmware as of 2020-02-07
cat /sys/bus/thunderbolt/devices/0-0/nvm_version
72.0
cat /sys/bus/thunderbolt/devices/0-3/nvm_version
50.0

- Unfortunately I cannot procure older firmware for the dock to know if the laptop or the dock is the source of the problem (As this dock was released over a year ago, and I cannot find any specific relevant problems with Linux)

- The screens connected to the displayports on the dock always work. But but all other ports (USB, ethernet, sound fail) when plugged-in after boot.

- Doesn't seem to be a thunderbolt authorization problem:
tbtadm devices
0-3 Lenovo ThinkPad Thunderbolt 3 Dock authorized not in ACL

Originally reported to ubuntu in: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860284

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287233
mainline_5.5.2_notworking_dmesg_dock_plugged_after_boot

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287235
mainline_5.5.2_notworking_lspci_vvvv_dock_plugged_after_boot

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287237
mainline_5.5.2_notworking_lsusb_dock_plugged_after_boot

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287239
mainline_5.5.2_working_dmesg_dock_plugged_before_boot

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287241
mainline_5.5.2_working_lspci_vvvv_dock_plugged_before_boot

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287243
mainline_5.5.2_working_lsusb_dock_plugged_before_boot

Revision history for this message
Benoit Grégoire (benoitg) wrote :
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Please test latest BIOS AUCN54WW.

Revision history for this message
Benoit Grégoire (benoitg) wrote :

unfortunately, same results with BIOS AUCN54WW (manually installed, since not offered by either lenovo system update nor windows update). Same symptoms.

Same symptoms also on with kernel mainline 5.5.3

For reference, dock firmware is V3.1.66

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Still no luck on 5.5.4, and with an updated BIOS (AUCN54WW)

Is there any other information I could provide?

Revision history for this message
Benoit Grégoire (benoitg) wrote :

Still no luck with 5.5.4

Revision history for this message
In , nicholas.johnson-opensource (nicholas.johnson-opensource-linux-kernel-bugs) wrote :

Hi Benoit,

Please try Linux v5.6-rc2: https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.6-rc2/

I have seven patches directly relating to Thunderbolt PCI native enumeration in the v5.6 release, which may help.

In the future, please note that "sudo lspci -xxxx" dumps all information into a file, allowing us to run any lspci command from that file, as if it were on your system. "lspci -F file -vt" for example. I like to have -vt to get a feel for the topology, especially for Thunderbolt.

Thanks for reporting.

Regards,
Nicholas

Revision history for this message
In , mika.westerberg (mika.westerberg-linux-kernel-bugs) wrote :

In case the v5.6-rcX kernel does not help, can you boot the system without device connected and attach 'sudo lspci -vv' and also full dmesg? It looks like the root port (07.1) gets misconfigured by Linux for some reason upon hotplug.

Another question, if you plug the device to another port, does it work any better? Can you attach 'sudo lspci -vv' and dmesg output of that run as well?

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287469
mainline_5.6rc2_working_dmesg_dock_plugged_before_boot

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287471
mainline_5.6rc2_working_lspci_xxxx_dock_plugged_before_boot

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287473
mainline_5.6rc2_working_lspci_vt_dock_plugged_before_boot

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287475
mainline_5.6rc2_reference_lspci_vv_dock_not_plugged

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287477
mainline_5.6rc2_reference_dmesg_dock_not_plugged

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287479
mainline_5.6rc2_notworking_lspci_vv_dock_plugged_second_port_after_boot

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287481
mainline_5.6rc2_notworking_dmesg_dock_plugged_second_port_after_boot

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287483
mainline_5.6rc2_notworking_dmesg_dock_plugged_after_boot

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287485
mainline_5.6rc2_notworking_dmesg_dock_replugged_after_boot

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Hello Nicholas and Mika,

Unfortunately, 5.6rc2 didn't help.

See the new attachments, I believe I included all the information requested.

In addition, I included separate dmesg for when the dock is plugged after boot, and when it was plugged before boot and subsequently re-plugged.

Thanks for your help!

Revision history for this message
In , nicholas.johnson-opensource (nicholas.johnson-opensource-linux-kernel-bugs) wrote :

Thanks for the additional information, Benoit.

If you have other Thunderbolt 3 devices, do they also cause issues with this computer?

Do you have another Thunderbolt 3 computer to boot Linux to try the dock?

Please give "lspci -vnnt" with dock attached before boot and working so I can be sure of topology.

Mika, do you think it could it be worth changing the ACPI OSI name to mimic Windows to see if ACPI is treating us differently?

I see there is a conflict with reserved memory (I have never seen this before) but it is with the SPI controller, not Thunderbolt.

The dmesg suggests booting with pci=realloc. It is worth that with Ice Lake, Linux refuses to reassign (my theory is that ACPI _DSM method evaluates to zero).

I would really like the struct resource to be changed in Linux so that the desired alignment is preserved after assignment, so that we can see it. I suspect the dock has funny alignment expectations which we cannot easily see.

For future tests, you may want to pass pci.dyndbg to kernel parameters to give more information.

This is a bunch of random thoughts and observations for now. I will continue to scour the logs for clues.

Revision history for this message
In , nicholas.johnson-opensource (nicholas.johnson-opensource-linux-kernel-bugs) wrote :

Benoit, are you comfortable compiling and running your own kernel?

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287493
mainline_5.6rc2_working_dmesg_pci_dyndbg_dock_plugged_before_boot

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287495
mainline_5.6rc2_working_lspci_vnnt_dock_plugged_before_boot

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Nicholas, see the two new files with the info you requested (dmesg with pci.dyndbg, and lspci -vnnt)

Unfortunately, I do not have another thunderbolt3 peripheral or other machine with thunderbolt3 on hand.

Yes, I can compile my own kernel to test things if it helps.

Revision history for this message
In , nicholas.johnson-opensource (nicholas.johnson-opensource-linux-kernel-bugs) wrote :

Thanks Benoit, I will have a look at them.

Here is another person who was having issues specifically with MMIO resource window when hot plugging. I think it could be related (same bug?):

https://<email address hidden>/T/

Revision history for this message
In , nicholas.johnson-opensource (nicholas.johnson-opensource-linux-kernel-bugs) wrote :

Sorry, I you had already given an lspci -vt which was sufficient, I forgot to drop that request before posting.

Could we please have dyndbg after the dock has been hot-added after boot? Thanks

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287501
mainline_5.6rc2_notworking_dmesg_pci_dyndbg_dock_plugged_after_boot

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Sure, see new attachment

Revision history for this message
In , nicholas.johnson-opensource (nicholas.johnson-opensource-linux-kernel-bugs) wrote :

Hi Benoit,

It does not contain the information I am expecting.

I need to see the pci_dbg() calls at lines 1855 and 1859 here:

https://elixir.bootlin.com/linux/v5.6-rc2/source/drivers/pci/setup-bus.c

Perhaps your log level is excluding them. Can you please see if you can adjust dmesg log level to see "extended by" and "shrunken by"?

Thanks!

Revision history for this message
In , nicholas.johnson-opensource (nicholas.johnson-opensource-linux-kernel-bugs) wrote :

There could be a possibility that they all have new_size = size and are skipping the pci_dbg(), but I find that unlikely. But if this is the case then I apologise.

Revision history for this message
In , nicholas.johnson-opensource (nicholas.johnson-opensource-linux-kernel-bugs) wrote :

Could I please also have "sudo cat /proc/iomem" before and after dock attached? Must be sudo or else it excludes address information. This gives a complete overview of resources. Thanks

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287503
mainline_5.6rc2_cat_proc_iomem_before_attach

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287505
mainline_5.6rc2_cat_proc_iomem_after_attach

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

I don't know, I seem to get the messages generated at https://
elixir.bootlin.com/linux/v5.6-rc2/source/drivers/pci/pci.c#L1378 , line 1378.
I really don't know what could be filtering the specific ones you want.

The two iomem files are attached above.

Revision history for this message
In , mika.westerberg (mika.westerberg-linux-kernel-bugs) wrote :

(In reply to Nicholas Johnson from comment #20)
> Mika, do you think it could it be worth changing the ACPI OSI name to mimic
> Windows to see if ACPI is treating us differently?

Linux should do that by default e.g Linux looks like Windows to the ACPI code.

Revision history for this message
In , mika.westerberg (mika.westerberg-linux-kernel-bugs) wrote :

Can you check if you have CONFIG_PCI_DEBUG=y enabled in your .config? If not please enable it and attach full dmesg of the failure. I think that option is also needed to see the additional debugging information regarding resource allocation and more.

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287511
mainline_5.6rc2_working_dmesg_pci_dyndbg_dock_plugged_before_boot

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287513
mainline_5.6rc2_notworking_dmesg_pci_dyndbg_dock_plugged_after_boot

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Ok, thanks Mika. I compiled my own kernel and the attachments above now have the information Nicholas wanted.

Revision history for this message
In , mika.westerberg (mika.westerberg-linux-kernel-bugs) wrote :

I'm still going through your log but in the meantime one option you could try is to put "pci=hpmemsize=0" into the kernel command line and see if it makes any difference.

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287523
mainline_5.6rc2_notworking_dmesg_pci_dyndbg_dock_plugged_after_boot_hpmensize_0

Attempt with pci=hpmemsize=0

Revision history for this message
In , mika.westerberg (mika.westerberg-linux-kernel-bugs) wrote :

Created attachment 287533
Don't align upstream port resources

Can you try the attached patch? For some reason we fail already when the upstream port (2b:00.0) resources are assigned which is weird because it should simply get all the resources. This one also adds couple of debug prints more so please attach full dmesg.

You can also remove "pci=hpmemsize=0" from the command line since it did not help.

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287551
mainline_5.6rc2_notworking_dmesg_dock_plugged_after_boot_2020-02-21_patch

Unfortunately, the patch did not help

Revision history for this message
In , mika.westerberg (mika.westerberg-linux-kernel-bugs) wrote :

I have been trying to reproduce this on my reference ICL system without success but today I got my hands on a recent Lenovo Yoga and it has the same issue so now I can reproduce it :) I'll update this as soon as I have some idea what the root cause might be.

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

That's great news for me! Thanks a lot Mika, and good luck...

Revision history for this message
In , mika.westerberg (mika.westerberg-linux-kernel-bugs) wrote :

Created attachment 287619
Skip clipping e820 regions

It looks like the Yoga BIOS-e820 memory map includes some of the memory space reserved for root bridge and the devices below it:

4bc50000-cfffffff BIOS-e820 reserved area
  65400000-bfffffff Root bridge
    66000000-721fffff PCIe root port 07.1

There is code in arch/x86/kernel/resource.c (arch_remove_reservations()) that clips the resource so that it avoids these regions. This is why we can't find memory space for the upstream port.

I wonder if you can try the attached hack patch that skips the clipping?

The changelog in 4dc2287c1805 ("x86: avoid E820 regions when allocating address space") says that Windows seems to ignore these reserved regions which might explain why this works in Windows.

Revision history for this message
In , mika.westerberg (mika.westerberg-linux-kernel-bugs) wrote :

Created attachment 287661
Do not exclude regions marked as MMIO in EFI memmap

This patch is slightly better. Can you try this one?

Bjorn, can you check if this makes sense? The original code is from you so you know this much better than I :) This fixes the issue on Yoga S740 I have here.

Revision history for this message
In , nicholas.johnson-opensource (nicholas.johnson-opensource-linux-kernel-bugs) wrote :

Nice catch. Does this affect all Thunderbolt peripherals with MMIO BAR? It sounds like it does.

More abstract questions for thought (not necessarily expecting any answers):
- Why did they do this, why does Windows ignore the reserved region, and why only Lenovo?
- Could this suggest Linux needs to be added into the certification requirements someday?

The thing I love about Ice Lake is it will hopefully give the OEMs less chance to mess up the Thunderbolt implementation than with external chips. However, clearly mistakes can still be made.

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287663
mainline_5.6rc3_working_dmesg_dock_plugged_after_boot_patch_287619

Result of trying patch 287619, it works!

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Created attachment 287665
mainline_5.6rc3_working_dmesg_dock_plugged_after_boot_patch_287661

Result of trying patch 287661, it ALSO works! Many thanks!

Revision history for this message
Benoit Grégoire (benoitg) wrote :

There is now a working patch on the upstream bug

Revision history for this message
In , mika.westerberg (mika.westerberg-linux-kernel-bugs) wrote :

Great, thanks for testing. I submitted the patch upstream now:

https://<email address hidden>/

Revision history for this message
In , mika.westerberg (mika.westerberg-linux-kernel-bugs) wrote :

(In reply to Nicholas Johnson from comment #48)
> Nice catch. Does this affect all Thunderbolt peripherals with MMIO BAR? It
> sounds like it does.

Yes, I think it does.

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Any chance this will make it into 5.8 ?

Changed in linux:
importance: Unknown → Medium
status: Unknown → Confirmed
Revision history for this message
In , mika.westerberg (mika.westerberg-linux-kernel-bugs) wrote :

I just resent the patch. Hopefully it lands mainline at some point :)

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

Just a testing update: As of today (2020-11-12), the patch:
- Still hasn't landed
- Still applies cleanly on kernel 5.10-rc3
- Is still needed, otherwise thunderbolt doesn't work at all on affected hardware upon reconnect.

Revision history for this message
In , mika.westerberg (mika.westerberg-linux-kernel-bugs) wrote :

Hi, sorry about this. I did not get any comments from x86 maintainers for this and the comment from Bjorn (the author of the original code) seems to suggest rather big rework so I simply haven't had time to look at it at the moment. Can send a ping on that thread? Maybe we get some x86 maintainers to comment it then.

Revision history for this message
In , mumblingdrunkard (mumblingdrunkard-linux-kernel-bugs) wrote :

Any updates on if a fix for this has arrived? So far I've just been applying the patch (287661) and compiling the kernel myself for my Yoga C940, but it's rather finnicky.

Revision history for this message
In , mika.westerberg (mika.westerberg-linux-kernel-bugs) wrote :

I suggest you to reply on that email thread that there is a real problem that needs to be solved so we get some traction from the maintainers.

Revision history for this message
In , wse (wse-linux-kernel-bugs) wrote :

Just wanted to leave a note here that this issue continues to be a problem with descrete thundebold 4 chips: https://bugzilla.kernel.org/show_bug.cgi?id=214259

The hack from https://bugzilla.kernel.org/show_bug.cgi?id=206459#c46 fixes the dock on that Laptop, the cleaner hack I have yet to try.

Revision history for this message
In , wse (wse-linux-kernel-bugs) wrote :

*** Bug 214259 has been marked as a duplicate of this bug. ***

Revision history for this message
Kolargol00 (kolargol00) wrote :

I'm also affected by this bug on 20.04 with the HWE 5.13 kernel (currently: 5.13.0-30-generic #33~20.04.1-Ubuntu SMP Mon Feb 7 14:25:10 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux). My PCIe Thunderbolt card (ASUS ThunderboltEX 4, Intel JHL8540 controller) only partially works: the USB part works and USB devices plugged into the dock are recognised and functional (including the dock's built-in Ethernet interface), however there is no video signal coming from the dock's DisplayPort ports. I've tried the patch mentioned in the kernel bug tracker [1], but it doesn't fix the issue.

Possibly relevant kernel messages:
[ 0.602662] pci 0000:38:00.0: BAR 1: assigned to efifb
[ 0.629016] pci 0000:02:00.0: BAR 13: no space for [io size 0x2000]
[ 0.629017] pci 0000:02:00.0: BAR 13: failed to assign [io size 0x2000]
[ 0.629019] pci 0000:02:00.0: BAR 13: no space for [io size 0x2000]
[ 0.629020] pci 0000:02:00.0: BAR 13: failed to assign [io size 0x2000]
[ 0.629021] pci 0000:03:00.0: BAR 13: no space for [io size 0x2000]
[ 0.629022] pci 0000:03:00.0: BAR 13: failed to assign [io size 0x2000]
[ 0.629023] pci 0000:03:00.0: BAR 13: no space for [io size 0x2000]
[ 0.629024] pci 0000:03:00.0: BAR 13: failed to assign [io size 0x2000]
[ 0.629026] pci 0000:04:01.0: BAR 13: no space for [io size 0x1000]
[ 0.629027] pci 0000:04:01.0: BAR 13: failed to assign [io size 0x1000]
[ 0.629028] pci 0000:04:03.0: BAR 13: no space for [io size 0x1000]
[ 0.629029] pci 0000:04:03.0: BAR 13: failed to assign [io size 0x1000]
[ 0.629030] pci 0000:04:03.0: BAR 13: no space for [io size 0x1000]
[ 0.629031] pci 0000:04:03.0: BAR 13: failed to assign [io size 0x1000]
[ 0.629032] pci 0000:04:01.0: BAR 13: no space for [io size 0x1000]
[ 0.629033] pci 0000:04:01.0: BAR 13: failed to assign [io size 0x1000]
[ 4.271107] thunderbolt 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0xcb0f6500 flags=0x0020]
[ 25.977578] thunderbolt 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0xcb0f6600 flags=0x0020]
[ 46.457436] thunderbolt 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0xcb0f6700 flags=0x0020]
[ 66.937458] thunderbolt 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0xcb0f6800 flags=0x0020]
[ 87.413107] thunderbolt 0000:05:00.0: failed to send driver ready to ICM
[ 87.414436] thunderbolt: probe of 0000:05:00.0 failed with error -110

[1] https://<email address hidden>/

Revision history for this message
In , bjorn (bjorn-linux-kernel-bugs) wrote :

From the dmesg log in https://bugzilla.kernel.org/attachment.cgi?id=287483,

  BIOS-e820: [mem 0x000000004bc50000-0x00000000cfffffff] reserved
  pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
  pci 0000:00:07.1: PCI bridge to [bus 2b-54]
  pci 0000:00:07.1: bridge window [mem 0x66000000-0x721fffff]
  # add dock
  pcieport 0000:00:07.1: pciehp: Slot(0-1): Card present
  pcieport 0000:00:07.1: pciehp: Slot(0-1): Link Up
  pci 0000:2b:00.0: BAR 14: no space for [mem size 0x0c200000]

From the log in https://bugzilla.kernel.org/attachment.cgi?id=287665, which includes the patch in https://bugzilla.kernel.org/attachment.cgi?id=287661 to "not exclude EFI MMIO regions":

  pci 0000:2b:00.0: BAR 14: assigned [mem 0x66000000-0x721fffff]

The 00:07.1 bridge window was the same in both cases, so the same space is available on bus 2b. I think the reason the first one failed even though the space was available was because the entire MMIO aperture was marked "reserved" in E820, so PCI avoids assigning space from it. The patch basically avoids that E820 checking if the region is EfiMemoryMappedIO.

The patch https://git.kernel.org/linus/d341838d776a ("x86/PCI: Disable E820 reserved region clipping via quirks") appeared in v5.19 and should work around this problem for this machine and others.

Revision history for this message
In , bjorn (bjorn-linux-kernel-bugs) wrote :

Created attachment 303237
experimental patch

That patch (d341838d776a ("x86/PCI: Disable E820 reserved region clipping via quirks")) relies on quirks that match DMI Vendor, Product Version, Product Name, and Board Name. This isn't an ideal solution because there are likely other systems we don't know about that need the a similar fix.

The patch I'm attaching here is an experimental idea to work around this issue without the maintenance burden of the quirks.

If anybody would be willing to test this patch, I would be very grateful. To test it, either start with a v5.18 or earlier kernel, or on a v5.19 or newer kernel, revert d341838d776a (or just remove your system's ID from the list in pci_crs_quirks[]). Then apply this patch, boot with the "efi=debug" parameter, connect a dock, see whether it works, and attach the dmesg log.

Revision history for this message
In , benoitg (benoitg-linux-kernel-bugs) wrote :

I'd love to help you out Bjorn; Unfortunately, I no longer have easy access to the original hardware.

Revision history for this message
In , wse (wse-linux-kernel-bugs) wrote :

Tested on Clevo X170KM seems to work. Attached dmesg with dock attached during boot and dock attached after boot.

Revision history for this message
In , wse (wse-linux-kernel-bugs) wrote :

Created attachment 303308
dmsg of experimental patch, dock attached before boot

Revision history for this message
In , wse (wse-linux-kernel-bugs) wrote :

Created attachment 303309
dmsg of experimental patch, dock attached after boot

Revision history for this message
In , bjorn (bjorn-linux-kernel-bugs) wrote :

Created attachment 303314
add resource clip debug

Thank you very much, Werner! I was confused about why your machine has DMI_BOARD_NAME "X170KM-G", but didn't match the quirk in d341838d776a ("x86/PCI: Disable E820 reserved region clipping via quirks"), but I see that in comment #62 I suggested reverting it, so I assume that's why I don't see "PCI: %s detected: not clipping E820 regions from _CRS" in your logs.

So the fact that it works as expected with the comment #62 patch but without the d341838d776a quirk is great news.

I do want to figure out the "clipped [mem size 0x00000000 64bit] to [mem size 0xfffffffffffa0000 64bit]" messages, which seem sort of bogus. Can I trouble you to add this patch and attach the dmesg (from either dock scenario)?

Revision history for this message
In , wse (wse-linux-kernel-bugs) wrote :

Created attachment 303326
dmsg of experimental patch, dock attached before boot 2

Revision history for this message
In , wse (wse-linux-kernel-bugs) wrote :

Created attachment 303327
dmsg of experimental patch, dock attached after boot 2

Revision history for this message
In , wse (wse-linux-kernel-bugs) wrote :

ofc uploaded the dmesg with the new patch

Revision history for this message
In , kjhambrick (kjhambrick-linux-kernel-bugs) wrote :

Bjorn --

Not to butt in, but ...

If you want it, I've built 6.0.10 with your Patches and I ran the same tests.

I've got a Sager NP9672M Laptop which is a rebranded Clevo X1170KM-G

I've been working with Mika on Bug 214259 - Discrete Thunderbolt Controller 8086:1137 throws DMAR and XHCI errors and is only partially functional

-- kjh

Revision history for this message
In , bjorn (bjorn-linux-kernel-bugs) wrote :

Hi Konrad, you're not butting in at all!

Things got a little tangled up here. I think bug 214259 describes two issues:

1) Hot-added devices, e.g., an ethernet NIC in a dock, don't work. This should be helped by https://git.kernel.org/linus/d341838d776a, which appeared in v5.19, but only for the machines specifically listed in that patch.

2) I/O page faults related to IOMMU and USB. I think this one is still unresolved, and I don't think the comment #62 patch will help.

Your laptop is a rebranded Clevo. Does it match the d341838d776a list? You can tell by looking for "PCI: %s detected: not clipping E820 regions from _CRS" in your dmesg log.

If your laptop does not match the list, hot-added devices probably will not work with v6.0. If the comment #62 patch makes them work better, that information would be extremely valuable. A dmesg log from a boot with "pci=use_e820 efi=debug" would be very helpful.

Revision history for this message
In , kjhambrick (kjhambrick-linux-kernel-bugs) wrote :

Bjorn --

My Laptop does match d341838d776a

# dmesg -t |grep DMI:
DMI: Notebook X170KM-G/X170KM-G, BIOS 1.07.08LS1 01/11/2020

I've already built 6.0.10 with your patches and I booted yesterday afternoon with these CommandLine Args:

ro nvidia-drm.modeset=1 thunderbolt.dyndbg=+p efi=debug loglevel=3 udev.log_level=3

Note that Mika gave me the thunderbolt.dyndbg=+p Arg and it does print some 'interesting but mysterious' info :)

I've already gatherred the two sets of logs, so ...

Q1: Would you prefer a single .tgz file with the logs in sub-directories or do you prefer multiple attachments in this thread ?

Q2: Would you prefer logs for linux 6.1-rc7 ?

I can build that one and gather logs for 6.1-rc7 instead of 6.0.y if you prefer.

Thanks again Bjorn !

-- kjh

Revision history for this message
In , bjorn (bjorn-linux-kernel-bugs) wrote :

Since your laptop does match d341838d776a, I don't think the comment #62 patch will make much difference. It will likely change MMCONFIG messages from "reserved in E820" to "reserved in ACPI motherboard resources", but won't change the behavior.

No need to repeat with v6.1-rc7. A single .tgz or even just the single dmesg log would be great. I don't think lsmod/lspci/etc are relevant for this bug, but no harm in including them.

Revision history for this message
In , kjhambrick (kjhambrick-linux-kernel-bugs) wrote :

Created attachment 303339
6.0.10 Logs

Bjorn --

Attached anker-tests-e..f-tar.gz

Manifest is below.

I went ahead and included all the logs I had already gathered yesterday after reading this thread.

Notable:

1. The Patches I applied are in kernel_config_debug/linux-6.0.10.kjh_dp.patch

2. Test(e) - Boot with Thunderbolt Attached ...

Read Performance dropped from ~800 MB/sec to 39 MB/sec hdparm after unplugging then replugging the TBT

This did not happen when I booted with TBT unplugged ...

Thank You !

-- kjh

Manifest of anker-tests-e..f-tar.gz in temporal order:

kernel_config_debug/

   .config
   linux-6.0.10.kjh_dp.patch

test-e-with-tbt4-on-at-boot-thunderbolt-debug/

   dmesg-boot-with-tbt4-on.txt
   lsusb-vv-boot-with-tbt4-on.txt
   lsmod-boot-with-tbt4-on.txt
   lspci-vv-boot-with-tbt4-on.txt
   hdparm--direct-t_sda3-boot-with-tbt4-on.txt

   dmesg-boot-with-tbt4-on-unplugged.txt
   lsusb-vv-boot-with-tbt4-on-unplugged.txt
   lspci-vv-boot-with-tbt4-on-unplugged.txt
   lsmod-boot-with-tbt4-on-unplugged.txt

   dmesg-boot-with-tbt4-on-replugged.txt
   lsmod-boot-with-tbt4-on-replugged.txt
   lspci-vv-boot-with-tbt4-on-replugged.txt
   hdparm--direct-t_sda3-boot-with-tbt4-on-replugged.txt
   lsmod-boot-with-tbt4-on-replugged-after-hdparm.txt

test-f-without-tbt4-on-at-boot-thunderbolt-debug/

   dmesg-boot-without-tbt4-on.txt
   lspci-vv-boot-without-tbt4-on.txt
   lsusb-vv-boot-without-tbt4-on.txt
   lsmod-boot-without-tbt4-on.txt

   dmesg-boot-without-tbt4-on-plugged.txt
   lsmod-boot-without-tbt4-on-plugged.txt
   lspci-vv-boot-without-tbt4-on-plugged.txt
   lsusb-vv-boot-without-tbt4-on-plugged.txt
   hdparm--direct-t_sda3-boot-without-tbt4-on-plugged.txt

   dmesg-boot-without-tbt4-on-unplugged.txt
   lsmod-boot-without-tbt4-on-unplugged.txt
   lspci-vv-boot-without-tbt4-on-unplugged.txt
   lsusb-vv-boot-without-tbt4-on-unplugged.txt

   dmesg-boot-without-tbt4-on-replugged.txt
   lsmod-boot-without-tbt4-on-replugged.txt
   lspci-vv-boot-without-tbt4-on-replugged.txt
   lsusb-vv-boot-without-tbt4-on-replugged.txt
   hdparm--direct-t_sda3-boot-without-tbt4-on-replugged.txt

Revision history for this message
In , bjorn (bjorn-linux-kernel-bugs) wrote :

Thanks for those! It's a little overwhelming, but I don't see any issues related to the comment #62 patch. It seems to work as I expect on your system.

There are definitely dock hotplug issues and the performance issue, but those are problems for different reports.

Revision history for this message
In , kjhambrick (kjhambrick-linux-kernel-bugs) wrote :

Yes, it is overwhelming, isn't it.

The only thing that seems to stick out for me is when I sdiff pairs of lsmod and lsusb files where it seems like the states of the devices should return to the initial states after plugging or unplugging the TBT4 Dock.

For example, when I sdiff these two files, there are just a few diffs in the TBT-related Flags:

test-f-without-tbt4-on-at-boot-thunderbolt-debug/lspci-vv-boot-without-tbt4-on.txt
test-f-without-tbt4-on-at-boot-thunderbolt-debug/lspci-vv-boot-without-tbt4-on-unplugged.txt

It seems that these two files should show the TBT PCI Devices in the same state ?

But it's been almost 40 years since I messed with any low level PeeCee firmware programming and I am just a tad behind the times :)

Anyhow, thanks for what you ( and all the Kernel Devs ) do !

I appreciate you !!

-- kjh

Revision history for this message
In , kjhambrick (kjhambrick-linux-kernel-bugs) wrote :

p.s. Yes Bjorn, your patch from commentt #62 worked perfectly, even after I removed the Clevo Entry from the pci_crs_quirks[] array.

Revision history for this message
In , knyffen (knyffen-linux-kernel-bugs) wrote :

Thanks for the great work!

Hotplugging thunderbolt docks definitely work better, but I still have a couple issues.

Before I start, here is some info about my setup.

I use KDE on Arch Linux, but it shouldn't affect the problem. uname -a prints:
Linux MyPC 6.1.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 21 Dec 2022 22:27:55 +0000 x86_64 GNU/Linux

Regarding BIOS, I am running the non-standard Lenovo C940 BIOS that some Lenovo engineer temporarily released on their forum to fix the speakers on Linux (version AUCN57WW).

My dock is a "Kensington SD5500T/SD5550T Thunderbolt 3 and USB-C Docking Station".

But now to the issues.

First: The fix currently mainlined in the kernel doesn't work for me. After comparing the file arch/x86/pci/acpi.c on github with the result of dmidecode (see below), I think it the check for DMI_PRODUCT_VERSION should be changed to checking the product family. It could potentially be related to my non-standard BIOS version, but I can't check that.

Second: Since the current fix didn't apply to my computer, I tried compiling the kernel myself and applying the patch from comment #62.
My dock has both a USB hub and display outputs, and on the non-patched kernel, if I tried hotplugging it, the USB hub didn't work, but the external displays always did (and sometimes the computer crashed).
On the patched kernel, if I hotplug, the USB hub always works, and I haven't experienced any crashes, but the external displays aren't recognized. More specifically, if the dock is unplugged when booting and I then hotplug it, they aren't recognized. If instead the computer is docked while booting, the displays work, and if I unplug the dock and then replug it within ~5-10 seconds, the displays are recognized again. If I instead have the dock unplugged for a minute or so, the displays are not recognized when replugging the dock.
I have tried running lspci -vv, and the output differs between if the computer was docked on boot and after it was un- and replugged, but it doesn't differ between if the external displays are recognized or not.

I would attach dmesg and lspci -xxxx, but I cannot figure out how to create attachments, so please help me if you need that info. And please specify under which conditions you want the logs. :)

--- dmidecode ---
Handle 0x0001, DMI type 1, 27 bytes
System Information
        Manufacturer: LENOVO
        Product Name: 81Q9
        Version: Yoga C940
        Serial Number: [...]
        UUID: [...]
        Wake-up Type: Power Switch
        SKU Number: LENOVO_MT_81Q9_BU_idea_FM_Yoga C940-14IIL
        Family: Yoga C940-14IIL

Revision history for this message
In , bjorn (bjorn-linux-kernel-bugs) wrote :

Thanks, Jonas. This bugzilla has a lot of stuff going on, and it's not clear yet whether the issue you're seeing is the same, so can you please open a new report?

Use the "File a Bug" button at https://bugzilla.kernel.org/ and the Drivers/PCI product/component. After you open the issue, the "Add an attachment" link is near the top of the issue, just above your "Description". Make the attachments text/plain.

If you can attach a complete dmesg log and output of "sudo lspci -vv", that would be great. Since you can compile your own kernel, testing with v6.2-rc1 would be a great place to start. It includes the equivalent of the comment #62 patch, so I expect you'll see the display issues. I would start with "boot undocked, collect lspci, dock, collect lspci, collect dmesg".

Revision history for this message
In , knyffen (knyffen-linux-kernel-bugs) wrote :

First of all, I forgot that this issue was for "thinkpad thunderbolt 3 dock gen2" specifically. Sorry.

Anyway, I leave this as a note for other people who came here from the Arch Linux wiki: The mainline kernel (6.2-rc2) works without bugs (in my case). I have been testing it for a week, and I haven't had a single issue.

Basically, either one of these things was the problem:
1. I messed up when applying the patch. :P
2. The Arch Linux 6.1 kernel has some other patch that interferes with the patch from comment #62 and causes my problems.
3. Something other than the patch from comment #62 has been introduced in the 6.2-rc1(/2), which helps fix the bug.

I don't know which one is the problem, but this doesn't seem to be an issue for the mainline kernel. I am sorry for the confusion.

To post a comment you must log in.