thunderbolt / PCIe hotplug gets confused after a few cycles on X1 Yoga 2nd gen

Bug #1825395 reported by Roland Dreier
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I have a Lenovo X1 Yoga 2nd gen along with the Lenovo thunderbolt docking station. Initial connection to the dock works great, but after a few plug/unplug and suspend/resume cycles, the system gets in a state where a reboot is needed to make the dock work. The displayport connection still works so my external monitor gets the right display, but the thunderbolt / PCIe connection does not and so the USB and network ports in the dock aren't usable. I assume it is connected to the kernel messages like

[37269.423750] thunderbolt 0-1: new device found, vendor=0x108 device=0x1630
[37269.423751] thunderbolt 0-1: Lenovo ThinkPad Thunderbolt 3 Dock

[37270.096713] pci 0000:09:00.0: [8086:15d3] type 01 class 0x060400
[37270.096820] pci 0000:09:00.0: enabling Extended Tags
[37270.096975] pci 0000:09:00.0: supports D1 D2
[37270.096976] pci 0000:09:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[37270.097068] pci 0000:09:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x4 link at 0000:07:01.0 (capable of 31.504 Gb/s with 8 GT/s x4 link)
[37270.097187] pcieport 0000:07:01.0: ASPM: current common clock configuration is broken, reconfiguring
[37270.108588] pci 0000:0a:00.0: [8086:15d3] type 01 class 0x060400
[37270.108705] pci 0000:0a:00.0: enabling Extended Tags
[37270.108865] pci 0000:0a:00.0: supports D1 D2
[37270.108866] pci 0000:0a:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[37270.109066] pcieport 0000:07:02.0: ASPM: current common clock configuration is broken, reconfiguring
[37270.109147] pci 0000:0b:00.0: [1b73:1100] type 00 class 0x0c0330
[37270.109221] pci 0000:0b:00.0: reg 0x10: [mem 0xbc000000-0xbc00ffff 64bit]
[37270.109258] pci 0000:0b:00.0: reg 0x18: [mem 0xbc010000-0xbc010fff 64bit]
[37270.109295] pci 0000:0b:00.0: reg 0x20: [mem 0xbc011000-0xbc011fff 64bit]
[37270.109499] pci 0000:0b:00.0: supports D1
[37270.109500] pci 0000:0b:00.0: PME# supported from D0 D1 D3hot D3cold
[37270.109686] pcieport 0000:07:04.0: ASPM: current common clock configuration is broken, reconfiguring
[37270.109724] pci 0000:0a:00.0: devices behind bridge are unusable because [bus 0b] cannot be assigned for them
[37270.109738] pci 0000:09:00.0: devices behind bridge are unusable because [bus 0a] cannot be assigned for them
[37270.109766] pci 0000:0a:00.0: devices behind bridge are unusable because [bus 0b] cannot be assigned for them
[37270.109779] pcieport 0000:07:02.0: bridge has subordinate 0a but max busn 0b
[37270.109831] pci_bus 0000:3d: busn_res: can not insert [bus 3d-70] under [bus 07-0b] (conflicts with (null) [bus 07-0b])
[37270.109834] pcieport 0000:07:04.0: PCI bridge to [bus 3d-70]
[37270.109843] pcieport 0000:07:04.0: bridge window [mem 0xd4000000-0xe9ffffff]
[37270.109848] pcieport 0000:07:04.0: bridge window [mem 0x90000000-0xb9ffffff 64bit pref]
[37270.109849] pcieport 0000:07:04.0: devices behind bridge are unusable because [bus 3d-70] cannot be assigned for them

ProblemType: Bug
DistroRelease: Ubuntu 19.04
Package: linux-image-5.0.0-13-generic 5.0.0-13.14
ProcVersionSignature: Ubuntu 5.0.0-13.14-generic 5.0.6
Uname: Linux 5.0.0-13-generic x86_64
ApportVersion: 2.20.10-0ubuntu27
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: roland 1690 F.... pulseaudio
CurrentDesktop: ubuntu:GNOME
Date: Thu Apr 18 09:40:22 2019
InstallationDate: Installed on 2019-03-29 (20 days ago)
InstallationMedia: Ubuntu 19.04 "Disco Dingo" - Alpha amd64 (20190326.2)
MachineType: LENOVO 20JGS01000
ProcEnviron:
 TERM=screen
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.0.0-13-generic root=UUID=323a265b-35d4-4a11-82f0-e47b38cac797 ro quiet splash vt.handoff=1
RelatedPackageVersions:
 linux-restricted-modules-5.0.0-13-generic N/A
 linux-backports-modules-5.0.0-13-generic N/A
 linux-firmware 1.178
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 03/11/2019
dmi.bios.vendor: LENOVO
dmi.bios.version: N1NET45W (1.32 )
dmi.board.asset.tag: Not Available
dmi.board.name: 20JGS01000
dmi.board.vendor: LENOVO
dmi.board.version: SDK0J40697 WIN
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 31
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.modalias: dmi:bvnLENOVO:bvrN1NET45W(1.32):bd03/11/2019:svnLENOVO:pn20JGS01000:pvrThinkPadX1Yoga2nd:rvnLENOVO:rn20JGS01000:rvrSDK0J40697WIN:cvnLENOVO:ct31:cvrNone:
dmi.product.family: ThinkPad X1 Yoga 2nd
dmi.product.name: 20JGS01000
dmi.product.sku: LENOVO_MT_20JG_BU_Think_FM_ThinkPad X1 Yoga 2nd
dmi.product.version: ThinkPad X1 Yoga 2nd
dmi.sys.vendor: LENOVO

Revision history for this message
Roland Dreier (roland.dreier) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Would it be possible for you to test the latest upstream kernel? Refer
to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest
v5.1-rc5 kernel [0].

If this bug is fixed in the mainline kernel, please add the following
tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag:
'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as
"Confirmed”, and attach dmesg.

Thanks in advance.

[0] https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.1-rc5/

Revision history for this message
Roland Dreier (roland.dreier) wrote :

I will try the upstream kernel, although it will be a few days before I can confidently report if the issue is not present.

Revision history for this message
Roland Dreier (roland.dreier) wrote :

So I can confirm that this bug happens with upstream build 5.1.0-050100rc5-generic. Will attach full dmesg.

Revision history for this message
Roland Dreier (roland.dreier) wrote :

Attaching dmesg showing failed thunderbolt attachment. There is a good hotplug starting with

[35304.659358] thunderbolt 0-1: new device found, vendor=0x108 device=0x1630
[35304.659361] thunderbolt 0-1: Lenovo ThinkPad Thunderbolt 3 Dock

for example, you can see that the system discovers the downstream USB controller and USB audio device that is part of the dock:

[35305.406824] xhci_hcd 0000:0b:00.0: xHCI Host Controller
[35305.406831] xhci_hcd 0000:0b:00.0: new USB bus registered, assigned bus number 3
...
[35306.102685] usb 3-1: New USB device found, idVendor=17ef, idProduct=306a, bcdDevice=28.00
[35306.102688] usb 3-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[35306.102691] usb 3-1: Product: ThinkPad Thunderbolt 3 Dock USB Audio

then there are a few suspend/resume cycles, and a second hotplug of the thunderbolt dock:

[40222.796009] thunderbolt 0-1: new device found, vendor=0x108 device=0x1630
[40222.796010] thunderbolt 0-1: Lenovo ThinkPad Thunderbolt 3 Dock

in this case PCI bus enumeration seems to fail:

[40223.479357] pci 0000:0a:00.0: devices behind bridge are unusable because [bus 0b] cannot be assigned for them
[40223.479371] pci 0000:09:00.0: devices behind bridge are unusable because [bus 0a] cannot be assigned for them
[40223.479397] pci 0000:0a:00.0: devices behind bridge are unusable because [bus 0b] cannot be assigned for them
[40223.479411] pcieport 0000:07:02.0: bridge has subordinate 0a but max busn 0b
[40223.479459] pci_bus 0000:3d: busn_res: can not insert [bus 3d-70] under [bus 07-0b] (conflicts with (null) [bus 07-0b])
[40223.479462] pcieport 0000:07:04.0: PCI bridge to [bus 3d-70]
[40223.479470] pcieport 0000:07:04.0: bridge window [mem 0xd4000000-0xe9ffffff]
[40223.479475] pcieport 0000:07:04.0: bridge window [mem 0x90000000-0xb9ffffff 64bit pref]
[40223.479476] pcieport 0000:07:04.0: devices behind bridge are unusable because [bus 3d-70] cannot be assigned for them

tags: added: kernel-bug-exists-upstream
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Subscribing Mika.

Revision history for this message
Mika Westerberg (mika-westerberg) wrote :

Can you also attach full dmesg of the boot so that we can see the initial PCI configuration? Now all the dmesgs are missing that information.

When it initially works, do you boot with the dock connected or not?

Revision history for this message
Roland Dreier (roland.dreier) wrote :

I can boot either connected or not, and the first connection will usually work. I'd say my more common workflow is to boot away from my dock, use my laptop for a bit, suspend, and go to my desk and resume while docked. That usually works, but if I then undock, use my laptop for a bit, suspend, come back to my desk and resume, then I hit the PCI issue.

I simulated that workflow and indeed reproduced the issue. This morning I:
 - shut down my laptop
 - booted it while undocked (and captured dmesg and lspci)
 - suspended and connected the thunderbolt cable
 - resumed by pressing the power button on the dock
 - thunderbolt worked - captured dmesg and lspci again
 - unplugged the dock and captured dmesg and lspci while undocked
 - supended and connected thunderbolt again
 - hit the issue described in this bug - captured dmesg and lspci one more time

One mildly interesting thing is that while my system is in the final "broken" state, lspci prints the following to stderr:

pcilib: Cannot open /sys/bus/pci/devices/0000:0b:00.0/config
lspci: Unable to read the standard configuration space header of device 0000:0b:00.0

I will attach the dmesg as well as lspci I captured at various steps

Revision history for this message
Roland Dreier (roland.dreier) wrote :
Revision history for this message
Roland Dreier (roland.dreier) wrote :
Revision history for this message
Roland Dreier (roland.dreier) wrote :
Revision history for this message
Roland Dreier (roland.dreier) wrote :
Revision history for this message
Roland Dreier (roland.dreier) wrote :
Revision history for this message
Mika Westerberg (mika-westerberg) wrote :

Thanks for the logs. It seems like the BIOS does not handle S3 exit properly and leaves the Thunderbolt host router unconfigured. I wonder if there is a BIOS upgrade for this system and have you tried that?

Revision history for this message
Roland Dreier (roland.dreier) wrote :

No, unfortunately I am on the latest BIOS. dmidecode reports

BIOS Information
        Vendor: LENOVO
        Version: N1NET45W (1.32 )
        Release Date: 03/11/2019

and https://pcsupport.lenovo.com/us/en/products/laptops-and-netbooks/thinkpad-x-series-laptops/thinkpad-x1-yoga-type-20jd-20je-20jf-20jg/downloads/ds121063 lists BIOS 1.32 as the latest.

Revision history for this message
Roland Dreier (roland.dreier) wrote :

I see from https://pcsupport.lenovo.com/us/en/downloads/DS506115 that there is a firmware update available for my dock. I am going to try and borrow a windows system and perform that update.

Revision history for this message
Mika Westerberg (mika-westerberg) wrote :

BTW, did you enable any Linux specific configuration in the BIOS? For example to enable S3 (many systems I've seen default to s2idle nowadays).

Revision history for this message
Roland Dreier (roland.dreier) wrote :

No BIOS options changed. I am on X1 Yoga 2nd gen - the BIOS exposes S3 by default. X1 Yoga 3rd gen is the first Lenovo generation that defaults to S0i3 for suspend.

Revision history for this message
Roland Dreier (roland.dreier) wrote :

I've come up with a theory that I think matches what I observe. It seems that my laptop gets into the bad state if I suspend too quickly after hot unplug from thunderbolt. If I wait long enough after hot unplug that all the thunderbolt devices are fully removed, then the system remains fine. If I suspend while cleanup is still going on, then upon resume the system confuses itself trying to deal with devices that are no longer there (and that never get cleaned up).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.