Kernel bug when unplugging Thunderbolt 3 cable, leaves xHCI host controller dead

Bug #1768852 reported by Alfred Krohmer on 2018-05-03
56
This bug affects 9 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Status tracked in Cosmic
Bionic
Undecided
Unassigned
Cosmic
Medium
Unassigned

Bug Description

===SRU Justification===
[Impact]
When unplugging the Thunderbolt 3 cable from the TBT controller, kernel
oops.

[Test]
The user confirms this patch works.

[Fix]
tty_unregister_driver may be called more than 1 time in some
hotplug cases,it will cause the kernel oops. This patch checked
dbc_tty_driver to make sure it is unregistered only 1 time.

[Regression Potential]
Low. The change is to guard against null pointer, so it's the correct
behavior.

===Original Bugreport===
When unplugging the Thunderbolt 3 cable that was connected to a Lenovo Thunderbolt 3 Dock:

[78402.194718] xhci_hcd 0000:0f:00.0: remove, state 4
[78402.194726] usb usb8: USB disconnect, device number 1
[78402.194727] usb 8-2: USB disconnect, device number 2
[78402.195072] xhci_hcd 0000:0f:00.0: USB bus 8 deregistered
[78402.195077] xhci_hcd 0000:0f:00.0: xHCI host controller not responding, assume dead
[78402.195086] xhci_hcd 0000:0f:00.0: remove, state 1
[78402.195091] usb usb7: USB disconnect, device number 1
[78402.195092] usb 7-2: USB disconnect, device number 2
[78402.195094] usb 7-2.1: USB disconnect, device number 3
[78402.242648] usb 7-2.2: USB disconnect, device number 4
[78402.246827] xhci_hcd 0000:0f:00.0: Host halt failed, -19
[78402.246829] xhci_hcd 0000:0f:00.0: Host not accessible, reset failed.
[78402.246917] xhci_hcd 0000:0f:00.0: USB bus 7 deregistered
[78402.247998] pcieport 0000:0a:03.0: Refused to change power state, currently in D3
[78402.255841] xhci_hcd 0000:0d:00.0: remove, state 1
[78402.255847] usb usb6: USB disconnect, device number 1
[78402.255849] usb 6-1: USB disconnect, device number 2
[78402.255900] xhci_hcd 0000:0d:00.0: xHCI host controller not responding, assume dead
[78402.255920] r8152 5-3.4.3:1.0 enx00e04c6814c6: Stop submitting intr, status -108
[78402.302674] xhci_hcd 0000:0d:00.0: USB bus 6 deregistered
[78402.302679] xhci_hcd 0000:0d:00.0: remove, state 1
[78402.302685] usb usb5: USB disconnect, device number 1
[78402.302687] usb 5-3: USB disconnect, device number 2
[78402.302688] usb 5-3.4: USB disconnect, device number 3
[78402.302689] usb 5-3.4.1: USB disconnect, device number 4
[78402.430677] usb 5-3.4.2: USB disconnect, device number 5
[78402.470512] usb 5-3.4.3: USB disconnect, device number 6
[78402.506481] usb 5-3.4.4: USB disconnect, device number 7
[78402.507533] BUG: unable to handle kernel NULL pointer dereference at 0000000000000034
[78402.507540] IP: tty_unregister_driver+0xd/0x70
[78402.507542] PGD 0 P4D 0
[78402.507544] Oops: 0000 [#1] SMP PTI
[78402.507546] Modules linked in: xt_nat xt_tcpudp veth rfcomm ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack libcrc32c br_netfilter bridge stp llc ccm cmac bnep binfmt_misc nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc arc4 aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf snd_hda_codec_hdmi snd_soc_skl snd_soc_skl_ipc snd_hda_ext_core snd_soc_sst_dsp snd_soc_sst_ipc snd_hda_codec_conexant snd_soc_acpi snd_hda_codec_generic snd_soc_core joydev snd_compress serio_raw ac97_bus snd_pcm_dmaengine wmi_bmof intel_wmi_thunderbolt
[78402.507582] snd_hda_intel snd_hda_codec snd_hda_core iwlmvm input_leds mac80211 snd_usb_audio snd_usbmidi_lib cdc_ether snd_hwdep r8152 iwlwifi snd_pcm uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev thinkpad_acpi media cfg80211 nvram cdc_mbim qcserial cdc_wdm cdc_ncm snd_seq_midi usb_wwan snd_seq_midi_event usbnet rtsx_pci_ms btusb snd_rawmidi btrtl memstick btbcm usbserial mii btintel snd_seq bluetooth snd_seq_device snd_timer mei_me ucsi_acpi mei shpchp intel_pch_thermal typec_ucsi ecdh_generic snd typec soundcore acpi_pad mac_hid sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) hid_generic usbhid rtsx_pci_sdmmc i915 psmouse i2c_algo_bit e1000e drm_kms_helper syscopyarea
[78402.507629] ptp sysfillrect sysimgblt pps_core nvme rtsx_pci fb_sys_fops thunderbolt nvme_core drm wmi i2c_hid video hid
[78402.507639] CPU: 0 PID: 15421 Comm: kworker/u8:3 Tainted: P O 4.15.0-20-generic #21-Ubuntu
[78402.507640] Hardware name: LENOVO 20HRCTO1WW/20HRCTO1WW, BIOS N1MET38W (1.23 ) 08/30/2017
[78402.507644] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[78402.507648] RIP: 0010:tty_unregister_driver+0xd/0x70
[78402.507649] RSP: 0018:ffffa16e94de3af0 EFLAGS: 00010246
[78402.507651] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[78402.507652] RDX: ffff8c4e4e972f00 RSI: fffff687520f9380 RDI: 0000000000000000
[78402.507654] RBP: ffffa16e94de3af8 R08: ffff8c4e43e4e110 R09: 00000001801e0013
[78402.507655] R10: fffff687512bfa00 R11: 0000000000000000 R12: ffff8c4e43e0e230
[78402.507656] R13: ffff8c4e43e0e27c R14: ffff8c4e43e0e390 R15: 0000000000000060
[78402.507658] FS: 0000000000000000(0000) GS:ffff8c4e61400000(0000) knlGS:0000000000000000
[78402.507659] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[78402.507661] CR2: 0000000000000034 CR3: 00000003ef40a001 CR4: 00000000003606f0
[78402.507662] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[78402.507664] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[78402.507665] Call Trace:
[78402.507671] xhci_dbc_tty_unregister_driver+0x15/0x30
[78402.507673] xhci_dbc_exit+0x2e/0x50
[78402.507676] xhci_stop+0x5b/0x1e0
[78402.507679] usb_remove_hcd+0x105/0x250
[78402.507681] usb_hcd_pci_remove+0x74/0x130
[78402.507683] xhci_pci_remove+0x6b/0x70
[78402.507686] pci_device_remove+0x3e/0xb0
[78402.507694] device_release_driver_internal+0x15b/0x220
[78402.507696] device_release_driver+0x12/0x20
[78402.507699] pci_stop_bus_device+0x7f/0xa0
[78402.507701] pci_stop_bus_device+0x30/0xa0
[78402.507703] pci_stop_bus_device+0x41/0xa0
[78402.507705] pci_stop_and_remove_bus_device+0x12/0x20
[78402.507708] trim_stale_devices+0x11d/0x150
[78402.507711] trim_stale_devices+0xa9/0x150
[78402.507713] trim_stale_devices+0xbb/0x150
[78402.507715] ? get_slot_status+0xa3/0xe0
[78402.507718] acpiphp_check_bridge.part.7+0x100/0x140
[78402.507720] acpiphp_hotplug_notify+0x18e/0x220
[78402.507723] ? free_bridge+0x100/0x100
[78402.507725] acpi_device_hotplug+0xa4/0x4b0
[78402.507727] acpi_hotplug_work_fn+0x1e/0x30
[78402.507730] process_one_work+0x1de/0x410
[78402.507732] worker_thread+0x32/0x410
[78402.507735] kthread+0x121/0x140
[78402.507737] ? process_one_work+0x410/0x410
[78402.507739] ? kthread_create_worker_on_cpu+0x70/0x70
[78402.507742] ? do_syscall_64+0x73/0x130
[78402.507744] ? SyS_exit_group+0x14/0x20
[78402.507746] ret_from_fork+0x35/0x40
[78402.507748] Code: c2 bf 2c 94 b6 48 c7 c7 90 0d e4 b6 e8 ed 92 ee ff 48 89 df e8 85 c7 c6 ff 5b 5d c3 66 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb <8b> 77 34 8b 7f 2c c1 e7 14 0b 7b 30 e8 22 15 ca ff 48 c7 c7 e0
[78402.507776] RIP: tty_unregister_driver+0xd/0x70 RSP: ffffa16e94de3af0
[78402.507777] CR2: 0000000000000034
[78402.507779] ---[ end trace 5ed527061c666404 ]---
[78402.808128] thinkpad_acpi: EC reports that Thermal Table has changed

Some time later (cable *not* plugged back in yet):

[79559.939390] xhci_hcd 0000:0b:00.0: xHCI host controller not responding, assume dead
[79559.939423] xhci_hcd 0000:0b:00.0: HC died; cleaning up
[79559.939448] xhci_hcd 0000:0b:00.0: Timeout while waiting for configure endpoint command
[79559.939494] usb 3-1: Not enough bandwidth for altsetting 1
[79559.939504] usb 3-1: 2:1: usb_set_interface failed (-62)
[79559.940534] usb 3-1: Not enough bandwidth for altsetting 1
[79559.940546] usb 3-1: 2:1: usb_set_interface failed (-19)
[79559.940777] usb 3-1: Not enough bandwidth for altsetting 1
[79559.940787] usb 3-1: 2:1: usb_set_interface failed (-19)
[79559.941181] usb 3-1: Not enough bandwidth for altsetting 1
[79559.941188] usb 3-1: 2:1: usb_set_interface failed (-19)
[79559.941510] usb 3-1: Not enough bandwidth for altsetting 1
[79559.941517] usb 3-1: 2:1: usb_set_interface failed (-19)
[79559.941863] usb 3-1: USB disconnect, device number 2
[79560.000039] usb 3-4: USB disconnect, device number 3

When I plug the Thunderbolt 3 cable back in, the monitor ports of Thunderbolt 3 Dock are working, but USB is not.

Hardware is a Lenovo ThinkPad X1 from 2017. This started happening after updating to Ubuntu 18.04. It did not happen on Ubuntu 17.10.
---
ApportVersion: 2.20.9-0ubuntu7
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: alfred 5177 F.... pulseaudio
 /dev/snd/controlC0: alfred 5177 F.... pulseaudio
CurrentDesktop: MATE
DistroRelease: Ubuntu 18.04
InstallationDate: Installed on 2017-10-04 (210 days ago)
InstallationMedia: Ubuntu-MATE 17.04 "Zesty Zapus" - Release amd64 (20170412)
MachineType: LENOVO 20HRCTO1WW
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
Package: linux (not installed)
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/root/default@/boot/vmlinuz-4.15.0-20-generic root=ZFS=zroot/root/default ro quiet splash vt.handoff=1
ProcVersionSignature: Ubuntu 4.15.0-20.21-generic 4.15.17
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-20-generic N/A
 linux-backports-modules-4.15.0-20-generic N/A
 linux-firmware 1.173
Tags: bionic
Uname: Linux 4.15.0-20-generic x86_64
UpgradeStatus: Upgraded to bionic on 2018-05-02 (1 days ago)
UserGroups: adm cdrom dip docker lpadmin plugdev sambashare sudo
WifiSyslog:

_MarkForUpload: True
dmi.bios.date: 08/30/2017
dmi.bios.vendor: LENOVO
dmi.bios.version: N1MET38W (1.23 )
dmi.board.asset.tag: Not Available
dmi.board.name: 20HRCTO1WW
dmi.board.vendor: LENOVO
dmi.board.version: SDK0J40697 WIN
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.modalias: dmi:bvnLENOVO:bvrN1MET38W(1.23):bd08/30/2017:svnLENOVO:pn20HRCTO1WW:pvrThinkPadX1Carbon5th:rvnLENOVO:rn20HRCTO1WW:rvrSDK0J40697WIN:cvnLENOVO:ct10:cvrNone:
dmi.product.family: ThinkPad X1 Carbon 5th
dmi.product.name: 20HRCTO1WW
dmi.product.version: ThinkPad X1 Carbon 5th
dmi.sys.vendor: LENOVO

CVE References

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1768852

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: cosmic

apport information

tags: added: apport-collected bionic
description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.17 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.17-rc4

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Alfred Krohmer (devkid) wrote :

As stated in the original post, the issue started happening after the upgrade from 17.10 to 18.04.

Additional info: system seems to hang up / becomes unresponsive when I trigger suspend to RAM after unplugging the TB3 cable.

Will test with an upstream kernel soon.

Alfred Krohmer (devkid) wrote :

Found the following ticket in Red Hats bug tracker:
https://bugzilla.redhat.com/show_bug.cgi?id=1565131

This seems to be the fix:
https://patchwork.kernel.org/patch/10340045/

This is merged to torvalds/linux since v4.17-rc3.

I'm assuming this is enough to mark the bug as confirmed? Would it be possible to port this back into Ubuntu 18.04?

tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Kai-Heng Feng (kaihengfeng) wrote :

I built a kernel with the patch. Please try it:
https://people.canonical.com/~khfeng/lp1768852/

Alfred Krohmer (devkid) wrote :

Thank you very much, it's working great!

description: updated
Tommy Nevtelen (dal) wrote :

I can also confirm that his fixes the issue.

Changed in linux (Ubuntu Bionic):
status: New → Fix Committed
Changed in linux (Ubuntu Cosmic):
status: Confirmed → In Progress
cardonator (bcardon) wrote :

Just want to confirm that this fixed the problem for me as well. And just to be absolutely clear, here are some of the problems I was having when attaching/detaching Thunderbolt 3 on my XPS 13 9360:

1) Re-attaching the dock, USB would not function
2) Unable to suspend, reboot, or shutdown after detaching the dock
3) Eventual system hang/freeze
4) Occasional black screen in KDE

After running this kernel for a bit, it seems all of these problems are gone. Looking forward to having the patch merged! For what it's worth, I began having these issues on the KDE Neon distribution sometime after the 4.10 kernel was released.

Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
cardonator (bcardon) wrote :

I can confirm that:

1) rolling back to the latest mainstream kernel for 18.04 is broken (4.15.0-22-generic)

2) Installing the updated kernel from bionic-proposed does fix the problem (4.15.0-23-generic)

tags: added: verification-done-bionic
removed: verification-needed-bionic
rathboma (matthew-rathbone) wrote :

Thanks all for working on this. I've been trying to nail down the issues on my Thinkpad X1 Carbon for a couple of weeks and this seems to be the solution.

I'm a little unclear from the bug report when this bug fix will be released, will it be in July with the 18.04.1 update?

Launchpad Janitor (janitor) wrote :
Download full text (11.4 KiB)

This bug was fixed in the package linux - 4.15.0-23.25

---------------
linux (4.15.0-23.25) bionic; urgency=medium

  * linux: 4.15.0-23.25 -proposed tracker (LP: #1772927)

  * arm64 SDEI support needs trampoline code for KPTI (LP: #1768630)
    - arm64: mmu: add the entry trampolines start/end section markers into
      sections.h
    - arm64: sdei: Add trampoline code for remapping the kernel

  * Some PCIe errors not surfaced through rasdaemon (LP: #1769730)
    - ACPI: APEI: handle PCIe AER errors in separate function
    - ACPI: APEI: call into AER handling regardless of severity

  * qla2xxx: Fix page fault at kmem_cache_alloc_node() (LP: #1770003)
    - scsi: qla2xxx: Fix session cleanup for N2N
    - scsi: qla2xxx: Remove unused argument from qlt_schedule_sess_for_deletion()
    - scsi: qla2xxx: Serialize session deletion by using work_lock
    - scsi: qla2xxx: Serialize session free in qlt_free_session_done
    - scsi: qla2xxx: Don't call dma_free_coherent with IRQ disabled.
    - scsi: qla2xxx: Fix warning in qla2x00_async_iocb_timeout()
    - scsi: qla2xxx: Prevent relogin trigger from sending too many commands
    - scsi: qla2xxx: Fix double free bug after firmware timeout
    - scsi: qla2xxx: Fixup locking for session deletion

  * Several hisi_sas bug fixes (LP: #1768974)
    - scsi: hisi_sas: dt-bindings: add an property of signal attenuation
    - scsi: hisi_sas: support the property of signal attenuation for v2 hw
    - scsi: hisi_sas: fix the issue of link rate inconsistency
    - scsi: hisi_sas: fix the issue of setting linkrate register
    - scsi: hisi_sas: increase timer expire of internal abort task
    - scsi: hisi_sas: remove unused variable hisi_sas_devices.running_req
    - scsi: hisi_sas: fix return value of hisi_sas_task_prep()
    - scsi: hisi_sas: Code cleanup and minor bug fixes

  * [bionic] machine stuck and bonding not working well when nvmet_rdma module
    is loaded (LP: #1764982)
    - nvmet-rdma: Don't flush system_wq by default during remove_one
    - nvme-rdma: Don't flush delete_wq by default during remove_one

  * Warnings/hang during error handling of SATA disks on SAS controller
    (LP: #1768971)
    - scsi: libsas: defer ata device eh commands to libata

  * Hotplugging a SATA disk into a SAS controller may cause crash (LP: #1768948)
    - ata: do not schedule hot plug if it is a sas host

  * ISST-LTE:pKVM:Ubuntu1804: rcu_sched self-detected stall on CPU follow by CPU
    ATTEMPT TO RE-ENTER FIRMWARE! (LP: #1767927)
    - powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()
    - powerpc/64s: return more carefully from sreset NMI
    - powerpc/64s: sreset panic if there is no debugger or crash dump handlers

  * fsnotify: Fix fsnotify_mark_connector race (LP: #1765564)
    - fsnotify: Fix fsnotify_mark_connector race

  * Hang on network interface removal in Xen virtual machine (LP: #1771620)
    - xen-netfront: Fix hang on device removal

  * HiSilicon HNS NIC names are truncated in /proc/interrupts (LP: #1765977)
    - net: hns: Avoid action name truncation

  * Ubuntu 18.04 kernel crashed while in degraded mode (LP: #1770849)
    - SAUCE: powerpc/perf: Fix memory allocation for...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Cosmic):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.