Almost all USB ports suddenly stopped working; unbootable

Bug #1956849 reported by dgatwood
40
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

This package needs to be pulled NOW. It disables almost all USB-3.0 and USB-C ports completely.

Even though I had automatic software updates turned OFF (or so I thought), my Mac Pro suddenly got a new kernel when I rebooted it this morning:

Linux macpro-obs 5.11.0-44-generic #48~20.04.2-Ubuntu SMP Tue Dec 14 15:36:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

and it contains a P0 showstopper bug. Upon rebooting, I got dropped into the initrd prompt because Linux could not see the external USB drive that I'm booting from (WDBAGF5000AGY-WESN).

I wasted an entire day figuring out what was wrong, and even went so far as to order a replacement SSD, planning to rebuild everything from scratch, because the drive failed to appear in every single USB port I tried.

Then I discovered that a different SSD didn't work, either. At that point, I realized that something else was wrong, and I kept trying until I found one other port that worked. I was then able to boot and get dmesg and lsusb output.

This kernel update broke not only the built-in ports, but also the ports on a generic USB-C PCIe card (Amazon B08PF8XR73).

Mac Pro built-in USB-3.0(A) ports (2x): working
Mac Pro built-in USB-C ports (4x): dead
USB-C PCIe card USB-C ports (2x): dead
USB-C PCIe card USB-3.0(A) ports (5x): dead

All devices fail to appear in lsusb when attached to the port, including an Apple USB keyboard, an Anker USB-C Ethernet adapter, a WD SSD, and a Sandisk SSD.

I'm going to roll back my kernel to a working kernel, but this package needs to be pulled NOW before it affects too many people. This is too catastrophic a bug to wait even a day.

Tags: hirsute
Revision history for this message
dgatwood (dgatwood) wrote :
Revision history for this message
Aaron (aklinker1) wrote :

What kernel version did you have to revert to? I'm also experiencing this bug

Revision history for this message
Jazz (jazzzz) wrote :

I had the same issue, also with 5.11.0-44-generic, reverting to kernel 5.11.0-41-generic fixes it. Prior to this I upgraded from Ubuntu 21.04 to 21.10 and no USB port worked, with kernel 5.13.0-23-generic.

Revision history for this message
dgatwood (dgatwood) wrote :

-38 was fine.

Revision history for this message
Heitor Alves de Siqueira (halves) wrote :

@dgatwood would you be able to upload kernel logs and lsusb output from the -44 kernel? From what you're describing it seems like a possible issue with host controllers, but it's hard to tell without looking at what is going in the kernel.

You can use `apport-collect 1956849` to upload relevant data directly to this LP bug.

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi @dgatwood, @Jazz. I took a look at the USB patches that landed between 5.11.0-41-generic and 5.11.0-44-generic in the Focal HWE tree:

https://paste.ubuntu.com/p/mfJCw5XmBv/

One commit which I thought was interesting was:

commit 5caa90d5fedfcdf57a42c272896cfd0ef6b5c667
commit-upstream: 5255660b208aebfdb71d574f3952cf48392f4306
Author: Jonathan Bell <email address hidden>
Date: Fri Oct 8 12:25:44 2021 +0300
Subject: xhci: add quirk for host controllers that don't update endpoint DCS
Link: https://github.com/torvalds/linux/commit/5255660b208aebfdb71d574f3952cf48392f4306

I took a look at the Amazon B08PF8XR73 card you have:

https://www.amazon.com/YEELIYA-PCI-7-Port-USB-C-USB/dp/B08PF8XR73

The page mentions "Advanced Dual Chip Uses Fresco Logic 1100 +VL820 dual chips".

The above commit is a quirk for VIA VL8xx chipsets, and looking at the code:

 if (pdev->vendor == PCI_VENDOR_ID_VIA && pdev->device == 0x3483) {
  xhci->quirks |= XHCI_LPM_SUPPORT;
  xhci->quirks |= XHCI_EP_CTX_BROKEN_DCS;
 }

It was applied to PCI_VENDOR_ID_VIA && pdev->device == 0x3483. Looking these up:

https://devicehunt.com/view/type/pci/vendor/1106/device/3483

We see that the VL820 is listed:

VX800/820-Series PCI-Express Root Port 1

So this new piece of code would apply to your PCIe card.

There was a similar bug filed a few days ago:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1956542

Looking at the lsusb output, they are also using a VIA Labs VL813 Hub, and USB 3 no longer works.

Bus 003 Device 056: ID 2109:2813 VIA Labs, Inc. VL813 Hub

In this case though, the VL813 is under a different vendor and device ID, and the code in that commit would not apply:

https://devicehunt.com/view/type/usb/vendor/2109/device/2813

Since its 0x2813 and not 0x3483.

Still, this is very interesting.

@Jazz, does your system have any "VIA Labs, Inc" USB host devices? Check "lsusb -v".

Note that the commit was also present in 5.13.0-23-generic on Impish, but is not in 5.13.0-22-generic.

@dgatwood, could you please collect "lsusb -v" as well, and maybe try 5.11.0-41-generic?

Thanks,
Matthew

Revision history for this message
dgatwood (dgatwood) wrote :
Revision history for this message
dgatwood (dgatwood) wrote :

Also attached lsusb output from a working kernel for comparison. And yes, it looks like this is a VIA-chipset-specific regression.

I stand corrected about the Mac's built-in USB-C ports. Those don't work even after rolling back the kernel to the previous version. I'll file a separate bug for that.

Revision history for this message
Jazz (jazzzz) wrote :

Yes, I have "VIA Labs, Inc. Hub" device:

Bus 002 Device 010: ID 2109:2811 VIA Labs, Inc. Hub
Device Descriptor:
  bLength 18
  bDescriptorType 1
  bcdUSB 2.10
  bDeviceClass 9 Hub
  bDeviceSubClass 0
  bDeviceProtocol 1 Single TT
  bMaxPacketSize0 64
  idVendor 0x2109 VIA Labs, Inc.
  idProduct 0x2811 Hub
  bcdDevice 90.90
  iManufacturer 1 VIA Labs, Inc.
  iProduct 2 USB2.0 Hub
  iSerial 0
  bNumConfigurations 1
  Configuration Descriptor:
    bLength 9
    bDescriptorType 2
    wTotalLength 0x0019
    bNumInterfaces 1
    bConfigurationValue 1
    iConfiguration 0
    bmAttributes 0xe0
      Self Powered
      Remote Wakeup
    MaxPower 0mA
    Interface Descriptor:
      bLength 9
      bDescriptorType 4
      bInterfaceNumber 0
      bAlternateSetting 0
      bNumEndpoints 1
      bInterfaceClass 9 Hub
      bInterfaceSubClass 0
      bInterfaceProtocol 0 Full speed (or root) hub
      iInterface 0
      Endpoint Descriptor:
        bLength 7
        bDescriptorType 5
        bEndpointAddress 0x81 EP 1 IN
        bmAttributes 3
          Transfer Type Interrupt
          Synch Type None
          Usage Type Data
        wMaxPacketSize 0x0001 1x 1 bytes
        bInterval 12

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi @dgatwood, @jazzzz,

I think we should try a test kernel with the VIA chipset changes reverted, to see if it is the commit causing you issues.

I have a couple of test kernels based on 5.11.0-44-generic for Focal HWE and Hirsute, and 5.13.0-23-generic for Impish currently building, and they have the below commit reverted:

commit 5caa90d5fedfcdf57a42c272896cfd0ef6b5c667
commit-upstream: 5255660b208aebfdb71d574f3952cf48392f4306
Author: Jonathan Bell <email address hidden>
Date: Fri Oct 8 12:25:44 2021 +0300
Subject: xhci: add quirk for host controllers that don't update endpoint DCS
Link: https://github.com/torvalds/linux/commit/5255660b208aebfdb71d574f3952cf48392f4306

Its building in https://launchpad.net/~mruffell/+archive/ubuntu/lp1956849-test and will be ready in a few hours, although I will post proper instructions on how to install the test kernel tomorrow.

I will write back once the kernels have finished building, and I have had a chance to boot test them.

Thanks,
Matthew

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi @dgatwood, @jazzzz,

The test kernels have finished building and are ready for you to try.

Like I said before, they are 5.11.0-44-generic for Focal HWE and Hirsute, and 5.13.0-23-generic for Impish, with just the one commit reverted:

commit 5caa90d5fedfcdf57a42c272896cfd0ef6b5c667
commit-upstream: 5255660b208aebfdb71d574f3952cf48392f4306
Author: Jonathan Bell <email address hidden>
Date: Fri Oct 8 12:25:44 2021 +0300
Subject: xhci: add quirk for host controllers that don't update endpoint DCS
Link: https://github.com/torvalds/linux/commit/5255660b208aebfdb71d574f3952cf48392f4306

Let's see if this is the commit which is causing you issues. Could you please install a test kernel and let me know if your USB ports work again?

Please note, these test packages are NOT SUPPORTED by Canonical and are for TEST PURPOSES ONLY. ONLY install in a dedicated test environment.

Instructions to install:
1) sudo add-apt-repository ppa:mruffell/lp1956849-test
2) sudo apt update

For Focal HWE and Hirsute users:
3) sudo apt install linux-image-unsigned-5.11.0-44-generic linux-modules-5.11.0-44-generic linux-modules-extra-5.11.0-44-generic linux-headers-5.11.0-44-generic
For Impish users:
3) sudo apt install linux-image-unsigned-5.13.0-23-generic linux-modules-5.13.0-23-generic linux-modules-extra-5.13.0-23-generic linux-headers-5.13.0-23-generic

4) sudo reboot
5) uname -rv
Focal:
5.11.0-44-generic #48~20.04.2+TEST1956849v20220112b1-Ubuntu SMP Wed Jan 12 02:26:0
Hirsute:
5.11.0-44-generic #48+TEST1956849v20220112b1-Ubuntu SMP Wed Jan 12 02:31:07 UTC 20
Impish:
5.13.0-23-generic #23+TEST1956849v20220112b1-Ubuntu SMP Wed Jan 12 02:37:08 UTC 20

If you are asked to abort the current kernel removal, say no.

You may need to change your grub config to boot the correct kernel. You can follow these instructions to do that: https://paste.ubuntu.com/p/WGpCWTPyTj/

Please make sure the uname is correct on boot. Sometimes newer kernels get pulled in due to metapackage dependencies not liking the linux-image-unsigned package.

Let me know if your USB ports start to work again. If they do, we can revert the commit in the Ubuntu kernels, and go and report the issue upstream and get it fixed. If it doesn't, then we will need to do some more tests to try find the commit which introduced the regression.

Thanks,
Matthew

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1956849

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: hirsute
Revision history for this message
Isaac True (itrue) wrote (last edit ):

This problem seems to occur very randomly for me on 20.04 with the 5.13 HWE kernel (5.13.0-27-generic). Here's the relevant dmesg from after the problem occurs:

[ 1.751416] wacom 0003:056A:00EC.0002: hidraw0: USB HID v1.10 Mouse [Tablet ISD-V4] on usb-0000:00:1d.0-1.8/input0
[ 6.738809] usb 3-3: device descriptor read/8, error -110
[ 6.846682] usb 3-3: new SuperSpeed USB device number 2 using xhci_hcd
[ 12.114819] usb 3-3: device descriptor read/8, error -110
[ 12.358687] usb 2-4: new full-speed USB device number 3 using xhci_hcd
[ 12.508071] usb 2-4: New USB device found, idVendor=8087, idProduct=07dc, bcdDevice= 0.01
[ 12.508086] usb 2-4: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[ 12.598690] usb 2-3.2: new high-speed USB device number 4 using xhci_hcd
[ 12.711168] usb 2-3.2: New USB device found, idVendor=1a40, idProduct=0101, bcdDevice= 1.00
[ 12.711184] usb 2-3.2: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[ 12.711190] usb 2-3.2: Product: USB 2.0 Hub [MTT]
[ 12.712162] hub 2-3.2:1.0: USB hub found
[ 12.712243] hub 2-3.2:1.0: 4 ports detected
[ 23.002688] xhci_hcd 0000:00:14.0: Abort failed to stop command ring: -110
[ 23.002707] xhci_hcd 0000:00:14.0: xHCI host controller not responding, assume dead
[ 23.002744] xhci_hcd 0000:00:14.0: HC died; cleaning up
[ 23.002767] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
[ 23.002771] clocksource: 'acpi_pm' wd_now: af831a wd_last: a565c0 mask: ffffff
[ 23.002775] clocksource: 'tsc' cs_now: 10bdde5e61 cs_last: e237c090a mask: ffffffffffffffff
[ 23.002778] tsc: Marking TSC unstable due to clocksource watchdog
[ 23.002791] xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command
[ 23.002798] xhci_hcd 0000:00:14.0: Error while assigning device slot ID
[ 23.002801] xhci_hcd 0000:00:14.0: Max number of devices this xHCI host supports is 32.
[ 23.002805] usb 2-3.2-port1: couldn't allocate usb_device
[ 23.003022] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
[ 23.003024] sched_clock: Marking unstable (23002093033, 928871)<-(23061485899, -58464196)
[ 23.003286] clocksource: Checking clocksource tsc synchronization from CPU 2.
[ 23.003348] clocksource: Switched to clocksource acpi_pm
[ 23.418688] usb 3-3: device not accepting address 3, error -108
[ 23.418792] usb usb3-port3: attempt power cycle
[ 23.418799] usb usb3-port3: failed to disable port power
[ 23.418804] usb usb3-port3: couldn't allocate usb_device
[ 23.418855] usb usb2-port5: couldn't allocate usb_device
[ 23.418902] usb 2-3: USB disconnect, device number 2
[ 23.418909] usb 2-3.2: USB disconnect, device number 4
[ 23.419931] usb 2-4: USB disconnect, device number 3

Revision history for this message
Jazz (jazzzz) wrote :

@mruffell I tried to boot your custom kernel, but could not because of an invalid signature. Seems to be because of secure boot, but I don't know how to workaround this.

Revision history for this message
Peter Maffter (pmaff) wrote :

I am experiencing the same problem with a brandnew brother DS-740D scanner,
when connecting the scanner to a USB3.0 port on a MSI GE63 7RD laptop
with Ubuntu 18.04 LTS, mouse and keyboard connected to USB are dead afterwards.

touchpad and keyboard directly on the machine are still working.

On Win10 the scanner works.

journalctl | grep -7 "controller not responding"
Feb 07 14:02:26 machine5 kernel: xhci_hcd 0000:00:14.0: Abort failed to stop command ring: -110
Feb 07 14:02:26 machine5 kernel: xhci_hcd 0000:00:14.0: xHCI host controller not responding, assume dead
Feb 07 14:02:26 machine5 kernel: xhci_hcd 0000:00:14.0: HC died; cleaning up
Feb 07 14:02:26 machine5 kernel: xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command
Feb 07 14:02:26 machine5 kernel: usb 1-3: USB disconnect, device number 2
Feb 07 14:02:26 machine5 kernel: usb 1-3.1: USB disconnect, device number 4
Feb 07 14:02:26 machine5 kernel: usb 1-3.1.3: USB disconnect, device number 7
Feb 07 14:02:26 machine5 kernel: usb 1-3.1.3.1: USB disconnect, device number 10

No "VIA Labs, Inc" USB host devices.

Linux machine5 4.15.0-167-generic #175-Ubuntu SMP Wed Jan 5 01:56:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

So having a brandnew scanner and not getting it to work because of Ubuntu is quite a new experience.
:-/

Peter Maffter (pmaff)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Peter Maffter (pmaff) wrote (last edit ):

The following devices are breaking away after connecting the scanner
(compared lsusb -v -v -v before and afterwards):

Bus 002 Device 003: ID 05e3:0612 Genesys Logic, Inc.
idVendor 0x05e3 Genesys Logic, Inc.

Bus 002 Device 002: ID 05e3:0612 Genesys Logic, Inc.
  idVendor 0x05e3 Genesys Logic, Inc.

Bus 001 Device 003: ID 1038:1122 SteelSeries ApS
  idVendor 0x1038 SteelSeries ApS

Bus 001 Device 009: ID 05e3:0610 Genesys Logic, Inc. 4-port hub
  idVendor 0x05e3 Genesys Logic, Inc.

Bus 001 Device 006: ID 1a40:0101 Terminus Technology Inc. Hub
  idVendor 0x1a40 Terminus Technology Inc.

Bus 001 Device 007: ID 0b95:6802 ASIX Electronics Corp.
  idVendor 0x0b95 ASIX Electronics Corp.

Bus 001 Device 004: ID 1a40:0101 Terminus Technology Inc. Hub
  idVendor 0x1a40 Terminus Technology Inc.

Bus 001 Device 002: ID 05e3:0610 Genesys Logic, Inc. 4-port hub
  idVendor 0x05e3 Genesys Logic, Inc.

Wireless also seems to be kicked out
Bus 001 Device 014: ID 2357:0009
  idVendor 0x2357
      bFunctionClass 224 Wireless

Bus 001 Device 008: ID 5986:0547 Acer, Inc
seems to be the video camera
  idVendor 0x5986 Acer, Inc
      bFunctionClass 14 Video

Bus 001 Device 005: ID 8087:0aa7 Intel Corp.
  idVendor 0x8087 Intel Corp.
  bDeviceClass 224 Wireless
  bDeviceSubClass 1 Radio Frequency
  bDeviceProtocol 1 Bluetooth
so bluetooth is dead also. :-(

I am using a LTE/WLAN router which
is connected to USB, the connection is also lost of course.

Keyboard on USB and Logitech, Inc. Unifying Receiver for mouse
are kicked out as already written.

I have to reboot after that error.

If I connect the scanner via adapter to the Type-C USB 3.1 Gen2 port then everything works ok and I can scan.

Bus 004 Device 005: ID 04f9:0469 Brother Industries, Ltd
Device Descriptor:
  bLength 18
  bDescriptorType 1
  bcdUSB 3.20
  bDeviceClass 0 (Defined at Interface level)
  bDeviceSubClass 0
  bDeviceProtocol 0
  bMaxPacketSize0 9
  idVendor 0x04f9 Brother Industries, Ltd
  idProduct 0x0469
...
At least one possibility.

Revision history for this message
Jazz (jazzzz) wrote :

I no longer have the issue with 5.13.0-52-generic, looks like something was fixed.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.