[regression] USB device is not detected during boot

Bug #1939638 reported by Chris Chiu
32
This bug affects 4 people
Affects Status Importance Assigned to Milestone
HWE Next
Fix Released
Low
Unassigned
linux (Ubuntu)
Fix Released
Critical
Unassigned
Focal
Fix Released
Undecided
Unassigned
Hirsute
Fix Released
Undecided
Unassigned
linux-intel (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Confirmed
Undecided
Jesse Sung
Hirsute
Invalid
Undecided
Unassigned
linux-intel-5.13 (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Undecided
Jesse Sung
Hirsute
Invalid
Undecided
Unassigned

Bug Description

[SRU Justification]

[Impact]
The USB devices (keyboard, storage...) are failed to be detected when connecting to the problematic root hubs which need longer PowerOn-to-PowerGood delay than it claims in the hub descriptor. It's caused by the upstream fix 90d28fb53d4a ("usb: core: reduce power-on-good delay time of root hub").

[Fix]
Reverting the upstream fix until a formal fix been placed.

[Test Case]
1. Plug the USB device to the ports of problematic root hub.
2. Power on the machine.
3. Check if the USB device can work or not after boot.

[Regression Potential]
Low. The longer delay is proven safe in old kernels.

========== Original Bug Description ==========

[Summary]

The USB devices (keyboard, storage...) are failed to be detected during boot after upgrade to UBUNTU focal kernel 5.4.0-78 and hirsute 5.11.0-26. However, they will be detected and working ok after re-plugging. The kernel output shows as down below during boot
[ 39.350435] hub 1-0:1.0: USB hub found
[ 39.398835] hub 1-0:1.0: 12 ports detected
[ 39.622744] usb usb1-port3: couldn't allocate usb_device

And when I plug out then plug in the same device, it shows
[57210.794140] usb 1-3: new low-speed USB device number 4 using xhci_hcd
[57210.951289] usb 1-3: New USB device found, idVendor=17ef, idProduct=6099, bcdDevice= 1.14
[57210.951293] usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=0

After doing kernel bisecting, we found the upstream commit https://github.com/torvalds/linux/commit/90d28fb53d4a51299ff324dede015d5cb11b88a2 makes the difference. It indicates that the delay for the root hub from power_on to power_good is not long enough. There was no problem if the delay is 100 ms. From the Hub Descriptor of the root hub, the value is 10 * 2 milliseconds. And the XHCI spec also says in section 5.4.8
"""
The host is required to have power stable to the port within 20 milliseconds of the '0' to '1' transition of PP. If PPC = '1', software is responsible for waiting 20ms.
"""

The commit seems to follow the SPEC but could cause problems on some hubs.

[Reproduce Steps]
1. Plug the USB device to the physical port #1 and #4 which belongs to high-speed hub.
2. Power on the machine.
3. Check if the USB device can work or not after boot.

[Results]
Expected:
  All usb devices connect to the hub should work OK.

Actual:
  USB devices connects to high-speed hub can not be probed.

[Additional Information]
Kernel Version: focal 5.4.0-78 and hirsute 5.11.0-26

[Upstream bug]
https://bugzilla.kernel.org/show_bug.cgi?id=214021

CVE References

Revision history for this message
Chris Chiu (mschiu77) wrote :
summary: - USB device is not detected during boot
+ [regression] USB device is not detected during boot
Chris Chiu (mschiu77)
tags: added: austin oem-priority originate-from-1939113
Changed in linux (Ubuntu Focal):
status: New → In Progress
Changed in linux (Ubuntu Hirsute):
status: New → In Progress
Changed in linux (Ubuntu):
importance: Undecided → Critical
Chris Chiu (mschiu77)
description: updated
Revision history for this message
Chris Chiu (mschiu77) wrote :
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Hirsute):
status: In Progress → Fix Committed
Changed in linux-intel (Ubuntu Hirsute):
status: New → Invalid
Changed in linux-intel (Ubuntu Focal):
assignee: nobody → Jesse Sung (wenchien)
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-hirsute' to 'verification-done-hirsute'. If the problem still exists, change the tag 'verification-needed-hirsute' to 'verification-failed-hirsute'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-hirsute
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1939638

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Chris Chiu (mschiu77)
tags: added: verification-done-hirsute
removed: verification-needed-hirsute
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Chris Chiu (mschiu77)
tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-intel (Ubuntu Focal):
status: New → Confirmed
Changed in linux-intel (Ubuntu):
status: New → Confirmed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (36.6 KiB)

This bug was fixed in the package linux - 5.4.0-84.94

---------------
linux (5.4.0-84.94) focal; urgency=medium

  * focal/linux: 5.4.0-84.94 -proposed tracker (LP: #1941767)

  * Server boot failure after adding checks for ACPI IRQ override (LP: #1941657)
    - Revert "ACPI: resources: Add checks for ACPI IRQ override"

linux (5.4.0-83.93) focal; urgency=medium

  * focal/linux: 5.4.0-83.93 -proposed tracker (LP: #1940159)

  * fails to launch linux L2 guests on AMD (LP: #1940134) // CVE-2021-3653
    - KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl
      (CVE-2021-3653)

  * fails to launch linux L2 guests on AMD (LP: #1940134)
    - SAUCE: Revert "UBUNTU: SAUCE: KVM: nSVM: avoid picking up unsupported bits
      from L2 in int_ctl"

linux (5.4.0-82.92) focal; urgency=medium

  * focal/linux: 5.4.0-82.92 -proposed tracker (LP: #1939799)

  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2021.08.16)

  * CVE-2021-3656
    - SAUCE: KVM: nSVM: always intercept VMLOAD/VMSAVE when nested

  * CVE-2021-3653
    - SAUCE: KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl

  * [regression] USB device is not detected during boot (LP: #1939638)
    - SAUCE: Revert "usb: core: reduce power-on-good delay time of root hub"

  * dev_forward_skb: do not scrub skb mark within the same name space
    (LP: #1935040)
    - dev_forward_skb: do not scrub skb mark within the same name space

  * XPS 9510 (TGL) Screen Brightness could not be changed (LP: #1933566)
    - SAUCE: drm/i915: Force DPCD backlight mode for Dell XPS 9510(TGL)

  * Acer Aspire 5 sound driver issues (LP: #1930188)
    - ALSA: hda/realtek: headphone and mic don't work on an Acer laptop

  * Sony Dualshock 4 usb dongle crashes the whole system (LP: #1935846)
    - HID: sony: Workaround for DS4 dongle hotplug kernel crash.

  * [21.10 FEAT] KVM: Provide a secure guest indication (LP: #1933173)
    - s390/uv: add prot virt guest/host indication files
    - s390/uv: fix prot virt host indication compilation

  * Skip rtcpie test in kselftests/timers if the default RTC device does not
    exist (LP: #1937991)
    - selftests: timers: rtcpie: skip test if default RTC device does not exist

  * Focal update: v5.4.133 upstream stable release (LP: #1938713)
    - drm/mxsfb: Don't select DRM_KMS_FB_HELPER
    - drm/zte: Don't select DRM_KMS_FB_HELPER
    - drm/amd/amdgpu/sriov disable all ip hw status by default
    - drm/vc4: fix argument ordering in vc4_crtc_get_margins()
    - net: pch_gbe: Use proper accessors to BE data in pch_ptp_match()
    - drm/amd/display: fix use_max_lb flag for 420 pixel formats
    - hugetlb: clear huge pte during flush function on mips platform
    - atm: iphase: fix possible use-after-free in ia_module_exit()
    - mISDN: fix possible use-after-free in HFC_cleanup()
    - atm: nicstar: Fix possible use-after-free in nicstar_cleanup()
    - net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
    - drm/mediatek: Fix PM reference leak in mtk_crtc_ddp_hw_init()
    - reiserfs: add check for invalid 1st journal block
    - drm/virtio: Fix double free on probe failure
    - dr...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (62.5 KiB)

This bug was fixed in the package linux - 5.11.0-34.36

---------------
linux (5.11.0-34.36) hirsute; urgency=medium

  * hirsute/linux: 5.11.0-34.36 -proposed tracker (LP: #1941766)

  * Server boot failure after adding checks for ACPI IRQ override (LP: #1941657)
    - Revert "ACPI: resources: Add checks for ACPI IRQ override"

linux (5.11.0-33.35) hirsute; urgency=medium

  * hirsute/linux: 5.11.0-33.35 -proposed tracker (LP: #1940101)

  * libvirtd fails to create VM (LP: #1940107)
    - sched: Stop PF_NO_SETAFFINITY from being inherited by various init system
      threads

linux (5.11.0-32.34) hirsute; urgency=medium

  * hirsute/linux: 5.11.0-32.34 -proposed tracker (LP: #1939769)

  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2021.08.16)

  * CVE-2021-3656
    - SAUCE: KVM: nSVM: always intercept VMLOAD/VMSAVE when nested

  * CVE-2021-3653
    - SAUCE: KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl

  * [regression] USB device is not detected during boot (LP: #1939638)
    - SAUCE: Revert "usb: core: reduce power-on-good delay time of root hub"

  * Support builtin revoked certificates (LP: #1932029)
    - [Packaging] build canonical-revoked-certs.pem from branch/arch certs
    - [Packaging] Revoke 2012 UEFI signing certificate as built-in
    - [Config] Configure CONFIG_SYSTEM_REVOCATION_KEYS with revoked keys

  * Support importing mokx keys into revocation list from the mok table
    (LP: #1928679)
    - SAUCE: integrity: add informational messages when revoking certs

  * Support importing mokx keys into revocation list from the mok table
    (LP: #1928679) // CVE-2020-26541 when certificates are revoked via
    MokListXRT.
    - SAUCE: integrity: Load mokx certs from the EFI MOK config table

  * Include product_sku info to modalias (LP: #1938143)
    - firmware/dmi: Include product_sku info to modalias

  * Fix Ethernet not working by hotplug - RTL8106E (LP: #1930645)
    - net: phy: rename PHY_IGNORE_INTERRUPT to PHY_MAC_INTERRUPT
    - SAUCE: r8169: Use PHY_POLL when RTL8106E enable ASPM

  * [SRU][H/OEM-5.10/OEM-5.13/U] Fix system hang after unplug tbt dock
    (LP: #1938689)
    - SAUCE: igc: fix page fault when thunderbolt is unplugged

  * [Regression] Audio card [8086:9d71] not detected after upgrade from linux
    5.4 to 5.8 (LP: #1915117)
    - [Config] set CONFIG_SND_SOC_INTEL_SKYLAKE_HDAUDIO_CODEC to y

  * Backlight (screen brightness) on Lenovo P14s AMD Gen2 inop (LP: #1934557)
    - drm/amdgpu/display: only enable aux backlight control for OLED panels

  * Touchpad not working with ASUS TUF F15 (LP: #1937056)
    - pinctrl: tigerlake: Fix GPIO mapping for newer version of software

  * dev_forward_skb: do not scrub skb mark within the same name space
    (LP: #1935040)
    - dev_forward_skb: do not scrub skb mark within the same name space

  * Fix display output on HP hybrid GFX laptops (LP: #1936296)
    - drm/i915: Invoke another _DSM to enable MUX on HP Workstation laptops

  * [SRU][OEM-5.10/H] UBUNTU: SAUCE: Fix backlight control on Samsung 16727
    panel (LP: #1930527)
    - SAUCE: drm/i915: Force DPCD backlight mode for Samsung 16727 pa...

Changed in linux (Ubuntu Hirsute):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.13.0-16.16

---------------
linux (5.13.0-16.16) impish; urgency=medium

  * impish/linux: 5.13.0-16.16 -proposed tracker (LP: #1942611)

  * Miscellaneous Ubuntu changes
    - [Config] update toolchain in configs

  * Miscellaneous upstream changes
    - Revert "UBUNTU: [Config] Enable CONFIG_UBSAN_BOUNDS"

 -- Andrea Righi <email address hidden> Fri, 03 Sep 2021 16:21:14 +0200

Changed in linux (Ubuntu):
status: Incomplete → Fix Released
Changed in linux-intel-5.13 (Ubuntu Hirsute):
status: New → Invalid
Changed in linux-intel-5.13 (Ubuntu Focal):
assignee: nobody → Jesse Sung (wenchien)
Changed in linux-intel-5.13 (Ubuntu Focal):
status: New → Fix Released
Changed in linux-intel-5.13 (Ubuntu):
status: New → Fix Released
Revision history for this message
Jarkko Toivonen (jttoivon) wrote :

This got broken again in 5.4.0-90 (Ubuntu 20.04). USB2 ports work but USB-C don't. Reattaching USB plug doesn't help, but after suspending and resuming USB starts working again, until the next time I restart the computer. The reason seems to be that the temporary fix was reverted:
* Fix cold plugged USB device on certain PCIe USB cards (LP: #1945211)
 - Revert "UBUNTU: SAUCE: Revert "usb: core: reduce power-on-good delay time of
      root hub""
And if some proper fix was made, it doesn't work in my case.

Revision history for this message
Tilman Schmidt (tgs-bonn) wrote :

Broken again on Ubuntu 20.04LTS after updating to kernel 5.4.0-105-generic.
USB2 devices not detected on USB3 ports after reboot, but appearing after:

$ echo '0000:00:14.0' | sudo tee /sys/bus/pci/drivers/xhci_hcd/unbind
$ sleep 1
$ echo '0000:00:14.0' | sudo tee /sys/bus/pci/drivers/xhci_hcd

Revision history for this message
Dries Oeyen (driesoeyen) wrote :

Confirmed, this seems to be broken again on Ubuntu Core 20 devices using the latest pc-kernel Snap on the 20/stable channel (5.4.0-104.118.1).

Revision history for this message
Luke Nowakowski-Krijger (lukenow) wrote :

Hi, would it be possible to provide the dmesg output after boot?
Could you also provide the output of 'lspci -nn | grep USB' since since a lot of these issues seem to be USB card specific and I am unfamiliar with your hardware.

As Jarkko mentioned, in 5.4.0-90 there were some patches that were reverted to do with https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1945211 that might be now affecting your machines.

Thanks,
- Luke

Revision history for this message
Tilman Schmidt (tgs-bonn) wrote :

a-schmidt@ulanbator:/var/log$ lspci -nn | grep USB
00:14.0 USB controller [0c03]: Intel Corporation C620 Series Chipset Family USB 3.0 xHCI Controller [8086:a1af] (rev 09)

Output of dmesg is attached.
At kernel timestamp 2.544086 you can see the telltale "couldn't allocate usb_device" message.
Starting at 4181.282035 there's the manual unbind, then at 4185.895352 the bind and subsequent detection of the previously missing Smart-UPS device.

Revision history for this message
Dries Oeyen (driesoeyen) wrote :

See the attached dmesg output. Here's the relevant line:

[ 37.400119] usb usb1-port4: couldn't allocate usb_device

The lspci utility is not available on Ubuntu Core 20, but when I boot Ubuntu Desktop 20.04 LTS from a live USB drive it yields the following output on the affected machine:

$ lspci -nn | grep -i USB
00:15.0 USB controller [0c03]: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series USB xHCI [8086:5aa8] (rev 0b)

Some additional observations:

1. Very, very occasionally, the "couldn't allocate usb_device" line won't be present in dmesg output and in this case the affected USB device *will* work immediately after boot. So this does seem to be a race condition of some kind.
2. I ran a quick check just in case: this is still present on the latest pc-kernel Snap update from a couple of days ago on the 20/stable channel (5.4.0-107.121.1).
3. I can confirm that the unbind-bind operation posted by tgs-bonn above makes the affected USB devices appear on my system as well:

$ echo '0000:00:15.0' | sudo tee /sys/bus/pci/drivers/xhci_hcd/unbind
$ sleep 1
$ echo '0000:00:15.0' | sudo tee /sys/bus/pci/drivers/xhci_hcd/bind

Revision history for this message
Luke Nowakowski-Krijger (lukenow) wrote :

Hello Dries and Tilman,

Thank you for providing the logs and other relevant information. For clarification, when you were testing if the issue still exists, was this from cold boot or warm boot?

And if you sometimes see it working from cold boot it indeed seems to be very racey. There now should be a 100ms power-on-to-good delay on all usb 3.0 hubs since 5.4.0-97, which seems like it used to delay whatever race seems to be going on when the roothubs are being registered (as per the discussion here https://bugzilla.kernel.org/show_bug.cgi?id=214021), but maybe that isn't enough to avoid the race in your case?
p.s. If you run lsusb -v as root you should be able to see the bPwrOn2PwrGood value of the root hubs as 100ms.

Also, the patchset that seemed to directly address this issue got reverted due to regressions and never seemed to be re-addressed upstream... there seem to be a lot of fixes for those introduced problems since then though so maybe it would make sense to reintroduce those patches.

I will send an email to the upstream usb mailing list asking about this and see what their thoughts are.

Thanks,
- Luke

Revision history for this message
Tilman Schmidt (tgs-bonn) wrote :

The log I attached in comment #15 is from a cold boot, but I'm seeing all four combinations of outcomes:
cold boot -> failing
cold boot -> working
warm boot -> failing
warm boot -> working
on ~40 Focal servers with identical hardware.
(Except for a different RAID controller in three of them which doesn't seem to matter.)

Revision history for this message
Luke Nowakowski-Krijger (lukenow) wrote :

Is it possible to test on a machine that is not in a secureboot default environment that experienced these issues? i.e. a non-core device. This would be to just confirm if the fix does anything.

If so I put up the kernel .deb files with the possible fix here https://kernel.ubuntu.com/~lukenow/lp1939638/deferred_xhci. If that is possible please let me know if you are still experiencing these issues.

Revision history for this message
Tilman Schmidt (tgs-bonn) wrote :

Sorry, I can't help with that.
All machines on which I have seen the issue so far are production machines under configuration control where I cannot install an experimental kernel.

Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

Hi,

We have identified a fix for this issue and opened a new bug report for it: bug 1968210. Please use this bug from now on for updates and feedback about this regression.

Thank you.

Revision history for this message
Tilman Schmidt (tgs-bonn) wrote :

Good to know. Thanks for the information. I guess I should subscribe to that other bug then. Would it be an option to mark this one a duplicate of the new one?

Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

Hi Tilman,

The fix for this bug report was already released in the past and marked as "Fix Released". In this case the regression is handled by a new bug report and we usually keep the original one without duplicating it.

Timo Aaltonen (tjaalton)
Changed in hwe-next:
importance: Undecided → Low
Timo Aaltonen (tjaalton)
Changed in linux-intel (Ubuntu):
status: Confirmed → Invalid
Changed in hwe-next:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.