Nvidia RTX 2060 USB bus timeouts cause ~30s delay in boot

Bug #1830905 reported by Dan Watkins
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

From dmesg:

[ 0.132667] kernel: pci 0000:07:00.1: Linked as a consumer to 0000:07:00.0
[ 5.882727] kernel: pci 0000:07:00.2: xHCI HW not ready after 5 sec (HC bug?) status = 0x801
[ 5.882781] kernel: pci 0000:07:00.2: quirk_usb_early_handoff+0x0/0x6a6 took 5615333 usecs

<snip>

[ 9.984204] kernel: usb 1-7: new full-speed USB device number 4 using xhci_hcd
[ 10.308689] kernel: usb 1-7: New USB device found, idVendor=8087, idProduct=0a2b, bcdDevice= 0.10
[ 10.308690] kernel: usb 1-7: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[ 34.654721] kernel: xhci_hcd 0000:07:00.2: can't setup: -110
[ 34.654730] kernel: xhci_hcd 0000:07:00.2: USB bus 3 deregistered
[ 34.654769] kernel: xhci_hcd 0000:07:00.2: init 0000:07:00.2 fail, -110
[ 34.654771] kernel: xhci_hcd: probe of 0000:07:00.2 failed with error -110

$ lspci -k -s 0000:07:00
07:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2060 Rev. A] (rev a1)
 Subsystem: Gigabyte Technology Co., Ltd TU106
 Kernel driver in use: nvidia
 Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
07:00.1 Audio device: NVIDIA Corporation TU106 High Definition Audio Controller (rev a1)
 Subsystem: Gigabyte Technology Co., Ltd TU106 High Definition Audio Controller
 Kernel driver in use: snd_hda_intel
 Kernel modules: snd_hda_intel
07:00.2 USB controller: NVIDIA Corporation TU106 USB 3.1 Host Controller (rev a1)
 Subsystem: Gigabyte Technology Co., Ltd TU106 USB 3.1 Host Controller
07:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU106 USB Type-C Port Policy Controller (rev a1)
 Subsystem: Gigabyte Technology Co., Ltd TU106 USB Type-C Port Policy Controller
 Kernel driver in use: nvidia-gpu
 Kernel modules: i2c_nvidia_gpu

ProblemType: Bug
DistroRelease: Ubuntu 19.10
Package: linux-image-5.0.0-15-generic 5.0.0-15.16
ProcVersionSignature: Ubuntu 5.0.0-15.16-generic 5.0.6
Uname: Linux 5.0.0-15-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
ApportVersion: 2.20.11-0ubuntu2
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: daniel 4470 F.... pulseaudio
 /dev/snd/controlC1: daniel 4470 F.... pulseaudio
 /dev/snd/controlC2: daniel 4470 F.... pulseaudio
CurrentDesktop: i3
Date: Wed May 29 09:36:51 2019
InstallationDate: Installed on 2019-05-07 (21 days ago)
InstallationMedia: Ubuntu 18.04.2 LTS "Bionic Beaver" - Release amd64 (20190210)
MachineType: Gigabyte Technology Co., Ltd. B450M DS3H
ProcFB: 0 EFI VGA
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.0.0-15-generic root=/dev/mapper/ubuntu--vg-root ro quiet splash resume=UUID=73909634-a75d-42c9-8f66-a69138690756 vt.handoff=1
RelatedPackageVersions:
 linux-restricted-modules-5.0.0-15-generic N/A
 linux-backports-modules-5.0.0-15-generic N/A
 linux-firmware 1.179
SourcePackage: linux
UpgradeStatus: Upgraded to eoan on 2019-05-08 (20 days ago)
dmi.bios.date: 01/25/2019
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: F4
dmi.board.asset.tag: Default string
dmi.board.name: B450M DS3H-CF
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrF4:bd01/25/2019:svnGigabyteTechnologyCo.,Ltd.:pnB450MDS3H:pvrDefaultstring:rvnGigabyteTechnologyCo.,Ltd.:rnB450MDS3H-CF:rvrx.x:cvnDefaultstring:ct3:cvrDefaultstring:
dmi.product.family: Default string
dmi.product.name: B450M DS3H
dmi.product.sku: Default string
dmi.product.version: Default string
dmi.sys.vendor: Gigabyte Technology Co., Ltd.

Revision history for this message
Dan Watkins (oddbloke) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Nvidia sent a patch series [1] in linux-usb. Wonder if it helps.

[1] https://patchwork.kernel.org/project/linux-usb/list/?series=124541

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

A test kernel can be found here:
https://people.canonical.com/~khfeng/lp1830905/

Revision history for this message
Dan Watkins (oddbloke) wrote :

Hi Kai-Heng,

Thanks for the test kernel! Unfortunately, it doesn't appear to have changed the behaviour. :(

```
$ uname -a
Linux surprise 5.0.0-16-generic #17~lp1830905 SMP Fri May 31 13:39:26 CST 2019 x86_64 x86_64 x86_64 GNU/Linux

$ grep 0000:07:00.2 /var/log/dmesg
[ 0.086533] kernel: pci 0000:07:00.2: [10de:1ada] type 00 class 0x0c0330
[ 0.086556] kernel: pci 0000:07:00.2: reg 0x10: [mem 0xf2000000-0xf203ffff 64bit pref]
[ 0.086576] kernel: pci 0000:07:00.2: reg 0x1c: [mem 0xf2040000-0xf204ffff 64bit pref]
[ 0.086639] kernel: pci 0000:07:00.2: PME# supported from D0 D3hot
[ 5.878588] kernel: pci 0000:07:00.2: xHCI HW not ready after 5 sec (HC bug?) status = 0x801
[ 5.878636] kernel: pci 0000:07:00.2: quirk_usb_early_handoff+0x0/0x6a0 took 5611519 usecs
[ 6.542191] kernel: iommu: Adding device 0000:07:00.2 to group 15
[ 8.682608] kernel: xhci_hcd 0000:07:00.2: xHCI Host Controller
[ 8.682611] kernel: xhci_hcd 0000:07:00.2: new USB bus registered, assigned bus number 3
[ 34.483186] kernel: xhci_hcd 0000:07:00.2: can't setup: -110
[ 34.483204] kernel: xhci_hcd 0000:07:00.2: USB bus 3 deregistered
[ 34.483250] kernel: xhci_hcd 0000:07:00.2: init 0000:07:00.2 fail, -110
[ 34.483253] kernel: xhci_hcd: probe of 0000:07:00.2 failed with error -110
```

Cheers,

Dan

Revision history for this message
Dan Watkins (oddbloke) wrote :

Hi Kai-Heng,

As an extra data point, when I attempted to boot the latest mainline kernel for bug 1830910, it didn't fully boot but it did get far enough to display the "kernel: xhci_hcd 0000:07:00.2: can't setup: -110" messages. So the mainline kernel doesn't address this issue either.

Thanks!

Dan

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

One possibility is that it doesn't need to do xhci handoff, please test:
https://people.canonical.com/~khfeng/lp1830905-2/

Revision history for this message
Dan Watkins (oddbloke) wrote :
Download full text (3.7 KiB)

OK, this improves things slightly; the first timeout is gone:

[ 0.124444] kernel: pci 0000:07:00.1: Linked as a consumer to 0000:07:00.0
[ 0.124592] kernel: PCI: CLS 64 bytes, default 64
[ 0.124621] kernel: Unpacking initramfs...

But the second (longer) timeout is still there, and now has a trace:

[ 4.601812] kernel: usb 1-7: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[ 28.016007] kernel: watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [swapper/0:1]
[ 28.016007] kernel: Modules linked in:
[ 28.016007] kernel: CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.0.0-16-generic #17~lp1830905+2
[ 28.016007] kernel: Hardware name: Gigabyte Technology Co., Ltd. B450M DS3H/B450M DS3H-CF, BIOS F4 01/25/2019
[ 28.016007] kernel: RIP: 0010:xhci_reset+0x7f/0x1a0
[ 28.016007] kernel: Code: 00 00 00 4d 8b 6c 24 18 bb 80 96 98 00 eb 17 a8 02 74 43 bf c7 10 00 00 e8 4e 30 25 00 83 eb 01 0f 84 e9 00 00 00 41 8b 45 00 <83> f8 ff 75 e0 b8 ed ff ff ff 5b 41 5c 41 5d 5d c3 48 8b 07 48 c7
[ 28.016007] kernel: RSP: 0018:ffffacb30004bad0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[ 28.016007] kernel: RAX: 0000000000000002 RBX: 0000000000050ef2 RCX: 0000000000000002
[ 28.016007] kernel: RDX: 00000000000013d4 RSI: 00000024d0b051d8 RDI: 0000000000000ded
[ 28.016007] kernel: RBP: ffffacb30004bae8 R08: 00000000ffffffff R09: 0000000000000000
[ 28.016007] kernel: R10: 0000000000000002 R11: 000000000000000f R12: ffff89803bf62230
[ 28.016007] kernel: R13: ffffacb302480020 R14: ffffffff8bbddee0 R15: ffff89803bf62230
[ 28.016007] kernel: FS: 0000000000000000(0000) GS:ffff89804eb00000(0000) knlGS:0000000000000000
[ 28.016007] kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 28.016007] kernel: CR2: 00007f3995075e04 CR3: 00000001ec00e000 CR4: 00000000003406e0
[ 28.016007] kernel: Call Trace:
[ 28.016007] kernel: xhci_gen_setup+0x224/0x3e0
[ 28.016007] kernel: ? pci_bus_read_config_byte+0x40/0x60
[ 28.016007] kernel: xhci_pci_setup+0x56/0x120
[ 28.016007] kernel: usb_add_hcd+0x2cb/0x8a0
[ 28.016007] kernel: usb_hcd_pci_probe+0x283/0x480
[ 28.016007] kernel: xhci_pci_probe+0x30/0x230
[ 28.016007] kernel: local_pci_probe+0x47/0xa0
[ 28.016007] kernel: pci_device_probe+0x145/0x1b0
[ 28.016007] kernel: really_probe+0xfe/0x3f0
[ 28.016007] kernel: ? set_debug_rodata+0x17/0x17
[ 28.016007] kernel: driver_probe_device+0x11a/0x130
[ 28.016007] kernel: __driver_attach+0xe3/0x110
[ 28.016007] kernel: ? driver_probe_device+0x130/0x130
[ 28.016007] kernel: ? driver_probe_device+0x130/0x130
[ 28.016007] kernel: bus_for_each_dev+0x74/0xb0
[ 28.016007] kernel: ? kmem_cache_alloc_trace+0x1a6/0x1c0
[ 28.016007] kernel: driver_attach+0x1e/0x20
[ 28.016007] kernel: bus_add_driver+0x167/0x260
[ 28.016007] kernel: ? xhci_debugfs_create_root+0x25/0x25
[ 28.016007] kernel: driver_register+0x60/0x100
[ 28.016007] kernel: ? xhci_debugfs_create_root+0x25/0x25
[ 28.016007] kernel: __pci_register_driver+0x5a/0x60
[ 28.016007] kernel: xhci_pci_init+0x47/0x49
[ 28.016007] kernel: do_one_initcall+0x4a/0x1c9
[ 28.016007] kernel: kernel_init_freeable+0x1cb/0x272
[ 28.01...

Read more...

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

We can add Gigabyte subsystem id to blacklist so xhci_pci and i2c_nvidia_gpu won't load for those PCI functions. But before doing that, please raise the issue to Nvidia forum, they may have better solution.

Revision history for this message
Dan Watkins (oddbloke) wrote :

I will do, thank you! Can you direct me towards the appropriate Nvidia forum to use? And is there any way I could configure my system to perform that blacklisting in the meantime?

Thanks again!

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

> Can you direct me towards the appropriate Nvidia forum to use?
https://devtalk.nvidia.com/default/board/98/linux/

> And is there any way I could configure my system to perform that blacklisting in the meantime?
Not really. Module xhci_hcd is built into kernel, so a custom compiled kernel is needed here.

Revision history for this message
Dan Watkins (oddbloke) wrote :
Revision history for this message
Dan Watkins (oddbloke) wrote :

Applying the latest vbios update (F52) has caused this problem to go away; this isn't a kernel issue.

Changed in linux (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Somogyi (zolee99) wrote (last edit ):

Same problem here. But with Gainward rtx2060. Gainward has no vbios update. Can i disable this usb port?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.