xHCI host not responding to stop endpoint command

Bug #1313279 reported by Olli Salonen
50
This bug affects 10 people
Affects Status Importance Assigned to Milestone
Linux
Unknown
Unknown
linux (Ubuntu)
Incomplete
Medium
Unassigned

Bug Description

Description: Ubuntu 14.04 LTS
Release: 14.04
Hardware: Intel NUC D34010WYK (Core i3 4010U)
Memory: 4 gigabytes
USD device: PCTV 292e DVB-T tuner

I have 2 PCTV 292e DVB-T tuners connected to the system. When I start to use either of the tuner, the xHCI host crashes. Since I'm booting from an USB disk. If I only connect 1 of the tuners, the system works ok. It does not matter which one of the tuners I connect.

[ 68.990396] xhci_hcd 0000:00:14.0: xHCI host not responding to stop endpoint command.
[ 68.990402] xhci_hcd 0000:00:14.0: Assuming host is dying, halting host.

Sometimes this also leads to this:

[ 638.183002] IP: [<ffffffff815325dd>] usb_enable_link_state+0x2d/0x2f0
[ 638.183057] PGD 0
[ 638.183077] Oops: 0000 [#1] SMP

I can also observe the same fault with OpenELEC Linux distribution that runs kernel 3.14 when I run the system from an internal SATA SSD drive (not from an external USB disk).

lsusb -v: http://paste.ubuntu.com/7343384/
dmesg: http://sprunge.us/HMXH
dmesg (crash type 2): http://sprunge.us/PSiX

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: lubuntu-desktop 0.55
ProcVersionSignature: Ubuntu 3.13.0-24.46-generic 3.13.9
Uname: Linux 3.13.0-24-generic x86_64
ApportVersion: 2.14.1-0ubuntu3
Architecture: amd64
Date: Sun Apr 27 11:12:22 2014
InstallationDate: Installed on 2014-04-07 (19 days ago)
InstallationMedia: Lubuntu 13.10 "Saucy Salamander" - Release amd64 (20131016.1)
SourcePackage: lubuntu-meta
UpgradeStatus: Upgraded to trusty on 2014-04-26 (0 days ago)

Revision history for this message
Olli Salonen (olli-salonen) wrote :
Revision history for this message
Olli Salonen (olli-salonen) wrote :

The NUC is running the latest available BIOS version (25).

lsmod: http://paste.ubuntu.com/7343416/

Revision history for this message
Olli Salonen (olli-salonen) wrote :

Curiously, if I install kernel 3.13.3 or 3.13.4 (from http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13.4-trusty/ ) this works, but then in 3.13.5 this is broken again.

Revision history for this message
Olli Salonen (olli-salonen) wrote :

This seems to be a problem in the kernel. Did git bisecting and here is the result:

Commit:47f467a is the first bad commit
commit Commit:47f467a
Author: Sarah Sharp <email address hidden>
Date: Fri Jan 31 11:45:02 2014 -0800

    Revert "xhci: Set scatter-gather limit to avoid failed block writes."

    commit Commit:1386ff7 upstream.

    This reverts commit Commit:f2d9b99.

    We are ripping out commit Commit:35773da "usb:
    xhci: Link TRB must not occur within a USB payload burst" because it's a
    hack that caused regressions in the usb-storage and userspace USB
    drivers that use usbfs and libusb. This commit attempted to fix the
    issues with that patch.

    Signed-off-by: Sarah Sharp <email address hidden>
    Signed-off-by: Greg Kroah-Hartman <email address hidden>

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in lubuntu-meta (Ubuntu):
status: New → Confirmed
Revision history for this message
ProfYaffle (profyaffle) wrote :
Download full text (3.3 KiB)

Anyone have any further comment on this one? I've just come crashing into the same problem - clean install of 14.04 Xubuntu on an MSI H97M-G43 motherboard, and all is well until I connect my 2x460e and 2x290e tuners. At that point, I can watch the USB devices drop off as the xHCI subsystem dies.... it's like that scene from Terminator 2 as the red light slowly fades on my mouse... :-(

Nov 8 17:42:17 Server kernel: [ 95.306697] xhci_hcd 0000:00:14.0: xHCI host not responding to stop endpoint command.
Nov 8 17:42:17 Server kernel: [ 95.306706] xhci_hcd 0000:00:14.0: Assuming host is dying, halting host.
Nov 8 17:42:17 Server kernel: [ 95.306829] em28174 #1: writing to i2c device at 0xaa failed (error=-5)
Nov 8 17:42:17 Server kernel: [ 95.306837] xhci_hcd 0000:00:14.0: HC died; cleaning up

and then syslog fills up with:

Nov 8 17:42:17 Server kernel: [ 95.307142] em28174 #1: writing to i2c device at 0xaa failed (error=-19)
Nov 8 17:42:17 Server kernel: [ 95.307150] i2c i2c-16: tda10071: i2c wr failed=-19 reg=00 len=2
Nov 8 17:42:17 Server kernel: [ 95.307163] em28174 #1: writing to i2c device at 0xaa failed (error=-19)
Nov 8 17:42:17 Server kernel: [ 95.307170] i2c i2c-16: tda10071: i2c rd failed=-19 reg=3a len=2
Nov 8 17:42:17 Server kernel: [ 95.307433] em28174 #0: writing to i2c device at 0xd8 failed (error=-19)
Nov 8 17:42:17 Server kernel: [ 95.307440] i2c i2c-14: cxd2820r: i2c rd failed=-19 reg=10 len=1
Nov 8 17:42:17 Server kernel: [ 95.307480] em28174 #2: writing to i2c device at 0xd8 failed (error=-19)
Nov 8 17:42:17 Server kernel: [ 95.307485] i2c i2c-18: cxd2820r: i2c rd failed=-19 reg=10 len=1
Nov 8 17:42:17 Server kernel: [ 95.326994] hid-generic 0003:051D:0002.0002: usb_submit_urb(ctrl) failed: -19
Nov 8 17:42:17 Server kernel: [ 95.347130] hid-generic 0003:051D:0002.0002: usb_submit_urb(ctrl) failed: -19

... while lan0, sshd and everything non-USB keeps purring on nicely.

I've tried various kernels, all seem to have the same issue (although I'm yet to try 3.13.3 or 3.13.4 as above, that's on my to-do):

user@NewServer:~$ ll /boot/vm*
-rw------- 1 root root 5807968 Nov 5 19:52 /boot/vmlinuz-3.12.32-031232-generic
-rw-r--r-- 1 root root 5798112 Nov 1 15:47 /boot/vmlinuz-3.13.0-32-generic
-rw------- 1 root root 5704000 Nov 8 19:40 /boot/vmlinuz-3.13.0-39-generic
-rw------- 1 root root 5810456 Nov 1 15:51 /boot/vmlinuz-3.13.0-39-generic.efi.signed
-rw------- 1 root root 5927568 Apr 14 2014 /boot/vmlinuz-3.14.1-031401-generic
-rw------- 1 root root 6491024 Nov 2 23:47 /boot/vmlinuz-3.18.0-031800rc3-generic

... one of which (the unsigned 3.13.0-39) I compiled myself, modularising EHCI and xHCI and then blacklisting the latter. Sadly, this just shut down all my USB ports, which is hardly convenient.

From what I read, it's a fundamental issue with Linux and USB 3.0, and the best way to resolve it is to disable xHCI at the BIOS level and then fall back to EHCI/USB2 throughout. Sadly, the MSI board doesn't support disabling xHCI, so that's not an option (although buying a different mobo on which I can disable it - Gigabyte, for example - is looking increasingly attractive).

Interesti...

Read more...

Revision history for this message
ProfYaffle (profyaffle) wrote :

FWIW - I replicated the issue on a Gigabyte H97M-D3H board, exactly the same symptoms and effect.

However, disabling xHCI in the BIOS (so all USB3 ports fall back to USB2/EHCI) and the problem is no longer apparent.

My suspicion is thus the newer chipset: both of my boards are H97, and the OP's NUC is a Haswell unit, I think, so will have similar chippery and/or microcode in it.

Revision history for this message
Robin Becker (robin-reportlab) wrote :

I have exactly this issue with two pctv 290e usb devices using arch linux (see https://bbs.archlinux.org/viewtopic.php?id=190000). The kernel is 3.17.3-1. Has anyone pushed this upstream?

Revision history for this message
ProfYaffle (profyaffle) wrote :

I did find an upstream bug report from earlier this year:

https://bugzilla.kernel.org/show_bug.cgi?id=65021

... although haven't had a chance to progress it as getting my system working was more of an imperative.

Might be worth everyone piling on the back of that one, or opening a new report and linking it back?

affects: lubuntu-meta (Ubuntu) → linux (Ubuntu)
affects: xubuntu-meta (Ubuntu) → linux
Changed in linux:
importance: Undecided → Unknown
status: New → Unknown
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.18 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.18-rc6-vivid/

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
Revision history for this message
ProfYaffle (profyaffle) wrote :

I will as soon as I can, but it obviously makes the system unserviceable so it's not something I can do lightly.

Give me a few days and I'll find a quiet time in which I can potentially destroy my system :-)

(Not that I did test 3.18.0-031800rc3-generic previously, so unless there have been specific xHCI commits in the last couple of rc revisions, I wouldn't hold your breath).

Revision history for this message
Naser Ali (naserali01) wrote :

I have an Intel NUC D54250WYK running BIOS V32, Ubuntu 14.04 and Unity with a very similar issue. If I boot with the 290e tuner attached and then turn on an external DAC, the Cambridge Audio DacMagic 100, all the USB ports fail. Logs show the same xHCI host crash.

I found two ways to deal with the problem.

1. Uninstall the MythTV backend - undesirable
2. Remove the 290e usb tuner during booting. Putting it in after a minute after booting means both DAC and 290e coexist, but they used to without this complication at the beginning of the year using, still on the same NUC on Trusty.

Still looks like a kernel bug.

Revision history for this message
Jonas Adler (jns-adlr) wrote :

I have the exact same bug using a Gigabyte GA-Z170X-UD5 TH Mainboard

Revision history for this message
ProfYaffle (profyaffle) wrote :

@jonas - OS version, kernel version? Output of uname -a and lsb_release -a would help.... I've not come back to this, having run in USB2 for ~18 months, but it'd be interesting to know if it's still a problem...

Revision history for this message
Keerthi Raj (keerthiraj) wrote :

I am using kernel 4.4.0-127-lowlatency (Ubuntu 14.04) and I have this bug. When we run an UVC camera, the host controller dies randomly after some point. The symptoms are exactly as described by other users above.

I would like to know which kernels this bug affects and where a patch has been applied.

The link below describes a fix but I couldn't figure out which kernel it has been applied to. May I know which kernel fixes it?
https://www.spinics.net/lists/linux-usb/msg122678.html

Hardware:
Asus X99-E-10G WS Motherboard
Intel Xeon E5-2687W v4 3.0 GHz

Revision history for this message
Keerthi Raj (keerthiraj) wrote :

Output of lsb_release -a

No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty

output of uname -a

Linux <user> 4.4.0-127-lowlatency #153~14.04.1-Ubuntu SMP PREEMPT Sat May 19 15:05:22 UTC 2018 x86_64 x86_64 x8

Revision history for this message
ProfYaffle (profyaffle) wrote :

I'm way out of date on this, but that patch looks like it was committed into the Linux kernel in Mach 2015:

https://github.com/torvalds/linux/commit/227a4fd801c8a9fa2c4700ab98ec1aec06e3b44d

... which would suggest that it would have appeared "in the wild" with 3.19.2 or thereabouts?

Even if my timeline isn't completely accurate, and if that patch fixes the problem, then it should be ancient history by your 4.4.0 kernel.

Revision history for this message
Keerthi Raj (keerthiraj) wrote :

What are the ways to figure out if the symptoms I am seeing is because of this bug or something else?

Revision history for this message
Carlo Wood (carlo-alinoe) wrote :

I'm running ubuntu 18.04 and have the same problems with USB 3.0.
The ONLY way to get it to work at all ever is to turn iommu off in the bios
and use iommu=soft as kernel parameter (as is described as the "solution"
on many forums) AND I need to have certain device plugged into USB and/or
something else magical that I didn't pin point.

Anyway, once USB 3 works, it keeps working until I turn off a device
(Valve Index headset), or rather when I quit steam, I get the following
dmesg:

[ 7211.963003] xhci_hcd 0000:04:00.0: xHCI host not responding to stop endpoint command.
[ 7211.963019] xhci_hcd 0000:04:00.0: xHCI host controller not responding, assume dead
[ 7211.963037] xhci_hcd 0000:04:00.0: HC died; cleaning up
[ 7211.963053] usb 10-1: USB disconnect, device number 9
[ 7211.963222] usb 11-2: USB disconnect, device number 7
[ 7212.211386] usb 10-2: USB disconnect, device number 10
[ 7212.211390] usb 10-2.3: USB disconnect, device number 11
[ 7212.211392] usb 10-2.3.1: USB disconnect, device number 12
[ 7212.212041] usb 10-2.3.2: USB disconnect, device number 14
[ 7212.212529] usb 10-2.3.3: USB disconnect, device number 15
[ 7212.213010] usb 10-2.3.5: USB disconnect, device number 13

which were the connected USB 3 devices.

After that those USB plugs are dead. I have no figured out how
or even if it is possible to revive them without a reboot.

Revision history for this message
Carlo Wood (carlo-alinoe) wrote :

I bought a PCI Express USB 3.0 card to overcome the above... invain.
It performed better, in the sense that the xHCI host controller gets
less often unresponsive and that it just works with iommu enabled in
the BIOS as all the other USB did.

However, I STILL get a reproducible freeze up:

[ 5964.638609] xhci_hcd 0000:08:00.0: xHCI host not responding to stop endpoint command.
[ 5964.639671] xhci_hcd 0000:08:00.0: xHCI host controller not responding, assume dead
[ 5964.639679] xhci_hcd 0000:08:00.0: HC died; cleaning up

I compiled kernel 5.6-rc7 with xhci_pci and xhci_hcd as modules.
This allows me to cover without rebooting: rmmod xhci_pci xhci_hcd
followed by a modprobe xhci_pci recovers functionality.

Revision history for this message
Carlo Wood (carlo-alinoe) wrote :

I can now 100% reproduce this:

1) Run an application to test that the connected USB3 device (a 3D camera) works (cheese).
   It works.
2) Power off the 3D camera.
   This produces the well-known dmesg lines.
3) Power on the 3D camera; run cheese again: "No device detected".
4) sudo rmmod xhci_pci
5) sudo modprobe xhci_pci
6) Verify that dmesg printed detection of the 3D camera again, go to 1.

tags: added: bionic kernel-bug-exists-upstream
removed: kernel-da-key
Revision history for this message
Carlo Wood (carlo-alinoe) wrote :

I bought a PCI Express card (PEXUSB3S44V from Startech, see https://www.startech.com/nl/en/Cards-Adapters/USB-3.0/Cards/PCI-Express-USB-3-Card-4-Dedicated-Channels-4-Port~PEXUSB3S44V) basically because it is the most expensive card you can get (I was hoping to rule out certain things with that).

As a result I no longer need to disable iommu in the BIOS or pass iommu=soft as kernel boot parameter.

However, I still get the following dmesg:

POWER DOWN CONNECTED DEVICE (Valve Index):

[77859.996165] retire_capture_urb: 99 callbacks suppressed
[77859.996475] usb 14-1.3: USB disconnect, device number 3
[77859.996479] usb 14-1.3.1: USB disconnect, device number 4
[77859.997165] usb 14-1.3.3: cannot submit urb (err = -19)
[77859.999429] usb 14-1: USB disconnect, device number 2
[77860.002716] usb 14-1.3.2: USB disconnect, device number 5
[77860.005249] usb 14-1.3.3: cannot submit urb 0, error -19: no device
[77860.009536] usb 14-1.3.3: USB disconnect, device number 6
[77860.013694] usb 14-1.3.5: USB disconnect, device number 7
[77860.737515] usb 15-1: USB disconnect, device number 2
[77860.737520] usb 15-1.1: USB disconnect, device number 3

POWER UP CONNECTED DEVICE

[77865.729313] xhci_hcd 0000:08:00.0: xHCI host not responding to stop endpoint command.
[77865.729989] xhci_hcd 0000:08:00.0: xHCI host controller not responding, assume dead
[77865.729995] xhci_hcd 0000:08:00.0: HC died; cleaning up

This reproduces 100% of the time.

Kernel: 5.6.0-rc7-lowlatlocxhci

This is 5.6.0-rc7 with a .config from a lowlatency kernel, ran through
make oldconfig; then localmodconfig to get rid of unused modules and
make menuconfig to turn xhci_hda and xhci_pci into kernel modules.
I can assure you however that this bug happens with any kernel and
any reasonable .config (I tried 4.15-general and several other kernels
and configs too).

Recovery is now possible with:

sudo rmmod xhci_pci
sudo modprobe xhci_pci

This allows me to recover without having to reboot, but also to
experiment with changes to the drivers.

Revision history for this message
Carlo Wood (carlo-alinoe) wrote :
Revision history for this message
Sergey Galtsev (sam-j811) wrote :

FWIW I used method described here https://bbs.archlinux.org/viewtopic.php?id=236536 and it works for me:

    echo -n "0000:00:14.0" | tee /sys/bus/pci/drivers/xhci_hcd/unbind
    sleep 5
    echo -n "0000:00:14.0" | tee /sys/bus/pci/drivers/xhci_hcd/bind

Revision history for this message
Chris Guiver (guiverc) wrote :

Thank you for reporting this bug to Ubuntu.

Ubuntu 14.04 (trusty) reached end-of-life on April 25, 2019.

See this document for currently supported Ubuntu releases:
https://wiki.ubuntu.com/Releases

We appreciate that this bug may be old and you might not be interested in discussing it any more. But if you are then please upgrade to the latest Ubuntu version and re-test. If you then find the bug is still present in the newer Ubuntu version, please add a comment here telling us which new version it is in.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.