USB 3.0 connection is unreliable + xHCI xhci_drop_endpoint called with disabled ep

Bug #1371233 reported by Karl-Philipp Richter
184
This bug affects 34 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
High
Unassigned
Trusty
Expired
High
Unassigned
Utopic
Won't Fix
High
Unassigned
Vivid
Expired
High
Unassigned

Bug Description

based on https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1358871/comments/7 I tested with the HDD enclosure IcyBox IB-351 series with USB 3.0 connection with the same result (connection drops after 300 MB reading, error message "xHCI xhci_drop_endpoint called with disabled ep" occurs sometimes, other devices (e.g. Samsung HD103SI 1 TB HDD connected with 3.0 USB adapter to eSATA of enclosure) read hundreds of GB before failing, but definitely do before reading 1TB). Reading tested with `dd`, `gpart` and `btrfsck`.
Also confirmed on Lenovo IdeaPad-Z500 after BIOS update to 71CN51WW(V1.21) (changelog didn't indicate any USB issues anyway). Also confirmed with 3.16.3 and 3.16.0-14 on Ubuntu 14.10-beta1 after updates.
The issue seems to cause failure of ASIX AX179 gigabit ethernet chip as well, but is independent from the usage of the ethernet adapter.
I'm without any clue and stuck with an unreliable USB and ethernet connection which basically means no I/O out of the machine!!

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-35-generic 3.13.0-35.62
ProcVersionSignature: Ubuntu 3.13.0-35.62-generic 3.13.11.6
Uname: Linux 3.13.0-35-generic x86_64
ApportVersion: 2.14.1-0ubuntu3.4
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/hwC0D3', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/pcmC0D3p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CurrentDesktop: Unity
Date: Thu Sep 18 19:58:24 2014
EcryptfsInUse: Yes
InstallationDate: Installed on 2014-09-10 (8 days ago)
InstallationMedia: Ubuntu 14.04 LTS "Trusty Tahr" - Release amd64 (20140417)
MachineType: LENOVO 20221
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/@/boot/vmlinuz-3.13.0-35-generic.efi.signed root=UUID=5e999111-7efe-4818-b9e8-a950ad6d3296 ro rootflags=subvol=@ quiet splash nomdmonddf nomdmonisw vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-35-generic N/A
 linux-backports-modules-3.13.0-35-generic N/A
 linux-firmware 1.127.5
SourcePackage: linux
StagingDrivers: rts5139
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 07/12/2013
dmi.bios.vendor: LENOVO
dmi.bios.version: 71CN51WW(V1.21)
dmi.board.asset.tag: No Asset Tag
dmi.board.name: INVALID
dmi.board.vendor: LENOVO
dmi.board.version: 31900003WIN8 STD MLT
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Lenovo IdeaPad Z500 Touch
dmi.modalias: dmi:bvnLENOVO:bvr71CN51WW(V1.21):bd07/12/2013:svnLENOVO:pn20221:pvrLenovoIdeaPadZ500Touch:rvnLENOVO:rnINVALID:rvr31900003WIN8STDMLT:cvnLENOVO:ct10:cvrLenovoIdeaPadZ500Touch:
dmi.product.name: 20221
dmi.product.version: Lenovo IdeaPad Z500 Touch
dmi.sys.vendor: LENOVO

Revision history for this message
Karl-Philipp Richter (krichter722) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
description: updated
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.17 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.17-rc5-utopic/

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
Changed in linux (Ubuntu):
importance: Medium → High
Revision history for this message
Karl-Philipp Richter (krichter722) wrote :

Tested in 3.17-rc5 and experienced the issue. Due to the high number of connection failures one partition used for reproduction has vanished, but this shouldn't matter because data rescue in gparted has been a main reproduction scenario. I had to remove apt packages `multipath-*` in order to make all devices on USB ports being recognized.

penalvch (penalvch)
tags: added: latest-bios-1.21
tags: added: kernel-bug-exists-upstream-3.15-rc7 utopic
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

This issue appears to be an upstream bug, since you tested the latest upstream kernel. Would it be possible for you to open an upstream bug report[0]? That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

Please follow the instructions on the wiki page[0]. The first step is to email the appropriate mailing list. If no response is received, then a bug may be opened on bugzilla.kernel.org.

Once this bug is reported upstream, please add the tag: 'kernel-bug-reported-upstream'.

[0] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Also, was there a prior kernel version that did not exhibit this bug? If there is, we can perform a bisect to identify the commit that introduced this.

Revision history for this message
Karl-Philipp Richter (krichter722) wrote :

After obtaining an Orico A3H7 USB 3.0 hub with (sufficient) power supply the reproducability of the bug got limited to the broken HDD (reproducability with it is 100 % (disk is recognized by gparted and doesn't have a /dev/sdxY file)). Failure of of ethernet adapter no longer occurs which makes me think the original issue was related to insufficient power on USB connections and that the error `xHCI xhci_drop_endpoint called with disabled ep` is related rather to a consequence of the power failure rather than being the error of the power failure.

In my point of view it'd make sense to get the HDD working again (there's probably an issue with the partition table) and testing whether it works in 3.13.0-36 and then other versions (including mainline).

Sorry for the delay of my response.

Revision history for this message
Karl-Philipp Richter (krichter722) wrote :

At the same time of upgrade from 13.10 to 14.04 I changed to btrfs which causes incredible trouble (it can't be said often enough that it's irresponsible to offer it as root filesystem without(!) a warning), so there're so many possible reasons for the bug (most of them are bugs for themselves) that for a non-developer it's impossible to provide accurate reports.

Revision history for this message
Karl-Philipp Richter (krichter722) wrote :

Found a working kernel: on Ubuntu 14.04 live system with 3.13.0-24-generic I can run ddrescue which found > 50 errors on the HDD so far, but the error doesn't occur and ddrescue proceed - slowly, but at least it does! Here's the thing: running the 3.13.0-24-generic kernel on Ubuntu installation doesn't work, so that I assume it's not just a kernel issue or not a kernel issue at all, but related to an upgraded apt package. How to proceed? How is bisecting done?

Revision history for this message
legolas558 (legolas558) wrote :

Running kernel 3.13.0-35-generic here.
I was affected by the same issue. Using "echo -1 >/sys/module/usbcore/parameters/autosuspend" seemed to help a bit, although the reset entries "xHCI xhci_drop_endpoint called with disabled ep *******" were still there and system was blocking because of these continuous drops, leading to an unreliable (and incredibly slow) writing to the disk.

The best way to trigger the issue was for example to run an rsync reading from one partition on the external disk and writing to another partition on the same disk.

Plugging the disk with an USB 2.0 cable would show no problems at all with same tests.

After plugging the external disk to a an USB port on the back, I can confirm it works perfectly (and still in high speed mode, "new SuperSpeed USB device number 2 using xhci_hcd").

Upon inspection, I found that the USB 3.0 cable connecting the front USB ports to the motherboard was a bit loose at connecting the USB 3.0 part of it. USB 2.0 traffic would always work perfectly fine but problems would arise when estabilishing 3.0 links.

So, long story short, please check other USB 3.0 ports and that those you are using are well connected. Although I was suspecting the PSU of the external disk (12V 1.5A) and a bug in the kernel (as I read here and elsewhere), in the end it was just a cable problem.

Hope it helps.

Revision history for this message
Karl-Philipp Richter (krichter722) wrote :

Thanks @legolas558, I checked all cables and exchanged most of my USB equipment (hubs and cables). I tested in combinations which would reveal a faulty USB part. Inspired by your comment I opened the machine and checked the internal cable connections as well. They're all fine.

I `ddrescue` test case causes a kernel panic in 3.17-rc6, now, photo attached[1]. This occurs ~90 % of the time. I attach the log file of `ddrescue` 1.7.0 as well[2], maybe it can serve to produce a test file for you. It definitely works in the non-persistent live system based on 3.13.0-24-generic.

---
[1] I'm currently stuck with `linux-crashtools`, please help me http://askubuntu.com/questions/529452/how-to-cause-a-test-crash-with-kdump for a text output of the panic stack.
[2] Although the wiki states no compressed attachements, there's no way of figuring out how to attach a second file...

Revision history for this message
Karl-Philipp Richter (krichter722) wrote :

Panic confirmed on 3.17-rc7.

Revision history for this message
Karl-Philipp Richter (krichter722) wrote :
Download full text (24.2 KiB)

In the meantime I tested with 3.2.64, 3.4.104, 3.10.60 and 3.12.32 and had no problem, i.e. GNU `ddrescue ` 1.17 processes the device without kernel panic (1.5 TB with > 7000 errors recognized and copied on a damaged device).

In 3.14.24 I don't get a kernel panic, but `ddrescue` get stuck at reading a damaged block while `dmesg` shows

    [ 2074.174135] sdh: sdh1 sdh9
    [ 2462.587818] usb 4-1.3.1.3: reset SuperSpeed USB device number 9 using xhci_hcd
    [ 2462.603164] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff88041d586d80
    [ 2462.603168] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff88041d586dc0
    [ 2502.601314] usb 4-1.3.1.3: reset SuperSpeed USB device number 9 using xhci_hcd
    [ 2502.616594] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff88041d586d80
    [ 2502.616602] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff88041d586dc0
    [ 2647.252571] INFO: task usb-storage:604 blocked for more than 120 seconds.
    [ 2647.252580] Tainted: PF W O 3.14.24-031424-generic #201411141736
    [ 2647.252591] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 2647.252593] usb-storage D ffffffff81811ae0 0 604 2 0x00000000
    [ 2647.252596] ffff88041cb55af8 0000000000000046 ffff88041cb55af8 ffff88041cb55fd8
    [ 2647.252599] 0000000000014540 0000000000014540 ffffffff81c144a0 ffff88041cb0a7c0
    [ 2647.252601] 0000000100000000 ffff880422b9d808 7fffffffffffffff 7fffffffffffffff
    [ 2647.252603] Call Trace:
    [ 2647.252609] [<ffffffff8177a739>] schedule+0x29/0x70
    [ 2647.252612] [<ffffffff817799e5>] schedule_timeout+0x1e5/0x250
    [ 2647.252616] [<ffffffff8156bd58>] ? usb_hcd_submit_urb+0x88/0x1b0
    [ 2647.252618] [<ffffffff8177b9d7>] wait_for_completion+0xa7/0x160
    [ 2647.252620] [<ffffffff8156cece>] ? usb_alloc_urb+0x1e/0x50
    [ 2647.252624] [<ffffffff810a4da0>] ? try_to_wake_up+0x210/0x210
    [ 2647.252626] [<ffffffff8156f14a>] usb_sg_wait+0x13a/0x1f0
    [ 2647.252646] [<ffffffffa019f531>] usb_stor_bulk_transfer_sglist.part.5+0x51/0xc0 [usb_storage]
    [ 2647.252651] [<ffffffffa019f637>] usb_stor_bulk_transfer_sglist+0x97/0xa0 [usb_storage]
    [ 2647.252655] [<ffffffffa019f66e>] usb_stor_bulk_srb+0x2e/0x50 [usb_storage]
    [ 2647.252659] [<ffffffffa019f7d7>] usb_stor_Bulk_transport+0x147/0x3f0 [usb_storage]
    [ 2647.252662] [<ffffffff817799e5>] ? schedule_timeout+0x1e5/0x250
    [ 2647.252666] [<ffffffffa01a006e>] usb_stor_invoke_transport+0x3e/0x570 [usb_storage]
    [ 2647.252668] [<ffffffff8177b1bd>] ? wait_for_completion_interruptible+0xcd/0x1c0
    [ 2647.252672] [<ffffffffa019ee5e>] usb_stor_transparent_scsi_command+0xe/0x10 [usb_storage]
    [ 2647.252676] [<ffffffffa01a172a>] usb_stor_control_thread+0x1ba/0x310 [usb_storage]
    [ 2647.252681] [<ffffffffa01a1570>] ? fill_inquiry_response+0x20/0x20 [usb_storage]
    [ 2647.252683] [<ffffffff81093079>] kthread+0xc9/0xe0
    [ 2647.252685] [<ffffffff81092fb0>] ? flush_kthread_worker+0xb0/0xb0
    [ 2647.252687] [<ffffffff817875bc>] ret_from_fork+0x7c/0xb0
    [ 2647....

Revision history for this message
gazhay (gazhay) wrote :

I'm finding xhci unreliable across the board since 14.x, worse in 14.10

Scanners (usb2) that previously worked in usb3 ports now do not work, causing seg faults in applications and all kinds of errors in dmesg. (segfault at 0 ip 00007f1f79b9dd3f sp 00007f1fa283ca10 error 4 in libsane-genesys.so.1.0.24[7f1f79b4f000+6e000])

There are also unexpected length errors, and device not found errors.

Revision history for this message
Karl-Philipp Richter (krichter722) wrote :

I've found this fixed in 3.12.32 and in 3.17.4 (no issues in 3.17.5 and 3.17.6 as well) in all kernels in between issues occured. I reported more information on different kernels and different error messages occuring when reproducing the issue on them on the same issue reported by another person (Ubuntu guidelines discourage this sort of helpful cross posting), but launchpad linking and search facilities are so bad, that I don't have energy to compensate them now. @gazhay try > 3.17.4 or 3.12.x with x >= 32. Maybe 3.18.0 contains a regression again so that it stopped working...

Revision history for this message
technik007_cz (technik007-cz) wrote :

I'm running on kernel 3.13.0-43-lowlatency, laptop Lenovo Ideapad Y500. I have this "xHCI xhci_drop_endpoint called with disabled ep" problems with different USB 3.0 devices since I bought laptop at the end of year 2013. What increase possibility of this failures is long USB 3.0 cable or using of USB 3.0 extension. But even I keep my system/kernel updated this problem have never been solved with new kernel. I have tried different USB 3.0 hubs, cables, enclosures and what is sad if you plug "noncompatible device" it put into hell other devices like webcamera internally connected into usb 3.0 hub.

Revision history for this message
technik007_cz (technik007-cz) wrote :

I gonna try 3.17.4 kernel or up Karl...

Changed in linux (Ubuntu):
status: Confirmed → Fix Committed
Revision history for this message
Scott (e2e8e2) wrote :

The only kernel version that works for me is 3.13.0.24.28 . I just tried upgrading Ubuntu to 3.13.0.44.51 and the problem came back with a vengeance (destroyed my software raid configuration of 2 usb 3 drives).

Revision history for this message
Alan Pope 🍺🐧🐱 🦄 (popey) wrote :

This is marked "Fix committed" but I'm still seeing it on 3.19.0-9-generic on 15.04.

dding to a USB3 key attached to a USB3 hub causes my syslog to be spammed with a bunch of:-

Mar 16 11:54:24 deep-thought kernel: [ 5515.310662] usb 2-1.1: reset SuperSpeed USB device number 7 using xhci_hcd
Mar 16 11:54:24 deep-thought kernel: [ 5515.327096] xhci_hcd 0000:0e:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff880032dc5cc0

As well as some:-

Mar 16 11:55:26 deep-thought kernel: [ 5576.916045] hub 2-1:1.0: hub_port_status failed (err = -71)

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The v4.0 -rc4 kernel is now available. Can folks affected by this bug test the latest kernel? It can be downloaded from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.0-rc4-vivid/

Changed in linux (Ubuntu Vivid):
status: Fix Committed → Confirmed
Changed in linux (Ubuntu Trusty):
status: New → Confirmed
Changed in linux (Ubuntu Utopic):
status: New → Confirmed
importance: Undecided → High
Changed in linux (Ubuntu Trusty):
importance: Undecided → High
tags: added: vivid
Revision history for this message
Alan Pope 🍺🐧🐱 🦄 (popey) wrote :

Tried 4.0-rc4 on vivid and saw none of the xhci error messages I saw on 3.19. I performed exactly the same operation using the same hardware, namely ddrescuing an ~7.5GB image to an 8GB stick.

alan@deep-thought:/data/usb⟫ dmesg -T | grep xhci_hcd
[Mon Mar 16 19:48:40 2015] xhci_hcd 0000:0e:00.0: xHCI Host Controller
[Mon Mar 16 19:48:40 2015] xhci_hcd 0000:0e:00.0: new USB bus registered, assigned bus number 1
[Mon Mar 16 19:48:40 2015] xhci_hcd 0000:0e:00.0: hcc params 0x014042cb hci version 0x96 quirks 0x00000004
[Mon Mar 16 19:48:40 2015] xhci_hcd 0000:0e:00.0: xHCI Host Controller
[Mon Mar 16 19:48:40 2015] xhci_hcd 0000:0e:00.0: new USB bus registered, assigned bus number 2
[Mon Mar 16 19:49:49 2015] usb 2-1: new SuperSpeed USB device number 2 using xhci_hcd
[Mon Mar 16 19:49:50 2015] usb 1-1: new high-speed USB device number 2 using xhci_hcd
[Mon Mar 16 19:49:59 2015] usb 2-1.3: new SuperSpeed USB device number 3 using xhci_hcd
[Mon Mar 16 19:54:59 2015] usb 2-1.3: new SuperSpeed USB device number 4 using xhci_hcd

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I'd like to perform a bisect to figure out what commit caused this regression. We need to identify the earliest kernel where the issue started happening as well as the latest kernel that did not have this issue. Reading the comments, it sounds like this first started in the 3.13.0-25 Ubuntu kernel, which is based on upstream 3.13.10.

Can folks affected by this bug test the following kernels and post back

3.13.9: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13.9-trusty/
3.13.10: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13.10-trusty/

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I re-read comment #21, and you 'Saw none' of the errors with 4.0, which means this bug might be fixed there. It might be better of to perform a 'Reverse' bisect to identify the commit that fixes this bug and requested it in the prior stable kernels.

Can you test the following kernels and report back? We are looking for the last kernel version that has the bug and the first that does not:

v4.0-rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.0-rc1-vivid/
v4.0-rc2: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.0-rc2-vivid/
v4.0-rc3: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.0-rc3-vivid/

tags: added: performing-bisect
Revision history for this message
Brad Figg (brad-figg) wrote : Test with newer development kernel (3.19.0-9.9)

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

You can update to the latest development kernel by simply running the following commands in a terminal window:

    sudo apt-get update
    sudo apt-get dist-upgrade

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

 Thank you for your help, we really do appreciate it.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-request-3.19.0-9.9
tehownt (tehownt)
Changed in linux (Ubuntu Vivid):
status: Incomplete → Confirmed
Revision history for this message
Stefan (steffel) wrote :
Download full text (4.2 KiB)

Experiencing a problem on xhci module with an Opticon Barcode Scanner NLV-1001 (ttyUSB device). After a few barcode-scan-triggers, it doesn't return from opening the device.

Tried:

- 3.13.0-35-generic
- 3.16.0-34-generic
- 3.17.0-031700-generic
- 3.19.1-031901-generic

Without USB3 (when disabled in BIOS) - ehci module is used, then, and no problems show up.

Could anyone tell me whether my problem with following trace is related to this issue?
Thank you for your help!

(kernel.log)
Apr 27 10:32:10 myhost123 kernel: [ 3721.041230] INFO: task myapp:2920 blocked for more than 120 seconds.
Apr 27 10:32:10 myhost123 kernel: [ 3721.041235] Tainted: G OE 3.19.1-031901-generic #201504091335
Apr 27 10:32:10 myhost123 kernel: [ 3721.041236] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 27 10:32:10 myhost123 kernel: [ 3721.041237] myapp D e4995b74 0 2920 2477 0x00000004
Apr 27 10:32:10 myhost123 kernel: [ 3721.041240] e4995be0 00200086 00000000 e4995b74 e85ba000 0899fc06 000002fb 00000001
Apr 27 10:32:10 myhost123 kernel: [ 3721.041244] 00000001 e4995fec c15511bf c1b88f40 e51b0e00 e926cf40 e48cee40 e5730620
Apr 27 10:32:10 myhost123 kernel: [ 3721.041247] 00000000 00000000 e4cfb940 00000001 00000000 e863f070 e5376610 00000c01
Apr 27 10:32:10 myhost123 kernel: [ 3721.041250] Call Trace:
Apr 27 10:32:10 myhost123 kernel: [ 3721.041258] [<c15511bf>] ? xhci_queue_ctrl_tx+0x1ef/0x260
Apr 27 10:32:10 myhost123 kernel: [ 3721.041261] [<c1548d5d>] ? xhci_urb_enqueue+0x16d/0x420
Apr 27 10:32:10 myhost123 kernel: [ 3721.041263] [<c16eac03>] schedule+0x23/0x60
Apr 27 10:32:10 myhost123 kernel: [ 3721.041266] [<c16ed135>] schedule_timeout+0x165/0x1c0
Apr 27 10:32:10 myhost123 kernel: [ 3721.041270] [<c1509180>] ? usb_hcd_submit_urb+0x80/0x180
Apr 27 10:32:10 myhost123 kernel: [ 3721.041272] [<c150a480>] ? usb_submit_urb.part.9+0x1e0/0x520
Apr 27 10:32:10 myhost123 kernel: [ 3721.041275] [<c11808f4>] ? vmap_pmd_range+0x94/0xe0
Apr 27 10:32:10 myhost123 kernel: [ 3721.041277] [<c16ebbb5>] wait_for_completion_timeout+0x85/0x140
Apr 27 10:32:10 myhost123 kernel: [ 3721.041280] [<c1089770>] ? try_to_wake_up+0x210/0x210
Apr 27 10:32:10 myhost123 kernel: [ 3721.041282] [<c150b541>] usb_start_wait_urb+0x71/0x150
Apr 27 10:32:10 myhost123 kernel: [ 3721.041284] [<c1194feb>] ? __kmalloc+0xab/0x230
Apr 27 10:32:10 myhost123 kernel: [ 3721.041286] [<c1509ff9>] ? usb_alloc_urb+0x19/0x40
Apr 27 10:32:10 myhost123 kernel: [ 3721.041288] [<c150b83b>] usb_control_msg+0xbb/0xe0
Apr 27 10:32:10 myhost123 kernel: [ 3721.041295] [<f058b6fa>] send_control_msg.isra.4+0x7a/0xa0 [opticon]
Apr 27 10:32:10 myhost123 kernel: [ 3721.041297] [<f058b830>] opticon_open+0x40/0x84 [opticon]
Apr 27 10:32:10 myhost123 kernel: [ 3721.041300] [<f05d50d1>] serial_port_activate+0x61/0x90 [usbserial]
Apr 27 10:32:10 myhost123 kernel: [ 3721.041303] [<c1416791>] tty_port_open+0x71/0xf0
Apr 27 10:32:10 myhost123 kernel: [ 3721.041307] [<f05d5b7c>] serial_open+0x2c/0x70 [usbserial]
Apr 27 10:32:10 myhost123 kernel: [ 3721.041309] [<c140f22e>] tty_open+0x3e/0x3f0
Apr 27 10:32:10 myhost123 kernel: [ 3721.041312] [<c11ae...

Read more...

Revision history for this message
Brad Figg (brad-figg) wrote : Test with newer development kernel (3.19.0-15.15)

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

You can update to the latest development kernel by simply running the following commands in a terminal window:

    sudo apt-get update
    sudo apt-get dist-upgrade

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

 Thank you for your help, we really do appreciate it.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-request-3.19.0-15.15
Revision history for this message
NightShade (tim-night-shade) wrote :

I was having the same problem with a number of USB3 harddisks, upgrading to 4.0.1 from the mainline PPA has fixed it for me.

I think it might be triggered by my Yubikey security key. I get "WARN Event TRB for slot 8 ep 4 with no TDs queued?" when gpg accesses the key.

Revision history for this message
Marcos Vives Del Sol (socram8888) wrote :
Download full text (8.9 KiB)

I am still experiencing this issue as of now:

marcos@S4X8-MANNY:~$ sudo apt-get update && sudo apt-get upgrade -y && uname -a
Ign http://es.archive.ubuntu.com vivid InRelease
Ign http://es.archive.ubuntu.com vivid-updates InRelease
Ign http://es.archive.ubuntu.com vivid-backports InRelease
Obj http://es.archive.ubuntu.com vivid Release.gpg
Des:1 http://es.archive.ubuntu.com vivid-updates Release.gpg [933 B]
Obj http://es.archive.ubuntu.com vivid-backports Release.gpg
Obj http://es.archive.ubuntu.com vivid Release
Des:2 http://es.archive.ubuntu.com vivid-updates Release [63,5 kB]
Ign http://security.ubuntu.com vivid-security InRelease
Obj http://security.ubuntu.com vivid-security Release.gpg
Obj http://es.archive.ubuntu.com vivid-backports Release
Obj http://es.archive.ubuntu.com vivid/main Sources
Obj http://es.archive.ubuntu.com vivid/restricted Sources
Obj http://security.ubuntu.com vivid-security Release
Obj http://es.archive.ubuntu.com vivid/universe Sources
Obj http://es.archive.ubuntu.com vivid/multiverse Sources
Obj http://es.archive.ubuntu.com vivid/main amd64 Packages
Obj http://es.archive.ubuntu.com vivid/restricted amd64 Packages
Obj http://security.ubuntu.com vivid-security/main Sources
Obj http://es.archive.ubuntu.com vivid/universe amd64 Packages
Obj http://es.archive.ubuntu.com vivid/multiverse amd64 Packages
Obj http://es.archive.ubuntu.com vivid/main i386 Packages
Obj http://security.ubuntu.com vivid-security/restricted Sources
Obj http://es.archive.ubuntu.com vivid/restricted i386 Packages
Obj http://security.ubuntu.com vivid-security/universe Sources
Obj http://es.archive.ubuntu.com vivid/universe i386 Packages
Obj http://es.archive.ubuntu.com vivid/multiverse i386 Packages
Obj http://security.ubuntu.com vivid-security/multiverse Sources
Obj http://es.archive.ubuntu.com vivid/main Translation-es
Obj http://es.archive.ubuntu.com vivid/main Translation-en
Obj http://es.archive.ubuntu.com vivid/multiverse Translation-es
Obj http://security.ubuntu.com vivid-security/main amd64 Packages
Obj http://es.archive.ubuntu.com vivid/multiverse Translation-en
Obj http://es.archive.ubuntu.com vivid/restricted Translation-es
Obj http://security.ubuntu.com vivid-security/restricted amd64 Packages
Obj http://es.archive.ubuntu.com vivid/restricted Translation-en
Obj http://es.archive.ubuntu.com vivid/universe Translation-es
Obj http://es.archive.ubuntu.com vivid/universe Translation-en
Obj http://security.ubuntu.com vivid-security/universe amd64 Packages
Des:3 http://es.archive.ubuntu.com vivid-updates/main Sources [9.756 B]
Des:4 http://es.archive.ubuntu.com vivid-updates/restricted Sources [28 B]
Obj http://security.ubuntu.com vivid-security/multiverse amd64 Packages
Des:5 http://es.archive.ubuntu.com vivid-updates/universe ...

Read more...

Revision history for this message
Scott (e2e8e2) wrote :

I know people are looking at this, but it' been a long time, and since this bug can and does cause data corruption (it happened to me) I'd hope that this one would be near the top of the heap.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Can folks affected by this bug test the following kernels and report back? We are looking for the last kernel version that has the bug and the first that does not:

v4.0-rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.0-rc1-vivid/
v4.0-rc2: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.0-rc2-vivid/
v4.0-rc3: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.0-rc3-vivid/

Revision history for this message
NightShade (tim-night-shade) wrote :

tim@desktop-tim:~$ uname -a
Linux desktop-tim 4.0.1-040001-generic #201504290935 SMP Wed Apr 29 09:36:55 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

No bug, I'll pull the other kernels down and test them over the next few days

Changed in linux (Ubuntu Utopic):
status: Confirmed → Incomplete
Changed in linux (Ubuntu Trusty):
status: Confirmed → Incomplete
Revision history for this message
Manuel Iglesias Alonso (glesialo) wrote :

I had to plug my USB3 external drive into a (much slower) USB2 socket to avoid data being corrupted by this bug.

Could you please reactivate this report and start looking for a solution?

Thanks.

Revision history for this message
gazhay (gazhay) wrote :

Recently bought a new machine and put 15.04 on it fresh.

I can confirm that my scanner which I had got working thanks to a usb2 card, no longer works as the motherboard of my new machine forces you to use xhci for all ports regardless.

There still seems to be some sort of bug in vivid.

Revision history for this message
Mikhail (mikhail-kuzmin) wrote :

I'm having the same problem on 3.13.0-57-lowlatency on Ubuntu 14.04.

Revision history for this message
Scott (e2e8e2) wrote :

If anyone has the time to test this (the system it's happening to me on is off site and I won't be back to it for a while), see if the problem still occurs if you connect the device through a USB 3 hub rather than directly to the USB port. It just occurred to me that somewhere in the process of trying to circumvent this problem I added a USB 3 hub to the mix, and now I'm wondering if it's the hub that's circumventing the problem or if backing down to 3.13.0.24.28 did the trick.

As it doesn't look like this bug is going to receive attention anytime soon I thought it might be worth someone doing a test. If there aren't any results by the time I get back to my system that was having trouble in a couple of months I'll try it myself.

Revision history for this message
Yuriy Vidineev (adeptg) wrote :

Dell XPS 13 9333, Ubuntu 14.04, 3.19.0-26-generic, USB 3.0 HDD enclosure (ASMedia AS2105). Directly connected to USB3 port - not detected at all. Connected to USB2.0 hub - detected and perfectly worked. Connected to USB 3.0 hub (ASIX AX88179) - determined ~50% times (~50% no any line in syslog after plug in). When it successfully determined in USB3 hub - works after it without any problem (however my test was short - 15 min rsync with a lot of files (my home folder))

Revision history for this message
Scott (e2e8e2) wrote :

So there was a difference between being plugged into a hub and not. Mine is working with 3.13.0.24.28, but I also have the hard drives connected through a USB 3 hub. I'm not sure if this information will help find the bug, but it would be nice if we could determine that using a hub can circumvent the problem.

I also am quite perplexed as to why this bug hasn't been addressed as it's about a year old now and it can cause data corruption. One would think that bugs which have the potential to cause data corruption would be among the highest priority to be fixed.

Revision history for this message
daniel lopez (dlopez7892) wrote :

I'm using a ASUS Rampage IV Black edition with Ubuntu currently and am also experiencing this issue. I've tested several kernel versions and distros, and this is not an issue unique to Ubuntu. It will happen on pretty much any kernel =>3.15 with certain ASMedia USB 3.0 controllers. I'm not sure if it is isolated to ASMedia ICs, but it does impact at least two ASmedia products. The upstream kernel devs are aware of the issue and don't seem very optimistic about being able to fix it. Apparently there is a poor design for a PCI to USB bridge used in these controllers, and the possibility of fixing it without ASMedia or the motherboard manufacturers contributing code is fairly slim. For me, I had to disable USB 3.0 entirely to even get Linux =>3.15 to boot on this board without having a kernel panic. While I'd love to see this get fixed, I'm not sure that chasing this down is worth the Ubuntu kernel team's time.

Revision history for this message
Scott (e2e8e2) wrote :

It's not exclusive to that controller. It's happening to me with a NEC uPD720200 USB 3.0 Host Controller, which is a very common USB 3 controller chip.

Revision history for this message
Malachi de AElfweald (malachid) wrote :

I see the same problem with a hub plugged in and nothing attached to it. This happens repeatedly. I've reported it to them as well, but it seems to be more of a problem with the xhci_hcd. This is with 3.19.0-30.

[ 3323.263466] usb 2-1.1: USB disconnect, device number 103
[ 3323.284305] usb 2-1.4: USB disconnect, device number 104
[ 3323.398767] usb 2-1: reset SuperSpeed USB device number 2 using xhci_hcd
[ 3323.416724] xhci_hcd 0000:04:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff8804288644e0
[ 3323.697914] usb 2-1.1: new SuperSpeed USB device number 105 using xhci_hcd
[ 3323.713630] usb 2-1.1: New USB device found, idVendor=05e3, idProduct=0612
[ 3323.713633] usb 2-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 3323.713635] usb 2-1.1: Product: USB3.0 Hub
[ 3323.713636] usb 2-1.1: Manufacturer: SKIVA TECHNOLOGIES INC
[ 3323.713638] usb 2-1.1: SerialNumber: t2
[ 3323.715331] hub 2-1.1:1.0: USB hub found
[ 3323.715596] hub 2-1.1:1.0: 4 ports detected
[ 3323.789786] usb 2-1.4: new SuperSpeed USB device number 106 using xhci_hcd
[ 3323.805724] usb 2-1.4: New USB device found, idVendor=05e3, idProduct=0612
[ 3323.805727] usb 2-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 3323.805729] usb 2-1.4: Product: USB3.0 Hub
[ 3323.805731] usb 2-1.4: Manufacturer: SKIVA TECHNOLOGIES INC
[ 3323.805732] usb 2-1.4: SerialNumber: t3
[ 3323.807841] hub 2-1.4:1.0: USB hub found
[ 3323.808171] hub 2-1.4:1.0: 4 ports detected

Revision history for this message
Malachi de AElfweald (malachid) wrote :

Bug did not go away with 15.10 / 4.2.0-16

Revision history for this message
Malachi de AElfweald (malachid) wrote :

Regarding comment #37

If I plug an Android phone (HTC One m7 GPe) to the same port instead of the USB3 Hub, it stays connected.
If I connect the USB hub instead, I get this problem.
If I connect the phone through the USB hub, I loose access to the phone rather quickly during one of the drop cycles.

Revision history for this message
Malachi de AElfweald (malachid) wrote :
Revision history for this message
Malachi de AElfweald (malachid) wrote :

I believe I have fixed mine. Perhaps someone else can test the fix on theirs?
From this http://unix.stackexchange.com/questions/91027/how-to-disable-usb-autosuspend-on-kernel-3-7-10-or-above

Edit the /etc/default/grub file and append to the GRUB_CMDLINE_LINUX_DEFAULT line:
usbcore.autosuspend=-1

sudo update-grub
reboot

I don't appear to be getting the disconnects anymore.

Revision history for this message
Robert Oswald (robert-oswald) wrote :

Tried your workaround but problem still exists.

Log:
Nov 6 21:35:38 rodomp01 kernel: [ 289.927150] sd 4:0:0:0: [sdb] uas_eh_abort_handler ffff88021311cd80 tag 0, inflight: CMD IN
Nov 6 21:35:41 rodomp01 kernel: [ 292.927539] scsi host4: uas_eh_task_mgmt: ABORT TASK timed out
Nov 6 21:35:41 rodomp01 kernel: [ 292.927588] sd 4:0:0:0: uas_eh_device_reset_handler
Nov 6 21:35:41 rodomp01 kernel: [ 292.927597] scsi host4: uas_eh_task_mgmt: LOGICAL UNIT RESET: error already running a task
Nov 6 21:35:41 rodomp01 kernel: [ 292.927603] scsi host4: uas_eh_bus_reset_handler start
Nov 6 21:35:41 rodomp01 kernel: [ 292.927669] usb 3-1: stat urb: killed, stream 2
Nov 6 21:35:41 rodomp01 kernel: [ 292.927763] sd 4:0:0:0: [sdb] uas_data_cmplt ffff88021311cd80 tag 0, inflight: CMD abort
Nov 6 21:35:41 rodomp01 kernel: [ 292.927770] sd 4:0:0:0: [sdb] data cmplt err -2 stream 2
Nov 6 21:35:41 rodomp01 kernel: [ 292.927804] sd 4:0:0:0: [sdb] uas_zap_dead ffff88021311cd80 tag 0, inflight: CMD abort
Nov 6 21:35:41 rodomp01 kernel: [ 292.927822] sd 4:0:0:0: [sdb] abort completed
Nov 6 21:35:41 rodomp01 kernel: [ 293.039920] usb 3-1: reset SuperSpeed USB device number 2 using xhci_hcd
Nov 6 21:35:41 rodomp01 kernel: [ 293.056271] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff8800d620f000
Nov 6 21:35:41 rodomp01 kernel: [ 293.056277] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff8800d620f048
Nov 6 21:35:41 rodomp01 kernel: [ 293.056280] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff8800d620f090
Nov 6 21:35:41 rodomp01 kernel: [ 293.056283] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff8800d620f0d8
Nov 6 21:35:41 rodomp01 kernel: [ 293.057750] scsi host4: uas_eh_bus_reset_handler success

cat /proc/cmdline:
BOOT_IMAGE=/vmlinuz-3.16.0-51-generic root=UUID=cee44af9-c463-4cf1-ac47-36add5959072 ro quiet splash usbcore.autosuspend=-1 vt.handoff=7

Revision history for this message
user (user-3) wrote :

hi there,

I had the same issue, and grub update helped me to resolve this issue

BOOT_IMAGE=/vmlinuz-3.19.0-33-generic root=UUID=...... ro quiet splash nox2apic usbcore.autosuspend=-1

Revision history for this message
Scott (e2e8e2) wrote :

Since my original post a year ago I have had a couple of occurrences of this problem even on the older kernel version that I'm using. I tried setting nox2apic as a boot parameter, and it appears to work. The only time I was getting the error on that kernel version was when I was resynching a RAID 1 array on 2 2tb USB 3 drives on the same controller. I got the error today when doing that, so I aborted the resynch and rebooted with nox2apic. It resynched all 2tb with no occurrences of the error. Now I'm going to try upgrading the kernel to current and see if I can cause the error again with nox2apic set.

Revision history for this message
Scott (e2e8e2) wrote :

I was able to upgrade to kernel version 3.13.0-74-generic (Ubuntu 14.04 LTS) from 3.13.0.24.28 and everything appears to work fine (i.e. no messages and no lost USB connections) with nox2apic set. I copied hundreds of gigabytes back and forth on USB 3 drives without any problems and a binary compare of the results showed the copies were exact. Unfortunately I can't upgrade any further than that as the next version, Ubuntu 14.10, is now obsolete and the next LTS version isn't out yet; so my installation won't upgrade at all right now. I'm stuck unless I want to do an install from scratch.

Revision history for this message
databill (julienjut) wrote :

I was affected by usb3.0 for years. Once Kernel 3.12(test release, linux-headers-3.12.0-031200rc2-generic_3.12.0-031200rc2.201309231935_amd64.deb, etc) has resolved my issue and works for a year, but since I update ubuntu to 15.04, it appears again and again.

I have the phenomenon as link described, http://unix.stackexchange.com/questions/91027/how-to-disable-usb-autosuspend-on-kernel-3-7-10-or-above.
I'm testing #44 suggestion, and I hope it will help me to step out the issue. Thanks, Malachi de AElfweald (malachid)

I will give back the test result tomorrow.

Revision history for this message
databill (julienjut) wrote :

Issue is resolved! It has been working over 24 Hours.

#uptime
11:02:54 up 1 day, 1:37, 2 users, load average: 0.28, 0.25, 0.24

Thanks again!

solutions refer to #44,

Edit the /etc/default/grub file and append to the GRUB_CMDLINE_LINUX_DEFAULT line:
usbcore.autosuspend=-1

sudo update-grub
sudo reboot

Revision history for this message
Scott (e2e8e2) wrote :

I have found another setting that in my case was the real cause of the problem, namely ASPM (Active State Power Management). I had noticed that I was getting the disabled endpoint error message at odd times, like in the middle of the night, when there was little or no activity to the USB 3 disks attached to the controller. It always struck me as odd that the problem would occur randomly like that and not be clustered around the periods of high activity.

It turns out that ASPM was active by default in the BIOS, buried in a power management setting for PCI devices (i.e. it didn't say ASPM in the setting title). Once I turned that setting off all the errors stopped, both when the devices were active and when they were idle. I'm sort of assuming that the issues happens when ASPM tries to turn off or go to a lower power state on the USB controller. I don't know if this is a bug in the BIOS, firmware of the USB controller, or the linux kernel device driver, or a combination of those. Whichever it is though, turning this off in the BIOS solved everything in every kernel version that I've tested. I also installed Arch Linux so that I could try a recent kernel version and it works find there too. So if you are having this problem and it the other solutions haven't worked for you, look for this setting in your BIOS. It might be in the power management section, but also might be in the peripherals, devices, or advanced PCI settings area.

tags: removed: performing-bisect
Revision history for this message
Rolf Leggewie (r0lf) wrote :

utopic has seen the end of its life and is no longer receiving any updates. Marking the utopic task for this ticket as "Won't Fix".

Changed in linux (Ubuntu Utopic):
status: Incomplete → Won't Fix
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu Vivid) because there has been no activity for 60 days.]

Changed in linux (Ubuntu Vivid):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu Trusty) because there has been no activity for 60 days.]

Changed in linux (Ubuntu Trusty):
status: Incomplete → Expired
Revision history for this message
sibulini (sibulini) wrote :

Hi everyone, from your discussions, I can't find the root cause about the problem. I meet this issue once in 3.14.55 with android OS. I hope get the root cause. Thanks a lot!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.