USB 3.0 mass storage devices intermittently disconnect/reconnect

Bug #1332722 reported by Bob McChesney
30
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Linux Mint
New
Undecided
Unassigned

Bug Description

I am using Linux Mint 17 Cinnamon 64-bit, with all updates, and also updated to kernel 3.13.0-29-generic #53, on a system with a Gigabyte 990FXA-UD3 motherboard (revision 4.0, and latest BIOS).

The issue I am having is with USB 3.0 mass storage devices intermittently disconnecting/reconnecting while in use. This issue causes corruption of NTFS volumes. The issue does not occur if the drive is connected to USB 2.0 ports.

There's a high chance that this issue is upstream, or it might just be a poor caddy (I only currently have one to test with). However, that may depend on whether certain kernel backports have occurred or not, so I feel it is wise to get support at the distro level before moving on to other avenues. Any help would be appreciated.

Here is details of my devices:
bob@BOB1 ~ $ lspci | grep USB
00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:12.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:14.5 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI2 Controller
00:16.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:16.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
02:00.0 USB controller: VIA Technologies, Inc. Device 3483 (rev 01)

I am pretty sure that the VIA controller is the USB 3.0 XHCI controller.

Here is the output of dmesg at the time of an issue (if full dmesg output is required, I will supply):
[ 651.450348] ------------[ cut here ]------------
[ 651.450363] WARNING: CPU: 5 PID: 0 at /build/buildd/linux-3.13.0/drivers/usb/host/xhci-ring.c:1589 handle_cmd_completion+0xde4/0xe40()
[ 651.450367] Modules linked in: vmnet(OF) parport_pc vmw_vsock_vmci_transport vsock vmw_vmci vmmon(OF) ctr ccm rfcomm bnep bluetooth dm_crypt binfmt_misc snd_hda_codec_hdmi ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack joydev snd_usb_audio snd_usbmidi_lib ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables mxm_wmi kvm_amd kvm crct10dif_pclmul crc32_pclmul dm_multipath arc4 scsi_dh ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd rt2800pci rt2800mmio rt2800lib rt2x00pci rt2x00mmio rt2x00lib snd_hda_codec_realtek mac80211 serio_raw edac_core fam15h_power edac_mce_amd k10temp snd_hda_intel snd_hda_codec snd_seq_midi cfg80211 snd_seq_midi_event snd_hwdep sp5100_tco snd_rawmidi snd_pcm eeprom_93cx6 crc_ccitt snd_seq i2c_piix4 snd_page_alloc nvidia(POF) snd_seq_device snd_timer snd drm soundcore wmi mac_hid ppdev lp parport dm_mirror dm_region_hash dm_log hid_microsoft hid_generic usbhid hid usb_storage firewire_ohci psmouse r8169 firewire_core ahci mii crc_itu_t libahci [last unloaded: vmnet]
[ 651.450503] CPU: 5 PID: 0 Comm: swapper/5 Tainted: PF O 3.13.0-29-generic #53-Ubuntu
[ 651.450507] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./990FXA-UD3, BIOS F2 07/15/2013
[ 651.450510] 0000000000000009 ffff88042ed43da8 ffffffff8171a214 0000000000000000
[ 651.450518] ffff88042ed43de0 ffffffff810676bd ffff88041812f510 0000000000000000
[ 651.450525] ffff88041812f330 000000041812f330 ffff880412e40000 ffff88042ed43df0
[ 651.450532] Call Trace:
[ 651.450536] <IRQ> [<ffffffff8171a214>] dump_stack+0x45/0x56
[ 651.450551] [<ffffffff810676bd>] warn_slowpath_common+0x7d/0xa0
[ 651.450557] [<ffffffff8106779a>] warn_slowpath_null+0x1a/0x20
[ 651.450563] [<ffffffff81577d14>] handle_cmd_completion+0xde4/0xe40
[ 651.450570] [<ffffffff81097fb5>] ? check_preempt_curr+0x75/0xa0
[ 651.450576] [<ffffffff8109801c>] ? ttwu_do_wakeup+0x3c/0xc0
[ 651.450583] [<ffffffff815797b3>] xhci_irq+0x333/0xa60
[ 651.450589] [<ffffffff81579ef1>] xhci_msi_irq+0x11/0x20
[ 651.450595] [<ffffffff810bf6be>] handle_irq_event_percpu+0x3e/0x1d0
[ 651.450600] [<ffffffff810bf88d>] handle_irq_event+0x3d/0x60
[ 651.450606] [<ffffffff810c2267>] handle_edge_irq+0x77/0x130
[ 651.450614] [<ffffffff81015cde>] handle_irq+0x1e/0x30
[ 651.450620] [<ffffffff8172cf0d>] do_IRQ+0x4d/0xc0
[ 651.450627] [<ffffffff817226ad>] common_interrupt+0x6d/0x6d
[ 651.450630] <EOI> [<ffffffff815cd0ef>] ? cpuidle_enter_state+0x4f/0xc0
[ 651.450642] [<ffffffff815cd219>] cpuidle_idle_call+0xb9/0x1f0
[ 651.450649] [<ffffffff8101ce9e>] arch_cpu_idle+0xe/0x30
[ 651.450654] [<ffffffff810beb95>] cpu_startup_entry+0xc5/0x290
[ 651.450662] [<ffffffff81040fb8>] start_secondary+0x218/0x2c0
[ 651.450667] ---[ end trace 67a8e5f2b63d5882 ]---

It's worth noting that there is a setting in the motherboard for XHCI hand-off; the issue occurs whether this is Enabled or Disabled. It's also worth knowing that I had issues getting USB ports to work initially. In order to get both USB 2.0 and 3.0 ports to work, I had to enable IOMMU in the BIOS and add iommu=soft to the kernel options (as advised by many discussions).

Based on the above, I thought the issue could be related to this: https://lists.debian.org/debian-kernel/2014/03/msg00153.html

However - if my reading of changelogs is correct - then the changes under suspicion have already been reverted in the kernel and backported to Ubuntu 3.13.0-18.38 (namely "xhci 1.0: Limit arbitrarily-aligned scatter gather" and "USBNET: ax88179_178a: enable tso if usb host supports sg dma"): http://changelogs.ubuntu.com/changelogs/pool/main/l/linux/linux_3.13.0-30.54/changelog

I'm kind of new to this. Am I on the right track?

Revision history for this message
Prophet6 (prophet6) wrote :

Similar bug reported here:
https://bugs.launchpad.net/linuxmint/+bug/1353050

Although in my case, the USB3 drives will not mount. I'm now wondering if this has something to do with the filesystem on those external drives? Apparently NTFS works. What type of drive? USB flash memory, or a spinning drive? I've got a Seagate GoFlex 3TB and it is formatted as ext4

Revision history for this message
Bob McChesney (bmcchesney) wrote :

My drive showing the issue is a spinning drive... It's a Western Digital 3TB SATA WD30EZRX.

Filesystem doesn't seem to matter. The drive has a number of partitions; ext4, NTFS. My issue - the drive intermittently disconnecting and reconnecting - happens regardless of the activity... Using dd to write to the partition raw, copying files to a mounted ext3 or NTFS partition. Problem seems to be more related to the throughput being experienced. I've got nothing definitive, but I get the *feeling* it might happen when either there is high usage (i.e. performing a dd or copy operation that is going at full speed) or perhaps when usage drops off (a copy operation is happening and another process needs to use CPU for a moment).

I'm open to providing more information or tests if requested, but I do need some guidance about the useful information to provide as I'm a beginner at hardware issues in Linux.

Revision history for this message
Prophet6 (prophet6) wrote :

Thanks Bob. Ditto begineer status wrt to Linux in general, and hardware issues specifically, so there's not anything useful I can suggest.

I noticed this little exchange among kernel developers working on USB issues related to our USB controller chip (VL805):

https://www.mail-archive.com/linux-usb%40vger.kernel.org/msg43326.html

It could be that they borked something and have patched it. Perhaps there might be a kernel patch in the new future that addresses our issues? I'm trying to figure out how to track that patch down and apply it. Let me know if you find anything.

Cheers

Revision history for this message
Roger Lawhorn (rll-m) wrote :

I have this same issue.
USB 3.0 drives disconnect randomly.
It is driving me nuts.

Revision history for this message
Roger Lawhorn (rll-m) wrote :

BTW: Linux Mint 17.1/17.2 64bit
MSI Gt70-2pe Dominator Pro laptop.

Revision history for this message
Bob McChesney (bmcchesney) wrote :

Hi,

I've not seen the problem in a long time, myself. I forgot about this issue but at some point the issue stopped, probably with a kernel update. I just presumed it had been fixed.

In the time since I reported it, I've done quite a number of updates and unfortunately I don't remember when my USB3 issues stopped. I've upgraded the 3.13 kernel to the latest 17.0 offered, then upgraded to 17.1 and moved to latest 3.16 kernels, and now I'm on 17.2 and have been using the latest 3.19 kernels (right now I'm running 3.19.0-22).

It's been working for a while so I can only presume that it was fixed upstream and that the fixes were put into later 3.16 kernels and even possibly backported into later 3.13 kernels. However, that's not to diminish your issues. Are you able to report on your current kernel versions and also the output of lsusb and the dmesg output (hopefully showing the stacktrace of any exceptions). If your devices and symptoms are same/similar to mine, I'd be happy to run the same kernel version as you and attempt to reproduce here.

Revision history for this message
Roger Lawhorn (rll-m) wrote :
Download full text (35.7 KiB)

dad@dad-pc ~ $ lsusb
Bus 004 Device 002: ID 8087:8000 Intel Corp.
Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 003 Device 002: ID 8087:8008 Intel Corp.
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 003: ID 1770:ff00
Bus 001 Device 006: ID 046d:c52b Logitech, Inc. Unifying Receiver
Bus 001 Device 004: ID 0cf3:3004 Atheros Communications, Inc.
Bus 001 Device 011: ID 1f75:0621 Innostor Technology Corporation
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Happening on usb 3.0 only. Usb 2.0 working.
Honestly, I think this is an intel issue.

dad@dad-pc ~ $ inxi -Fx
System: Host: dad-pc Kernel: 3.18.6-031806-generic x86_64 (64 bit, gcc: 4.6.3)
           Desktop: Cinnamon 2.6.13 (Gtk 3.10.8) Distro: Linux Mint 17.2 Rafaela
Machine: Mobo: Micro-Star model: MS-1763 version: REV:0.C Bios: American Megatrends version: E1763IMS.51B date: 01/29/2015
CPU: Quad core Intel Core i7-4810MQ CPU (-HT-MCP-) cache: 6144 KB flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) bmips: 22348.9
           Clock Speeds: 1: 2801.00 MHz 2: 2801.00 MHz 3: 2801.00 MHz 4: 2801.00 MHz 5: 2801.00 MHz 6: 2801.00 MHz 7: 2801.00 MHz 8: 2801.00 MHz
Graphics: Card-1: Intel 4th Gen Core Processor Integrated Graphics Controller bus-ID: 00:02.0
           Card-2: NVIDIA GK104M [GeForce GTX 880M] bus-ID: 01:00.0
           X.Org: 1.15.1 driver: nvidia Resolution: 1920x1080@60.0hz, 1920x1080@60.0hz
           GLX Renderer: GeForce GTX880M/PCIe/SSE2 GLX Version: 4.5.0 NVIDIA 346.72 Direct Rendering: Yes
Audio: Card: Intel 8 Series/C220 Series High Definition Audio Controller driver: snd_hda_intel bus-ID: 00:1b.0
           Sound: Advanced Linux Sound Architecture ver: k3.18.6-031806-generic
Network: Card-1: Qualcomm Atheros AR9462 Wireless Network Adapter driver: ath9k bus-ID: 04:00.0
           IF: wlan0 state: up mac: 48:5a:b6:19:7f:2f
           Card-2: Qualcomm Atheros Killer E2200 Gigabit Ethernet Controller driver: alx port: d000 bus-ID: 03:00.0
           IF: eth1 state: down mac: 44:8a:5b:44:0e:b5
           Card-3: Atheros usb-ID: 001-004
           IF: N/A state: N/A speed: N/A duplex: N/A mac: N/A
Drives: HDD Total Size: 4500.9GB (41.7% used) 1: id: /dev/sda model: ST2000LM003_HN size: 2000.4GB temp: 28C
           2: id: /dev/sdb model: ST2000LM003_HN size: 2000.4GB temp: 28C 3: USB id: /dev/sdc model: Ext._HDD size: 500.1GB temp: 0C
Partition: ID: / size: 50G used: 12G (25%) fs: ext4 ID: /boot size: 938M used: 282M (32%) fs: ext2
           ID: /home size: 1.5T used: 1.3T (90%) fs: ext4 ID: swap-1 size: 25.77GB used: 0.00GB (0%) fs: swap
RAID: Device-1: /dev/md1 - active components: online: sda4[2] sdb4[1]
           Info: raid: 1 report: 2/2 blocks: 1690262464 chunk size: N/A
           Device-2: /dev/md0 - active components: online: sda3[0] sdb3[1]
           Info: raid: 1 report: 2/2 blocks: 976832 chunk size: N/A
Sensors: System Temperatures: cpu: 42.0C mobo: N/A gpu: 0.0:42C
           Fan Speeds (in rpm): cpu: N/A
Info: Processes: 283 Uptime: 1 day Memory: 1846.0/11934.9MB Runlevel: 2 G...

Revision history for this message
Anthony Hoppe (anthony-hoppe) wrote :

I am having this problem as well, however, the only difference is that once the USB 3.0 device drops, no devices are recognized via any USB 3.0 port until a reboot. When this happens, USB 2.0 ports are unaffected.

Mint Cinnamon 17.2, up to date (according to Update Manager).

Motherboard: ASRock 970M Pro3

lightningmcqueen ~ $ lsusb
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 007 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 006 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 005 Device 002: ID 045e:071d Microsoft Corp.
Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 009 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 008 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
lightningmcqueen ~ $

lightningmcqueen ~ $ inxi -Fx
System: Host: lightningmcqueen Kernel: 3.16.0-38-generic x86_64 (64 bit, gcc: 4.8.2)
           Desktop: Cinnamon 2.6.13 Distro: Linux Mint 17.2 Rafaela
Machine: Mobo: ASRock model: 970M Pro3 Bios: American Megatrends version: P1.00 date: 01/21/2015
CPU: Octa core AMD FX-8370E Eight-Core (-MCP-) cache: 16384 KB flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm) bmips: 52685.4
           Clock Speeds: 1: 1400.00 MHz 2: 1400.00 MHz 3: 1900.00 MHz 4: 1400.00 MHz 5: 1400.00 MHz 6: 1400.00 MHz 7: 1400.00 MHz 8: 1400.00 MHz
Graphics: Card: NVIDIA Device 0f02 bus-ID: 01:00.0
           X.Org: 1.15.1 drivers: nvidia (unloaded: fbdev,vesa) Resolution: 1920x1080@60.0hz, 1920x1080@60.0hz
           GLX Renderer: GeForce GT 730/PCIe/SSE2 GLX Version: 4.5.0 NVIDIA 352.30 Direct Rendering: Yes
Audio: Card-1: NVIDIA GF108 High Definition Audio Controller driver: snd_hda_intel bus-ID: 01:00.1
           Card-2: Advanced Micro Devices [AMD/ATI] SBx00 Azalia (Intel HDA) driver: snd_hda_intel bus-ID: 00:14.2
           Sound: Advanced Linux Sound Architecture ver: k3.16.0-38-generic
Network: Card: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
           driver: r8169 ver: 2.3LK-NAPI port: d000 bus-ID: 05:00.0
           IF: eth0 state: up speed: 1000 Mbps duplex: full mac: d0:50:99:72:85:d3
Drives: HDD Total Size: 512.1GB (3.9% used) 1: id: /dev/sda model: Samsung_SSD_850 size: 512.1GB temp: 0C
Partition: ID: / size: 454G used: 19G (5%) fs: ext4 ID: swap-1 size: 17.13GB used: 0.00GB (0%) fs: swap
RAID: No RAID devices detected - /proc/mdstat and md_mod kernel raid module present
Sensors: System Temperatures: cpu: 16.5C mobo: N/A gpu: 0.0:44C
           Fan Speeds (in rpm): cpu: N/A
Info: Processes: 237 Uptime: 1:11 Memory: 6138.9/15941.3MB Runlevel: 2 Gcc sys: 4.8.4
           Client: Shell (bash 4.3.11) inxi: 1.9.17
lightningmcqueen ~ $

Revision history for this message
Bob McChesney (bmcchesney) wrote :
Download full text (4.2 KiB)

Hello,

Apologies for the late reply. Been really busy and not had much time for computer troubleshooting recently.

My device is:
Bus 009 Device 003: ID 174c:5136 ASMedia Technology Inc.

Here's my system information:
System: Host: BOB1 Kernel: 3.19.0-26-generic x86_64 (64 bit, gcc: 4.8.2)
           Desktop: Cinnamon 2.6.13 Distro: Linux Mint 17.2 Rafaela
Machine: System: Gigabyte product: N/A
           Mobo: Gigabyte model: 990FXA-UD3 version: x.x Bios: American Megatrends version: F2 date: 07/15/2013
CPU: Octa core AMD FX-8350 Eight-Core (-MCP-) cache: 16384 KB flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm) bmips: 64286.7
           Clock Speeds: 1: 1400.00 MHz 2: 1400.00 MHz 3: 1400.00 MHz 4: 1400.00 MHz 5: 1400.00 MHz 6: 1400.00 MHz 7: 1400.00 MHz 8: 1400.00 MHz
Graphics: Card: NVIDIA GK106 [GeForce GTX 660] bus-ID: 01:00.0
           X.Org: 1.15.1 drivers: nvidia (unloaded: fbdev,vesa,nouveau) Resolution: 1680x1050@59.9hz, 1680x1050@59.9hz
           GLX Renderer: GeForce GTX 660/PCIe/SSE2 GLX Version: 4.5.0 NVIDIA 346.72 Direct Rendering: Yes
Audio: Card-1: NVIDIA GK106 HDMI Audio Controller driver: snd_hda_intel bus-ID: 01:00.1
           Card-2: Advanced Micro Devices [AMD/ATI] SBx00 Azalia (Intel HDA) driver: snd_hda_intel bus-ID: 00:14.2
           Sound: Advanced Linux Sound Architecture ver: k3.19.0-26-generic
Network: Card-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
           driver: r8169 ver: 2.3LK-NAPI port: b000 bus-ID: 05:00.0
           IF: eth0 state: down mac: 74:d4:35:5e:c0:be
           Card-2: Ralink RT2800 802.11n PCI driver: rt2800pci ver: 2.3.0 bus-ID: 04:06.0
           IF: wlan0 state: up mac: 68:7f:74:e2:0f:a4
           Card-3: Microsoft Xbox 360 Wireless Adapter usb-ID: 007-004
           IF: N/A state: N/A mac: N/A
Drives: HDD Total Size: 6501.3GB (20.6% used) 1: id: /dev/sda model: WDC_WD10EADS size: 1000.2GB
           2: id: /dev/sdb model: WDC_WD20EARS size: 2000.4GB 3: USB id: /dev/sdc model: SNA size: 500.1GB
           4: USB id: /dev/sdd model: 2105 size: 3000.6GB
Partition: ID: / size: 25G used: 17G (70%) fs: ext4 ID: /home size: 731G used: 660G (96%) fs: ext4
           ID: swap-1 size: 4.36GB used: 0.00GB (0%) fs: swap
RAID: No RAID devices detected - /proc/mdstat and md_mod kernel raid module present
Sensors: System Temperatures: cpu: 25.5C mobo: 41.0C gpu: 0.0:51C
           Fan Speeds (in rpm): cpu: 997 fan-2: 0 fan-3: 0 fan-4: 0 fan-5: 0
Info: Processes: 250 Uptime: 5 min Memory: 1708.8/16012.9MB Runlevel: 2 Gcc sys: 4.8.4
           Client: Shell (bash 4.3.11) inxi: 1.9.17

Also here's the kernel logs produced when the device is connected:
[ 237.728033] usb 2-1: new SuperSpeed USB device number 2 using xhci_hcd
[ 237.744780] usb 2-1: New USB device found, idVendor=174c, idProduct=5136
[ 237.744783] usb 2-1: New USB device strings: Mfr=2, Product=3, SerialNumber=1
[ 237.744785] usb 2-1: Product: AS2105
[ 237.744786] usb 2-1: Manufacturer: ASMedia
[ 237.744788] usb 2-1: SerialNumber: 00000000000000000000
[ 237.745252] usb-storage 2-1:1.0: USB Mass Storage device detected
[ 237.745413] scsi host9: u...

Read more...

Revision history for this message
Bob McChesney (bmcchesney) wrote :

Hi,

The problem I was experiencing last year and that I reported here definitely appears to be resolved. It appears that the following commit introduced the issue with VIA chipsets and this was included in 3.13:

https://github.com/torvalds/linux/commit/20e7acb13ff48fbc884d5918c3697c27de63922a

I think the following commit resolved the problem:

https://github.com/torvalds/linux/commit/6fcfb0d682a8212d321a6131adc94daf0905992a

This appears to have been pulled into 3.16-rc4 so both of you (Anthony and Roger) should have that; I think you're probably seeing a different issue that I had.

Can you possibly get the dmesg output immediately after the problem occurs, and also the output of lspci to see what USB controller make/model you have?

(Not saying I'm going to be able to help, but I'll try and see if I have any ideas.)

Regards,
Bob

Revision history for this message
Roger Lawhorn (rll-m) wrote :
Download full text (11.0 KiB)

I am still being driven nuts by this.
Random usb 3.0 hard drive disconnection.

>lspci
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor DRAM Controller (rev 06)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06)
00:02.0 VGA compatible controller: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller (rev 06)
00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 05)
00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 (rev 05)
00:1b.0 Audio device: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller (rev 05)
00:1c.0 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 (rev d5)
00:1c.2 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #3 (rev d5)
00:1c.3 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #4 (rev d5)
00:1c.4 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #5 (rev d5)
00:1d.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation HM87 Express LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 05)
00:1f.3 SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 05)
01:00.0 VGA compatible controller: NVIDIA Corporation GK104M [GeForce GTX 880M] (rev a1)
03:00.0 Ethernet controller: Qualcomm Atheros Killer E2200 Gigabit Ethernet Controller (rev 13)
04:00.0 Network controller: Qualcomm Atheros AR9462 Wireless Network Adapter (rev 01)
05:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5227 PCI Express Card Reader (rev 01)

>dmesg
[ 145.156136] EXT4-fs (dm-4): recovery complete
[ 145.156257] EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: (null)
[ 196.482958] usb 2-1: USB disconnect, device number 2
[ 196.483807] scsi 6:0:0:0: rejecting I/O to offline device
[ 196.483812] scsi 6:0:0:0: [sdc] killing request
[ 196.483815] scsi 6:0:0:0: rejecting I/O to offline device
[ 196.483817] scsi 6:0:0:0: [sdc] killing request
[ 196.483819] scsi 6:0:0:0: rejecting I/O to dead device
[ 196.483829] scsi 6:0:0:0: rejecting I/O to dead device
[ 196.483839] scsi 6:0:0:0: [sdc]
[ 196.483840] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 196.483842] scsi 6:0:0:0: [sdc] CDB:
[ 196.483844] Read(10): 28 00 00 06 4a 00 00 00 f0 00
[ 196.483852] blk_update_request: I/O error, dev sdc, sector 412160
[ 196.483863] scsi 6:0:0:0: [sdc]
[ 196.483864] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 196.483866] scsi 6:0:0:0: [sdc] CDB:
[ 196.483867] Read(10): 28 00 00 06 4a f0 00 00 10 00
[ 196.483873] blk_update_request: I/O error, dev sdc, sector 412400
[ 196.540789] Buffer I/O error on dev dm-4, logical block 244190112, async p...

Revision history for this message
Roger Lawhorn (rll-m) wrote :

I suspect my usb external case. It is made by Innostor.

Revision history for this message
Roger Lawhorn (rll-m) wrote :

p.s.
My laptop is new.
I am using cryptsetup with my external drives, though not always and it doesn't seem to make a difference.
Changing the xhci settings in the BIOS has no effect with the default firmware.
I paid for a custom unlocked BIOS build and can install the unlocked firmware if you think some sort of custom BIOS setting will fix this.

Revision history for this message
Roger Lawhorn (rll-m) wrote :

I think I got this. It is a poor quality usb 3.0 cable. I twisted the cable once to make it tight and even laid something on top of the external drive to keep it from moving around. I am now not disconnecting. It is not software, it is hardware. I will try to find a good quality usb 3.0 cable and test again.

Revision history for this message
Bob McChesney (bmcchesney) wrote :

Good find! Glad you've got your issue resolved.

Revision history for this message
Roger Lawhorn (rll-m) wrote :
Download full text (5.0 KiB)

I have tried different cables. No good. Maybe all of my usb cables are poor quality? I tried a different usb hd external enclosure but same results. This time I caught all of the dmesg errors as they occurred. BTW: I disabled all usb 3.0 support in the BIOS. This means Mint is handling everything and the BIOS is not handling anything. Most of the BIOS settings are for when you have an OS that cannot handle usb 3.0 properly and you need the BIOS to handle it instead. I am going to try this all out in Windows 7 as my next test.

Here they are:

Unplug usb cable while drive is mounted:
[ 3003.621425] usb 2-1: USB disconnect, device number 12
[ 3003.653969] buffer_io_error: 10 callbacks suppressed
[ 3003.653972] Buffer I/O error on dev dm-4, logical block 0, lost sync page write
[ 3003.658322] Buffer I/O error on dev dm-4, logical block 244190112, async page read
[ 3003.658352] Buffer I/O error on dev dm-4, logical block 244190132, async page read
[ 3003.658362] Buffer I/O error on dev dm-4, logical block 0, async page read
[ 3003.658369] Buffer I/O error on dev dm-4, logical block 1, async page read
[ 3003.658377] Buffer I/O error on dev dm-4, logical block 244190133, async page read
[ 3003.658491] Buffer I/O error on dev dm-4, logical block 244190133, async page read
[ 3003.658499] Buffer I/O error on dev dm-4, logical block 244190133, async page read
[ 3003.658505] Buffer I/O error on dev dm-4, logical block 244190133, async page read
[ 3003.658509] Buffer I/O error on dev dm-4, logical block 244190133, async page read

Plug back in:
[ 3046.080228] usb 2-1: new SuperSpeed USB device number 13 using xhci_hcd
[ 3046.099130] usb 2-1: New USB device found, idVendor=1f75, idProduct=0621
[ 3046.099133] usb 2-1: New USB device strings: Mfr=4, Product=5, SerialNumber=6
[ 3046.099135] usb 2-1: Product: Ext. HDD
[ 3046.099136] usb 2-1: Manufacturer: Innostor
[ 3046.099137] usb 2-1: SerialNumber: 20140313
[ 3046.317277] usb-storage 2-1:1.0: USB Mass Storage device detected
[ 3046.317525] scsi host11: usb-storage 2-1:1.0
[ 3048.536611] scsi 11:0:0:0: Direct-Access Innostor Ext. HDD PQ: 0 ANSI: 6
[ 3048.537303] sd 11:0:0:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[ 3048.537652] sd 11:0:0:0: [sdc] Write Protect is off
[ 3048.537655] sd 11:0:0:0: [sdc] Mode Sense: 3b 00 00 00
[ 3048.538004] sd 11:0:0:0: [sdc] No Caching mode page found
[ 3048.538008] sd 11:0:0:0: [sdc] Assuming drive cache: write through
[ 3048.540006] sd 11:0:0:0: Attached scsi generic sg3 type 0
[ 3048.601280] sd 11:0:0:0: [sdc] Attached SCSI disk
[ 3050.408024] EXT4-fs (dm-4): recovery complete
[ 3050.408027] EXT4-fs (dm-4): mounted filesystem with ordered data mode. Opts: (null)

Copy files....Disconnect occurs during copy:
[ 3262.494967] usb 2-1: USB disconnect, device number 13
[ 3262.495822] scsi 11:0:0:0: rejecting I/O to offline device
[ 3262.495827] scsi 11:0:0:0: [sdc] killing request
[ 3262.495830] scsi 11:0:0:0: rejecting I/O to offline device
[ 3262.495832] scsi 11:0:0:0: [sdc] killing request
[ 3262.495835] scsi 11:0:0:0: rejecting I/O to dead device
[ 3262.495846] scsi 11:0:0:0: rejecting I/O to dead device
[ 3262.495858] scsi 11:0:0:0: [...

Read more...

Revision history for this message
Roger Lawhorn (rll-m) wrote :

ok, I now have traced the issue down.

All of my external drives are encrypted with LUKS (cryptsetup).

I hooked up an NTFS formated drive and the issue stopped completely.

ONLY drives encrypted with LUKS are randomly disconnecting.

Who do I contact about this issue?

I don't want to leave external drives unencrypted.

Talk about a huge security flaw.

Revision history for this message
Roger Lawhorn (rll-m) wrote :

p.s. I tested the NTFS formatted drive using the exact same cables and enclosures.

It is NOT hardware, it is related to using LUKS.

Revision history for this message
Bob McChesney (bmcchesney) wrote :

I'm afraid I'm not really in a position to help here. Can't think why LUKS would cause random disconnecting but other use wouldn't. I would have thought the USB drivers would present an encapsulated view of the disk to LUKS so there shouldn't really be disconnects; I could be very wrong though. To my mind it would be more feasible that something in the way LUKS might cause more disk seeking or bursts than other scenarios might more regularly reveal a bug in the USB drivers (or more likely a bug in the device you're using and that a kernel patch needs created for).

Have you tried this in Ubuntu out of interest? If not, I think it might worth testing and if reproducible there then you might get better response or ideas from the Ubuntu community. No offence to Linux Mint - I love it - but I don't think they're as able to deal with core Ubuntu or Linux Kernel issues. It's been a while since I originally created this issue, but in that time I think I've learned that this is not the best place to direct this type of problem.

Please let me know how you get on.

Revision history for this message
Roger Lawhorn (rll-m) wrote :

Well, I am at a stand still.
I am using EXT4 on the encrypted drive so I plan to format a drive with EXT4 and no encryption to see if the issue goes away.
It would very logical that encrpytion adds a great deal of delay to any drive and the delay is causing a timeout.
However, not sure how that results in a total disconnect of the usb device. I thought that would be hardware.

Revision history for this message
Roger Lawhorn (rll-m) wrote :

Update:

Some absolutes seem to be that:
a) This is a Linux Mint issue (other distros work fine)
b) This is a usb 3 issue (plugging drive into a usb 2 port works fine)
c) NTFS volumes work fine (other filesystems disconnect).

I am wondering if this has anything to do with external usb 3 drives being 'win' devices without the knowledge of the owner.
In other words, NTFS is programmed into the usb->sata controller in your external drive case.

Also, I tested a drive with EXT4 and no LUKS encryption, same issue occurs.
Only NTFS never disconnects.

Revision history for this message
Bob McChesney (bmcchesney) wrote :

Hi,

You've obviously got a frustrating issue so please don't let the following suggestion annoy you; just want to make sure the test you did is fair...

Not sure how you tested other distros but I don't think testing several would be particularly useful; the variances in the kernel bases and what patches each distro has backported creating too many variables. My suspicion would lie with the kernel so replicating the problem with the same upstream kernel would be my priority.

(For example, if it was me - currently running Linux Mint 17.2 with kernel 3.19.0-26-generic, I would probably want to test either with Ubuntu 14.04 LTS with the vivid kernel added (i.e. the LTSEnablementStack) to version 3.19.0-26; I think that's the test that removes the most variables.)

However, looking back on your comments, you're using kernel 3.18.6-031806-generic. Where did this come from? It doesn't look distro standard. I'd want to understand where you got this from. If it's your own build, would you be willing to retest Linux Mint but with the latest 3.19 kernel from the Software Manager? (Linux Mint should provide the latest kernels pulled from the same LTS Enablement Stack I believe.)

Sorry if I'm way off mark, but this is what I would do.

Regards,
Bob

Revision history for this message
Roger Lawhorn (rll-m) wrote :

Resolved:

Bad usb 3 external enclosure or enclosure is a 'win' device.

I have now stopped the issue by finding an external usb 3 enclosure that does work.

So either:
a) My enclosure is flaky as hell.
b) My enclosure is a 'win' device.

It is an Innostor. I only paid $8 on ebay for it.
That might say it all.

I have seen my fair share of cheap chinese devices over the years that do not meet standards and employ various tricks that make these cheap devices work in Windows and no other OS.
For those who posted and were using a Western Digital external drive, might be a 'win' device issue also.

Thanks for all of the help.
I learned a lot about things like IOMMU and other things.

Hardware issues are damn hard to diagnose.
It looks like software so much of the time.

BTW:
In order to even use the other usb 3 enclosure all of my drives have to be reformatted.
Anything formatted by the Innostor cannot be seen by the other enclosure.
However, after reformatting the drives now work.

Revision history for this message
Bob McChesney (bmcchesney) wrote :

Glad to hear you got this fixed. Any objections to me marking this bug as Invalid? It appears all commenters were receiving different issues to reporter (me) and those still commenting have resolved it.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.