Unable to Connect Third HDD via USB Hub

Bug #1663991 reported by Simon Davis on 2017-02-12
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linux
Unknown
Unknown
linux (Ubuntu)
Medium
Joseph Salisbury
Yakkety
Medium
Joseph Salisbury

Bug Description

Hello, I am running:

(K)Ubuntu 16.10
4.8.0-37-generic #39-Ubuntu SMP Thu Jan 26 02:27:07 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

There has been found a bug in 4.8, which causes kernel call trace when I connect third USB HDD to a USB hub. After the trace, kswapd0 process uses 100% of CPU.

The bug has been fixed in 4.10. Is it possible to roll out new kernel packages, please?

References:
https://ubuntuforums.org/showthread.php?t=2345213
https://bugzilla.kernel.org/show_bug.cgi?id=177551
http://marc.info/?l=linux-mm-commits&m=148650422714993&w=2

CVE References

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1663991

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Simon Davis (davis-decent) wrote :

Sorry, I was unable to make apport-collect working.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Vej (vej) wrote :

Hello,

I added a Bug Watch for the upstream bug you mentioned.

This should trigger some "fix available" mechanisms in a few minutes.
Please let me know, if these bugs are not identical.

Best Regards

Vej

Simon Davis (davis-decent) wrote :

Thank you very much.

It is exactly the same bug I am facing to. The same results on different hardware.

Joseph Salisbury (jsalisbury) wrote :

The commit that fixes this bug was cc'd to upstream stable. The fix should make it's way into Ubuntu through the normal stable update process.

Do you have a work around for this issue?

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
Changed in linux (Ubuntu):
status: Confirmed → Triaged
Simon Davis (davis-decent) wrote :

Amazing, thank you very much.

I do not know about any workaround. Plugging the third HDD into another USB port brings the same result. On the other hand, it is possible to use USB flash drives. Probably because they are not UAS devices.

I had been running a three USB HDDs setup to the day the bug hit me. The setup was 2x USB 3 HDD plus 1x HDD in a USB 2 enclosure case. Everything was working.

I bought a USB hub and new USB 3 case enclosure. With this setup, I tried to plug all three HDDs into the hub and also two of them into the hub and one to another USB port. I ended up with 100% failure probability.

The presence of the hub or all three HDDs being USB 3, something could have triggered the bug. In such a case, using USB 2 or removing a hub could be an extremely ugly work around.

If there is not a serious work around, and I have not found any yet, I will wait for the kernel update.

Simon Davis (davis-decent) wrote :

Since this bug is really ruining my life I have been checking for a kernel update every day. I updated to 4.8.0-38-generic today but, sadly, the bug is still present. May I ask when the bug gets fixed and patched kernel is released as update? Thank you in advance.

Simon Davis (davis-decent) wrote :
Download full text (10.8 KiB)

Can someone please explain me the Ubuntu update process? My systems are two kernel updates older now and the bug is still present.

Today, it made the system disk unresponsive on my server, damaging its filesystem and dpkg database in the process. I think that such a bug is critical and should be fixed and pushed to updates immediately.

Why is Ubuntu always late with fixing serious bugs which e.g. Debian has fixed a long time ago? I really love Ubuntu but it looks like it is time to move over.

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
   74 root 20 0 0 0 0 R 100.0 0.0 0:18.28 kswapd0

Mar 8 10:22:01 server-1 kernel: [ 116.176399] Workqueue: usb_hub_wq hub_event
Mar 8 10:22:01 server-1 kernel: [ 116.176402] 0000000000000286 00000000cbca6753 ffff98c7d3967150 ffffffffaae30e92
Mar 8 10:22:01 server-1 kernel: [ 116.176406] 0000000000000000 0000000000000000 ffff98c7d39671e0 ffffffffaabab0c1
Mar 8 10:22:01 server-1 kernel: [ 116.176410] 0260400100000287 ffff98c7d3967178 0000000000000040 0000000000000012
Mar 8 10:22:01 server-1 kernel: [ 116.176415] Call Trace:
Mar 8 10:22:01 server-1 kernel: [ 116.176420] [<ffffffffaae30e92>] dump_stack+0x63/0x81
Mar 8 10:22:01 server-1 kernel: [ 116.176424] [<ffffffffaabab0c1>] warn_alloc_failed+0x101/0x160
Mar 8 10:22:01 server-1 kernel: [ 116.176428] [<ffffffffaae46ed8>] ? find_next_bit+0x18/0x20
Mar 8 10:22:01 server-1 kernel: [ 116.176431] [<ffffffffaabab39f>] __alloc_pages_slowpath+0x1ff/0x9c0
Mar 8 10:22:01 server-1 kernel: [ 116.176434] [<ffffffffaababe15>] __alloc_pages_nodemask+0x2b5/0x300
Mar 8 10:22:01 server-1 kernel: [ 116.176438] [<ffffffffaabff945>] alloc_pages_current+0x95/0x140
Mar 8 10:22:01 server-1 kernel: [ 116.176442] [<ffffffffaac093c9>] new_slab+0x419/0x6e0
Mar 8 10:22:01 server-1 kernel: [ 116.176445] [<ffffffffaac0af40>] ___slab_alloc+0x3a0/0x4b0
Mar 8 10:22:01 server-1 kernel: [ 116.176449] [<ffffffffab0b24ab>] ? xhci_segment_alloc.isra.25+0xfb/0x140
Mar 8 10:22:01 server-1 kernel: [ 116.176453] [<ffffffffaaa652e5>] ? x86_swiotlb_alloc_coherent+0x25/0x50
Mar 8 10:22:01 server-1 kernel: [ 116.176456] [<ffffffffaac0b070>] __slab_alloc+0x20/0x40
Mar 8 10:22:01 server-1 kernel: [ 116.176459] [<ffffffffaac0cb22>] __kmalloc+0x182/0x1e0
Mar 8 10:22:01 server-1 kernel: [ 116.176462] [<ffffffffab0b24ab>] ? xhci_segment_alloc.isra.25+0xfb/0x140
Mar 8 10:22:01 server-1 kernel: [ 116.176465] [<ffffffffab0b24ab>] xhci_segment_alloc.isra.25+0xfb/0x140
Mar 8 10:22:01 server-1 kernel: [ 116.176469] [<ffffffffab0b2533>] xhci_alloc_segments_for_ring+0x43/0x100
Mar 8 10:22:01 server-1 kernel: [ 116.176473] [<ffffffffab0b26ae>] xhci_ring_alloc.constprop.32+0xbe/0x140
Mar 8 10:22:01 server-1 kernel: [ 116.176476] [<ffffffffab0b3d6f>] xhci_alloc_stream_info+0x1df/0x3e0
Mar 8 10:22:01 server-1 kernel: [ 116.176480] [<ffffffffab0b3af0>] ? xhci_alloc_command+0x100/0x140
Mar 8 10:22:01 server-1 kernel: [ 116.176484] [<ffffffffab0af3b7>] xhci_alloc_streams+0x447/0x840
Mar 8 10:22:01 server-1 kernel: [ 116.176487] [<ffffffffab0aef70>] ? xhci_check_bandwidth+0x370/0x370
Mar 8 10:22...

Vej (vej) wrote :

> Can someone please explain me the Ubuntu update process?
Please read this first: https://wiki.ubuntu.com/StableReleaseUpdates.

For kernel updates you find further informations here: https://wiki.ubuntu.com/KernelTeam/KernelUpdates

Simon Davis (davis-decent) wrote :

Vej, Thank you very much for posting the links.

"What sort of updates are allowed for post-release kernels?"

"It fixes a critical issue (data-loss, OOPs, crashes) or is security related. Security related issues might be covered by security releases which are special in handling and publication."

...

"2. When"

"Stable release updates will, in general, only be issued in order to fix high-impact bugs. Examples of such bugs include:"

"Bugs which may, under realistic circumstances, directly cause a loss of user data"

This is exactly the case but Ubuntu probably does not care until a massive data loss happens to thousands of people. I encountered serious data loss yesterday. Fortunately, I managed to recover the important part of data from backups.

Even my main disk, SATA connected SSD (!), was left stuck as a result of this bug. The USB connected drives hung first, the rest of the system hung after I tried to unplug them. They were not mounted at the moment, the SATA SSD was, of course.

USB is a very important part of computers and using USB connected storage devices is critically important functionality as well. I really do not understand how such a serious bug can be left unfixed.

Vej (vej) wrote :

To my knowledge (which is limited, when it comes to the kernel package with all its extra rules) this falls under:

3. The patch is included in a corresponding upstream stable or extended stable release. For the lifetime of both LTS and non-LTS releases we will be pulling upstream stable updates from the corresponding series. There will be one tracking bug report for each stable update but additional references to existing bugs will be added to the contained patches (on best can do base).

As Joseph Salisbury wrote it had been cc'd to the upstream stable mailinglist already. So you might want to try to find out how their progress is on this.

Simon Davis (davis-decent) wrote :

Thank you again. I have to say that I am not experienced in the standard Ubuntu or kernel update processes thus I was unsure how to interpret Joseph's message.

I also do not want to waste your time but may I ask if you can tell me where I can check or what/where I can search for? Thank you in advance.

Vej (vej) on 2017-03-09
tags: added: yakkety
Joseph Salisbury (jsalisbury) wrote :

To submit your patch, send your patch with the detailed description/changelog and your Signoff (ending with Signed-off-by: your name <email>), to the emails listed from ./scripts/get_maintainer.pl drivers/SUBSYSTEM-DETAILS (the get_maintainer.pl is from the kernel sources). Once you have sent the patch upstream and it's accepted, please drop a note here so that we can cherry-pick/include the patch into Ubuntu kernel.

Simon Davis (davis-decent) wrote :

A link from my first message:

http://marc.info/?l=linux-mm-commits&m=148650422714993&w=2

Is this what you are talking about?

Sorry, I am not a kernel developer in any way. I do not make patches nor I know how to submit them. I did not know that reporting a bug has such requirements.

Please forget it and, if you want, close this bug. I will install another distribution which has this bug already fixed. At least, that is what I know how to do it.

Thank you all for your time, I am very sorry for the disturbance.

Vej (vej) wrote :

@Simon I am sorry, if Simons comment has upset you. It had been triggered by me asking on the IRC channel #ubuntu-kernel for help about finding the best resource to answer comment #12.

That caused some misunderstandings I guess.

@Joseph: That patch is already accepted into 4.10.1 (see https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=a810007afe239d59c1115fcaa06eb5b480f876e9). Do we need to submit it again to get it integrated into 4.4.X, or will you cherry-pick/include from 4.10.1?

Vej (vej) wrote :

@Simon I meant Davids comment of course.

Simon Davis (davis-decent) wrote :

Vej, I was not upset, I was just feeling sad and looking for a light at the end of the tunnel.
I always loved Ubuntu, converted to this distro a long time ago with all my servers and desktops, but I cannot put my data at risk.

I really really appreciate your help and personal involvement and help. Thank you and thank to all the people involved in fixing this bug and making Ubuntu better.

Really, thank you very much!

Joseph Salisbury (jsalisbury) wrote :

Hi Vej/Simon,

No need to resubmit the patch. It was cc'd to stable when it was submitted to mainline. However, the patch has not landed in the upstream 4.4 kernel as of yet. I can cherry pick it into the Ubuntu Xenial kernel in the meantime. Let me do that and post a test kernel to ensure it fixes the bug. I'll post the test kernel here shortly.

Changed in linux (Ubuntu Yakkety):
status: New → In Progress
Changed in linux (Ubuntu):
status: Triaged → In Progress
Changed in linux (Ubuntu Yakkety):
importance: Undecided → Medium
Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Yakkety):
assignee: nobody → Joseph Salisbury (jsalisbury)
Joseph Salisbury (jsalisbury) wrote :

I built a Yakkety test kernel with the patch. It can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1663991/

Can you test this kernel and see if it resolves this bug?

Also, it does not look like this patch is needed in the 4.4 based kernel, just 4.8 and newer.

Simon Davis (davis-decent) wrote :

Joseph, I have installed the patched kernel. As of now, I have a USB hub connected with four drives attached to it. It is twice as much as I could connect before. The syslog seems to be perfectly clear and kswapd does not hang with 100% CPU consumption. I am stressing the drives a bit and everything is working fine so far. The problem looks completely fixed.

Thank you very much!

Two more questions please, ...

The kernel from you updated (removed) my only kernel installed. When a new kernel is released, will the update process take care automatically or should I execute some pre/post update procedures to keep my system working and clean?

Can I expect the patch to be included in the next kernel update? I do not want to make my system unstable again.

Thank you once more, Joseph and Vej ...

Simon Davis (davis-decent) wrote :

Several hours and a terabyte later, still going strong. I would consider it definitely fixed.

Joseph Salisbury (jsalisbury) wrote :

Thanks for testing and the update Simon. I will submit an SRU request so the patch is included in the next kernel release. Simon, you would not have to do anything special, the new kernel will just replace your existing kernel.

Simon Davis (davis-decent) wrote :

In fact, it already offers me to "update" to the previous kernel because the system understands this new one as a downgrade. No problem at all, I will not let the system to update the kernel until the new release is out.

In the end, I tested six USB HDD drives plus two flash disks connected all at once and everything was perfect.

Thank you once more. It helped me a lot and I believe it will save other users from having hard time, too. I cannot say how much happy you have made me.

Simon Davis (davis-decent) wrote :

Hello again, the gods must definitely hate me.

Your testing kernel got replaced immediately. Automatically. I do not understand it. I have never used any automatic updater and I always do all updates by hand. Anyway, the original kernel had been back with next reboot. OK, no problem I told to myself. I will wait for official update. I also didn't write it to you because I was extremely happy that the problem was fixed and did not want to disturb anyone with something what was going to be fixed with standard update schedule.

Well, the 4.8.0-42 kernel showed up today along with some other updates. Great! I updated my system with the most recent packages as I always did for years and years. But it was different this time. The system did not boot with the new kernel. I have never experienced this. I must say that my system is pretty clean and perfectly stable. I usually keep from doing anything potentially hazardous or dirty.

The good thing is that I still can boot the older 4.8.0-41 kernel. So I am definitely cursed and sentenced to stay with the bug forever. :) No, really, I was not expecting this.

Why am I writing this most probably off topic information here? At first, I wanted to share my complete bad luck situation with someone. Second, I want to ask you if there is any possibility that it can be caused by the previous kernel switching back and forth for the test. I do not think so but who knows.

I searched the net, checked ubuntuforums.org and I could not find anyone else with these symptoms. Maybe it is too early. I was checking the logs and failed to identify the problem. I will try harder later because I have to work now.

Vej (vej) wrote :

@Joseph Do you have an open SRU request for this somewhere?

@Simon I have to admit, that I do not know if your problem might be triggered by the previous self compiled kernel.

However: It sometimes happens, that fixes for kernel bugs create regressions. I think you are better with filing another bug report for that. You might link to this report, but the new report should contain all available informations (so it should not be necessary to read this one to understand the other bug report).

Simon Davis (davis-decent) wrote :

Thanks, I will do when I have more info. I will try one more boot and analyze the syslog closely.

Joseph Salisbury (jsalisbury) wrote :

@Vej Yes the SRU request can be viewed in this mail thread on the kernel team mailing list:

https://lists.ubuntu.com/archives/kernel-team/2017-March/082946.html

Changed in linux (Ubuntu Yakkety):
status: In Progress → Fix Committed
Changed in linux (Ubuntu):
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-yakkety' to 'verification-done-yakkety'. If the problem still exists, change the tag 'verification-needed-yakkety' to 'verification-failed-yakkety'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-yakkety
Simon Davis (davis-decent) wrote :

Hello, I tested the patch using a testing kernel from Joseph Salisbury and it completely fixed the bug. Later, I was unable to boot to 4.8.0-42 which came with updates. I do not know why. I have never before experienced such a problem.

What can I do to try it again better?

Launchpad Janitor (janitor) wrote :
Download full text (14.5 KiB)

This bug was fixed in the package linux - 4.8.0-49.52

---------------
linux (4.8.0-49.52) yakkety; urgency=low

  * linux: 4.8.0-49.52 -proposed tracker (LP: #1684427)

  * [Hyper-V] hv: util: move waiting for release to hv_utils_transport itself
    (LP: #1682561)
    - Drivers: hv: util: move waiting for release to hv_utils_transport itself

linux (4.8.0-48.51) yakkety; urgency=low

  * linux: 4.8.0-48.51 -proposed tracker (LP: #1682034)

  * [Hyper-V] hv: vmbus: Raise retry/wait limits in vmbus_post_msg()
    (LP: #1681893)
    - Drivers: hv: vmbus: Raise retry/wait limits in vmbus_post_msg()

linux (4.8.0-47.50) yakkety; urgency=low

  * linux: 4.8.0-47.50 -proposed tracker (LP: #1679678)

  * CVE-2017-6353
    - sctp: deny peeloff operation on asocs with threads sleeping on it

  * CVE-2017-5986
    - sctp: avoid BUG_ON on sctp_wait_for_sndbuf

  * vfat: missing iso8859-1 charset (LP: #1677230)
    - [Config] NLS_ISO8859_1=y

  * [Hyper-V] pci-hyperv: Use device serial number as PCI domain (LP: #1667527)
    - net/mlx4_core: Use cq quota in SRIOV when creating completion EQs

  * Regression: KVM modules should be on main kernel package (LP: #1678099)
    - [Config] powerpc: Add kvm-hv and kvm-pr to the generic inclusion list

  * linux-lts-xenial 4.4.0-63.84~14.04.2 ADT test failure with linux-lts-xenial
    4.4.0-63.84~14.04.2 (LP: #1664912)
    - SAUCE: apparmor: fix link auditing failure due to, uninitialized var

  * regession tests failing after stackprofile test is run (LP: #1661030)
    - SAUCE: fix regression with domain change in complain mode

  * Permission denied and inconsistent behavior in complain mode with 'ip netns
    list' command (LP: #1648903)
    - SAUCE: fix regression with domain change in complain mode

  * unexpected errno=13 and disconnected path when trying to open /proc/1/ns/mnt
    from a unshared mount namespace (LP: #1656121)
    - SAUCE: apparmor: null profiles should inherit parent control flags

  * apparmor refcount leak of profile namespace when removing profiles
    (LP: #1660849)
    - SAUCE: apparmor: fix ns ref count link when removing profiles from policy

  * tor in lxd: apparmor="DENIED" operation="change_onexec"
    namespace="root//CONTAINERNAME_<var-lib-lxd>" profile="unconfined"
    name="system_tor" (LP: #1648143)
    - SAUCE: apparmor: Fix no_new_privs blocking change_onexec when using stacked
      namespaces

  * apparmor oops in bind_mnt when dev_path lookup fails (LP: #1660840)
    - SAUCE: apparmor: fix oops in bind_mnt when dev_path lookup fails

  * apparmor auditing denied access of special apparmor .null fi\ le
    (LP: #1660836)
    - SAUCE: apparmor: Don't audit denied access of special apparmor .null file

  * apparmor label leak when new label is unused (LP: #1660834)
    - SAUCE: apparmor: fix label leak when new label is unused

  * apparmor reference count bug in label_merge_insert() (LP: #1660833)
    - SAUCE: apparmor: fix reference count bug in label_merge_insert()

  * apparmor's raw_data file in securityfs is sometimes truncated (LP: #1638996)
    - SAUCE: apparmor: fix replacement race in reading rawdata

  * unix domain socket cross permission check failing with n...

Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Po-Hsu Lin (cypressyew) on 2019-10-03
Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.