Linux 4.15 and onwards fails to initialize some hard drives

Bug #1783906 reported by danieru on 2018-07-26
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)

Bug Description

I have two hard drives, the main hard drive is a TOSHIBA DT01ACA200 the second backup hard drive is a Western Digital WD5003AZEX. I installed lubuntu 18.04.1 on the Toshiba HDD and it boots just fine, the issue is with the second hard drive, when installing the WD HDD wouldn't even come as an option to install, and after boot the WD HDD still wouldn't come up, this is the dmesg with the stock kernel (4.15)

ata6 is the WD HDD that refuses to work. The messages:
[ 302.107650] ata6: SError: { CommWake 10B8B Dispar DevExch }
[ 302.107658] ata6: hard resetting link
[ 307.860291] ata6: link is slow to respond, please be patient (ready=0)
[ 312.120898] ata6: COMRESET failed (errno=-16)
[ 363.445120] INFO: task kworker/u8:5:201 blocked for more than 120 seconds.
[ 363.445131] Not tainted 4.15.0-29-generic #31-Ubuntu
[ 363.445135] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 363.445140] kworker/u8:5 D 0 201 2 0x80000000
[ 363.445155] Workqueue: events_unbound async_run_entry_fn
[ 363.445157] Call Trace:
[ 363.445171] __schedule+0x291/0x8a0
[ 363.445177] schedule+0x2c/0x80
[ 363.445182] ata_port_wait_eh+0x7c/0xf0
[ 363.445186] ? wait_woken+0x80/0x80
[ 363.445189] ata_port_probe+0x28/0x40
[ 363.445192] async_port_probe+0x2e/0x52
[ 363.445196] async_run_entry_fn+0x3c/0x150
[ 363.445199] process_one_work+0x1de/0x410
[ 363.445203] worker_thread+0x32/0x410
[ 363.445207] kthread+0x121/0x140
[ 363.445210] ? process_one_work+0x410/0x410
[ 363.445214] ? kthread_create_worker_on_cpu+0x70/0x70
[ 363.445218] ret_from_fork+0x22/0x40

Repeat constantly. Also when I try to turn off the computer, the computer seem to freeze, the lights of the keyboard and mouse turn off and the computer just stay on.

I tried Tiny Core 9.0 which has linux 4.14.10, and i didn't had this issue, i also installed linux 4.14 on this lubuntu 18.04 using Ukuu Kernel Update Utility. And with this kernel version, or any previous version the WD HDD does work again. Here's a dmesg of lubuntu 18.04 with linux 4.14 and the WD HDD finally coming up at the end:

Also tried with with linux 4.17 but the WD HDD would also refuse to work on this version. Here's another dmesg with this version:

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-29-generic 4.15.0-29.31
ProcVersionSignature: Ubuntu 4.15.0-29.31-generic 4.15.18
Uname: Linux 4.15.0-29-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version k4.15.0-29-generic.
ApportVersion: 2.20.9-0ubuntu7.2
Architecture: amd64
 /dev/snd/controlC0: testtest 756 F.... pulseaudio
 /dev/snd/controlC1: testtest 756 F.... pulseaudio
 Card hw:0 'NVidia_1'/'HDA NVidia at 0xfe020000 irq 22'
   Mixer name : 'Realtek ALC1200'
   Components : 'HDA:10ec0888,10ec0000,00100101 HDA:10de0002,10de0101,00100000'
   Controls : 56
   Simple ctrls : 21
 Card hw:1 'NVidia'/'HDA NVidia at 0xfcffc000 irq 16'
   Mixer name : 'Nvidia GPU 42 HDMI/DP'
   Components : 'HDA:10de0042,38422651,00100100'
   Controls : 21
   Simple ctrls : 3
CurrentDesktop: LXDE
Date: Thu Jul 26 17:10:58 2018
HibernationDevice: RESUME=UUID=17e70869-516d-4b63-b900-e92e3c4b73b6
InstallationDate: Installed on 2018-07-26 (0 days ago)
InstallationMedia: Lubuntu 18.04.1 LTS "Bionic Beaver" - Release amd64 (20180725)
MachineType: 113 1
ProcFB: 0 nouveaufb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-29-generic root=UUID=71cf0a32-7827-49be-a2c0-cd50a72c26a1 ro quiet splash vt.handoff=1
 linux-restricted-modules-4.15.0-29-generic N/A
 linux-backports-modules-4.15.0-29-generic N/A
 linux-firmware 1.173.1
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install) 09/30/2008
dmi.bios.vendor: Phoenix Technologies, LTD
dmi.bios.version: 6.00 PG 113-M2-E113
dmi.board.vendor: EVGA
dmi.board.version: 1
dmi.chassis.asset.tag: Unknow
dmi.chassis.type: 3
dmi.chassis.vendor: EVGA
dmi.chassis.version: 113-M2-E113
dmi.modalias: dmi:bvnPhoenixTechnologies,LTD:bvr6.00PG:bd09/30/2008:svn113:pn1:pvr1:rvnEVGA:rn113-M2-E113:rvr1:cvnEVGA:ct3:cvr113-M2-E113: 1
dmi.product.version: 1
dmi.sys.vendor: 113

danieru (danigaritarojas) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
danieru (danigaritarojas) wrote :

I forgot to mention the Western Digital hard drive has always been slow to actually show up, and I’ve always gotten those "COMRESET failed, link is slow to respond" messages since I bought it.

And two details I forgot to mention about the hard drives:
1 The Toshiba hard drive that works is formatted with GPT.
2 The Western Digital hard drive that doesn't work with linux 4.15+ is formatted with MBR.
And just for the record, usb flash drives formatted with MBR work just fine on linux 4.15.

Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to . Please test the latest v4.18 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.


tags: added: kernel-da-key
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
danieru (danigaritarojas) wrote :

I first experienced this issue while testing the second beta of ubuntu 18.04, as i explained in the original bug report this issue doesn't happen if i use linux 4.14

I've installed linux 4.18rc6 as explained in the wiki, but the WD HDD still won't come up.
Here's the dmesg with this kernel:
Aside from the WD HDD still not working with this kernel i noticed that unlike with linux 4.15 and 4.17, with this 4.18rc6 my computer wouldn't freeze while trying to turn it off.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: kernel-bug-exists-upstream
danieru (danigaritarojas) wrote :

I went ahead and test this bug from linux 4.15rc1 and up to linux 4.15rc5.
What i found is that linux 4.15rc3 is the last RC version where this bug doesn't occur, and linux 4.15rc4 is the first RC version where this bug occur.

Here's the dmesg with linux 4.15rc3 and the WD HDD working:
And here's the dmesg with linux 4.15rc4 and the WD HDD failing:

Joseph Salisbury (jsalisbury) wrote :

Thanks for the testing. I'll review the commits between -rc3 and -rc4. If nothing sticks out, I'll start a kernel bisect a build a test kernel.

Changed in linux (Ubuntu Bionic):
status: New → In Progress
Changed in linux (Ubuntu Cosmic):
status: Confirmed → In Progress
Changed in linux (Ubuntu Bionic):
importance: Undecided → Medium
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Cosmic):
assignee: nobody → Joseph Salisbury (jsalisbury)
Joseph Salisbury (jsalisbury) wrote :

Commit 2dc0b46b5ea3 in v4.15-rc4 looks like it could be related. I built a Bionic test kernel with this commit reverted. The test kernel can be downloaded from:

Can you test this kernel and see if it resolves this bug?

Thanks in advance!

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.

danieru (danigaritarojas) wrote :

Your test kernel with commit 2dc0b46b5ea3 does indeed fix the issue with the WD HDD. Here's the dmesg with your test kernel:

As you can see at the top i was using: "[ 0.000000] Linux version 4.15.0-29-generic (root@kathleen) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #32~lp1783906Commit2dc0b46b5ea3Reverted SMP Mon Jul 30 14:47:35 (Ubuntu 4.15.0-29.32~lp1783906Commit2dc0b46b5ea3Reverted-generic 4.15.18)"

And at the end you can see the WD HDD working: "[ 57.091948] ata6.00: ATA-8: WDC WD5003AZEX-00K1GA0, 80.00A80, max UDMA/133"

This however did not fix the freeze when rebooting, that bug seems also to be introduced in linux 4.15 but i'll have to do more test about that one and report it as a separate bug. Any ideas on how to get information, dmesg, logs, after the computer freeze before rebooting would be helpful. As i have absolutely no idea what causes that freeze. Only thing i know is that it also doesn't happen on linux 4.14

Hi David,

A kernel bug report was opened against Ubuntu [0].  This bug is a
regression introduced in v4.15-rc4.  The following commit was identified
as the cause of the regression:

        2dc0b46b5ea3 ("libata: sata_down_spd_limit should return if
driver has not recorded sstatus speed")

I was hoping to get your feedback, since you are the patch author.  Do
you think gathering any additional data will help diagnose this issue,
or would it be best to submit a revert request?



David Milburn (dmilburn) wrote :

Hi Joe,

Can we put some debug in sata_down_spd_limit() and see some of the values
for spdlimit, sstatus, spd, mask, right before the change to not force the mask.
Also, can we track the exact path of calling sata_down_spd_limit().

The the intent of the patch was not to force the speed down before reading the
link speed from SStatus, it doesn't change mask. Thanks.

Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with debug output as requested by upstream. Can you test this kernel and post your syslog or dmesg output?

The test kernel can be downloaded from:

This kernel should exhibit the bug, but will write to syslog with output like:
***-> Value of spd_limit: N
***-> Value of sstatus: N
***-> Value of spd: N
***-> Value of mask: N

I did a dump_stack in the function, so you should see a new stack trace as well.

danieru (danigaritarojas) wrote :

Here's dmesg with 4.15.0-30-generic #33~lp1783906DEBUG running for 10 mins:

David Milburn (dmilburn) wrote :

Ok, thanks, please let me look through the output.

David Milburn (dmilburn) wrote :

Hi Joe,

The original intent of the patch was not forcing a 6Gbs drive down to 1.5Gbs after hotplug.


SSTATUS = 275 = 0x113 = 0001 0001 0011


spd = 1 (corresponds to 1.5Gbps)

One question, the print for mask came after these 2 lines of code, right?

       /* unconditionally mask off the highest bit */
        bit = fls(mask) - 1;
        mask &= ~(1 << bit);

In your debug kernel, would please remove the following 2 lines of code (so the code falls thru)

        if (spd > 1)
                mask &= (1 << (spd - 1)) - 1;
        else <=====
                return -EINVAL; <===== Remove these 2 lines of code.

And finally, at the end __sata_set_spd_needed(), would you please print out these values?


The original patch didn't force changing mask, but, it does "return -EINVAL", I think it
may fix the problem just letting it fall thru to the end of sata_down_spd_limit(), but it would
still help to see the original debug values and these new ones with possible fix. Thank you.

Joseph Salisbury (jsalisbury) wrote :

Thanks for the response, David! Correct the print for the mask came after those 2 lines of code. Here is the snippet:

 /* unconditionally mask off the highest bit */
        bit = fls(mask) - 1;
        mask &= ~(1 << bit);

        /* Debug added for lp1783906: */
        pr_info("***-> Function calling sata_down_spd_limit: %pf", __builtin_return_address(0));
        printk(KERN_DEBUG "***-> Value of spd_limit: %u\n", spd_limit);
        printk(KERN_DEBUG "***-> Value of sstatus: %u\n", sstatus);
        printk(KERN_DEBUG "***-> Value of spd: %u\n", spd);
        printk(KERN_DEBUG "***-> Value of mask: %u\n", mask);

         * Mask off all speeds higher than or equal to the current one. At
         * this point, if current SPD is not available and we previously
         * recorded the link speed from SStatus, the driver has already
         * masked off the highest bit so mask should already be 1 or 0.
         * Otherwise, we should not force 1.5Gbps on a link where we have
         * not previously recorded speed from SStatus. Just return in this
         * case.
        if (spd > 1)
                mask &= (1 << (spd - 1)) - 1;

Joseph Salisbury (jsalisbury) wrote :

I'll build another test kernel with you're suggestions and ask @danieru to test.

Joseph Salisbury (jsalisbury) wrote :

I built a second test kernel with additional debug output as requested by David. This kernel also has the two lines removed request by David.

Can you test this kernel and post your syslog or dmesg output?

The test kernel can be downloaded from:

danieru (danigaritarojas) wrote :

Here's dmesg with #33~lp1783906DEBUGv2:
I expected this to have the bug and only print additional debug info, but it seems this fix the bug. The second drive (WDC) came up and I could mount it's partitions and read info from them.

Just to make sure, I did a second run and everything seemed to still work fine, here's the dmesg of the second run:

(didn't test write now that i think about it)

David Milburn (dmilburn) wrote :


I think the fix is removing the "return" and letting the code fall through in sata_down_spd_limit().
Please give me some time to review the latest log, and I will need to reconfigure a couple of local
systems and re-test with that change. Thanks.

David Milburn (dmilburn) wrote :


May I ask for one more test? Looking at the code some more, I don't think I can just remove
the return. The root of the problem is hard reset fails and sata_link_hardreset() is never able
to reconfigure the speed. This patch sets link->sata_spd_limit before returning, I have been
testing linux-4.19-rc1 successfully with a 6Gb drive on AHCI platform.

Would you mind testing this patch with no debug?

If all goes well, I will submit upstream. Thanks.

tags: added: patch
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with the patch from David posted in comment #21. The test kernel can be downloaded from:

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.

Thanks in advance!

Changed in linux (Ubuntu Cosmic):
status: In Progress → Confirmed
Changed in linux (Ubuntu Bionic):
status: In Progress → Confirmed
Changed in linux (Ubuntu):
status: In Progress → Confirmed
Changed in linux (Ubuntu Cosmic):
assignee: Joseph Salisbury (jsalisbury) → nobody
Changed in linux (Ubuntu Bionic):
assignee: Joseph Salisbury (jsalisbury) → nobody
Changed in linux (Ubuntu):
assignee: Joseph Salisbury (jsalisbury) → nobody
Michał Wadowski (wadosm) wrote :


I have similar issue as described in topic. I have two drives, one primary SSD and second HDD inside DVD enclosure.

Both drives works fine until kernel version 4.15.0-rc4 (exact commit

Removing added "return -EINVAL;" in sata_down_spd_limit function fixes the problem.

Michał Wadowski (wadosm) wrote :

Unfortunately patch "0001-libata-sata_down_spd_limit-should-record-link-speed-.patch" don't work.

Here is my dmesg dump with proposed debug info

I'm not familiar with libata and sata_down_spd_limit function, but "return -EINVAL;" spoils functionality for me. I tried also "return 0;" instead of "return -EINVAL", but it didn't work either.

David Milburn (dmilburn) wrote :

Ok thanks, please look at this some more, I will try to get back with you soon.

Michał Wadowski (wadosm) wrote :

I forgot to say that even on Linux 4.14 I have to execute manually script while booting:

echo "- - -" > /sys/class/scsi_host/host1/scan

So, only after this reset sata my laptop can see external HDD. Since Linux 4.15 resetting has no effect.

I am analyzing how libata setup is working, so I will try to fix this issue later.

Michał Wadowski (wadosm) wrote :

I can split my issue into two cases:

1. My HDD inside DVD enclosure is not discovered. Then I have to reset stata device 3 or 4 times by "echo "- - -" > /sys/class/scsi_host/host1/scan" command

2. After stata resets, function sata_down_spd_limit() is called.

 * In Linux 4.14 this function set up link->sata_spd_limit from value 4294967295 to value 1. After this set up, HDD is discovered and works fine.
 * Since patch 4.15.0-rc4 value of link->sata_spd_limit is not changed, so HDD is not working at all.

Some debug information:
Inside function sata_down_spd_limit() is called ata_sstatus_online() and it returns 0 that means link is offline.
Value of "link->sata_spd" and "spd" is 0, value of "link->sata_spd_limit" is 4294967295 and value of "bit" is 31. After statement "mask &= ~(1 << bit);" mask is set to 2147483647

Because "spd" is 0, then statement "if (spd > 1)" is not pass and "else return -EINVAL;" is called.

Patch "else link->sata_spd_limit = mask;" doesn't work, because mask is 2147483647 and it seems to not be valid value (I think it should be 1, 2, or 4).

Michał Wadowski (wadosm) wrote :

I think that I fixed second part of my issue.

I modified patch 0001-libata-sata_down_spd_limit-should-record-link-speed-.patch

David Milburn (dmilburn) wrote :

I not sure upstream will let us change the value of mask.

Will you get the same result of you set sata_spd_limit?

link->sata_spd_limit = 0x7

I believe that corresponds to 6Gbps but I don't see a #define
we can use.

Also, is the comment accurate? Before you started debugging
did you see a hard reset fail? Thanks.

Michał Wadowski (wadosm) wrote :

No, my device doesn't work with 6Gbps. In kernel 4.14 if my drive was offline, function sata_down_spd_limit() cut down link->sata_spd_limit to 1, so it corresponds to 1.5Gbps, and was working. But it wasn't even try to set limit to 3Gbps.

In my device, unfortunately mask and link->sata_spd_limit is set to INT_MAX, so next bitwise operations doesn't make any sense, because mask is far out of range valid values.

Setting link->sata_spd_limit = 0x7 doesn't work (my device is not 6Gbps capable). In patch I cut mask to be not greater than 0x7, not just 0x7. Actually after masking, kernel tries to connect with 3Gbps (mask 0x3) and it succeed.

About comment and hard reset. I get your patch and modified it leaving commit message and comment in code untouched. I don't investigate this topic enough. I have to check how working __sata_set_spd_needed() and sata_link_hardreset() functions.

Michał Wadowski (wadosm) wrote :

Ok, I find out what happens.

While standard SATA setup, chain of functions is called: ... -> ata_eh_recover -> ata_eh_reset. If SATA is not initialized, then hard reset is performed (function ata_do_reset() ).

In both drivers I have, function ata_do_reset() returns 0, even if after this reset device is not working (like my second drive HDD).

There is no sata_down_spd_limit() calls at all. After hard reset second device is not working, and there is no tries to recover that.

At first time, or every time when I physically hotplug device, or I call "echo "- - -" > /sys/class/scsi_host/host1/scan", then chain of functions are called: ata_eh_recover -> ata_eh_schedule_probe, and variable trials is increasing. After a few hotplugs, when trials > ATA_EH_PROBE_TRIALS, then sata_down_spd_limit(link, 1) is called and it cuts down SATA bandwidth. After bandwidth limiting, hard reset is performed and then device is working.

I think it's wrong behavior when it's try to limit the bandwidth ony after many hotpluging and hard resets. It could try in one ata_eh_recover() call.

For my own, I changed a little code of ata_eh_reset() to check if the device is online after reset:

rc = ata_do_reset(link, reset, classes, deadline, true);
if( ata_link_offline(link) )
  rc = -EPIPE;

At the bottom of ata_eh_reset(), if rc == -EPIPE, then sata_down_spd_limit() is called and after that. This completely fixes my problem with not working drive. I don't have to manually reconnect device to be working. Only issue is some delay performed before next reset (schedule_timeout_uninterruptible function).

Maybe this conversation should be moved to Linux linux-ide mailing list, t

Michał Wadowski (wadosm) wrote :

Maybe this conversation should be moved to Linux linux-ide mailing list. There is some technical decisions to make and may it's better to consult it with other developers.

Michał Wadowski (wadosm) wrote :

There is patch to issue I described above

David Milburn (dmilburn) wrote :

Yes, it may also be good to test on different configurations, the original patch fixed hotplug related
problem where 6Gps drives connected back at lower speeds.

Michał Wadowski (wadosm) wrote :

I tuned a little a previous patch

Kai-Heng Feng (kaihengfeng) wrote :

Have you sent the patch upstream?

Michał Wadowski (wadosm) wrote :

Yes, I have sent the patch, but until today no one answered

Michał Wadowski (wadosm) wrote :

I'm not familiar with kernel mailing list so I don't know why my patch was ignored. Maybe format of email was not proper, or I send wrong recipients.

I also updated the patch, so I attach the latest version.

Kai-Heng Feng (kaihengfeng) wrote :

No I think you did everything right.

Brad Figg (brad-figg) on 2019-07-24
tags: added: ubuntu-certified
To post a comment you must log in.