Ubuntu
linux package

Linux 4.15 and onwards fails to initialize some hard drives

Bug #1783906 reported by danieru on 2018-07-26

This bug affects 3 people

	Status	Importance	Assigned to
linux (Ubuntu)	Confirmed	Medium	Unassigned
Bionic	Confirmed	Medium	Unassigned
Cosmic	Won't Fix	Medium	Unassigned

Bug Description

I have two hard drives, the main hard drive is a TOSHIBA DT01ACA200 the second backup hard drive is a Western Digital WD5003AZEX. I installed lubuntu 18.04.1 on the Toshiba HDD and it boots just fine, the issue is with the second hard drive, when installing the WD HDD wouldn't even come as an option to install, and after boot the WD HDD still wouldn't come up, this is the dmesg with the stock kernel (4.15) https://paste.ubuntu.com/p/kpxh94v2SK/

ata6 is the WD HDD that refuses to work. The messages:
[ 302.107650] ata6: SError: { CommWake 10B8B Dispar DevExch }
[ 302.107658] ata6: hard resetting link
[ 307.860291] ata6: link is slow to respond, please be patient (ready=0)
[ 312.120898] ata6: COMRESET failed (errno=-16)
[ 363.445120] INFO: task kworker/u8:5:201 blocked for more than 120 seconds.
[ 363.445131] Not tainted 4.15.0-29-generic #31-Ubuntu
[ 363.445135] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 363.445140] kworker/u8:5 D 0 201 2 0x80000000
[ 363.445155] Workqueue: events_unbound async_run_entry_fn
[ 363.445157] Call Trace:
[ 363.445171] __schedule+0x291/0x8a0
[ 363.445177] schedule+0x2c/0x80
[ 363.445182] ata_port_wait_eh+0x7c/0xf0
[ 363.445186] ? wait_woken+0x80/0x80
[ 363.445189] ata_port_probe+0x28/0x40
[ 363.445192] async_port_probe+0x2e/0x52
[ 363.445196] async_run_entry_fn+0x3c/0x150
[ 363.445199] process_one_work+0x1de/0x410
[ 363.445203] worker_thread+0x32/0x410
[ 363.445207] kthread+0x121/0x140
[ 363.445210] ? process_one_work+0x410/0x410
[ 363.445214] ? kthread_create_worker_on_cpu+0x70/0x70
[ 363.445218] ret_from_fork+0x22/0x40

Repeat constantly. Also when I try to turn off the computer, the computer seem to freeze, the lights of the keyboard and mouse turn off and the computer just stay on.

I tried Tiny Core 9.0 which has linux 4.14.10, and i didn't had this issue, i also installed linux 4.14 on this lubuntu 18.04 using Ukuu Kernel Update Utility. And with this kernel version, or any previous version the WD HDD does work again. Here's a dmesg of lubuntu 18.04 with linux 4.14 and the WD HDD finally coming up at the end: https://paste.ubuntu.com/p/Gd3cGFbjTJ/

Also tried with with linux 4.17 but the WD HDD would also refuse to work on this version. Here's another dmesg with this version: https://paste.ubuntu.com/p/PmNn96vZZv/

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-29-generic 4.15.0-29.31
ProcVersionSignature: Ubuntu 4.15.0-29.31-generic 4.15.18
Uname: Linux 4.15.0-29-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version k4.15.0-29-generic.
ApportVersion: 2.20.9-0ubuntu7.2
Architecture: amd64
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/controlC0: testtest 756 F.... pulseaudio
/dev/snd/controlC1: testtest 756 F.... pulseaudio
Card0.Amixer.info:
Card hw:0 'NVidia_1'/'HDA NVidia at 0xfe020000 irq 22'
   Mixer name : 'Realtek ALC1200'
   Components : 'HDA:10ec0888,10ec0000,00100101 HDA:10de0002,10de0101,00100000'
   Controls : 56
   Simple ctrls : 21
Card1.Amixer.info:
Card hw:1 'NVidia'/'HDA NVidia at 0xfcffc000 irq 16'
   Mixer name : 'Nvidia GPU 42 HDMI/DP'
   Components : 'HDA:10de0042,38422651,00100100'
   Controls : 21
   Simple ctrls : 3
CurrentDesktop: LXDE
Date: Thu Jul 26 17:10:58 2018
HibernationDevice: RESUME=UUID=17e70869-516d-4b63-b900-e92e3c4b73b6
InstallationDate: Installed on 2018-07-26 (0 days ago)
InstallationMedia: Lubuntu 18.04.1 LTS "Bionic Beaver" - Release amd64 (20180725)
MachineType: 113 1
ProcFB: 0 nouveaufb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-29-generic root=UUID=71cf0a32-7827-49be-a2c0-cd50a72c26a1 ro quiet splash vt.handoff=1
RelatedPackageVersions:
linux-restricted-modules-4.15.0-29-generic N/A
linux-backports-modules-4.15.0-29-generic N/A
linux-firmware 1.173.1
RfKill:
0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 09/30/2008
dmi.bios.vendor: Phoenix Technologies, LTD
dmi.bios.version: 6.00 PG
dmi.board.name: 113-M2-E113
dmi.board.vendor: EVGA
dmi.board.version: 1
dmi.chassis.asset.tag: Unknow
dmi.chassis.type: 3
dmi.chassis.vendor: EVGA
dmi.chassis.version: 113-M2-E113
dmi.modalias: dmi:bvnPhoenixTechnologies,LTD:bvr6.00PG:bd09/30/2008:svn113:pn1:pvr1:rvnEVGA:rn113-M2-E113:rvr1:cvnEVGA:ct3:cvr113-M2-E113:
dmi.product.name: 1
dmi.product.version: 1
dmi.sys.vendor: 113

Tags:

Revision history for this message

danieru (danigaritarojas) wrote on 2018-07-26:

lspci-vnvn.log Edit (32.0 KiB, text/plain)
AlsaDevices.txt Edit (929 bytes, text/plain; charset="utf-8")
AplayDevices.txt Edit (695 bytes, text/plain; charset="utf-8")
ArecordDevices.txt Edit (412 bytes, text/plain; charset="utf-8")
CRDA.txt Edit (289 bytes, text/plain; charset="utf-8")
Card0.Amixer.values.txt Edit (4.0 KiB, text/plain; charset="utf-8")
Card0.Codecs.codec.0.txt Edit (13.9 KiB, text/plain; charset="utf-8")
Card0.Codecs.codec.3.txt Edit (3.4 KiB, text/plain; charset="utf-8")
Card1.Amixer.values.txt Edit (356 bytes, text/plain; charset="utf-8")
Card1.Codecs.codec.0.txt Edit (3.7 KiB, text/plain; charset="utf-8")
CurrentDmesg.txt Edit (75.1 KiB, text/plain; charset="utf-8")
Dependencies.txt Edit (2.5 KiB, text/plain; charset="utf-8")
IwConfig.txt Edit (516 bytes, text/plain; charset="utf-8")
Lspci.txt Edit (16.2 KiB, text/plain; charset="utf-8")
Lsusb.txt Edit (486 bytes, text/plain; charset="utf-8")
PciMultimedia.txt Edit (1.3 KiB, text/plain; charset="utf-8")
ProcCpuinfo.txt Edit (3.9 KiB, text/plain; charset="utf-8")
ProcCpuinfoMinimal.txt Edit (993 bytes, text/plain; charset="utf-8")
ProcEnviron.txt Edit (115 bytes, text/plain; charset="utf-8")
ProcInterrupts.txt Edit (2.7 KiB, text/plain; charset="utf-8")
ProcModules.txt Edit (3.3 KiB, text/plain; charset="utf-8")
PulseList.txt Edit (24.6 KiB, text/plain; charset="utf-8")
UdevDb.txt Edit (163.9 KiB, text/plain; charset="utf-8")
WifiSyslog.txt Edit (113.1 KiB, text/plain; charset="utf-8")

Revision history for this message

Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote on 2018-07-27: Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status:	New → Confirmed

Revision history for this message

danieru (danigaritarojas) wrote on 2018-07-27:

I forgot to mention the Western Digital hard drive has always been slow to actually show up, and I’ve always gotten those "COMRESET failed, link is slow to respond" messages since I bought it.

And two details I forgot to mention about the hard drives:
1 The Toshiba hard drive that works is formatted with GPT.
2 The Western Digital hard drive that doesn't work with linux 4.15+ is formatted with MBR.
And just for the record, usb flash drives formatted with MBR work just fine on linux 4.15.

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2018-07-27:

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.18 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.18-rc6

tags:	added: kernel-da-key
Changed in linux (Ubuntu):
importance:	Undecided → Medium
status:	Confirmed → Incomplete

Revision history for this message

danieru (danigaritarojas) wrote on 2018-07-27:

I first experienced this issue while testing the second beta of ubuntu 18.04, as i explained in the original bug report this issue doesn't happen if i use linux 4.14

I've installed linux 4.18rc6 as explained in the wiki, but the WD HDD still won't come up.
Here's the dmesg with this kernel: https://paste.ubuntu.com/p/dz8rZZNmHP/
Aside from the WD HDD still not working with this kernel i noticed that unlike with linux 4.15 and 4.17, with this 4.18rc6 my computer wouldn't freeze while trying to turn it off.

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed

danieru (danigaritarojas) on 2018-07-27

tags:

added: kernel-bug-exists-upstream

Revision history for this message

danieru (danigaritarojas) wrote on 2018-07-27:

I went ahead and test this bug from linux 4.15rc1 and up to linux 4.15rc5.
What i found is that linux 4.15rc3 is the last RC version where this bug doesn't occur, and linux 4.15rc4 is the first RC version where this bug occur.

Here's the dmesg with linux 4.15rc3 and the WD HDD working: https://paste.ubuntu.com/p/6DX5TfzMkW/
And here's the dmesg with linux 4.15rc4 and the WD HDD failing: https://paste.ubuntu.com/p/vF7Zs8xgjT/

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2018-07-30:

Thanks for the testing. I'll review the commits between -rc3 and -rc4. If nothing sticks out, I'll start a kernel bisect a build a test kernel.

Changed in linux (Ubuntu Bionic):
status:	New → In Progress
Changed in linux (Ubuntu Cosmic):
status:	Confirmed → In Progress
Changed in linux (Ubuntu Bionic):
importance:	Undecided → Medium
assignee:	nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Cosmic):
assignee:	nobody → Joseph Salisbury (jsalisbury)

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2018-07-30:

Commit 2dc0b46b5ea3 in v4.15-rc4 looks like it could be related. I built a Bionic test kernel with this commit reverted. The test kernel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1783906

Can you test this kernel and see if it resolves this bug?

Thanks in advance!

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.

Revision history for this message

danieru (danigaritarojas) wrote on 2018-07-30:

Your test kernel with commit 2dc0b46b5ea3 does indeed fix the issue with the WD HDD. Here's the dmesg with your test kernel: https://paste.ubuntu.com/p/745NscYFJh/

As you can see at the top i was using: "[ 0.000000] Linux version 4.15.0-29-generic (root@kathleen) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #32~lp1783906Commit2dc0b46b5ea3Reverted SMP Mon Jul 30 14:47:35 (Ubuntu 4.15.0-29.32~lp1783906Commit2dc0b46b5ea3Reverted-generic 4.15.18)"

And at the end you can see the WD HDD working: "[ 57.091948] ata6.00: ATA-8: WDC WD5003AZEX-00K1GA0, 80.00A80, max UDMA/133"

This however did not fix the freeze when rebooting, that bug seems also to be introduced in linux 4.15 but i'll have to do more test about that one and report it as a separate bug. Any ideas on how to get information, dmesg, logs, after the computer freeze before rebooting would be helpful. As i have absolutely no idea what causes that freeze. Only thing i know is that it also doesn't happen on linux 4.14

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2018-08-10: [Regression] libata: sata_down_spd_limit should return if driver has not recorded sstatus speed

#10

Hi David,

A kernel bug report was opened against Ubuntu [0]. This bug is a
regression introduced in v4.15-rc4. The following commit was identified
as the cause of the regression:

2dc0b46b5ea3 ("libata: sata_down_spd_limit should return if
driver has not recorded sstatus speed")

I was hoping to get your feedback, since you are the patch author. Do
you think gathering any additional data will help diagnose this issue,
or would it be best to submit a revert request?

Thanks,

Joe

http://pad.lv/1783906

Revision history for this message

David Milburn (dmilburn) wrote on 2018-08-10:

#11

Hi Joe,

Can we put some debug in sata_down_spd_limit() and see some of the values
for spdlimit, sstatus, spd, mask, right before the change to not force the mask.
Also, can we track the exact path of calling sata_down_spd_limit().

The the intent of the patch was not to force the speed down before reading the
link speed from SStatus, it doesn't change mask. Thanks.

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2018-08-13:

#12

I built a test kernel with debug output as requested by upstream. Can you test this kernel and post your syslog or dmesg output?

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1783906

This kernel should exhibit the bug, but will write to syslog with output like:
***-> Value of spd_limit: N
***-> Value of sstatus: N
***-> Value of spd: N
***-> Value of mask: N

I did a dump_stack in the function, so you should see a new stack trace as well.

Revision history for this message

danieru (danigaritarojas) wrote on 2018-08-24:

#13

Here's dmesg with 4.15.0-30-generic #33~lp1783906DEBUG running for 10 mins:
https://paste.ubuntu.com/p/Qc9Kpb62ch/

Revision history for this message

David Milburn (dmilburn) wrote on 2018-08-24:

#14

Ok, thanks, please let me look through the output.

Revision history for this message

David Milburn (dmilburn) wrote on 2018-08-24:

#15

Hi Joe,

The original intent of the patch was not forcing a 6Gbs drive down to 1.5Gbs after hotplug.

Noting,

SSTATUS = 275 = 0x113 = 0001 0001 0011

That corresponds to ACTIVE PM STATE | GEN1 SPEED | DEVICE DETECTED

spd = 1 (corresponds to 1.5Gbps)

One question, the print for mask came after these 2 lines of code, right?

       /* unconditionally mask off the highest bit */
        bit = fls(mask) - 1;
        mask &= ~(1 << bit);

In your debug kernel, would please remove the following 2 lines of code (so the code falls thru)

        if (spd > 1)
                mask &= (1 << (spd - 1)) - 1;
        else <=====
                return -EINVAL; <===== Remove these 2 lines of code.

And finally, at the end __sata_set_spd_needed(), would you please print out these values?

spd
target
*scontrol

The original patch didn't force changing mask, but, it does "return -EINVAL", I think it
may fix the problem just letting it fall thru to the end of sata_down_spd_limit(), but it would
still help to see the original debug values and these new ones with possible fix. Thank you.

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2018-08-29:

#16

Thanks for the response, David! Correct the print for the mask came after those 2 lines of code. Here is the snippet:

/* unconditionally mask off the highest bit */
bit = fls(mask) - 1;
mask &= ~(1 << bit);

        /* Debug added for lp1783906: */
        dump_stack();
        pr_info("***-> Function calling sata_down_spd_limit: %pf", __builtin_return_address(0));
        printk(KERN_DEBUG "***-> Value of spd_limit: %u\n", spd_limit);
        printk(KERN_DEBUG "***-> Value of sstatus: %u\n", sstatus);
        printk(KERN_DEBUG "***-> Value of spd: %u\n", spd);
        printk(KERN_DEBUG "***-> Value of mask: %u\n", mask);

        /*
         * Mask off all speeds higher than or equal to the current one. At
         * this point, if current SPD is not available and we previously
         * recorded the link speed from SStatus, the driver has already
         * masked off the highest bit so mask should already be 1 or 0.
         * Otherwise, we should not force 1.5Gbps on a link where we have
         * not previously recorded speed from SStatus. Just return in this
         * case.
         */
        if (spd > 1)
                mask &= (1 << (spd - 1)) - 1;

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2018-08-29:

#17

I'll build another test kernel with you're suggestions and ask @danieru to test.

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2018-08-29:

#18

I built a second test kernel with additional debug output as requested by David. This kernel also has the two lines removed request by David.

Can you test this kernel and post your syslog or dmesg output?

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1783906

Revision history for this message

danieru (danigaritarojas) wrote on 2018-08-29:

#19

Here's dmesg with #33~lp1783906DEBUGv2: https://paste.ubuntu.com/p/jK6zjKjtqc/
I expected this to have the bug and only print additional debug info, but it seems this fix the bug. The second drive (WDC) came up and I could mount it's partitions and read info from them.

Just to make sure, I did a second run and everything seemed to still work fine, here's the dmesg of the second run: https://paste.ubuntu.com/p/Tj5gnTWbs3/

(didn't test write now that i think about it)

Revision history for this message

David Milburn (dmilburn) wrote on 2018-08-30:

#20

Hi,

I think the fix is removing the "return" and letting the code fall through in sata_down_spd_limit().
Please give me some time to review the latest log, and I will need to reconfigure a couple of local
systems and re-test with that change. Thanks.

Revision history for this message

David Milburn (dmilburn) wrote on 2018-09-06:

#21

0001-libata-sata_down_spd_limit-should-record-link-speed-.patch Edit (1.6 KiB, text/plain)

Hi,

May I ask for one more test? Looking at the code some more, I don't think I can just remove
the return. The root of the problem is hard reset fails and sata_link_hardreset() is never able
to reconfigure the speed. This patch sets link->sata_spd_limit before returning, I have been
testing linux-4.19-rc1 successfully with a 6Gb drive on AHCI platform.

Would you mind testing this patch with no debug?

If all goes well, I will submit upstream. Thanks.

Ubuntu Foundations Team Bug Bot (crichton) on 2018-09-07

tags:

added: patch

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2018-09-11:

#22

I built a test kernel with the patch from David posted in comment #21. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1783906

Can you test this kernel and see if it resolves this bug?

Thanks in advance!

Joseph Salisbury (jsalisbury) on 2019-01-23

Changed in linux (Ubuntu Cosmic):
status:	In Progress → Confirmed
Changed in linux (Ubuntu Bionic):
status:	In Progress → Confirmed
Changed in linux (Ubuntu):
status:	In Progress → Confirmed
Changed in linux (Ubuntu Cosmic):
assignee:	Joseph Salisbury (jsalisbury) → nobody
Changed in linux (Ubuntu Bionic):
assignee:	Joseph Salisbury (jsalisbury) → nobody
Changed in linux (Ubuntu):
assignee:	Joseph Salisbury (jsalisbury) → nobody

Revision history for this message

Michał Wadowski (wadosm) wrote on 2019-05-07:

#23

Hi!

I have similar issue as described in topic. I have two drives, one primary SSD and second HDD inside DVD enclosure.

Both drives works fine until kernel version 4.15.0-rc4 (exact commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a83cb7e6ada140f8b932c223511f1063c4150870)

Removing added "return -EINVAL;" in sata_down_spd_limit function fixes the problem.

Revision history for this message

Michał Wadowski (wadosm) wrote on 2019-05-07:

#24

Unfortunately patch "0001-libata-sata_down_spd_limit-should-record-link-speed-.patch" don't work.

Here is my dmesg dump with proposed debug info https://paste.ubuntu.com/p/vxTM3jdMv9/

I'm not familiar with libata and sata_down_spd_limit function, but "return -EINVAL;" spoils functionality for me. I tried also "return 0;" instead of "return -EINVAL", but it didn't work either.

Revision history for this message

David Milburn (dmilburn) wrote on 2019-05-10:

#25

Ok thanks, please look at this some more, I will try to get back with you soon.

Revision history for this message

Michał Wadowski (wadosm) wrote on 2019-05-14:

#26

I forgot to say that even on Linux 4.14 I have to execute manually script while booting:

echo "- - -" > /sys/class/scsi_host/host1/scan

So, only after this reset sata my laptop can see external HDD. Since Linux 4.15 resetting has no effect.

I am analyzing how libata setup is working, so I will try to fix this issue later.

Revision history for this message

Michał Wadowski (wadosm) wrote on 2019-05-14:

#27

I can split my issue into two cases:

1. My HDD inside DVD enclosure is not discovered. Then I have to reset stata device 3 or 4 times by "echo "- - -" > /sys/class/scsi_host/host1/scan" command

2. After stata resets, function sata_down_spd_limit() is called.

* In Linux 4.14 this function set up link->sata_spd_limit from value 4294967295 to value 1. After this set up, HDD is discovered and works fine.
* Since patch 4.15.0-rc4 value of link->sata_spd_limit is not changed, so HDD is not working at all.

Some debug information:
Inside function sata_down_spd_limit() is called ata_sstatus_online() and it returns 0 that means link is offline.
Value of "link->sata_spd" and "spd" is 0, value of "link->sata_spd_limit" is 4294967295 and value of "bit" is 31. After statement "mask &= ~(1 << bit);" mask is set to 2147483647

Because "spd" is 0, then statement "if (spd > 1)" is not pass and "else return -EINVAL;" is called.

Patch "else link->sata_spd_limit = mask;" doesn't work, because mask is 2147483647 and it seems to not be valid value (I think it should be 1, 2, or 4).

Revision history for this message

Michał Wadowski (wadosm) wrote on 2019-05-15:

#28

0001-libata-sata_down_spd_limit-should-record-link-speed-.patch Edit (2.1 KiB, text/plain)

I think that I fixed second part of my issue.

I modified patch 0001-libata-sata_down_spd_limit-should-record-link-speed-.patch

Revision history for this message

David Milburn (dmilburn) wrote on 2019-05-15:

#29

I not sure upstream will let us change the value of mask.

Will you get the same result of you set sata_spd_limit?

link->sata_spd_limit = 0x7

I believe that corresponds to 6Gbps but I don't see a #define
we can use.

Also, is the comment accurate? Before you started debugging
did you see a hard reset fail? Thanks.

Revision history for this message

Michał Wadowski (wadosm) wrote on 2019-05-15:

#30

No, my device doesn't work with 6Gbps. In kernel 4.14 if my drive was offline, function sata_down_spd_limit() cut down link->sata_spd_limit to 1, so it corresponds to 1.5Gbps, and was working. But it wasn't even try to set limit to 3Gbps.

In my device, unfortunately mask and link->sata_spd_limit is set to INT_MAX, so next bitwise operations doesn't make any sense, because mask is far out of range valid values.

Setting link->sata_spd_limit = 0x7 doesn't work (my device is not 6Gbps capable). In patch I cut mask to be not greater than 0x7, not just 0x7. Actually after masking, kernel tries to connect with 3Gbps (mask 0x3) and it succeed.

About comment and hard reset. I get your patch and modified it leaving commit message and comment in code untouched. I don't investigate this topic enough. I have to check how working __sata_set_spd_needed() and sata_link_hardreset() functions.

Revision history for this message

Michał Wadowski (wadosm) wrote on 2019-05-16:

#31

Ok, I find out what happens.

While standard SATA setup, chain of functions is called: ... -> ata_eh_recover -> ata_eh_reset. If SATA is not initialized, then hard reset is performed (function ata_do_reset() ).

In both drivers I have, function ata_do_reset() returns 0, even if after this reset device is not working (like my second drive HDD).

There is no sata_down_spd_limit() calls at all. After hard reset second device is not working, and there is no tries to recover that.

At first time, or every time when I physically hotplug device, or I call "echo "- - -" > /sys/class/scsi_host/host1/scan", then chain of functions are called: ata_eh_recover -> ata_eh_schedule_probe, and variable trials is increasing. After a few hotplugs, when trials > ATA_EH_PROBE_TRIALS, then sata_down_spd_limit(link, 1) is called and it cuts down SATA bandwidth. After bandwidth limiting, hard reset is performed and then device is working.

I think it's wrong behavior when it's try to limit the bandwidth ony after many hotpluging and hard resets. It could try in one ata_eh_recover() call.

For my own, I changed a little code of ata_eh_reset() to check if the device is online after reset:

rc = ata_do_reset(link, reset, classes, deadline, true);
if( ata_link_offline(link) )
rc = -EPIPE;

At the bottom of ata_eh_reset(), if rc == -EPIPE, then sata_down_spd_limit() is called and after that. This completely fixes my problem with not working drive. I don't have to manually reconnect device to be working. Only issue is some delay performed before next reset (schedule_timeout_uninterruptible function).

Maybe this conversation should be moved to Linux linux-ide mailing list, t

Revision history for this message

Michał Wadowski (wadosm) wrote on 2019-05-16:

#32

Maybe this conversation should be moved to Linux linux-ide mailing list. There is some technical decisions to make and may it's better to consult it with other developers.

Revision history for this message

Michał Wadowski (wadosm) wrote on 2019-05-16:

#33

0001-libata-Fix-for-initialize-drives-not-capable-to-hand.patch Edit (2.4 KiB, text/plain)

There is patch to issue I described above

Revision history for this message

David Milburn (dmilburn) wrote on 2019-05-16:

#34

Yes, it may also be good to test on different configurations, the original patch fixed hotplug related
problem where 6Gps drives connected back at lower speeds.

Revision history for this message

Michał Wadowski (wadosm) wrote on 2019-05-17:

#35

0001-Fix-for-initialize-drives-not-capable-to-handle-maxi.patch Edit (2.8 KiB, text/plain)

I tuned a little a previous patch

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2019-06-24:

#36

Have you sent the patch upstream?

Revision history for this message

Michał Wadowski (wadosm) wrote on 2019-06-24:

#37

Yes, I have sent the patch, but until today no one answered

Revision history for this message

Michał Wadowski (wadosm) wrote on 2019-07-05:

#38

0001-Fix-for-initialize-drives-not-capable-to-handle-maxi.patch Edit (2.8 KiB, text/plain)

I'm not familiar with kernel mailing list so I don't know why my patch was ignored. Maybe format of email was not proper, or I send wrong recipients.

I also updated the patch, so I attach the latest version.

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2019-07-09:

#39

No I think you did everything right.

Brad Figg (brad-figg) on 2019-07-24

tags:

added: ubuntu-certified

Revision history for this message

Kristijan Žic  (kristijan-zic) wrote on 2020-06-13:

#40

I experience this issue on Ubuntu 20.04

$neofetch
OS: Ubuntu 20.04 LTS x86_64
Kernel: 5.4.0-37-generic
Uptime: 2 mins
Packages: 2210 (dpkg), 36 (flatpak), 32 (snap)
Shell: bash 5.0.16
Resolution: 1920x1200, 1680x1050
DE: GNOME
WM: Mutter
WM Theme: Adwaita
Theme: Adwaita-dark [GTK2/3]
Icons: Yaru [GTK2/3]
Terminal: gnome-terminal
CPU: AMD Ryzen Threadripper 1900X (16) @ 3.800GHz
GPU: AMD ATI Radeon RX Vega 56/64
Memory: 2642MiB / 15930MiB

I have 2 Samsung NVME SSD-s
1 ssd is Ubuntu 20.04, 2 is Windows 10
Ubuntu 20.04 is on EXT4 (just to clarify that I didn't use ZFS on this machine)

Revision history for this message

Brian Murray (brian-murray) wrote on 2024-07-26:

#41

Ubuntu 18.10 (Cosmic Cuttlefish) has reached end of life, so this bug will not be fixed for that specific release.

Changed in linux (Ubuntu Cosmic):
status:	Confirmed → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Patches

Add patch

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntulinux package

Linux 4.15 and onwards fails to initialize some hard drives

Bug Description

Other bug subscribers

Patches

Bug attachments

Remote bug watches

Ubuntu
linux package