sata hard drive connection fails with link is slow to respond, please be patient (ready=0) ; SRST failed (errno=-16) ; hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK

Bug #409639 reported by LimCore
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

When using new 1 TB sata hard drive on my nv based mainboard, dual core, 6 GB ram, amd64 ubuntu 9.04,
I get 2 types of errors:

1) after some times I get device-lost errors in dmesg and device stops working till reboot

2) I always see one more device named /dev/sdd (sdc is the new hard drive) that shows up as 2 TB device

Let me go into some details:

1) device gets losts, that is shows in dmesg errors like:

[ 5764.955034] sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 5764.955039] end_request: I/O error, dev sdc, sector 413808352
(...)
[ 5872.227484] Buffer I/O error on device sdc, logical block 52804512
[ 5872.227486] lost page write due to I/O error on sdc

and all the time, then the device is dead - all I/O reports errors, hdparm -i can not see it and so on

sometimes this is preceded with:

[ 4841.804585] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 4841.804602] ata6.00: cmd 35/00:00:30:65:7a/00:02:14:00:00/e0 tag 0 dma 262144 out
[ 4841.804604] res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 4841.804610] ata6.00: status: { DRDY }
[ 4841.804646] ata6: soft resetting link
[ 4847.016549] ata6: link is slow to respond, please be patient (ready=0)
[ 4851.852058] ata6: SRST failed (errno=-16)
[ 4851.852074] ata6: soft resetting link

This happens after 2-10 hours of intensive hard drive use (shred, or dd if=/dev/zero of=device and so on)
Hapend 3 times so far (at least once with the above ata6.00 exception error)

It seem to occur when computer was overheating (sensors of I think cpu and mobo, showing 80 C).
Nothing worked to fix it:
  - I put big fans next to computer, temp goes to just 60-65 instead 80
  - I used other sata cable
  - I plug it into other on board sata controller (3rd instead 2nd)

Also
  - badblocks report no problems
  - sata log and sata --test long doesnt show any problems
  - disc stays at max 35 C temperature

Disc gets speeds up to 150 MiB/s

So I guess it could be a hardware issue (faulty mobo?) - but perhaps it is a driver / kernel problem?
I found similar reports of link is slow to respond, please be patient related to installing Ubuntu - but there it seemed to happen always not just sometimes, although it was also NV based mobo chip afair.

2)
The second problem is the extra unknown device.

Below technical details follow.

AFTER THE FAILURE (after I/O errors and DID_BAD_TARGET etc stuff):
# hdparm -I /dev/sdc
/dev/sdc:
 HDIO_DRIVE_CMD(identify) failed: Input/output error

# hdparm -i /dev/sdc
/dev/sdc:
 HDIO_GET_IDENTITY failed: No message of desired type

# hdparm -a /dev/sdc
/dev/sdc:
 readahead = 256 (on)

# hdparm /dev/sdc
/dev/sdc:
 IO_support = 0 (default)
 readonly = 0 (off)
 readahead = 256 (on)
 geometry = 56065/255/63, sectors = 1953525168, start = 0

And the extra ghost device (2 TB) that is appearing for unknown reason:

root@lcwood:~# hdparm -i /dev/sdd

/dev/sdd:
 HDIO_GET_IDENTITY failed: Invalid argument
root@lcwood:~# hdparm -I /dev/sdd

/dev/sdd:
 HDIO_DRIVE_CMD(identify) failed: Input/output error
root@lcwood:~# hdparm /dev/sdd

/dev/sdd:
 readonly = 0 (off)
 readahead = 256 (on)
 geometry = 65535/255/63, sectors = 4294967296, start = 0
root@lcwood:~# fdisk /dev/sdd
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0xc3abd5d3.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

The number of cylinders for this disk is set to 267349.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): p

Disk /dev/sdd: 2199.0 GB, 2199023255552 bytes
255 heads, 63 sectors/track, 267349 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0xc3abd5d3

   Device Boot Start End Blocks Id System

Command (m for help): q

Yes, I do NOT have any 2 TB devices here, this one is some ghost O_o.

# uname -a
Linux lcwood 2.6.28-13-generic #45-Ubuntu SMP Tue Jun 30 22:12:12 UTC 2009 x86_64 GNU/Linux

Please write what more tests I could do.

Revision history for this message
LimCore (limcore) wrote :
Download full text (11.1 KiB)

More info:

I did run badblok -w and so on after the errors, all is fine.
There are tons of this I/O errors - always after the device fails it totally stops working, and after reboot it works fully.
If the device do not die in between, I can write/read entire hard-drive till last sector without any problems.

Therefore it is NOT a device-media / surface problem, must be SATA link/driver/etc problem.

more examples of reports from logs:

Aug 5 15:05:18 lcwood kernel: [75189.807205] ata3: hard resetting link
Aug 5 15:05:28 lcwood kernel: [75199.816547] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 5 15:05:33 lcwood kernel: [75204.816528] ata3.00: qc timeout (cmd 0xec)
Aug 5 15:05:33 lcwood kernel: [75204.816547] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Aug 5 15:05:33 lcwood kernel: [75204.816555] ata3: hard resetting link
Aug 5 15:05:43 lcwood kernel: [75214.820531] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 5 15:05:53 lcwood kernel: [75224.820525] ata3.00: qc timeout (cmd 0xec)
Aug 5 15:05:53 lcwood kernel: [75224.820539] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Aug 5 15:05:53 lcwood kernel: [75224.820547] ata3: limiting SATA link speed to 1.5 Gbps
Aug 5 15:05:53 lcwood kernel: [75224.820551] ata3: hard resetting link
Aug 5 15:06:03 lcwood kernel: [75234.828041] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 5 15:06:33 lcwood kernel: [75264.828042] ata3.00: qc timeout (cmd 0xec)
Aug 5 15:06:33 lcwood kernel: [75264.828062] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Aug 5 15:06:33 lcwood kernel: [75264.828074] ata3.00: disabled
Aug 5 15:06:33 lcwood kernel: [75264.828215] ata3: hard resetting link
Aug 5 15:06:43 lcwood kernel: [75274.872552] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 5 15:06:43 lcwood kernel: [75274.872617] ata3: EH complete
Aug 5 15:06:43 lcwood kernel: [75274.872683] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Aug 5 15:06:43 lcwood kernel: [75274.872704] lost page write due to I/O error on sdc
Aug 5 15:06:43 lcwood kernel: [75274.872722] lost page write due to I/O error on sdc
Aug 5 15:06:43 lcwood kernel: [75274.872732] lost page write due to I/O error on sdc
Aug 5 15:06:43 lcwood kernel: [75274.872741] lost page write due to I/O error on sdc
Aug 5 15:06:43 lcwood kernel: [75274.872750] lost page write due to I/O error on sdc
Aug 5 15:06:43 lcwood kernel: [75274.872759] lost page write due to I/O error on sdc
Aug 5 15:06:43 lcwood kernel: [75274.872768] lost page write due to I/O error on sdc
Aug 5 15:06:43 lcwood kernel: [75274.872777] lost page write due to I/O error on sdc
Aug 5 15:06:43 lcwood kernel: [75274.872787] lost page write due to I/O error on sdc
Aug 5 15:06:43 lcwood kernel: [75274.872796] lost page write due to I/O error on sdc
Aug 5 15:06:43 lcwood kernel: [75274.873139] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Aug 5 15:06:43 lcwood kernel: [75274.873511] sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Aug 5 15:06:43 lcwood kernel: [75274.873828] sd 2:0:0:0: [sdc] Result:...

Revision history for this message
Matt Darcy (matt-darcy) wrote :

If possible try it on another motherboard - one that is not NV based.

If it only happens when it's overheating, take out some components and just run the machine with as little as possible. If the system is overheating the board maybe having failures rather than the disk.

Make sure you're not overclocking or anything like that.

Revision history for this message
kernel-janitor (kernel-janitor) wrote :

[This is an automated message. Apologies if it has reached you inappropriately.]

This bug was reported against the linux-meta package when it likely should have been reported against the linux package instead. We are automatically transitioning this to the linux kernel package so that the appropriate teams are notified and made aware of this issue.

If this bug really is a bug in the linux-meta package you can move it back to linux-meta and set the Status to Confirmed, or contact us on the #ubuntu-kernel channel on the FreeNode IRC server. Thanks.

affects: linux-meta (Ubuntu) → linux (Ubuntu)
Revision history for this message
LimCore (limcore) wrote :

I tried on another mainboard, other cable etc, same problem (almost identical log messages).

While googline I found that such problems where some kernel bug or something, but around 2007 year.

Revision history for this message
Marko Lerota (mlerota) wrote :

I have the same problem. I have recently bought new comp. And this is happening. Here is the log:

Linux cosmos 2.6.28-15-generic #52-Ubuntu SMP Wed Sep 9 10:49:34 UTC 2009 i686 GNU/Linux
Motherboard Asus P7P55D, Intel(R) Core(TM) i5 CPU

Oct 20 20:54:49 cosmos -- MARK --
Oct 20 20:59:35 cosmos kernel: [ 1499.935680] ata4: hard resetting link
Oct 20 20:59:35 cosmos kernel: [ 1500.408047] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Oct 20 20:59:35 cosmos kernel: [ 1500.432223] ata4.00: configured for UDMA/100
Oct 20 20:59:35 cosmos kernel: [ 1500.432749] ata4: EH complete
Oct 20 20:59:37 cosmos kernel: [ 1501.927593] ata4: hard resetting link
Oct 20 20:59:37 cosmos kernel: [ 1502.400047] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Oct 20 20:59:37 cosmos kernel: [ 1502.434818] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x2)
Oct 20 20:59:42 cosmos kernel: [ 1507.400019] ata4: hard resetting link
Oct 20 20:59:42 cosmos kernel: [ 1507.876049] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Oct 20 20:59:47 cosmos kernel: [ 1512.876022] ata4.00: qc timeout (cmd 0xa1)
Oct 20 20:59:47 cosmos kernel: [ 1512.876027] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Oct 20 20:59:47 cosmos kernel: [ 1512.876034] ata4: limiting SATA link speed to 1.5 Gbps
Oct 20 20:59:47 cosmos kernel: [ 1512.876039] ata4: hard resetting link
Oct 20 20:59:53 cosmos kernel: [ 1518.396014] ata4: link is slow to respond, please be patient (ready=0)
Oct 20 20:59:58 cosmos kernel: [ 1522.942643] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Oct 20 21:00:08 cosmos kernel: [ 1532.984131] ata4.00: qc timeout (cmd 0xa1)
Oct 20 21:00:08 cosmos kernel: [ 1532.984136] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Oct 20 21:00:08 cosmos kernel: [ 1532.984141] ata4.00: disabled
Oct 20 21:00:08 cosmos kernel: [ 1532.984156] ata4: hard resetting link
Oct 20 21:00:08 cosmos kernel: [ 1533.460047] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Oct 20 21:00:08 cosmos kernel: [ 1533.460059] ata4: EH complete

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi LimCore,

This bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? Can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/ .

If it remains an issue, could you run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 409639

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kernel-workflow
tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu release http://www.ubuntu.com/getubuntu/download . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.