hardy / ibex - raid5 - ata#: hard resetting link

Bug #263160 reported by q on 2008-08-31
50
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Fedora)
Fix Released
Critical
linux (Ubuntu)
Medium
Unassigned
Hardy
Medium
Bryan Wu
Intrepid
Medium
Unassigned

Bug Description

Running 7 disk raid 5 array with the following card:
SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09)

file system is XFS.

Trying to do a 'cp' from the system drive (IDE, XFS) to the raid would constantly lead to the process stalling (state: D+) and leading to a cold reset. I believe network transfers are also suffering from this.

Hardy wasn't reporting _any_ of these errors in dmesg or /var/log/messages. Upgraded to Ibex to try and help track down what was going on and got the following _when_ transferring to the raid.

dmesg:
[11285.918535] ata9.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[11285.918567] ata9.00: cmd 61/03:00:49:00:00/00:00:00:00:00/40 tag 0 ncq 1536 out
[11285.918568] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[11285.918619] ata9.00: status: { DRDY }
[11285.918635] ata9: hard resetting link
[11286.420039] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11286.460065] ata9.00: max_sectors limited to 256 for NCQ
[11286.520054] ata9.00: max_sectors limited to 256 for NCQ
[11286.520059] ata9.00: configured for UDMA/133
[11286.520077] ata9: EH complete
[11286.520119] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
[11286.520132] sd 8:0:0:0: [sdd] Write Protect is off
[11286.520134] sd 8:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[11286.520154] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11326.988529] ata8.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[11326.988554] ata8.00: cmd 61/03:00:49:00:00/00:00:00:00:00/40 tag 0 ncq 1536 out
[11326.988555] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[11326.988606] ata8.00: status: { DRDY }
[11326.988623] ata8: hard resetting link
[11327.500037] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11327.580053] ata8.00: max_sectors limited to 256 for NCQ
[11327.657199] ata8.00: max_sectors limited to 256 for NCQ
[11327.657202] ata8.00: configured for UDMA/133
[11327.657207] ata8: EH complete
[11327.657257] sd 7:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
[11327.657272] sd 7:0:0:0: [sdc] Write Protect is off
[11327.657273] sd 7:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[11327.657296] sd 7:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11377.938532] ata7.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[11377.938557] ata7.00: cmd 61/03:00:49:00:00/00:00:00:00:00/40 tag 0 ncq 1536 out
[11377.938558] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[11377.938608] ata7.00: status: { DRDY }
[11377.938624] ata7: hard resetting link
[11378.440037] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11378.520056] ata7.00: max_sectors limited to 256 for NCQ
[11378.600065] ata7.00: max_sectors limited to 256 for NCQ
[11378.600068] ata7.00: configured for UDMA/133
[11378.600073] ata7: EH complete
[11378.600120] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[11378.600133] sd 6:0:0:0: [sdb] Write Protect is off
[11378.600135] sd 6:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[11378.600155] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11711.718523] ata9.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[11711.718548] ata9.00: cmd 61/03:00:49:00:00/00:00:00:00:00/40 tag 0 ncq 1536 out
[11711.718549] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[11711.718600] ata9.00: status: { DRDY }
[11711.718616] ata9: hard resetting link
[11712.220041] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11712.260058] ata9.00: max_sectors limited to 256 for NCQ
[11712.320057] ata9.00: max_sectors limited to 256 for NCQ
[11712.320066] ata9.00: configured for UDMA/133
[11712.320072] ata9: EH complete
[11712.320112] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
[11712.320125] sd 8:0:0:0: [sdd] Write Protect is off
[11712.320127] sd 8:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[11712.320148] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11849.328524] ata7.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[11849.328549] ata7.00: cmd 61/03:00:49:00:00/00:00:00:00:00/40 tag 0 ncq 1536 out
[11849.328549] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[11849.328600] ata7.00: status: { DRDY }
[11849.328617] ata7: hard resetting link
[11849.830037] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11849.910070] ata7.00: max_sectors limited to 256 for NCQ
[11849.990053] ata7.00: max_sectors limited to 256 for NCQ
[11849.990057] ata7.00: configured for UDMA/133
[11849.990069] ata7: EH complete
[11849.990109] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[11849.990123] sd 6:0:0:0: [sdb] Write Protect is off
[11849.990125] sd 6:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[11849.990147] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11909.629773] ata9.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[11909.629797] ata9.00: cmd 61/03:00:49:00:00/00:00:00:00:00/40 tag 0 ncq 1536 out
[11909.629798] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[11909.629849] ata9.00: status: { DRDY }
[11909.629865] ata9: hard resetting link
[11910.131295] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11910.180068] ata9.00: max_sectors limited to 256 for NCQ
[11910.231316] ata9.00: max_sectors limited to 256 for NCQ
[11910.231319] ata9.00: configured for UDMA/133
[11910.231327] ata9: EH complete
[11910.231381] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
[11910.231394] sd 8:0:0:0: [sdd] Write Protect is off
[11910.231396] sd 8:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[11910.231417] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11996.729773] ata7.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[11996.729797] ata7.00: cmd 61/03:00:49:00:00/00:00:00:00:00/40 tag 0 ncq 1536 out
[11996.729798] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[11996.729848] ata7.00: status: { DRDY }
[11996.729865] ata7: hard resetting link
[11997.231291] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11997.311308] ata7.00: max_sectors limited to 256 for NCQ
[11997.391306] ata7.00: max_sectors limited to 256 for NCQ
[11997.391316] ata7.00: configured for UDMA/133
[11997.391322] ata7: EH complete
[11997.391366] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[11997.391378] sd 6:0:0:0: [sdb] Write Protect is off
[11997.391380] sd 6:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[11997.391400] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FU

/var/log/messages:
Aug 30 20:12:43 isis kernel: [11285.918635] ata9: hard resetting link
Aug 30 20:12:43 isis kernel: [11286.420039] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 30 20:12:43 isis kernel: [11286.460065] ata9.00: max_sectors limited to 256 for NCQ
Aug 30 20:12:43 isis kernel: [11286.520054] ata9.00: max_sectors limited to 256 for NCQ
Aug 30 20:12:43 isis kernel: [11286.520059] ata9.00: configured for UDMA/133
Aug 30 20:12:43 isis kernel: [11286.520077] ata9: EH complete
Aug 30 20:12:43 isis kernel: [11286.520119] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
Aug 30 20:12:43 isis kernel: [11286.520132] sd 8:0:0:0: [sdd] Write Protect is off
Aug 30 20:12:43 isis kernel: [11286.520154] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 30 20:13:24 isis kernel: [11326.988623] ata8: hard resetting link
Aug 30 20:13:24 isis kernel: [11327.500037] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 30 20:13:24 isis kernel: [11327.580053] ata8.00: max_sectors limited to 256 for NCQ
Aug 30 20:13:24 isis kernel: [11327.657199] ata8.00: max_sectors limited to 256 for NCQ
Aug 30 20:13:24 isis kernel: [11327.657202] ata8.00: configured for UDMA/133
Aug 30 20:13:24 isis kernel: [11327.657207] ata8: EH complete
Aug 30 20:13:24 isis kernel: [11327.657257] sd 7:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
Aug 30 20:13:24 isis kernel: [11327.657272] sd 7:0:0:0: [sdc] Write Protect is off
Aug 30 20:13:24 isis kernel: [11327.657296] sd 7:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 30 20:14:15 isis kernel: [11377.938624] ata7: hard resetting link
Aug 30 20:14:15 isis kernel: [11378.440037] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 30 20:14:15 isis kernel: [11378.520056] ata7.00: max_sectors limited to 256 for NCQ
Aug 30 20:14:15 isis kernel: [11378.600065] ata7.00: max_sectors limited to 256 for NCQ
Aug 30 20:14:15 isis kernel: [11378.600068] ata7.00: configured for UDMA/133
Aug 30 20:14:15 isis kernel: [11378.600073] ata7: EH complete
Aug 30 20:14:15 isis kernel: [11378.600120] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
Aug 30 20:14:15 isis kernel: [11378.600133] sd 6:0:0:0: [sdb] Write Protect is off
Aug 30 20:14:15 isis kernel: [11378.600155] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 30 20:19:48 isis kernel: [11711.718616] ata9: hard resetting link
Aug 30 20:19:49 isis kernel: [11712.220041] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 30 20:19:49 isis kernel: [11712.260058] ata9.00: max_sectors limited to 256 for NCQ
Aug 30 20:19:49 isis kernel: [11712.320057] ata9.00: max_sectors limited to 256 for NCQ
Aug 30 20:19:49 isis kernel: [11712.320066] ata9.00: configured for UDMA/133
Aug 30 20:19:49 isis kernel: [11712.320072] ata9: EH complete
Aug 30 20:19:49 isis kernel: [11712.320112] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
Aug 30 20:19:49 isis kernel: [11712.320125] sd 8:0:0:0: [sdd] Write Protect is off
Aug 30 20:19:49 isis kernel: [11712.320148] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 30 20:22:06 isis kernel: [11849.328617] ata7: hard resetting link
Aug 30 20:22:06 isis kernel: [11849.830037] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 30 20:22:06 isis kernel: [11849.910070] ata7.00: max_sectors limited to 256 for NCQ
Aug 30 20:22:07 isis kernel: [11849.990053] ata7.00: max_sectors limited to 256 for NCQ
Aug 30 20:22:07 isis kernel: [11849.990057] ata7.00: configured for UDMA/133
Aug 30 20:22:07 isis kernel: [11849.990069] ata7: EH complete
Aug 30 20:22:07 isis kernel: [11849.990109] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
Aug 30 20:22:07 isis kernel: [11849.990123] sd 6:0:0:0: [sdb] Write Protect is off
Aug 30 20:22:07 isis kernel: [11849.990147] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 30 20:23:06 isis kernel: [11909.629865] ata9: hard resetting link
Aug 30 20:23:07 isis kernel: [11910.131295] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 30 20:23:07 isis kernel: [11910.180068] ata9.00: max_sectors limited to 256 for NCQ
Aug 30 20:23:07 isis kernel: [11910.231316] ata9.00: max_sectors limited to 256 for NCQ
Aug 30 20:23:07 isis kernel: [11910.231319] ata9.00: configured for UDMA/133
Aug 30 20:23:07 isis kernel: [11910.231327] ata9: EH complete
Aug 30 20:23:07 isis kernel: [11910.231381] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
Aug 30 20:23:07 isis kernel: [11910.231394] sd 8:0:0:0: [sdd] Write Protect is off
Aug 30 20:23:07 isis kernel: [11910.231417] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 30 20:24:33 isis kernel: [11996.729865] ata7: hard resetting link
Aug 30 20:24:34 isis kernel: [11997.231291] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 30 20:24:34 isis kernel: [11997.311308] ata7.00: max_sectors limited to 256 for NCQ
Aug 30 20:24:34 isis kernel: [11997.391306] ata7.00: max_sectors limited to 256 for NCQ
Aug 30 20:24:34 isis kernel: [11997.391316] ata7.00: configured for UDMA/133
Aug 30 20:24:34 isis kernel: [11997.391322] ata7: EH complete
Aug 30 20:24:34 isis kernel: [11997.391366] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
Aug 30 20:24:34 isis kernel: [11997.391378] sd 6:0:0:0: [sdb] Write Protect is off
Aug 30 20:24:34 isis kernel: [11997.391400] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

I've replaced the card and cables and i'm still getting the issue.

This card&raid was working on a centos last week (2.6.18 32bit).
Replaced OS (ubuntu 64bit), cpu (core2duo), mobo (asus p5k pro)

other info:
1) ubuntu release:
Description: Ubuntu intrepid (development branch)
Release: 8.10
2) package versions:
linux-server:
  Installed: 2.6.27.2.2
mdadm:
  Installed: 2.6.7-3ubuntu4
  Candidate: 2.6.7-3ubuntu4

I'm really at a loss here, not sure what else to do. I stressed the other components of the system in windows and they seemed fine. not sure if its the card or something with the newer kernels.

also, these issues are not causing my raid to fail.
q@test:/storage$ sudo mdadm -D /dev/md1
/dev/md1:
        Version : 01.02
  Creation Time : Sat Jan 19 13:29:40 2008
     Raid Level : raid5
     Array Size : 2930302464 (2794.55 GiB 3000.63 GB)
  Used Dev Size : 976767488 (931.52 GiB 1000.21 GB)
   Raid Devices : 7
  Total Devices : 7
Preferred Minor : 1
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Sat Aug 30 20:49:05 2008
          State : active
 Active Devices : 7
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 128K

           Name : big500raid
           UUID : 51ba59f2:45e85c89:53a81444:b210e1c6
         Events : 46

    Number Major Minor RaidDevice State
       0 8 17 0 active sync /dev/sdb1
       1 8 33 1 active sync /dev/sdc1
       2 8 97 2 active sync /dev/sdg1
       3 8 49 3 active sync /dev/sdd1
       4 8 81 4 active sync /dev/sdf1
       5 8 65 5 active sync /dev/sde1
       7 8 113 6 active sync /dev/sdh1

q (qr7atgwu) on 2008-09-02
description: updated
Chris (billytwowilly) wrote :
Download full text (6.4 KiB)

I am having the exact same problem in 8.10 (kubuntu fresh install) with my six disk raid 5 array. I'm using software raid, so I'm not sure if that is exactly the same. What motherboard are you using? I'm using an asus p5q and all drives are plugged into the onboard sata controller, not the onboard xpress backup ports(I believe it's an intel sata controller).

Here's the relevant bit of my /var/log/messages:

Nov 5 12:49:12 serverv2 -- MARK --
Nov 5 13:09:12 serverv2 -- MARK --
Nov 5 13:29:12 serverv2 -- MARK --
Nov 5 13:49:12 serverv2 -- MARK --
Nov 5 14:09:12 serverv2 -- MARK --
Nov 5 14:29:12 serverv2 -- MARK --
Nov 5 14:49:12 serverv2 -- MARK --
Nov 5 15:09:12 serverv2 -- MARK --
Nov 5 15:29:12 serverv2 -- MARK --
Nov 5 15:49:12 serverv2 -- MARK --
Nov 5 15:50:55 serverv2 python: hp-systray(init)[6671]: warning: No hp: or hpfax: devices found in any installed CUPS queue. Exiting.
Nov 5 15:52:07 serverv2 kernel: [12192.326735] type=1503 audit(1225925527.559:4): operation="inode_permission" requested_mask="r::" denied_mask="r::" fsuid=7 name="/proc/6778/net/" pid=6778 profile="/usr/sbin/cupsd"
Nov 5 15:52:08 serverv2 kernel: [12193.222565] type=1503 audit(1225925528.454:5): operation="inode_permission" requested_mask="r::" denied_mask="r::" fsuid=7 name="/proc/6782/net/" pid=6782 profile="/usr/sbin/cupsd"
Nov 5 15:52:08 serverv2 kernel: [12193.222606] type=1503 audit(1225925528.454:6): operation="socket_create" family="ax25" sock_type="dgram" protocol=0 pid=6782 profile="/usr/sbin/cupsd"
Nov 5 15:52:08 serverv2 kernel: [12193.222614] type=1503 audit(1225925528.454:7): operation="socket_create" family="netrom" sock_type="seqpacket" protocol=0 pid=6782 profile="/usr/sbin/cupsd"
Nov 5 15:52:08 serverv2 kernel: [12193.222621] type=1503 audit(1225925528.454:8): operation="socket_create" family="rose" sock_type="dgram" protocol=0 pid=6782 profile="/usr/sbin/cupsd"
Nov 5 15:52:08 serverv2 kernel: [12193.222628] type=1503 audit(1225925528.454:9): operation="socket_create" family="ipx" sock_type="dgram" protocol=0 pid=6782 profile="/usr/sbin/cupsd"
Nov 5 15:52:08 serverv2 kernel: [12193.222634] type=1503 audit(1225925528.454:10): operation="socket_create" family="appletalk" sock_type="dgram" protocol=0 pid=6782 profile="/usr/sbin/cupsd"
Nov 5 15:52:08 serverv2 kernel: [12193.222641] type=1503 audit(1225925528.454:11): operation="socket_create" family="econet" sock_type="dgram" protocol=0 pid=6782 profile="/usr/sbin/cupsd"
Nov 5 15:52:08 serverv2 kernel: [12193.222648] type=1503 audit(1225925528.454:12): operation="socket_create" family="ash" sock_type="dgram" protocol=0 pid=6782 profile="/usr/sbin/cupsd"
Nov 5 15:52:08 serverv2 kernel: [12193.222654] type=1503 audit(1225925528.454:13): operation="socket_create" family="x25" sock_type="seqpacket" protocol=0 pid=6782 profile="/usr/sbin/cupsd"
Nov 5 16:05:21 serverv2 kernel: [12986.184075] ata3: hard resetting link
Nov 5 16:05:21 serverv2 kernel: [12986.184077] ata4: hard resetting link
Nov 5 16:05:21 serverv2 kernel: [12986.668023] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Nov 5 16:05:21 serverv2 kernel: [12986.668709] ata3: SATA link up 3.0 Gbps (SStatus 123...

Read more...

Chris (billytwowilly) wrote :

http://forums.seagate.com/stx/board/message?board.id=ata_drives&thread.id=2390&view=by_date_ascending&page=1

I'm probably getting the above problem, perhaps you are too? Are your drives seagate drives?

q (qr7atgwu) wrote :

2 of my drives are seagate, another is a western digital. They're 500GB drives.

i just did a clean install of 8.10 and its still happening. this wasn't an issue back in January when i ran RHEL5...

Richard Ayotte (rich-ayotte) wrote :

I have the same problem.

rich@cheetah:~$ sudo hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media
 Model Number: Hitachi HDS721010KLA330
 Serial Number: GTE005PAJXM1PL
 Firmware Revision: GKAOAB0A
Standards:
 Used: ATA/ATAPI-7 T13 1532D revision 1
 Supported: 7 6 5 4 & some of 8
Configuration:
 Logical max current
 cylinders 16383 16383
 heads 16 16
 sectors/track 63 63
 --
 CHS current addressable sectors: 16514064
 LBA user addressable sectors: 268435455
 LBA48 user addressable sectors: 1953525168
 device size with M = 1024*1024: 953869 MBytes
 device size with M = 1000*1000: 1000204 MBytes (1000 GB)
Capabilities:
 LBA, IORDY(can be disabled)
 Queue depth: 32
 Standby timer values: spec'd by Standard, no device specific minimum
 R/W multiple sector transfer: Max = 16 Current = 1
 Advanced power management level: disabled
 Recommended acoustic management value: 128, current value: 254
 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
      Cycle time: min=120ns recommended=120ns
 PIO: pio0 pio1 pio2 pio3 pio4
      Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
 Enabled Supported:
    * SMART feature set
      Security Mode feature set
    * Power Management feature set
    * Write cache
    * Look-ahead
    * Host Protected Area feature set
    * WRITE_BUFFER command
    * READ_BUFFER command
    * DOWNLOAD_MICROCODE
      Advanced Power Management feature set
      Power-Up In Standby feature set
      SET_FEATURES required to spinup after power up
      Address Offset Reserved Area Boot
      SET_MAX security extension
      Automatic Acoustic Management feature set
    * 48-bit Address feature set
    * Device Configuration Overlay feature set
    * Mandatory FLUSH_CACHE
    * FLUSH_CACHE_EXT
    * SMART error logging
    * SMART self-test
      Media Card Pass-Through
    * General Purpose Logging feature set
    * WRITE_{DMA|MULTIPLE}_FUA_EXT
    * 64-bit World wide name
    * URG for READ_STREAM[_DMA]_EXT
    * URG for WRITE_STREAM[_DMA]_EXT
    * WRITE_UNCORRECTABLE_EXT command
    * Segmented DOWNLOAD_MICROCODE
    * SATA-I signaling speed (1.5Gb/s)
    * SATA-II signaling speed (3.0Gb/s)
    * Native Command Queueing (NCQ)
    * Host-initiated interface power management
    * Phy event counters
    * unknown 76[12]
      Non-Zero buffer offsets in DMA Setup FIS
      DMA Setup Auto-Activate optimization
      Device-initiated interface power management
      In-order data delivery
    * Software settings preservation
    * SMART Command Transport (SCT) feature set
    * SCT Long Sector Access (AC1)
    * SCT LBA Segment Access (AC2)
    * SCT Error Recovery Control (AC3)
    * SCT Features Control (AC4)
    * SCT Data Tables (AC5)
Security:
 Master password revision code = 65534
  supported
 not enabled
 not locked
 not frozen
 not expired: security count
 not supported: enhanced erase
 340min for SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 5000cca216e930ed
 NAA : 5
 IEEE OUI : cca
 Unique ID : 216e930ed
Checksum: correct

q (qr7atgwu) wrote :
Download full text (30.2 KiB)

Pretty fed up with people saying this could be so many different issues. So much so that i finally decided to risk my data to prove it.... read the following.

***___This has got to be the card / chipset / sata_mv driver._____***

Short and simple version of my issues:
    - This does not depend on drive types
    - Appears to be caused by MV88SX6081 chipset
    - Could be a problem in SATA_MV driver
    - I need replacement controller suggestions

Details to all non believers (it’s not a power / hardware issue):
I moved 5 of the 7 drives to my onboard controller (have 6 sata ports on the mobo, last was used by the system drive).
Left 2 of the western digital drives on the MV88SX6081 8-port SATA II:
    - sdg
    - sdh

After the advice of some through email, I unplugged everything that wasn't needed. They assumed that it could have been power giving the number of drives I had in the machine. What was left on a tx750w corsair power supply:
    - mobo (c2d, 4gb ram)
    - 7 sata raid drives - spread across multiple power supply rails
    - 1 sata system drive
    - Super Micro SAT2-MV8 (MV88SX6081 8-port SATA II)
    - intel pcie 10/100/1000 network card

Then I replaced the sate cables 1 more time with old cables I knew worked. I also threw in the brand new controller card as well (have a few spares lying around).
I brought everything up and upgraded to:

Then I started to rebuild the raid. Everything went fine, no freezes.
**This was the first indication that this only happens under heavy load on multiple ports as has been brought up before.
So then I started copying data over. About 180GB's the card hard reset both of the drives attached to it and knocked them both out of the raid.
**This was also significantly different from before when I was utilizing all the ports as it seemed to work great for quite some time, it wasn't until I was well into the process that the card finally gave up.
See the attached dmesg and /var/log/messages. This is the 2nd time I’ve had this card degrade my raid and almost give me a heart attack.

The cards are going in the trash at this point. I'm open to suggestions as to possibly replacement. I don’t need a hardware raid card, just a decent controller with great *nix support and lots of ports.
::sigh:: I don’t know who to contact but this is the end of the line for me with this controller and hopefully my issues.

Attempting to get my data back as we speak with 2 failed drives in a raid 5... wonderful times.

dmsg of the event:
[ 1061.040118] md: recovery of RAID array md1
[ 1061.040120] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[ 1061.040122] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[ 1061.040126] md: using 128k window, over a total of 488383744 blocks.
[11208.852220] md: md1: recovery done.
[11209.020072] RAID5 conf printout:
[11209.020076] --- rd:7 wd:7
[11209.020079] disk 0, o:1, dev:sdd1
[11209.020080] disk 1, o:1, dev:sdb1
[11209.020081] disk 2, o:1, dev:sdh1
[11209.020082] disk 3, o:1, dev:sdc1
[11209.020083] disk 4, o:1, dev:sdf1
[11209.020084] disk 5, o:1, dev:sde1
[11209.020085] disk 6, o:1, dev:sdg1
[19844.431690] SGI XFS with AC...

q (qr7atgwu) wrote :

sorry, forgot to put in the kernel ver that i upgraded to:
Linux isis 2.6.27-9-server #1 SMP Thu Nov 20 22:56:07 UTC 2008 x86_64 GNU/Linux

Kytrix (kytrix) wrote :

I get it work on with my sata2 drive on nforce4 by disabling disk write cache and NCQ

look here for details:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/301893/comments/10

Dan Helfman (witten-torsion) wrote :

I just wanted to point out that this issue appears to have been fixed by a one-line change to the sata_mv kernel source by Mark Lord. Discussion is here towards the bottom of the bug report page:

  https://bugzilla.redhat.com/show_bug.cgi?id=462425

And the patch itself is here:

  https://bugzilla.redhat.com/attachment.cgi?id=329048

I haven't confirmed that the fix works yet. Reportedly with the fix in place, you can safely re-enable your disk write cache.

Changed in linux:
status: Unknown → In Progress
Pitabred (ubuntu-pitabred) wrote :

I just wanted to add that I'm also seeing this bug with 4 Hitachi drives in a RAID5 array on an ATI SB700/SB800 chipset (64bit Intrepid Mythbuntu, fully updated, generic kernel). So it's not chipset specific. I'm going to compile a kernel with the above mentioned, but I can cause the error at will with a large data copy, so it will be apparent whether the fix works or not. I can provide any logs anyone needs for debugging, and will be watching changes to this bug.

Pitabred (ubuntu-pitabred) wrote :

Just wanted to comment that after getting the kernel compiling, generic except for the patch mentioned by Dan Helfman above, the drives do not crash under a load that they previously would have. I'll continue testing, but I have high hopes for it.

TJ (tj) wrote :

The patch from Mark Lord is included in Jaunty as commit

c42fae333255b08b8d4bc03e5853023145208d45 sata_mv: fix 8-port timeouts on 508x/6081 chips

Other non-marvel chipsets may be affected by similar bugs in other drivers.

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: New → Fix Released
TJ (tj) wrote :

Stefan, is this a candidate for back-porting to Hardy/Intrepid?

Stefan Bader (smb) wrote :

Patch fixes a real bug, is isolated to only sata_mv. So thumbs up. It has to go through the paperwork, though. I will see this gets done.

Stefan Bader (smb) on 2009-03-17
Changed in linux (Ubuntu Hardy):
assignee: nobody → cooloney
importance: Undecided → Medium
status: New → Triaged
Changed in linux (Ubuntu Intrepid):
assignee: nobody → cooloney
importance: Undecided → Medium
status: New → Triaged
krot (ubuntu-communitare) wrote :

Same problem with an ATI SB700/SB800 sata controller running a RAID-5 on Ubuntu 8.10. Under heavy load the system stalls and I see these errors

[255670.268058] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[255670.268082] ata5.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0
[255670.268085] res 40/00:00:2f:7b:a8/00:00:ae:00:00/e0 Emask 0x4 (timeout)
[255670.268092] ata5.00: status: { DRDY }
[255670.268103] ata5: hard resetting link
[255670.752537] ata5: softreset failed (device not ready)
[255670.752551] ata5: failed due to HW bug, retry pmp=0
[255670.916053] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[255671.611176] ata5.00: configured for UDMA/133
[255671.611226] ata5: EH complete

Driver used seems to be libata, not sata_mv.

If needed I can provide more information.

Bryan Wu (cooloney) wrote :

It needs big change to Hardy kernel, so we won't fix this issue in Hardy kernel.

-Bryn

Changed in linux (Ubuntu Hardy):
status: Triaged → Won't Fix
Bryan Wu (cooloney) wrote :

Stefan already check in the patch into Intrepid kernel. Changed the status.

-Bryan

Changed in linux (Ubuntu Intrepid):
status: Triaged → Fix Committed

I am seeing a very similar issue with a Via VT6421 SATA Controller (non-RAID BIOS).
Jaunty with Kernel: 2.6.28-11-server on 32-bit i386 (Pentium 4)
I have two disks: WDC WD3200AAJS-00L7A0 configured in RAID 1 using 'md' software RAID.
Linear operations, like rebuilding the RAID Mirror work like a dream with no errors, but random access causes lots of errors like above (*both* drives give lots of errors). The easiest way to reproduce is simply to apt-get install a package, even for just a few megs of data, the disks go nuts.

This is brand new hardware, new hba, new disks, new SATA cards. I've tried two different PSUs and refuse to believe that this is a power issue when a brand new 280W PSU has only the Pentium 4 Motherboard and the two disks attached. Maye the Via driver has adopted the broken code from the Marvel driver and needs fixing too?

Sure enough - If I disable the write cache on the disks, the problem is gone. As it happens I want the write cache disabled anyway but this was very concerning when I first installed the box.

Example of Errors:
-------------------------
[316730.629755] ata4.00: exception Emask 0x12 SAct 0x0 SErr 0x1000500 action 0x6
[316730.629793] ata4.00: BMDMA stat 0x5
[316730.629820] ata4: SError: { UnrecovData Proto TrStaTrns }
[316730.629853] ata4.00: cmd c8/00:18:af:f1:51/00:00:00:00:00/e0 tag 0 dma 12288 in
[316730.629855] res 51/84:07:c0:f1:51/84:01:00:00:00/e0 Emask 0x12 (ATA bus error)
[316730.629948] ata4.00: status: { DRDY ERR }
[316730.629974] ata4.00: error: { ICRC ABRT }
[316730.630021] ata4: hard resetting link
[316730.980054] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[316731.040809] ata4.00: configured for UDMA/33
[316731.040825] ata4: EH complete
[316731.045995] sd 3:0:0:0: [sdb] 625142448 512-byte hardware sectors: (320 GB/298 GiB)
[316731.046472] sd 3:0:0:0: [sdb] Write Protect is off
[316731.046475] sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[316731.046649] sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[316762.000273] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[316762.000315] ata3.00: cmd c8/00:20:2f:f3:51/00:00:00:00:00/e0 tag 0 dma 16384 in
[316762.000317] res 40/00:00:56:f1:51/00:00:00:00:00/e0 Emask 0x4 (timeout)
[316762.000409] ata3.00: status: { DRDY }
[316762.000442] ata3: hard resetting link
[316762.350049] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[316762.390408] ata3.00: configured for UDMA/133
[316762.390422] ata3: EH complete
[316762.412082] sd 2:0:0:0: [sda] 625142448 512-byte hardware sectors: (320 GB/298 GiB)
[316762.412267] sd 2:0:0:0: [sda] Write Protect is off
[316762.412271] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
[316762.438540] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

draknet (n638jl66) wrote :

I still see this problem on Ubuntu 9.04 running 2.6.28-11-generic with an ATI SB700/SB800 sata controller running a RAID-5.

This is not fixed in Jaunty.

Andrew Davison (darkinnit) wrote :
Download full text (3.9 KiB)

I also still see this problem on Ubuntu Server 9.04 running 2.6.28-11. Is this the patched Intrepid kernel, or do I need to enable backports or do some other thing to resolve this issue?

I have a RAID 5 running with 3 disks across two controllers:
VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80)
Silicon Image, Inc. SiI 3512 [SATALink/SATARaid] Serial ATA Controller (rev 01)

/var/log/messages:
[ 590.274538] ata1: hard resetting link
[ 590.594161] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 590.819220] ata1.00: configured for UDMA/100
[ 590.819266] ata1: EH complete
[ 590.848089] sd 0:0:0:0: [sda] 1250263728 512-byte hardware sectors: (640 GB/596 GiB)
[ 590.854831] sd 0:0:0:0: [sda] Write Protect is off
[ 590.861069] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 607.360924] ata1: hard resetting link
[ 607.679134] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 607.695647] ata1.00: configured for UDMA/100
[ 607.695693] ata1: EH complete
[ 607.700588] sd 0:0:0:0: [sda] 1250263728 512-byte hardware sectors: (640 GB/596 GiB)
[ 607.700662] sd 0:0:0:0: [sda] Write Protect is off
[ 607.700764] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 637.277044] ata2: hard resetting link
[ 637.596839] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 637.829733] ata2.00: configured for UDMA/100
[ 637.829780] ata2: EH complete
[ 637.844617] sd 1:0:0:0: [sdb] 1250263728 512-byte hardware sectors: (640 GB/596 GiB)
[ 637.853532] sd 1:0:0:0: [sdb] Write Protect is off
[ 637.858736] ata2: hard resetting link
[ 638.176875] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 638.194087] ata2.00: configured for UDMA/100
[ 638.194139] ata2: EH complete
[ 638.198664] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 638.201640] sd 1:0:0:0: [sdb] 1250263728 512-byte hardware sectors: (640 GB/596 GiB)
[ 638.201737] sd 1:0:0:0: [sdb] Write Protect is off
[ 638.201967] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

*However*, disabling the write cache does not fix this for me:

/var/log/messages:
[ 1387.362851] sd 0:0:0:0: [sda] 1250263728 512-byte hardware sectors: (640 GB/596 GiB)
[ 1387.363165] sd 0:0:0:0: [sda] Write Protect is off
[ 1387.363330] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 1387.363638] sd 0:0:0:0: [sda] 1250263728 512-byte hardware sectors: (640 GB/596 GiB)
[ 1387.363748] sd 0:0:0:0: [sda] Write Protect is off
[ 1387.363897] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 1416.321442] ata2: hard resetting link
[ 1416.641237] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 1416.874078] ata2.00: configured for UDMA/100
[ 1416.874124] ata2: EH complete
[ 1416.914095] ata2.00: configured for UDMA/100
[ 1416.914149] ata2: EH complete
[ 1416.928420] sd 1:0:0:0: [sdb] 1250263728 512-byte hardware sectors: (640 GB/596 GiB)
[ 1416.938974] sd 1:0:0:0: [sdb] Write Protect is off
[ 1416.939283] sd 1:0:0:0: [sdb] Write cache: di...

Read more...

Shawn Ostapuk (flagg) wrote :
Download full text (4.4 KiB)

I believe I also have this problem with Jaunty (9.04) running 2.6.28-11-server.

I am using 6 Seagate 1.5TB Disks (that do NOT have the notorious freezing firmware) and Promise SATA controllers, under heavy load i get ATA resets and 2 drives drop out of my raid.

Drive Info:

 Model Number: ST31500341AS
 Serial Number: 9VS1F7AR
 Firmware Revision: CC1H
 Transport: Serial

Controllers:

00:08.0 Mass storage controller: Promise Technology, Inc. PDC40718 (SATA 300 TX4) (rev 02)
00:09.0 Mass storage controller: Promise Technology, Inc. PDC40718 (SATA 300 TX4) (rev 02)

Jun 22 21:57:52 ralph -- MARK --
Jun 22 22:17:52 ralph -- MARK --
Jun 22 22:37:52 ralph -- MARK --
Jun 22 22:39:30 ralph kernel: [98628.073762] ata6: hard resetting link
Jun 22 22:39:31 ralph kernel: [98628.430232] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun 22 22:39:31 ralph kernel: [98628.566502] ata6.00: configured for UDMA/133
Jun 22 22:39:31 ralph kernel: [98628.566545] ata6: EH complete
Jun 22 22:40:01 ralph kernel: [98659.059722] ata6: hard resetting link
Jun 22 22:40:02 ralph kernel: [98659.400116] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun 22 22:40:02 ralph kernel: [98659.547905] ata6.00: configured for UDMA/133
Jun 22 22:40:02 ralph kernel: [98659.547951] ata6: EH complete
Jun 22 22:40:32 ralph kernel: [98690.065025] ata6: hard resetting link
Jun 22 22:40:33 ralph kernel: [98690.410095] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun 22 22:40:33 ralph kernel: [98690.549672] ata6.00: configured for UDMA/133
Jun 22 22:40:33 ralph kernel: [98690.549705] ata6: EH complete
Jun 22 22:41:03 ralph kernel: [98721.000265] ata6: limiting SATA link speed to 1.5 Gbps
Jun 22 22:41:03 ralph kernel: [98721.067312] ata6: hard resetting link
Jun 22 22:41:04 ralph kernel: [98721.410108] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jun 22 22:41:04 ralph kernel: [98721.537792] ata6.00: configured for UDMA/133
Jun 22 22:41:04 ralph kernel: [98721.537832] ata6: EH complete
Jun 22 22:41:34 ralph kernel: [98752.071181] ata6: hard resetting link
Jun 22 22:41:35 ralph kernel: [98752.430117] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jun 22 22:41:35 ralph kernel: [98752.557827] ata6.00: configured for UDMA/133
Jun 22 22:41:35 ralph kernel: [98752.557867] ata6: EH complete
Jun 22 22:42:05 ralph kernel: [98783.078988] ata6: hard resetting link
Jun 22 22:42:06 ralph kernel: [98783.420202] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jun 22 22:42:06 ralph kernel: [98783.547890] ata6.00: configured for UDMA/133
Jun 22 22:42:06 ralph kernel: [98783.547925] ata6: EH complete
Jun 22 22:42:36 ralph kernel: [98814.071513] ata6: hard resetting link
Jun 22 22:42:37 ralph kernel: [98814.440122] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jun 22 22:42:37 ralph kernel: [98814.567858] ata6.00: configured for UDMA/133
Jun 22 22:42:37 ralph kernel: [98814.567913] sd 5:0:0:0: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Jun 22 22:42:37 ralph kernel: [98814.567956] sd 5:0:0:0: [sdf] Sense Key : Aborted Command [current] [descriptor]
Jun 22 22:42:37 ralph kernel: ...

Read more...

Andrew Davison (darkinnit) wrote :

I'm just posting this in case it is useful information to anyone with this problem. I had tried the latest kernel (2.6.30) which seems to include the patch that was linked above and I still had this problem.

I've now given up on my Silicon Image, Inc. SiI 3512 controller and bought a Promise SATA-300TX4 controller as I've never had a problem with a Promise card yet. Sure enough, after installing the Promise card I no longer have this problem.

Changed in linux (Fedora):
status: In Progress → Fix Released
Alex Valavanis (valavanisalex) wrote :

Intrepid Ibex reached end-of-life on 30 April 2010 so I am closing the report. The bug has been fixed in newer releases of Ubuntu.

Changed in linux (Ubuntu Intrepid):
assignee: Bryan Wu (cooloney) → nobody
status: Fix Committed → Invalid
Changed in linux (Fedora):
importance: Unknown → Critical
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.