hardy / ibex - raid5 - ata#: hard resetting link
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Fedora) |
Fix Released
|
Critical
|
|||
linux (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Hardy |
Won't Fix
|
Medium
|
Bryan Wu | ||
Intrepid |
Invalid
|
Medium
|
Unassigned |
Bug Description
Running 7 disk raid 5 array with the following card:
SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
file system is XFS.
Trying to do a 'cp' from the system drive (IDE, XFS) to the raid would constantly lead to the process stalling (state: D+) and leading to a cold reset. I believe network transfers are also suffering from this.
Hardy wasn't reporting _any_ of these errors in dmesg or /var/log/messages. Upgraded to Ibex to try and help track down what was going on and got the following _when_ transferring to the raid.
dmesg:
[11285.918535] ata9.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[11285.918567] ata9.00: cmd 61/03:00:
[11285.918568] res 40/00:00:
[11285.918619] ata9.00: status: { DRDY }
[11285.918635] ata9: hard resetting link
[11286.420039] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11286.460065] ata9.00: max_sectors limited to 256 for NCQ
[11286.520054] ata9.00: max_sectors limited to 256 for NCQ
[11286.520059] ata9.00: configured for UDMA/133
[11286.520077] ata9: EH complete
[11286.520119] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
[11286.520132] sd 8:0:0:0: [sdd] Write Protect is off
[11286.520134] sd 8:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[11286.520154] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11326.988529] ata8.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[11326.988554] ata8.00: cmd 61/03:00:
[11326.988555] res 40/00:00:
[11326.988606] ata8.00: status: { DRDY }
[11326.988623] ata8: hard resetting link
[11327.500037] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11327.580053] ata8.00: max_sectors limited to 256 for NCQ
[11327.657199] ata8.00: max_sectors limited to 256 for NCQ
[11327.657202] ata8.00: configured for UDMA/133
[11327.657207] ata8: EH complete
[11327.657257] sd 7:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
[11327.657272] sd 7:0:0:0: [sdc] Write Protect is off
[11327.657273] sd 7:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[11327.657296] sd 7:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11377.938532] ata7.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[11377.938557] ata7.00: cmd 61/03:00:
[11377.938558] res 40/00:00:
[11377.938608] ata7.00: status: { DRDY }
[11377.938624] ata7: hard resetting link
[11378.440037] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11378.520056] ata7.00: max_sectors limited to 256 for NCQ
[11378.600065] ata7.00: max_sectors limited to 256 for NCQ
[11378.600068] ata7.00: configured for UDMA/133
[11378.600073] ata7: EH complete
[11378.600120] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[11378.600133] sd 6:0:0:0: [sdb] Write Protect is off
[11378.600135] sd 6:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[11378.600155] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11711.718523] ata9.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[11711.718548] ata9.00: cmd 61/03:00:
[11711.718549] res 40/00:00:
[11711.718600] ata9.00: status: { DRDY }
[11711.718616] ata9: hard resetting link
[11712.220041] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11712.260058] ata9.00: max_sectors limited to 256 for NCQ
[11712.320057] ata9.00: max_sectors limited to 256 for NCQ
[11712.320066] ata9.00: configured for UDMA/133
[11712.320072] ata9: EH complete
[11712.320112] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
[11712.320125] sd 8:0:0:0: [sdd] Write Protect is off
[11712.320127] sd 8:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[11712.320148] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11849.328524] ata7.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[11849.328549] ata7.00: cmd 61/03:00:
[11849.328549] res 40/00:00:
[11849.328600] ata7.00: status: { DRDY }
[11849.328617] ata7: hard resetting link
[11849.830037] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11849.910070] ata7.00: max_sectors limited to 256 for NCQ
[11849.990053] ata7.00: max_sectors limited to 256 for NCQ
[11849.990057] ata7.00: configured for UDMA/133
[11849.990069] ata7: EH complete
[11849.990109] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[11849.990123] sd 6:0:0:0: [sdb] Write Protect is off
[11849.990125] sd 6:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[11849.990147] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11909.629773] ata9.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[11909.629797] ata9.00: cmd 61/03:00:
[11909.629798] res 40/00:00:
[11909.629849] ata9.00: status: { DRDY }
[11909.629865] ata9: hard resetting link
[11910.131295] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11910.180068] ata9.00: max_sectors limited to 256 for NCQ
[11910.231316] ata9.00: max_sectors limited to 256 for NCQ
[11910.231319] ata9.00: configured for UDMA/133
[11910.231327] ata9: EH complete
[11910.231381] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
[11910.231394] sd 8:0:0:0: [sdd] Write Protect is off
[11910.231396] sd 8:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[11910.231417] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11996.729773] ata7.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[11996.729797] ata7.00: cmd 61/03:00:
[11996.729798] res 40/00:00:
[11996.729848] ata7.00: status: { DRDY }
[11996.729865] ata7: hard resetting link
[11997.231291] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11997.311308] ata7.00: max_sectors limited to 256 for NCQ
[11997.391306] ata7.00: max_sectors limited to 256 for NCQ
[11997.391316] ata7.00: configured for UDMA/133
[11997.391322] ata7: EH complete
[11997.391366] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[11997.391378] sd 6:0:0:0: [sdb] Write Protect is off
[11997.391380] sd 6:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[11997.391400] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FU
/var/log/messages:
Aug 30 20:12:43 isis kernel: [11285.918635] ata9: hard resetting link
Aug 30 20:12:43 isis kernel: [11286.420039] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 30 20:12:43 isis kernel: [11286.460065] ata9.00: max_sectors limited to 256 for NCQ
Aug 30 20:12:43 isis kernel: [11286.520054] ata9.00: max_sectors limited to 256 for NCQ
Aug 30 20:12:43 isis kernel: [11286.520059] ata9.00: configured for UDMA/133
Aug 30 20:12:43 isis kernel: [11286.520077] ata9: EH complete
Aug 30 20:12:43 isis kernel: [11286.520119] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
Aug 30 20:12:43 isis kernel: [11286.520132] sd 8:0:0:0: [sdd] Write Protect is off
Aug 30 20:12:43 isis kernel: [11286.520154] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 30 20:13:24 isis kernel: [11326.988623] ata8: hard resetting link
Aug 30 20:13:24 isis kernel: [11327.500037] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 30 20:13:24 isis kernel: [11327.580053] ata8.00: max_sectors limited to 256 for NCQ
Aug 30 20:13:24 isis kernel: [11327.657199] ata8.00: max_sectors limited to 256 for NCQ
Aug 30 20:13:24 isis kernel: [11327.657202] ata8.00: configured for UDMA/133
Aug 30 20:13:24 isis kernel: [11327.657207] ata8: EH complete
Aug 30 20:13:24 isis kernel: [11327.657257] sd 7:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
Aug 30 20:13:24 isis kernel: [11327.657272] sd 7:0:0:0: [sdc] Write Protect is off
Aug 30 20:13:24 isis kernel: [11327.657296] sd 7:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 30 20:14:15 isis kernel: [11377.938624] ata7: hard resetting link
Aug 30 20:14:15 isis kernel: [11378.440037] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 30 20:14:15 isis kernel: [11378.520056] ata7.00: max_sectors limited to 256 for NCQ
Aug 30 20:14:15 isis kernel: [11378.600065] ata7.00: max_sectors limited to 256 for NCQ
Aug 30 20:14:15 isis kernel: [11378.600068] ata7.00: configured for UDMA/133
Aug 30 20:14:15 isis kernel: [11378.600073] ata7: EH complete
Aug 30 20:14:15 isis kernel: [11378.600120] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
Aug 30 20:14:15 isis kernel: [11378.600133] sd 6:0:0:0: [sdb] Write Protect is off
Aug 30 20:14:15 isis kernel: [11378.600155] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 30 20:19:48 isis kernel: [11711.718616] ata9: hard resetting link
Aug 30 20:19:49 isis kernel: [11712.220041] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 30 20:19:49 isis kernel: [11712.260058] ata9.00: max_sectors limited to 256 for NCQ
Aug 30 20:19:49 isis kernel: [11712.320057] ata9.00: max_sectors limited to 256 for NCQ
Aug 30 20:19:49 isis kernel: [11712.320066] ata9.00: configured for UDMA/133
Aug 30 20:19:49 isis kernel: [11712.320072] ata9: EH complete
Aug 30 20:19:49 isis kernel: [11712.320112] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
Aug 30 20:19:49 isis kernel: [11712.320125] sd 8:0:0:0: [sdd] Write Protect is off
Aug 30 20:19:49 isis kernel: [11712.320148] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 30 20:22:06 isis kernel: [11849.328617] ata7: hard resetting link
Aug 30 20:22:06 isis kernel: [11849.830037] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 30 20:22:06 isis kernel: [11849.910070] ata7.00: max_sectors limited to 256 for NCQ
Aug 30 20:22:07 isis kernel: [11849.990053] ata7.00: max_sectors limited to 256 for NCQ
Aug 30 20:22:07 isis kernel: [11849.990057] ata7.00: configured for UDMA/133
Aug 30 20:22:07 isis kernel: [11849.990069] ata7: EH complete
Aug 30 20:22:07 isis kernel: [11849.990109] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
Aug 30 20:22:07 isis kernel: [11849.990123] sd 6:0:0:0: [sdb] Write Protect is off
Aug 30 20:22:07 isis kernel: [11849.990147] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 30 20:23:06 isis kernel: [11909.629865] ata9: hard resetting link
Aug 30 20:23:07 isis kernel: [11910.131295] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 30 20:23:07 isis kernel: [11910.180068] ata9.00: max_sectors limited to 256 for NCQ
Aug 30 20:23:07 isis kernel: [11910.231316] ata9.00: max_sectors limited to 256 for NCQ
Aug 30 20:23:07 isis kernel: [11910.231319] ata9.00: configured for UDMA/133
Aug 30 20:23:07 isis kernel: [11910.231327] ata9: EH complete
Aug 30 20:23:07 isis kernel: [11910.231381] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
Aug 30 20:23:07 isis kernel: [11910.231394] sd 8:0:0:0: [sdd] Write Protect is off
Aug 30 20:23:07 isis kernel: [11910.231417] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 30 20:24:33 isis kernel: [11996.729865] ata7: hard resetting link
Aug 30 20:24:34 isis kernel: [11997.231291] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 30 20:24:34 isis kernel: [11997.311308] ata7.00: max_sectors limited to 256 for NCQ
Aug 30 20:24:34 isis kernel: [11997.391306] ata7.00: max_sectors limited to 256 for NCQ
Aug 30 20:24:34 isis kernel: [11997.391316] ata7.00: configured for UDMA/133
Aug 30 20:24:34 isis kernel: [11997.391322] ata7: EH complete
Aug 30 20:24:34 isis kernel: [11997.391366] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
Aug 30 20:24:34 isis kernel: [11997.391378] sd 6:0:0:0: [sdb] Write Protect is off
Aug 30 20:24:34 isis kernel: [11997.391400] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
I've replaced the card and cables and i'm still getting the issue.
This card&raid was working on a centos last week (2.6.18 32bit).
Replaced OS (ubuntu 64bit), cpu (core2duo), mobo (asus p5k pro)
other info:
1) ubuntu release:
Description: Ubuntu intrepid (development branch)
Release: 8.10
2) package versions:
linux-server:
Installed: 2.6.27.2.2
mdadm:
Installed: 2.6.7-3ubuntu4
Candidate: 2.6.7-3ubuntu4
I'm really at a loss here, not sure what else to do. I stressed the other components of the system in windows and they seemed fine. not sure if its the card or something with the newer kernels.
also, these issues are not causing my raid to fail.
q@test:/storage$ sudo mdadm -D /dev/md1
/dev/md1:
Version : 01.02
Creation Time : Sat Jan 19 13:29:40 2008
Raid Level : raid5
Array Size : 2930302464 (2794.55 GiB 3000.63 GB)
Used Dev Size : 976767488 (931.52 GiB 1000.21 GB)
Raid Devices : 7
Total Devices : 7
Preferred Minor : 1
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sat Aug 30 20:49:05 2008
State : active
Active Devices : 7
Working Devices : 7
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 128K
Name : big500raid
UUID : 51ba59f2:
Events : 46
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 33 1 active sync /dev/sdc1
2 8 97 2 active sync /dev/sdg1
3 8 49 3 active sync /dev/sdd1
4 8 81 4 active sync /dev/sdf1
5 8 65 5 active sync /dev/sde1
7 8 113 6 active sync /dev/sdh1
description: | updated |
Changed in linux: | |
status: | Unknown → In Progress |
Changed in linux (Ubuntu Hardy): | |
assignee: | nobody → cooloney |
importance: | Undecided → Medium |
status: | New → Triaged |
Changed in linux (Ubuntu Intrepid): | |
assignee: | nobody → cooloney |
importance: | Undecided → Medium |
status: | New → Triaged |
Changed in linux (Fedora): | |
status: | In Progress → Fix Released |
Changed in linux (Fedora): | |
importance: | Unknown → Critical |
I am having the exact same problem in 8.10 (kubuntu fresh install) with my six disk raid 5 array. I'm using software raid, so I'm not sure if that is exactly the same. What motherboard are you using? I'm using an asus p5q and all drives are plugged into the onboard sata controller, not the onboard xpress backup ports(I believe it's an intel sata controller).
Here's the relevant bit of my /var/log/messages:
Nov 5 12:49:12 serverv2 -- MARK -- init)[6671] : warning: No hp: or hpfax: devices found in any installed CUPS queue. Exiting. 7.559:4) : operation= "inode_ permission" requested_ mask="r: :" denied_mask="r::" fsuid=7 name="/ proc/6778/ net/" pid=6778 profile= "/usr/sbin/ cupsd" 8.454:5) : operation= "inode_ permission" requested_ mask="r: :" denied_mask="r::" fsuid=7 name="/ proc/6782/ net/" pid=6782 profile= "/usr/sbin/ cupsd" 8.454:6) : operation= "socket_ create" family="ax25" sock_type="dgram" protocol=0 pid=6782 profile= "/usr/sbin/ cupsd" 8.454:7) : operation= "socket_ create" family="netrom" sock_type= "seqpacket" protocol=0 pid=6782 profile= "/usr/sbin/ cupsd" 8.454:8) : operation= "socket_ create" family="rose" sock_type="dgram" protocol=0 pid=6782 profile= "/usr/sbin/ cupsd" 8.454:9) : operation= "socket_ create" family="ipx" sock_type="dgram" protocol=0 pid=6782 profile= "/usr/sbin/ cupsd" 8.454:10) : operation= "socket_ create" family="appletalk" sock_type="dgram" protocol=0 pid=6782 profile= "/usr/sbin/ cupsd" 8.454:11) : operation= "socket_ create" family="econet" sock_type="dgram" protocol=0 pid=6782 profile= "/usr/sbin/ cupsd" 8.454:12) : operation= "socket_ create" family="ash" sock_type="dgram" protocol=0 pid=6782 profile= "/usr/sbin/ cupsd" 8.454:13) : operation= "socket_ create" family="x25" sock_type= "seqpacket" protocol=0 pid=6782 profile= "/usr/sbin/ cupsd"
Nov 5 13:09:12 serverv2 -- MARK --
Nov 5 13:29:12 serverv2 -- MARK --
Nov 5 13:49:12 serverv2 -- MARK --
Nov 5 14:09:12 serverv2 -- MARK --
Nov 5 14:29:12 serverv2 -- MARK --
Nov 5 14:49:12 serverv2 -- MARK --
Nov 5 15:09:12 serverv2 -- MARK --
Nov 5 15:29:12 serverv2 -- MARK --
Nov 5 15:49:12 serverv2 -- MARK --
Nov 5 15:50:55 serverv2 python: hp-systray(
Nov 5 15:52:07 serverv2 kernel: [12192.326735] type=1503 audit(122592552
Nov 5 15:52:08 serverv2 kernel: [12193.222565] type=1503 audit(122592552
Nov 5 15:52:08 serverv2 kernel: [12193.222606] type=1503 audit(122592552
Nov 5 15:52:08 serverv2 kernel: [12193.222614] type=1503 audit(122592552
Nov 5 15:52:08 serverv2 kernel: [12193.222621] type=1503 audit(122592552
Nov 5 15:52:08 serverv2 kernel: [12193.222628] type=1503 audit(122592552
Nov 5 15:52:08 serverv2 kernel: [12193.222634] type=1503 audit(122592552
Nov 5 15:52:08 serverv2 kernel: [12193.222641] type=1503 audit(122592552
Nov 5 15:52:08 serverv2 kernel: [12193.222648] type=1503 audit(122592552
Nov 5 15:52:08 serverv2 kernel: [12193.222654] type=1503 audit(122592552
Nov 5 16:05:21 serverv2 kernel: [12986.184075] ata3: hard resetting link
Nov 5 16:05:21 serverv2 kernel: [12986.184077] ata4: hard resetting link
Nov 5 16:05:21 serverv2 kernel: [12986.668023] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Nov 5 16:05:21 serverv2 kernel: [12986.668709] ata3: SATA link up 3.0 Gbps (SStatus 123...