I believe I have a related issue.  I installed server from Jaunty alternative CD, creating two partitions on each of two disks (one swap, one everything else), I then built a RAID1 arrangement.  Right from the start I saw similar errors on /dev/sda, if the RAID is working then that disk is eventually faulted out of both /dev/md devices.  If I pull the other disk (/dev/sdb) then the RAID tries to continue but this same error occurs and the file system mounts r/o.

The most recent error I saw is in the attachment, it starts with:

Sep  9 20:24:33 conway2 kernel: [23146.732090] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Sep  9 20:24:34 conway2 kernel: [23146.732185] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Sep  9 20:24:34 conway2 kernel: [23146.732188]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep  9 20:24:34 conway2 kernel: [23146.732319] ata1.00: status: { DRDY }
Sep  9 20:24:34 conway2 kernel: [23146.732371] ata1: hard resetting link
Sep  9 20:24:34 conway2 kernel: [23152.652048] ata1: link is slow to respond, please be patient (ready=-19)
Sep  9 20:24:34 conway2 kernel: [23156.740061] ata1: SRST failed (errno=-16)
Sep  9 20:24:34 conway2 kernel: [23156.740130] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep  9 20:24:34 conway2 kernel: [23161.740077] ata1.00: qc timeout (cmd 0xec)
Sep  9 20:24:34 conway2 kernel: [23161.740087] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x5)
Sep  9 20:24:34 conway2 kernel: [23161.740092] ata1.00: revalidation failed (errno=-5)

smartmon gives me:

jhowison@conway2:~$ sudo smartctl -a /dev/sda
[sudo] password for jhowison: 
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Short INQUIRY response, skip product id
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

If I reboot, then the disk comes up ok, and the RAID begins to resync.  At some point, usually a few hours later (there's almost no activity on this just installed server) it will die in the same way.

dmesg output for ata1 shows:

[    1.831368] sata_nv 0000:00:05.0: version 3.5
[    1.831696] ACPI: PCI Interrupt Link [LSA0] enabled at IRQ 23
[    1.831706] sata_nv 0000:00:05.0: PCI INT A -> Link[LSA0] -> GSI 23 (level, l
ow) -> IRQ 23[    1.831709] sata_nv 0000:00:05.0: Using SWNCQ mode
[    1.831775] sata_nv 0000:00:05.0: setting latency timer to 64[    1.831931] scsi0 : sata_nv
[    1.832009] scsi1 : sata_nv
[    1.832117] ata1: SATA max UDMA/133 cmd 0xd480 ctl 0xd400 bmdma 0xcc00 irq 23
[    1.832120] ata2: SATA max UDMA/133 cmd 0xd080 ctl 0xd000 bmdma 0xcc08 irq 23[    2.712051] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    2.746186] ata1.00: ATA-8: ST31000528AS, CC34, max UDMA/133[    2.746189] ata1.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32)
[    2.802201] ata1.00: configured for UDMA/133

...

[    3.770300] scsi 0:0:0:0: Direct-Access     ATA      ST31000528AS     CC34 PQ
: 0 ANSI: 5
[    3.770375] sd 0:0:0:0: [sda] 1953525168 512-byte hardware sectors: (1.00 TB/
931 GiB)[    3.770388] sd 0:0:0:0: [sda] Write Protect is off
[    3.770390] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    3.770405] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    3.770454] sd 0:0:0:0: [sda] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
[    3.770462] sd 0:0:0:0: [sda] Write Protect is off
[    3.770464] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    3.770478] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    3.770481]  sda: sda1 sda2
[    3.785595] sd 0:0:0:0: [sda] Attached SCSI disk

later I see the error (I had thought it didn't come on boot, but the disk did mount after this reboot):

[23146.732090] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[23146.732185] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[23146.732188]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeo
ut)
[23146.732319] ata1.00: status: { DRDY }
[23146.732371] ata1: hard resetting link
[23152.652048] ata1: link is slow to respond, please be patient (ready=-19)
[23156.740061] ata1: SRST failed (errno=-16)
[23156.740130] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[23161.740077] ata1.00: qc timeout (cmd 0xec)
[23161.740087] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x5)
[23161.740092] ata1.00: revalidation failed (errno=-5)
[23161.740150] ata1: hard resetting link
[23167.660057] ata1: link is slow to respond, please be patient (ready=-19)
[23171.748046] ata1: SRST failed (errno=-16)
[23171.748113] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[23181.748055] ata1.00: qc timeout (cmd 0xec)
[23181.748065] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x5)
[23181.748070] ata1.00: revalidation failed (errno=-5)
[23181.748122] ata1: limiting SATA link speed to 1.5 Gbps
[23181.748132] ata1: hard resetting link[23187.668049] ata1: link is slow to respond, please be patient (ready=-19)
[23187.668049] ata1: link is slow to respond, please be patient (ready=-19)
[23191.756047] ata1: SRST failed (errno=-16)
[23191.756114] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[23221.756057] ata1.00: qc timeout (cmd 0xec)
[23221.756067] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x5)
[23221.756071] ata1.00: revalidation failed (errno=-5)
[23221.756122] ata1.00: disabled
[23221.756151] ata1: hard resetting link
[23227.676046] ata1: link is slow to respond, please be patient (ready=-19)
[23231.764046] ata1: SRST failed (errno=-16)
[23231.764113] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[23231.764135] end_request: I/O error, dev sda, sector 1949616063
[23231.764190] md: super_written gets error=-5, uptodate=0
[23231.764196] raid1: Disk failure on sda2, disabling device.
[23231.764199] raid1: Operation continuing on 1 devices.
[23231.802974] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[23231.802984] end_request: I/O error, dev sda, sector 31145279
[23231.803129] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[23231.803137] end_request: I/O error, dev sda, sector 31146303
[23231.803276] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[23231.803283] end_request: I/O error, dev sda, sector 31147327
[23231.803404] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[23231.803410] end_request: I/O error, dev sda, sector 31148351
[23231.803472] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[23231.803479] end_request: I/O error, dev sda, sector 31148479
[23231.929793] RAID1 conf printout:
[23231.929799]  --- wd:1 rd:2
[23231.929804]  disk 1, wo:0, o:1, dev:sdb2
[59381.242235] UDP: short packet: From 0.0.0.0:65535 47596/280 to 255.255.255.255:25
[84257.788600] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[84257.788613] end_request: I/O error, dev sda, sector 1953519886
[84257.788672] Buffer I/O error on device sda1, logical block 3903616
[84257.788727] Buffer I/O error on device sda1, logical block 3903617
[84257.788780] Buffer I/O error on device sda1, logical block 3903618
[84257.788833] Buffer I/O error on device sda1, logical block 3903619
[84257.788885] Buffer I/O error on device sda1, logical block 3903620
[84257.788938] Buffer I/O error on device sda1, logical block 3903621
[84257.788994] Buffer I/O error on device sda1, logical block 3903622
[84257.789047] Buffer I/O error on device sda1, logical block 3903623

Anyway, I've attached that log too.

As this is a server I don't particularly want to run the latest and greatest ubuntu, but I have about a day to help debug this.  I'm going to install the latest karmic build and report back.  Does the apport-collect command need to be running while the error occurs?  Or can it run after the error occurs?

--J


I'm not entirely