I believe I have a related issue. I installed server from Jaunty alternative CD, creating two partitions on each of two disks (one swap, one everything else), I then built a RAID1 arrangement. Right from the start I saw similar errors on /dev/sda, if the RAID is working then that disk is eventually faulted out of both /dev/md devices. If I pull the other disk (/dev/sdb) then the RAID tries to continue but this same error occurs and the file system mounts r/o. The most recent error I saw is in the attachment, it starts with: Sep 9 20:24:33 conway2 kernel: [23146.732090] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Sep 9 20:24:34 conway2 kernel: [23146.732185] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 Sep 9 20:24:34 conway2 kernel: [23146.732188] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 9 20:24:34 conway2 kernel: [23146.732319] ata1.00: status: { DRDY } Sep 9 20:24:34 conway2 kernel: [23146.732371] ata1: hard resetting link Sep 9 20:24:34 conway2 kernel: [23152.652048] ata1: link is slow to respond, please be patient (ready=-19) Sep 9 20:24:34 conway2 kernel: [23156.740061] ata1: SRST failed (errno=-16) Sep 9 20:24:34 conway2 kernel: [23156.740130] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Sep 9 20:24:34 conway2 kernel: [23161.740077] ata1.00: qc timeout (cmd 0xec) Sep 9 20:24:34 conway2 kernel: [23161.740087] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x5) Sep 9 20:24:34 conway2 kernel: [23161.740092] ata1.00: revalidation failed (errno=-5) smartmon gives me: jhowison@conway2:~$ sudo smartctl -a /dev/sda [sudo] password for jhowison: smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Short INQUIRY response, skip product id A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. If I reboot, then the disk comes up ok, and the RAID begins to resync. At some point, usually a few hours later (there's almost no activity on this just installed server) it will die in the same way. dmesg output for ata1 shows: [ 1.831368] sata_nv 0000:00:05.0: version 3.5 [ 1.831696] ACPI: PCI Interrupt Link [LSA0] enabled at IRQ 23 [ 1.831706] sata_nv 0000:00:05.0: PCI INT A -> Link[LSA0] -> GSI 23 (level, l ow) -> IRQ 23[ 1.831709] sata_nv 0000:00:05.0: Using SWNCQ mode [ 1.831775] sata_nv 0000:00:05.0: setting latency timer to 64[ 1.831931] scsi0 : sata_nv [ 1.832009] scsi1 : sata_nv [ 1.832117] ata1: SATA max UDMA/133 cmd 0xd480 ctl 0xd400 bmdma 0xcc00 irq 23 [ 1.832120] ata2: SATA max UDMA/133 cmd 0xd080 ctl 0xd000 bmdma 0xcc08 irq 23[ 2.712051] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 2.746186] ata1.00: ATA-8: ST31000528AS, CC34, max UDMA/133[ 2.746189] ata1.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32) [ 2.802201] ata1.00: configured for UDMA/133 ... [ 3.770300] scsi 0:0:0:0: Direct-Access ATA ST31000528AS CC34 PQ : 0 ANSI: 5 [ 3.770375] sd 0:0:0:0: [sda] 1953525168 512-byte hardware sectors: (1.00 TB/ 931 GiB)[ 3.770388] sd 0:0:0:0: [sda] Write Protect is off [ 3.770390] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 3.770405] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 3.770454] sd 0:0:0:0: [sda] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB) [ 3.770462] sd 0:0:0:0: [sda] Write Protect is off [ 3.770464] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 3.770478] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 3.770481] sda: sda1 sda2 [ 3.785595] sd 0:0:0:0: [sda] Attached SCSI disk later I see the error (I had thought it didn't come on boot, but the disk did mount after this reboot): [23146.732090] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [23146.732185] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 [23146.732188] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeo ut) [23146.732319] ata1.00: status: { DRDY } [23146.732371] ata1: hard resetting link [23152.652048] ata1: link is slow to respond, please be patient (ready=-19) [23156.740061] ata1: SRST failed (errno=-16) [23156.740130] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [23161.740077] ata1.00: qc timeout (cmd 0xec) [23161.740087] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x5) [23161.740092] ata1.00: revalidation failed (errno=-5) [23161.740150] ata1: hard resetting link [23167.660057] ata1: link is slow to respond, please be patient (ready=-19) [23171.748046] ata1: SRST failed (errno=-16) [23171.748113] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [23181.748055] ata1.00: qc timeout (cmd 0xec) [23181.748065] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x5) [23181.748070] ata1.00: revalidation failed (errno=-5) [23181.748122] ata1: limiting SATA link speed to 1.5 Gbps [23181.748132] ata1: hard resetting link[23187.668049] ata1: link is slow to respond, please be patient (ready=-19) [23187.668049] ata1: link is slow to respond, please be patient (ready=-19) [23191.756047] ata1: SRST failed (errno=-16) [23191.756114] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [23221.756057] ata1.00: qc timeout (cmd 0xec) [23221.756067] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x5) [23221.756071] ata1.00: revalidation failed (errno=-5) [23221.756122] ata1.00: disabled [23221.756151] ata1: hard resetting link [23227.676046] ata1: link is slow to respond, please be patient (ready=-19) [23231.764046] ata1: SRST failed (errno=-16) [23231.764113] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [23231.764135] end_request: I/O error, dev sda, sector 1949616063 [23231.764190] md: super_written gets error=-5, uptodate=0 [23231.764196] raid1: Disk failure on sda2, disabling device. [23231.764199] raid1: Operation continuing on 1 devices. [23231.802974] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK [23231.802984] end_request: I/O error, dev sda, sector 31145279 [23231.803129] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK [23231.803137] end_request: I/O error, dev sda, sector 31146303 [23231.803276] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK [23231.803283] end_request: I/O error, dev sda, sector 31147327 [23231.803404] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK [23231.803410] end_request: I/O error, dev sda, sector 31148351 [23231.803472] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK [23231.803479] end_request: I/O error, dev sda, sector 31148479 [23231.929793] RAID1 conf printout: [23231.929799] --- wd:1 rd:2 [23231.929804] disk 1, wo:0, o:1, dev:sdb2 [59381.242235] UDP: short packet: From 0.0.0.0:65535 47596/280 to 255.255.255.255:25 [84257.788600] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK [84257.788613] end_request: I/O error, dev sda, sector 1953519886 [84257.788672] Buffer I/O error on device sda1, logical block 3903616 [84257.788727] Buffer I/O error on device sda1, logical block 3903617 [84257.788780] Buffer I/O error on device sda1, logical block 3903618 [84257.788833] Buffer I/O error on device sda1, logical block 3903619 [84257.788885] Buffer I/O error on device sda1, logical block 3903620 [84257.788938] Buffer I/O error on device sda1, logical block 3903621 [84257.788994] Buffer I/O error on device sda1, logical block 3903622 [84257.789047] Buffer I/O error on device sda1, logical block 3903623 Anyway, I've attached that log too. As this is a server I don't particularly want to run the latest and greatest ubuntu, but I have about a day to help debug this. I'm going to install the latest karmic build and report back. Does the apport-collect command need to be running while the error occurs? Or can it run after the error occurs? --J I'm not entirely