Port slow to respond on SiI3512 with sata_sil
Binary package hint: linux-source-2.6.22
Fresh install of the latest Ubuntu 7.10 Server i386 (Gutsy), with all fixes applied.
Linux house 2.6.22-14-server #1 SMP Sun Oct 14 23:34:23 GMT 2007 i686 GNU/Linux
System is a Jetway J7F4, with VIA raid on board, and a Silicon Image-based PCI Sata-raid card. System has 3x500GB sata drives installed, 2 on the motherboard, and one on the PCI card. The system was installed from a USB CD-drive, as it does not have a permanently attached CD-ROM drive. The drives were partitioned as 1GB, 512MB and "the rest" - about 499GB. Using software raid these were then formed into a 3 partition RAID1 set for /boot, a 3 partition RAID1 set for swap, and a 3 partition RAID5 set for / respectively. Install appeared to go normally, and on first reboot the raid arrays were rebuilt. Some errors from the ata driver (?) were reported on the console, but apart from significant slow downs in the rebuild rate (drops from nearly 50MB/s to less than 8MB/s) there appeared to be no problems. System was then lightly used for a couple of days (some minor initial configuration work) and again I noticed a very occasional error message on the console.
Stupidly, I didn't take note of the exact errors, but on examining my kern.log, I can see that they would have been related to errors such as the following (extracted from that file), which always relate to ata1, which is the sata drive plugged into the PCI raid card (sda) :
Oct 31 12:08:51 house kernel: [ 318.940000] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Oct 31 12:08:51 house kernel: [ 318.940000] ata1.00: cmd c8/00:00:
Oct 31 12:08:51 house kernel: [ 318.940000] res 40/00:00:
Oct 31 12:08:51 house kernel: [ 319.270000] ata1: soft resetting port
Oct 31 12:08:51 house kernel: [ 319.430000] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Oct 31 12:08:51 house kernel: [ 319.490000] ata1.00: configured for UDMA/100
Oct 31 12:08:51 house kernel: [ 319.490000] ata1: EH complete
Oct 31 12:08:51 house kernel: [ 319.510000] sd 0:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)
Oct 31 12:08:51 house kernel: [ 319.520000] sd 0:0:0:0: [sda] Write Protect is off
Oct 31 12:08:51 house kernel: [ 319.520000] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Oct 31 12:08:51 house kernel: [ 319.550000] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Then, last night I tried to copy some 12-15GB of data to the system, and I noticed *many* errors being echoed to the console, again all related to ata1. At some point in the process it appears that the ata driver was unable to reset the port, even using a hard reset, and the drive was "disabled", which caused the software raid system to remove that drives partitions from the raid sets. Fortunately the system continued to run on the other drives, but I couldn't get the ata1 drive up again. I needed to reboot the box to regain access to the drive. I left the system rebuilding the raid sets in single-user mode this morning ... no errors were apparent on the log or console at that time, but I will add anything I find when I get home this evening.
This problem looks similar to several other bugs in the system, though there are differences between this and them, as follows:
Will attach kern.log (from time of install to fail of raid system last night), the output from lspci -vv and hdparm -I next.
|Changed in linux (Ubuntu):|
|status:||Incomplete → Fix Released|