Nvidia MCP67 AHCI ata timeout exception with data loss
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Invalid
|
Medium
|
TJ |
Bug Description
Binary package hint: linux-image-
I have now gotten about 10 of these ata timeout exceptions with Jaunty.
It has never happened on Intrepid, I also use the disk heavily from Windows and never seen any problems there.
It has only happened on my ext4 root filesystem, usually during a big dist-upgrade. I would guess that ext4 doesn't have anything to do with the problem, it's just that dist-upgrade is such a big disk cruncher and my Jaunty root filesystem happens to be ext4. But I can't know of cource.
I have an option in my BIOS if the SATA-support should be with AHCI or IDE, it has happened with both of them.
I've also tried attaching the harddrive to a different SATA port on my motherboard, didn't help.
It's very hard to reproduce, I did this
while true; do
git clone /home/ernst/
sync
rm -r linux-2.6
sync
done
for an hour and it didn't happen.
lspci -vv (only sata controller)
00:09.0 IDE interface: nVidia Corporation MCP67 AHCI Controller (rev a2) (prog-if 85 [Master SecO PriO])
Subsystem: ABIT Computer Corp. Device 1c2f
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0 (750ns min, 250ns max)
Interrupt: pin A routed to IRQ 2296
Region 0: I/O ports at 09f0 [size=8]
Region 1: I/O ports at 0bf0 [size=4]
Region 2: I/O ports at 0970 [size=8]
Region 3: I/O ports at 0b70 [size=4]
Region 4: I/O ports at dc00 [size=16]
Region 5: Memory at fe026000 (32-bit, non-prefetchable) [size=8K]
Capabilities: [44] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [8c] SATA HBA <?>
Capabilities: [b0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3 Enable+
Address: 00000000fee0300c Data: 4189
Capabilities: [cc] HyperTransport: MSI Mapping Enable+ Fixed+
Kernel driver in use: ahci
[ 1375.804551] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 1375.804566] ata1.00: cmd ea/00:00:
[ 1375.804568] res 40/00:00:
[ 1375.804574] ata1.00: status: { DRDY }
[ 1375.804584] ata1: hard resetting link
[ 1376.288035] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1376.297962] ata1.00: configured for UDMA/133
[ 1376.297984] end_request: I/O error, dev sda, sector 476567766
[ 1376.298013] ata1: EH complete
[ 1376.298021] Aborting journal on device sda4:8.
[ 1376.298144] sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors: (250 GB/232 GiB)
[ 1376.298181] sd 0:0:0:0: [sda] Write Protect is off
[ 1376.298186] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 1376.298236] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 1376.301499] ext4_abort called.
[ 1376.301504] EXT4-fs error (device sda4): ext4_journal_
[ 1376.301511] Remounting filesystem read-only
tags: | added: ext4 |
Changed in linux (Ubuntu): | |
status: | In Progress → Invalid |
Now as you can see it's always a "timeout" exception. My first reaction to that is...
1) maybe it could wait a little longer?
2) is it really that bad? Try again?
3) has some timeout value changed between 2.6.27 and 2.6.28?
You can see that it's on different sectors each time.
"smartctl --all" for the disk looks fine, no errors ever.