So. It just happened again. My server crashed. This time I am sure it
has nothing to do with the USB drive I had since it is no longer attached.
It seems to be some unfortunate timing of a kernel(?) problem and
heavy disk use.
I just suddenly get these messages in the log:
Oct 23 00:56:13 matrix kernel: [14573759.262982] ata1: link is slow to respond, please be patient (ready=0)
Oct 23 00:56:13 matrix kernel: [14573764.242683] ata1: device not ready (errno=-16), forcing hardreset
Oct 23 00:56:13 matrix kernel: [14573764.242721] ata1: soft resetting link
Oct 23 00:56:13 matrix kernel: [14573765.081129] ata1.00: configured for UDMA/133
Oct 23 00:56:13 matrix kernel: [14573765.081188] ata1: EH completeOct 23 00:56:13 matrix kernel: [14573765.082422] sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
Oct 23 00:56:13 matrix kernel: [14573765.126583] sd 0:0:0:0: [sda] Write Protect is off
Oct 23 00:56:53 matrix kernel: [14573765.127506] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Which just repeat themselves until about 01:19 and then it goes quiet until a final logging at
7:54 where the server finally crashes (just stops to respond to network requests, keyboard a.s.o.)
I just checked the kern.log, which has a lot of entries of:
Oct 23 00:54:12 matrix kernel: [14573754.220270] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Oct 23 00:56:13 matrix kernel: [14573754.220348] ata1.00: cmd ca/00:50:14:9f:8d/00:00:00:00:00/e1 tag 0 dma 40960 out
Oct 23 00:56:13 matrix kernel: [14573754.220352] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 23 00:56:13 matrix kernel: [14573754.220465] ata1.00: status: { DRDY }
Oct 23 00:56:13 matrix kernel: [14573759.262982] ata1: link is slow to respond, please be patient (ready=0)
Oct 23 00:56:13 matrix kernel: [14573764.242683] ata1: device not ready (errno=-16), forcing hardreset
Oct 23 00:56:13 matrix kernel: [14573764.242721] ata1: soft resetting linkOct 23 00:56:13 matrix kernel: [14573765.081129] ata1.00: configured for UDMA/133
Oct 23 00:56:13 matrix kernel: [14573765.081188] ata1: EH complete
Oct 23 00:56:13 matrix kernel: [14573765.082422] sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
Oct 23 00:56:13 matrix kernel: [14573765.126583] sd 0:0:0:0: [sda] Write Protect is off
Oct 23 00:56:13 matrix kernel: [14573765.126598] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00Oct 23 00:56:53 matrix kernel: [14573765.127506] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
This adds some more info about an exception?
Searching for theses entries, gives a lot of people reporting the same problem:
So. It just happened again. My server crashed. This time I am sure it
has nothing to do with the USB drive I had since it is no longer attached.
It seems to be some unfortunate timing of a kernel(?) problem and
heavy disk use.
I just suddenly get these messages in the log:
Oct 23 00:56:13 matrix kernel: [14573759.262982] ata1: link is slow to respond, please be patient (ready=0)
Oct 23 00:56:13 matrix kernel: [14573764.242683] ata1: device not ready (errno=-16), forcing hardreset
Oct 23 00:56:13 matrix kernel: [14573764.242721] ata1: soft resetting link
Oct 23 00:56:13 matrix kernel: [14573765.081129] ata1.00: configured for UDMA/133
Oct 23 00:56:13 matrix kernel: [14573765.081188] ata1: EH completeOct 23 00:56:13 matrix kernel: [14573765.082422] sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
Oct 23 00:56:13 matrix kernel: [14573765.126583] sd 0:0:0:0: [sda] Write Protect is off
Oct 23 00:56:53 matrix kernel: [14573765.127506] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Which just repeat themselves until about 01:19 and then it goes quiet until a final logging at
7:54 where the server finally crashes (just stops to respond to network requests, keyboard a.s.o.)
I just checked the kern.log, which has a lot of entries of:
Oct 23 00:54:12 matrix kernel: [14573754.220270] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen 14:9f:8d/ 00:00:00: 00:00/e1 tag 0 dma 40960 out 00:00:00/ 00:00:00: 00:00/00 Emask 0x4 (timeout)
Oct 23 00:56:13 matrix kernel: [14573754.220348] ata1.00: cmd ca/00:50:
Oct 23 00:56:13 matrix kernel: [14573754.220352] res 40/00:00:
Oct 23 00:56:13 matrix kernel: [14573754.220465] ata1.00: status: { DRDY }
Oct 23 00:56:13 matrix kernel: [14573759.262982] ata1: link is slow to respond, please be patient (ready=0)
Oct 23 00:56:13 matrix kernel: [14573764.242683] ata1: device not ready (errno=-16), forcing hardreset
Oct 23 00:56:13 matrix kernel: [14573764.242721] ata1: soft resetting linkOct 23 00:56:13 matrix kernel: [14573765.081129] ata1.00: configured for UDMA/133
Oct 23 00:56:13 matrix kernel: [14573765.081188] ata1: EH complete
Oct 23 00:56:13 matrix kernel: [14573765.082422] sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
Oct 23 00:56:13 matrix kernel: [14573765.126583] sd 0:0:0:0: [sda] Write Protect is off
Oct 23 00:56:13 matrix kernel: [14573765.126598] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00Oct 23 00:56:53 matrix kernel: [14573765.127506] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
This adds some more info about an exception?
Searching for theses entries, gives a lot of people reporting the same problem:
And probably a solution: http:// ubuntuforums. org/showthread. php?t=1145513
(The guy on that post wonders why there hasn't been many reports on this issue...)
Also: /bugzilla. redhat. com/show_ bug.cgi? id=462425 /bugzilla. redhat. com/show_ bug.cgi? id=404851 lkml.org/ lkml/2008/ 11/9/22 forums. fedoraforum. org/showthread. php?t=219746
https:/
https:/
http://
http://
I'm running kernel 2.6.27-11-server. Someone suggest to run kernel-rt instead:
https:/ /bugs.launchpad .net/ubuntu/ +source/ linux/+ bug/279693 (comment #23)
I haven't tried that. I will try to see if a kernel 2.6.27-14 is available or eventually try the -rt
suggestion.
It seems it is possible to crash the system by doing a "ls -lR /". Not what I expect from a Linux system...
Kind regards
Torben