ata timeout freezes system partly

Bug #37382 reported by Martin Ammermüller on 2006-03-30
28
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
linux-source-2.6.15 (Ubuntu)
Medium
Unassigned
linux-source-2.6.20 (Ubuntu)
Medium
Unassigned

Bug Description

when running several instances of bonnie++ simultaneously, i get the following error (mostly during "Rewriting..." message from bonnie++):

ata1: command 0x25 timeout, stat 0xd8 host_stat 0x21
ata1: command translated ATA stat/err 0xd8/00 to SCSI SK/ASC/ASCQ 0xb/47/00

This message gets repeated every few seconds. After that message the hd-activity light is permanently on, i can switch consoles but everything involving harddrive access doesn't work (including save shutdown). Only option is a hard reset.

sometimes those error also occurs, but with no noticable effect (mostly occurs in packs of 3-5):

[4295874.351000] ATA: abnormal status 0x58 on port 0xDC870087

additionally (but i don't know if it's related):

[4295164.189000] ACPI-0307: *** Error: No installed handler for fixed event [00000000]

I don't know if this was intended, but this error leads to remounting the root-fs readonly (which can not be reversed, so the only option is a reboot) due to the "errors=remount-ro" flag in /etc/fstab

Filesystem is an ext3fs, installed by installing breezy 5.10 and updating from the net (since flight5 live and install cds cannot be booted on this system)

for dmesg and lspci output see https://launchpad.net/distros/ubuntu/+source/linux-source-2.6.15/+bug/20781

what i forgot to mention: its a toshiba a100-512 notebook

Workaround someone mailed me is to turn apic and acpi off. This is done with the kernel boot parameters "noapic acpi=off".

I tried to add this line with kopt="noapic acpi=off" to my menu.lst. But kopt is somehow ignored (btw, it's not mentioned in the grub docs), so that i have to modify the automagic kernel list everytime i install or update a kernel package. which is somewhat bad. Is there a better way to do this?

Nevertheless, i hope this bug will be fixed as soon as possible, since a notebook without acpi is really bad.

This problem seems to be resolved in vanilla kernel version 2.6.17-rc6-mm2

Not really fixed: even with recent 2.6.20 kernels, my sata controller freezes occasionally for about 30s until libata's EH hard resets it.

Couldn't detect any data loss, though.

Ben Collins (ben-collins) wrote :

Wondering if this is actually a hw issue.

Changed in linux-source-2.6.15:
assignee: nobody → ubuntu-kernel-team
importance: Undecided → Medium
status: Unconfirmed → Confirmed
Changed in linux-source-2.6.20:
assignee: nobody → ubuntu-kernel-team
status: Unconfirmed → Confirmed
StiffMe (stiffme) wrote :

my kernel is 2.6.20-15-generic
my laptop works fine on edgy and dapper.
when i use feisty, my system freezes from time to time. Every time it freezes, I have to wait for about half a minute, and then it works fine again.
Then i enter a console , and kernel showed the following output:
ata1: failed to respond(30 secs, status 0xd0)

sometimes kernel showed :
ata1.01:exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1:01cmd a0/01:00:00:00:00/00:00:00:00:00/b0 tag 0 cdb 0x43 data 12 in

I have a PATA harddisk ,but in feisty, my hard disk is recognized as /dev/sda
I tried to compile kernel source of version 2.6.20 from kernel.org, this problem remains.

PS: I am using ICH7, in feisty, at first ,no sound is out. But actually kernel detects my sound card and is LOOKS fine. The Only problem is NO sound out. Then i compiled the alsa driver and lib from alsa website, and the problem is resolved.

Sorry for my English.I am not good at it.

Peter Shaw (tr-spam) wrote :

I can confirm this error under Feisty. I reinstalled Edgy, and the problem doesn't show up any longer.

I don't have the log files any more but the symptoms are the same: the notebook freezes, and after about 30 seconds it is normal again. The notebook is a Asus A6JM with Fujitsu 120GB HDD.

Michael Boyle (mb-corfleet) wrote :

Acer 5612 Intel Core Duo Feisty.

The freeze occurred as follows

May 29 07:10:53
May 29 07:19:50
May 29 07:30:36
May 29 07:33:51
May 29 07:42:37
May 29 12:02:15
May 29 12:05:00
May 29 12:07:08
May 29 12:12:30
May 29 12:17:58
May 29 12:43:27
May 29 13:12:50
May 29 13:20:44
May 29 13:46:54
May 29 13:49:30
May 29 13:50:37
May 29 13:57:28
May 29 15:54:45
May 29 15:59:23
May 29 16:02:45
May 29 16:09:03
May 29 16:12:00
May 29 16:17:57
May 29 16:30:20
May 29 16:47:33
May 29 16:51:26
May 29 17:03:53
May 29 17:15:57
May 29 17:21:48
May 29 17:28:55
May 29 18:01:36
May 29 18:12:07
May 29 18:13:18

Full dump of one incident. (typical)

May 29 18:13:18 mb-server kernel: [40529.799911] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
May 29 18:13:18 mb-server kernel: [40529.799927] ata1.01: cmd a0/00:00:00:00:20/00:00:00:00:00/b0 tag 0 cdb 0x1e data 0
May 29 18:13:18 mb-server kernel: [40529.799930] res 40/00:03:00:00:00/00:00:00:00:00/b0 Emask 0x4 (timeout)
May 29 18:13:25 mb-server kernel: [40536.793812] ata1: port is slow to respond, please be patient (Status 0xd0)
May 29 18:13:48 mb-server kernel: [40559.776592] ata1: port failed to respond (30 secs, Status 0xd0)
May 29 18:13:48 mb-server kernel: [40559.776603] ata1: soft resetting port
May 29 18:13:49 mb-server kernel: [40560.120794] ata1.00: ata_hpa_resize 1: sectors = 117210240, hpa_sectors = 117210240
May 29 18:13:49 mb-server kernel: [40560.128781] ata1.00: ata_hpa_resize 1: sectors = 117210240, hpa_sectors = 117210240
May 29 18:13:49 mb-server kernel: [40560.128788] ata1.00: configured for UDMA/100
May 29 18:13:49 mb-server kernel: [40560.308249] ata1.01: configured for UDMA/33
May 29 18:13:49 mb-server kernel: [40560.308265] ata1: EH complete
May 29 18:13:49 mb-server kernel: [40560.316298] SCSI device sda: 117210240 512-byte hdwr sectors (60012 MB)
May 29 18:13:49 mb-server kernel: [40560.316524] sda: Write Protect is off
May 29 18:13:49 mb-server kernel: [40560.316530] sda: Mode Sense: 00 3a 00 00
May 29 18:13:49 mb-server kernel: [40560.316861] SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
May 29 18:13:49 mb-server kernel: [40560.317241] SCSI device sda: 117210240 512-byte hdwr sectors (60012 MB)
May 29 18:13:49 mb-server kernel: [40560.317483] sda: Write Protect is off
May 29 18:13:49 mb-server kernel: [40560.317489] sda: Mode Sense: 00 3a 00 00
May 29 18:13:49 mb-server kernel: [40560.317973] SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Michael Boyle (mb-corfleet) wrote :

Sorry I forgot
Linux mb-server 2.6.20-15-generic #2 SMP Sun Apr 15 07:36:31 UTC 2007 i686 GNU/Linux

Giuseppe Calà (jiveaxe) wrote :

My system is afflicted by the same problem from Feisty installation (Edgy was nice). Upgrading to kernel 2.6.20-16 not resolved the bug.

My system:
Linux kubuntu-home 2.6.20-16-generic #2 SMP Wed May 23 01:46:23 UTC 2007 i686 GNU/Linux

Giuseppe Calà (jiveaxe) wrote :

And here is the output of 'sudo lspci -vvnn'

Thanks

tuxrox (long-id-2-avoid-spam) wrote :

I would like to confirm the bug , occurring in an HP5100 desktop that I use at work. I started using Edgy on it last January , and it worked fine. After upgrading to Feisty two weeks ago , the bug made it very difficult to use the machine, requiring power cycle 3 or 4 times a day.

When the bug occcurs, neither the "top" shell command nor System Monitor shows what is locking the machine, you can only see the HD activity led blinking like crazy. At System Monitor screen, CPU, RAM and Network usage %s exhibit low values . But this may not be valid , because they are frozen, possibly in a value before the bug occurred. "top" also freezes , or almost , since it updates the data only every 40 seconds , or more. I have dclock running all the time (a digital clock KDE application), and this clock also freezes during these occasions, not even the colon blinks...

The box has plenty of disk space and swap area , which is rarely used. I tried unplugging the network cable to stop network traffic , makes no difference. If I let the box in peace for several minutes , let's say , some 40 minutes , it gets back to normality.

kernel 2.6.20-16-generic
512MB RAM - 1GB swap

Typical applications running when bug occurs are :
wine 0.9.37 , running Lotus Notes
vmware , emulating a Windoze 98 - 128MB running Novell Netware Administrator
java 1.5.0_11 , running SAP/R3 Front End (PlatinGUI )

tuxrox (long-id-2-avoid-spam) wrote :

Tried the workaround suggested by tenco (posting dated 05-23-2007) :

"Workaround someone mailed me is to turn apic and acpi off. This is done with the kernel boot parameters "noapic acpi=off"."

Maybe this , or maybe two lib updates received early this morning by Update Manager , but the problem is completely solved . More than 10 hours without the bug occurring .

Thank you , tenco.

Adam (adam.russell) wrote :
Download full text (3.9 KiB)

Kubuntu Feisty Fawn
kernel 2.6.20-16-generic

I believe the problem is occuring with my SATA hard drive, which is a Samsung SP0812C. It stops responding quite often. It seems that most people that are having this problem are fixing it with 'piix', but I have a VIA chipset rather than an Intel one. I will attach some logs below. Relevant dmesg lines:

[ 2416.505383] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
[ 2416.505394] ata1.00: cmd c8/00:90:e6:54:5b/00:00:00:00:00/e2 tag 0 cdb 0x0 data 73728 in
[ 2416.505396] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 2416.505414] ata1: soft resetting port
[ 2416.671775] ATA: abnormal status 0x7F on port 0x0001e007
[ 2416.682280] ATA: abnormal status 0x7F on port 0x0001e007
[ 2416.693332] ata1.00: ata_hpa_resize 1: sectors = 156368016, hpa_sectors = 156368016
[ 2416.705299] ata1.00: ata_hpa_resize 1: sectors = 156368016, hpa_sectors = 156368016
[ 2416.705303] ata1.00: configured for UDMA/100
[ 2416.705314] ata1: EH complete
[ 2416.727567] SCSI device sda: 156368016 512-byte hdwr sectors (80060 MB)
[ 2416.727651] sda: Write Protect is off
[ 2416.727653] sda: Mode Sense: 00 3a 00 00
[ 2416.741889] SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 2446.704938] ata1.00: limiting speed to UDMA/33:PIO4
[ 2446.704945] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
[ 2446.704953] ata1.00: cmd c8/00:88:e6:7d:5d/00:00:00:00:00/e1 tag 0 cdb 0x0 data 69632 in
[ 2446.704955] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 2453.697731] ata1: port is slow to respond, please be patient (Status 0xd0)
[ 2476.684824] ata1: port failed to respond (30 secs, Status 0xd0)
[ 2476.684831] ata1: soft resetting port
[ 2476.855210] ATA: abnormal status 0x7F on port 0x0001e007
[ 2476.865714] ATA: abnormal status 0x7F on port 0x0001e007
[ 2476.876786] ata1.00: ata_hpa_resize 1: sectors = 156368016, hpa_sectors = 156368016
[ 2476.888756] ata1.00: ata_hpa_resize 1: sectors = 156368016, hpa_sectors = 156368016
[ 2476.888760] ata1.00: configured for UDMA/33
[ 2476.888771] ata1: EH complete
[ 2476.909818] SCSI device sda: 156368016 512-byte hdwr sectors (80060 MB)
[ 2476.912474] sda: Write Protect is off
[ 2476.912478] sda: Mode Sense: 00 3a 00 00
[ 2476.916458] SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 2507.060115] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
[ 2507.060126] ata1.00: cmd c8/00:20:b6:2d:04/00:00:00:00:00/e2 tag 0 cdb 0x0 data 16384 in
[ 2507.060128] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 2514.052911] ata1: port is slow to respond, please be patient (Status 0xd0)
[ 2537.032018] ata1: port failed to respond (30 secs, Status 0xd0)
[ 2537.032025] ata1: soft resetting port
[ 2537.198411] ATA: abnormal status 0x7F on port 0x0001e007
[ 2537.208918] ATA: abnormal status 0x7F on port 0x0001e007
[ 2537.215986] ata1.00: ata_hpa_resize 1: sectors = 156368016, hpa_sectors = 156368016
[ 2537.223961] ata1.00: ata_hpa_resize 1: sectors = 156368016, hpa_sectors = 156368016
[ 2537.223964] ata1.00: configured for U...

Read more...

Adam (adam.russell) wrote :
Adam (adam.russell) wrote :
Adam (adam.russell) wrote :
Adam (adam.russell) wrote :
Adam (adam.russell) wrote :
Kevin Woo (kwoo) wrote :

Had a similar problem. Have a PATA cdrom drive and a SATA hard drive on a ICHM6 chipset. According to the libata FAQ, Intel chipsets like this combine the PATA and SATA functions and the IDE and libata drivers fight. Fixed it by adding "combined_mode=libata" to the kernel parameters in GRUB. Has booted multiple times without this problem now.

Got the information from:
http://linux-ata.org/faq.html#combined

Can any body else confirm this solution?

Peter Lewis (prlewis) wrote :

I tried this and it made no difference... sorry.

kiev1 (sys-sys-admin) wrote :

this problem already whole year (((((((((((((((((((

for me she showed up one time in the floor of hour, however as a result of this problem I lost a mysql database - mysql innodb not start - "Accertion error" - did not help even "innodb_force_recovery = 4", backup was an a week remoteness - the works of whole department lost data for a few days, the management simply in shock - I going to discharge from job (((

this problem already whole year:
-----------
I'm stumped trying to track down the below intermittent problem.....
I've confirmed this problem on 2.6.19, 2.6.20 and 2.6.21.
http://lkml.org/lkml/2007/6/14/154
http://kerneltrap.org/mailarchive/linux-kernel/2007/6/14/103765
http://kerneltrap.org/node/16175
http://lkml.org/lkml/2007/6/14/154
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/217920
https://bugs.launchpad.net/ubuntu/+bug/164183
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/229747
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/159521
https://bugs.launchpad.net/ubuntu/+bug/164183
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/187146
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/221437
https://bugs.launchpad.net/ubuntu/+bug/226600

SUSE:

ata errors, system freeze
https://bugzilla.novell.com/show_bug.cgi?id=393675

System lockup with concurrent acces to SATA disks on Promise PDC20378
http://lists.opensuse.org/opensuse-bugs/2008-02/msg03458.html

Kernel panic / system hang / sata_promise
https://bugzilla.novell.com/show_bug.cgi?id=350907

DELL Poweredge 2970 hangs sometimes (ata1)
https://bugzilla.novell.com/show_bug.cgi?id=359333

Fedora:
ata device crashing system in Fedora 8
http://www.experts-exchange.com/OS/Linux/Distributions/Fedora/Q_23125450.html

problème de mise à jour
http://forums.fedora-fr.org/viewtopic.php?pid=253930

Kernel 2.6.24.x boot problem - Anyone , Any idea
http://fcp.surfsite.org/modules/newbb/viewtopic.php?viewmode=flat&order=ASC&topic_id=54760&forum=10

Thought though with the newest hard drive with support of NCQ such is not present, ... also same:

"With this kernel I’m getting frequent temporary freezes (system comes back responsive after a minute or so…)."
http://kerneltrap.org/mailarchive/linux-kernel/2008/1/8/546296

Also occurs with Hardy here, too. dmesg output:

[212076.589397] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x400000 action 0x2 frozen
[212076.589406] ata1.00: BMDMA2 stat 0x82c0009
[212076.589410] ata1: SError: { Handshk }
[212076.589418] ata1.00: cmd c8/00:10:de:6f:3f/00:00:00:00:00/e9 tag 0 dma 8192 in
[212076.589420] res 58/00:00:ed:6f:3f/00:00:00:00:00/e9 Emask 0x2 (HSM violation)
[212076.589424] ata1.00: status: { DRDY DRQ }
[212076.900733] ata1: soft resetting link
[212082.090468] ata1: port is slow to respond, please be patient (Status 0xd8)
[212086.900665] ata1: SRST failed (errno=-16)
[212086.900675] ata1: hard resetting link
[212087.376126] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[212087.761562] ata1.00: configured for UDMA/100
[212087.761585] ata1: EH complete
[212087.815544] sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB)
[212087.838172] sd 0:0:0:0: [sda] Write Protect is off
[212087.838180] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[212087.880892] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Changed in linux:
status: New → Confirmed

Well, i wanted to add it to the list through "also affected" but launchpad insist on telling me linux-source-2.6.24 is a binary package and assigns it to source package linux.

John Woods (bamboowarrior) wrote :

Has anyone solved this? I'm having the same problem. It started today when I rebooted, probably due to some software upgrade (not kernel--using earlier kernels does not help).

Peter Lewis (prlewis) wrote :

Hi John,

Are you using KDE 4?

This had pretty much gone away for me over the last couple of months, but I've been noticing it again since I've been using KDE 4.1 RC1.....

Just a thought.

kiev1 (sys-sys-admin) wrote :

a problem still is not decided

[212076.900733] ata1: soft resetting link
or
[212086.900675] ata1: hard resetting link

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

John Woods (bamboowarrior) wrote :

For me, problem was solved by removing the ATA hard disk, so I can no longer comment.

Peter Shaw (tr-spam) wrote :

I can't comment on this either. I build my own kernel and reactivated the old harddisc modules. Devices are mounted as hdX again for me, solved the problem.

Trax (forum-traxbyte) wrote :

I have a similar problem when using Hardy and the 2.6.24-19 kernel. With 2.6.27, my situation has not improved at all. The latest flood of error messages states

[ 1435.198438] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[ 1435.198486] ata3.00: BMDMA stat 0x26
[ 1435.198521] ata3.00: cmd 25/00:00:00:17:3c/00:04:00:00:00/e0 tag 0 dma 524288 in
[ 1435.198525] res 51/84:4f:b1:18:3c/84:02:00:00:00/e0 Emask 0x30 (host bus error)
[ 1435.198607] ata3.00: status: { DRDY ERR }
[ 1435.198632] ata3.00: error: { ICRC ABRT }
[ 1435.198670] ata3: soft resetting link

Adam (adam.russell) wrote :

I am unable to contribute any further to this bug, as I am no longer using the hardware in question. I will be unsubscribing.

RubbelDieCatc (rubbel) wrote :

Using 2.6.27-6-generic with a VT8237A SATA 2-Port Controller in Combination with JMicron 20360/20363 AHCI Controller

The problem is still there.
* SATA Drive SAMSUNG HD400LJ freezes after approx 30 Minutes

The error occurred the first time with Kernel 2.6.24.
In between one of the "noapic" kernel boot parameters worked. But one kernel version later the error returned.
Meanwhile I am really pissed.

Sergio Zanchetta (primes2h) wrote :

The 18 month support period for Feisty Fawn 7.04 has reached it's end of life. As a result, we are closing the linux-source-2.6.20 Feisty Fawn kernel task. However, please note that this report will remain open against the actively developed kernel. Thank you for your continued support and help as we debug this issue.

Changed in linux-source-2.6.20:
status: Confirmed → Invalid

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

Changed in linux-source-2.6.15 (Ubuntu):
status: Confirmed → Invalid
Changed in linux (Ubuntu):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.