ata exception and hang when booting, at T=n.81606 s

Bug #397096 reported by Tormod Volden on 2009-07-08
84
This bug affects 19 people
Affects Status Importance Assigned to Milestone
Linux
Expired
Medium
linux (Ubuntu)
Undecided
Unassigned

Bug Description

This seems similar to the issues in bug 352197, I just don't know which one is closest, or bug 270794, bug 286380, bug 103780.

Sometimes the machine continues after having a ~30 seconds break with the HD lamp constantly lit. Sometimes the lamp goes off again but the machine is still hung.

I have also tried blacklisting the tg3 and ipw2200 network drivers, since it often hangs just after loading or dealing with these. Which is also just at the time gdm is starting (some people reported the readahead stresses the disk).

The number of similar reports makes me optimistic it is not the hard drive dying, but I can not be sure I guess.

ProblemType: Bug
Architecture: i386
Date: Wed Jul 8 19:27:09 2009
DistroRelease: Ubuntu 9.10
HibernationDevice: RESUME=UUID=d6223bfb-d64e-4243-96d5-5d5a22c7d50f
Lsusb:
 Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: Acer, inc. TravelMate 8100
Package: linux-image-2.6.31-2-generic 2.6.31-2.16
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-2-generic root=UUID=59b5106f-825d-4931-92d0-b6cd1eee4f49 ro radeon.modeset=1
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-2.16-generic
RelatedPackageVersions: linux-backports-modules-2.6.31-2-generic N/A
SourcePackage: linux
Uname: Linux 2.6.31-2-generic i686
dmi.bios.date: 01/20/06
dmi.bios.vendor: Acer
dmi.bios.version: 3C25
dmi.board.name: Kingfisher
dmi.board.vendor: Acer, Inc.
dmi.board.version: Not Applicable
dmi.chassis.type: 1
dmi.chassis.vendor: , Inc.
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnAcer:bvr3C25:bd01/20/06:svnAcer,inc.:pnTravelMate8100:pvrNotApplicable:rvnAcer,Inc.:rnKingfisher:rvrNotApplicable:cvn,Inc.:ct1:cvrN/A:
dmi.product.name: TravelMate 8100
dmi.product.version: Not Applicable
dmi.sys.vendor: Acer, inc.

Tormod Volden (tormodvolden) wrote :
Tormod Volden (tormodvolden) wrote :

Forgot to say I have been seeing this for a while (several releases) but it is worse now. I don't know if it is because I now use ext4 or radeon KMS, or 2.6.31.

Tormod Volden (tormodvolden) wrote :

Funny how this exception happens at the exact same time (except the first one, maybe it did a fsck at that boot):

$ grep frozen syslog.hangs.txt
Jul 7 22:16:53 acer-tormod kernel: [ 117.816060] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jul 7 23:15:05 acer-tormod kernel: [ 47.816052] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jul 8 18:24:51 acer-tormod kernel: [ 47.816052] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jul 8 19:12:58 acer-tormod kernel: [ 46.816058] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jul 8 19:15:12 acer-tormod kernel: [ 46.816056] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
$ dmesg|grep frozen
[ 46.816050] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Tormod Volden (tormodvolden) wrote :

I also moved away /etc/readahead/desktop, but it does not help. Most often it hangs before X has started, I still see the _ cursor in the upper left corner. A few times X has started so I see the mouse pointer.

One thing that seems to help is to use ctrl-S and ctrl-Q during boot to slow down things. I let it sit for a while before I let gdm loose, and then it will go on without hanging.

Tormod Volden (tormodvolden) wrote :

I tried libata.force=udma/66 which did not help, but from the below I am not sure it was correctly set anyway:

[ 1.702986] ata1.00: ATA-8: SAMSUNG HM160HC, LQ100-10, max UDMA/100
[ 1.702990] ata1.00: 312581808 sectors, multi 16: LBA48
[ 1.703006] ata1.00: applying bridge limits
[ 1.703022] ata1.00: FORCE: xfer_mask set to udma/66
[ 1.716789] ata2.00: failed to set xfermode (err_mask=0x1)
[ 1.718828] ata1.00: configured for UDMA/66

Tormod Volden (tormodvolden) wrote :

Now I have a "sleep 7" script before and after /etc/rc2.d/gdm and that works.

When I look through the syslogs, the exception always happen between
[ 90.066878] tg3 0000:06:06.0: firmware: requesting tigon/tg3_tso5.bin
and
[ 128.062549] tg3 0000:06:06.0: wake-up capability disabled by ACPI
[ 128.062558] tg3 0000:06:06.0: PME# disabled

so I suspected it was something like in bug 331415. However, it also happens when I blacklist tg3.

Tormod Volden (tormodvolden) wrote :

I notice that both the tg3 and the radeon modules reports "PCI INT A -> GSI 16 (level, low) -> IRQ 16". Could it be an interrupt conflict? There is of course the case when I have blacklisted tg3, but other modules (?) also reports this, like "HDA Intel" and uhci_hcd.

Tormod Volden (tormodvolden) wrote :

It seems like "nolapic" makes it stable. I have removed the "sleep 7" stuff and reinstated /etc/readahead/desktop.

Tormod Volden (tormodvolden) wrote :

No, nolapic does not work everytime neither.

Tormod Volden (tormodvolden) wrote :

Neither does noapic, acpi=noirq, acpi_irq_nobalance, acpi_irq_balance.

Tormod Volden (tormodvolden) wrote :

Sometimes it continues booting after a 30 s pause but sometimes it just hangs and I have to use sysctrl-B. I have tried netconsole to see what goes on in this case. There are "ata1: SRST failed" errors, before I get "end_request: I/O error".

Tormod Volden (tormodvolden) wrote :
Tormod Volden (tormodvolden) wrote :

smartctl "short" and "long" tests show no errors.

qwerty (escalantea) wrote :

I had the same problem (SATA disk freezing), and I solved it by tunning the pdflush parameters ... https://bugs.launchpad.net/ubuntu/+bug/270794/comments/12

Tormod Volden (tormodvolden) wrote :

qwerty, thanks for the hint. However I have a PATA disk, and it gets stuck during boot which should not be write-heavy...

I have now inserted "sleep 7" before and after launching gdm, and it helps. If it gets stuck now, it is a bit earlier, during module loading, typically the last message I see is about synaptics touchpad. I think I never experienced this issue after booting has finished. With the boot sequence being reordered a bit in Karmic, this is gonna be exciting^Wnerve-wrecking.

qwerty (escalantea) wrote :

The original report didn't mention it, but my disk freezing also started during boot time, the disk got frozen one or more times (30 seconds each time).

I had this problem since Ubuntu 8.04 (previous releases were Ok).

Tormod Volden (tormodvolden) wrote :

I have tried the pdflush parameters but it does not help. I verified that the sysctl settings are applied before it hangs.

This drives me mad. Now I wonder why the sub-second part of the times is always the same. Are these exceptions probed for only every full second? The different full-seconds part is because I have tried blacklisting some modules and removed the "sleep 7" again.

Aug 21 23:35:34 acer-tormod kernel: [ 51.816054] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Aug 24 23:55:50 acer-tormod kernel: [ 53.816050] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Aug 25 19:14:45 acer-tormod kernel: [ 44.816064] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Sep 4 20:26:07 acer-tormod kernel: [ 46.816045] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Sep 5 11:42:30 acer-tormod kernel: [ 52.816068] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Sep 5 11:48:56 acer-tormod kernel: [ 52.816056] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Sep 5 12:10:56 acer-tormod kernel: [ 52.816056] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Sep 5 12:16:47 acer-tormod kernel: [ 52.816056] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Tormod Volden (tormodvolden) wrote :

Many reboots later and I am not wiser. There must be something polling every second (using the kernel timer as reference) which interferes and causes the hang. Attaching another dmesg log, with many modules blacklisted.

summary: - ata exception and hang when booting
+ ata exception and hang when booting, at T=n.81606 s
Changed in linux:
status: Unknown → Confirmed
Mikael Bergqvist (mikaelb) wrote :

I have recently, with Karmic Koala Beta, experienced strange behaviour with my acer travelmate 800, with a [0.986327] ata1.00: ATA-8: SAMSUNG HM160HC, LQ100-10, max UDMA/100

Error ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
shows up in the log.

After having worked for a while, errors shows up indicating disk problems, partitions are remounted ro. But after rebooting, thinng are good for a while.

Thought the disk was defect somehow, but after having read this, and bug 285892 I am not so sure anymore.

Tormod Volden (tormodvolden) wrote :

I have *way* less of my issues after cleaning up all the hdparm calls during boot. The mishmash of acpi-support, gnome-power-manager, devicekit, laptop-mode-tools, pm-utils etc bombards the hard drive with hdparm calls. I have filed some bug and patches for some of these packages (bug 437796, bug 438355). Especially consistent treatment of the "nohdparm" boot option would be nice (bug 443992).

Mikael, you can try:
 sudo mv /sbin/hdparm /sbin/hdparm.orig
 sudo cp /bin/true /sbin/hdparm
After testing this, you can revert it with:
 sudo mv /sbin/hdparm.orig /sbin/hdparm

Tormod Volden (tormodvolden) wrote :

Just attaching my hdparm wrapper script here for myself and others.

Mikael Bergqvist (mikaelb) wrote :

Tormod.
I'll try your tip and see what happens.
In the meantime I can add some observations: Alongside with the freeze entry in the log there are allways these entries for me:

[ 159.816518] ata1: drained 134 bytes to clear DRQ.
[ 159.816557] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 159.816586] ata1.00: cmd c8/00:30:67:5c:d4/00:00:00:00:00/e6 tag 0 dma 24576 in
[ 159.816591] res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 159.816601] ata1.00: status: { DRDY }
[ 159.872074] ata1: soft resetting link
[ 160.054175] ata1.00: configured for UDMA/100
[ 160.054189] ata1.00: device reported invalid CHS sector 0
[ 160.054210] ata1: EH complete

Theese entries are allways associated with a lit HD LED and the system is frozen for a while. Sometimes the file system gets remounted ro after having been corrupted.

Sometimes the system decides that the hard drive is only capable of UDMA/66 and changes it to that after the soft reset.

The line: "device reported invalid CHS sector 0", seems to be an "off by one error" reported in http://patchwork.kernel.org/patch/41773/ and fixed upstream, and not seen often "The bug isn't very visible because ata_tf_read_block() is used only when generating sense data for a failed RW command and CHS addressing isn't used too often these days.", Tejun Heo says.

But why does the hard drive need to be "soft reset"?

I never had theese problems with Jaunty, or earlier releases that I have upgraded from.

Mikael Bergqvist (mikaelb) wrote :

Tormod,
Running without hdparm. No freeze for 1 hour and 46 minutes. Will try a little loggin with your wrapper.

Tormod Volden (tormodvolden) wrote :

Mikael, hdparm is mainly run during booting, suspend/resume, (un)plugging of AC adapter. If you have crashes at other times, it is probably not caused by hdparm.

Mikael Bergqvist (mikaelb) wrote :

Tormod, In any case, without hdparm, no symptoms for over 18 hours. Left the laptop on overnight just to see. Normally the locking and freezing would have occured regularly for me, with a period of something like 10 minutes, maybe less.

Suspend/resume shouldn't have been involved, since I was working with the laptop (trying at least) all the time.
Unplugging of AC did not occur either.
Might it be that the system is restarting something else? that triggers the same behaviour?

I will reboot with the logging wrapper and see what happens. Will put the periodicity of freezes here if it is of any interest, and will be glad to run any other tests that might shed light on the phenomenon.

Mikael Bergqvist (mikaelb) wrote :

Running with wrapped hdparm alters the bahaviour slightly, no longer freezes allt the time.
Here is one log with three "frozen". Generated with dmesg | grep ata together with the accompanying hdparm.log

Mikael Bergqvist (mikaelb) wrote :
Tormod Volden (tormodvolden) wrote :

Mikael, what is happening there at 67s? Like 5 acpid power events. And 3 at 129s. Definitely something wrong. hdparm -B 254 is being set all the time, but never the -B 128 which would happen if AC was disconnected. Can you please file a new bug, since that is a separate issue? Attach your /var/log/daemon there, unless "ubuntu-bug acpid" does it for you.

The "frozen" messages typically arrive timestamped 30s after the freeze was triggered, so at 21s, 130s, 190s which always match a hdparm call.

Mikael Bergqvist (mikaelb) wrote :

Tormod, OK I will file a new bug. Thanks.

Marco :-) (marco-carolli) wrote :

This problem also affect me, it happened with Ubuntu 9.10 beta, RC, Final and even Xubuntu 9.10.
LiveCD works but load in 15 min and can complete installation, but on reboot only a blank screen and the hdd red light flashing.
When using recovery mode shows errors as in the image attached.
HD works fine and have no error.

Tormod Volden (tormodvolden) wrote :

Marco and Mikael, does the issue go away if you disable hdparm? Do you also have a radeon GPU?

Tormod Volden (tormodvolden) wrote :

I have seen this when resuming from hibernation as well. Unfortunately the timestamps in the hdparm.log are then difficult to compare to the timestamps in dmesg. I therefore added a sysrq to the wrapper, so you will find "SysRq : Show Regs" in dmesg when hpparm is run. So the coincidences should be easier to verify, using for instance: egrep 'Emask|SysRq' /var/log/syslog

Mikael Bergqvist (mikaelb) wrote :

Tormod, there was a change in behaviour to the better, recently, after either kernel or startscript updates, so I never filed a new bug. I thought all was well. Suddenly i got the freeze again the other day. Will check with and without hdparm again and with your wrapper.
I have a Mobility Radeon 9000 GPU.

I haven't a Radeon GPU for sure, it's a VIA all integrated KT400 (if I remember
correctly) with an Athlon XP Processor 2200+, 512 MB of ram, 20 GB of HD drive.

Ubuntu 9.04 worked till the previous 30 GB HD failed. As far as I can tell the
new one works fine.
I don't know how to disable hdparm, if you tell me (I'm a
newby) I'll try as soon as I can.
Thanks

>----Messaggio originale----
>Da:
<email address hidden>
>Data: 01/11/2009 12.48
>A: <email address hidden>

>Ogg: [Bug 397096] Re: ata exception and hang when booting, at T=n.81606 s
>

>Marco and Mikael, does the issue go away if you disable hdparm? Do you
>also
have a radeon GPU?
>
>--
>ata exception and hang when booting, at T=n.81606 s

>https://bugs.launchpad.net/bugs/397096
>You received this bug notification
because you are a direct subscriber
>of the bug.
>
>Status in The Linux Kernel:
Confirmed
>Status in “linux” package in Ubuntu: New
>
>Bug description:
>This
seems similar to the issues in bug 352197, I just don't know which one is
closest, or bug 270794, bug 286380, bug 103780.
>
>Sometimes the machine
continues after having a ~30 seconds break with the HD lamp constantly lit.
Sometimes the lamp goes off again but the machine is still hung.
>
>I have also
tried blacklisting the tg3 and ipw2200 network drivers, since it often hangs
just after loading or dealing with these. Which is also just at the time gdm is
starting (some people reported the readahead stresses the disk).
>
>The number
of similar reports makes me optimistic it is not the hard drive dying, but I
can not be sure I guess.
>
>ProblemType: Bug
>Architecture: i386
>Date: Wed
Jul 8 19:27:09 2009
>DistroRelease: Ubuntu 9.10
>HibernationDevice:
RESUME=UUID=d6223bfb-d64e-4243-96d5-5d5a22c7d50f
>Lsusb:
> Bus 005 Device 001:
ID 1d6b:0001 Linux Foundation 1.1 root hub
> Bus 001 Device 001: ID 1d6b:0002
Linux Foundation 2.0 root hub
> Bus 004 Device 001: ID 1d6b:0001 Linux
Foundation 1.1 root hub
> Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1
root hub
> Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub

>MachineType: Acer, inc. TravelMate 8100
>Package: linux-image-2.6.31-2-generic
2.6.31-2.16
>ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-2-generic
root=UUID=59b5106f-825d-4931-92d0-b6cd1eee4f49 ro radeon.modeset=1

>ProcEnviron:
> LANG=en_US.UTF-8
> SHELL=/bin/bash
>ProcVersionSignature:
Ubuntu 2.6.31-2.16-generic
>RelatedPackageVersions: linux-backports-modules-
2.6.31-2-generic N/A
>SourcePackage: linux
>Uname: Linux 2.6.31-2-generic i686

>dmi.bios.date: 01/20/06
>dmi.bios.vendor: Acer
>dmi.bios.version: 3C25
>dmi.
board.name: Kingfisher
>dmi.board.vendor: Acer, Inc.
>dmi.board.version: Not
Applicable
>dmi.chassis.type: 1
>dmi.chassis.vendor: , Inc.
>dmi.chassis.
version: N/A
>dmi.modalias: dmi:bvnAcer:bvr3C25:bd01/20/06:svnAcer,inc.:
pnTravelMate8100:pvrNotApplicable:rvnAcer,Inc.:rnKingfisher:rvrNotApplicable:
cvn,Inc.:ct1:cvrN/A:
>dmi.product.name: TravelMate 8100
>dmi.product.version:
Not Applicable
>dmi.sys.vendor: Acer, inc.
>

Tormod Volden (tormodvolden) wrote :

Marco, you can boot with the "nohdparm" boot option. And please use the web interface to post comments instead of mail.

xamul (luigi-zanderighi) wrote :

Hi, got te same problem setting 'nohdparam' doesn't bring any benefit.

GPU is Radeon, with fglrx hdd seams to freeze more often, without fglrx no cpu freq and fan is always on.

With the first installation of Ubuntu 9.04 IK didn't se the issue, so if it was present it hapened very rarely. It got worst after various updates and with Karmic system is almost unusable.

Tormod Volden (tormodvolden) wrote :

xamul, since you are spelling nohdparm wrong, your testing can not be conclusive. The best would be if you use the hdparm wrapper so we can see that the option takes effect.

xamul (luigi-zanderighi) wrote :

Tormod, sorry, you are right there is a spelling problem.
This is the grub (/boot/grub/grub.cfg) line:
linux /boot/vmlinuz-2.6.31-14-generic root=UUID=e00ced67-7065-4fbf-8ec5-5485650ca4a5 ro nohdparm

Is it the correct option to set? I'm trying hdparm wrapper and will soon post the log.
Next try i will nodma, (I really don't know what to do to make my system usable)

xamul (luigi-zanderighi) wrote :

Just rebooted with wrapper and got the same issue:
[ 141.804095] ata1.00: exception Emask 0x0 SAct 0x10 SErr 0x0 action 0x6 frozen
[ 141.804109] ata1.00: cmd 60/08:20:60:8d:22/00:00:28:00:00/40 tag 4 ncq 4096 in
[ 141.804111] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 141.804116] ata1.00: status: { DRDY }
[ 141.804123] ata1: hard resetting link
[ 147.164219] ata1: link is slow to respond, please be patient (ready=0)
[ 151.812225] ata1: COMRESET failed (errno=-16)
[ 151.812233] ata1: hard resetting link
[ 152.132087] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 152.139906] ata1.00: _GTF unexpected object type 0x1
[ 152.200999] ata1.00: _GTF unexpected object type 0x1
[ 152.201389] ata1.00: configured for UDMA/133
[ 152.201394] ata1.00: device reported invalid CHS sector 0
[ 152.201403] ata1: EH complete

This is the wrapper log:
luigi@acer:~$ sudo hdparm -B /dev/sda

/dev/sda:
 APM_level = 128
luigi@acer:~$ cat /dev/.initramfs/hdparm.log

285.02 378.01 15:33:52.431771631
konsole,2291
  |-bash,2306
  | `-hdparm,2593 /sbin/hdparm -B /dev/sda
  | `-pstree,2599 -a -p 2291
  `-{konsole},2305

The only entry is mine
No changes in behavier

Marco :-) (marco-carolli) wrote :

I tried nohdparm and I got the same issue too.

Any idea?

Tormod Volden (tormodvolden) wrote :

xamul, it is interesting that you see this without hdparm and without the radeon.ko module. I think you should file a separate bug report, because I am not sure it is the exact same issue. You have a SATA disk and some additional error messages. Please just mention your bug number (like "bug 397096") here so we get the link.

xamul (luigi-zanderighi) wrote :

From my laptop BIOS I can set only few parameter one of this is the SATA mode between 'ahci' and 'sata'.

In both cases I get the error but dmsg log contains more with 'ahci' I get the log you see above, with sata mode I get less few messeges, just the frozen message and the reconfiguration message.

The last thing I have done yesterday is an updgade to the latest kernel 2.6.31-15-generic, it's about 45 minutes that I don't get a freeze. Never happened before since the last 2 weeks.

Next coffee break I will launch some compile scripts which I'm sure should freeze my hdd, I'll let you know and in the case I'll file a new bug and the id.

xamul (luigi-zanderighi) wrote :

Upgrade didn't solve the issue. Before opening a bug I'm seaching for something related to my laptop ACER, why not a HW but? (Not a HDD issue because I already changed HDD)

I tried with nohdparm: is catastrofic as without,
no changes, the bug is always there.
see also https://bugs.launchpad.net/ubuntu/+source/linux/+bug/279693

with today's last update the laptop is not more experiencing HD freezes,
maybe a temporary good combination between kernel and other modules.
Anyway NO kernel update as occoured since last bugs, only other modules updates.
hope still stable...
linux 2.6.31-9-rt

freezes continues....
a very stupid possible solution is to leave a CD in the CD reader: seems to works, no freezes occurs after.
I saw this idea on another post on the same issue.
but why? does this information doesn't give any clues to developers?

upgraded to Linux version 2.6.31-20-generic.
Freezes ATA continues.
Workaround with "inserted CD" continues.

Jeremy Foshee (jeremyfoshee) wrote :

Hi Tormod,

If you could also please test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Confirmed
Danny (danny1982) wrote :

Hi all,
this my first time reporting bugs, so sorry if i do stomething wrong

it seems that i am affected by this bug:

Ubuntu 9.10, ext4

Mar 18 01:15:46 htpc kernel: [ 6796.065007] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Mar 18 01:15:46 htpc kernel: [ 6796.065019] ata4.00: cmd c8/00:80:bf:26:3c/00:00:00:00:00/e6 tag 0 dma 65536 in
Mar 18 01:15:46 htpc kernel: [ 6796.065020] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Mar 18 01:15:46 htpc kernel: [ 6796.065024] ata4.00: status: { DRDY }
Mar 18 01:15:51 htpc kernel: [ 6801.112985] ata4: link is slow to respond, please be patient (ready=0)
Mar 18 01:15:56 htpc kernel: [ 6806.100984] ata4: device not ready (errno=-16), forcing hardreset
Mar 18 01:15:56 htpc kernel: [ 6806.100994] ata4: soft resetting link
Mar 18 01:15:57 htpc kernel: [ 6807.537346] ata4.00: configured for UDMA/133
Mar 18 01:15:57 htpc kernel: [ 6807.537346] ata4.00: device reported invalid CHS sector 0
Mar 18 01:15:57 htpc kernel: [ 6807.537346] ata4: EH complete

i tried the following workarounds:

tuning with pdflush
booting with acpi=off pci=nomsi
"inserted cd" workaround <- what kind of CD-ROM do you have? ide or sata?

none worked

the system boots without problems (sometimes i have to fsck because of the freeze)
after a while it freezes showing the errors above

i still need to try the nohdparm option and the hdparm-wrapper-script

any other ideas?

DjznBR (djzn-br) wrote :

I want to join the team and say that this bug is ANNOYING the hell out of me.

It's like the original poster said: The computer halts for 30 seconds and the HD led goes lit, after that it resumes normal operation. I tried the CD-ROM workaround but it doesn't work. Right after when I issue dmesg, similiar SATA errors come up in the screen.

I have a SAMSUNG 160GB HD161HJ, in Normal IDE Mode in BIOS.
Ubuntu Karmic 9.10
Linux orion 2.6.31-20-generic #58-Ubuntu SMP Fri Mar 12 05:23:09 UTC 2010 i686 GNU/Linux
My filesystem are all ext3. So I don't think this is ext4 related at all.

Please fix this!

Tormod Volden (tormodvolden) wrote :

Jeremy, I haven't seen this hang for quite a while, but I think this is due to the reordering of startup scripts, and not because the real problem has been solved. So it will be difficult for me to verify if another kernel is better.

Danny (danny1982) wrote :

update:

i've made a Bios-Update, changed the Harddrive, and set up a new ubuntu 9.10 with ext3. Haven't seen the problem since then...
i will post again, if the freezes occur again...

DjznBR (djzn-br) wrote :

Just so you know, optical media in my drive DOES NOT stop this bug. Occasional freezes occurs... same 30 second hd-led lit... I am going to try replacing hdparm for the true wrapper. Hope it works. Can't wait for Lucid Lynx now.,,

DjznBR (djzn-br) wrote :

Freezes still continue... occasionally... every 4 hours...

freeze is not related to ext3 or ext4 filesystem,
the filesystem of my laptop is an ext3, and still freeze,
hopefully with a cd in the cdreaderh freezes stops. but anyone know why.......

DjznBR (djzn-br) wrote :

Problem persists in Lucid. Random 30 second lock ups and resuming...

DjznBR (djzn-br) wrote :

Ok, I am doing an experiment. So far it worked... I went to BIOS and changed from SATA mode, to AHCI mode. My BIOS is an ASUS Bios, M3A78-EM. There is SATA Configuration entry right in the first BIOS screen. Then it gives you three modes:

SATA
RAID
AHCI

It was on SATA all this time, and I have changed to AHCI.
So far problem is gone.

People having this problem, please do the same to see if that narrows down the cause of this issue. (Particurlarly caused by kernel ata drivers).

DjznBR (djzn-br) wrote :

Problem vanished for a day, but back again.
Strangely, when in SATA mode, it gives errors reported in this post.
When in AHCI mode, it gives errors reported in this bug:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285892

[ 1550.000084] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[ 1550.000097] ata3.00: failed command: WRITE FPDMA QUEUED
[ 1550.000114] ata3.00: cmd 61/10:00:23:82:2f/00:00:08:00:00/40 tag 0 ncq 8192 out
[ 1550.000117] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1550.000125] ata3.00: status: { DRDY }
[ 1550.000135] ata3: hard resetting link
[ 1550.484467] ata3: softreset failed (device not ready)
[ 1550.484478] ata3: applying SB600 PMP SRST workaround and retrying
[ 1550.652070] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1550.663374] ata3.00: configured for UDMA/133
[ 1550.663388] ata3.00: device reported invalid CHS sector 0
[ 1550.663410] ata3: EH complete

Zrin Ziborski (zrin+launchpad) wrote :

I also had the "ata ... frozen" symptom on system with AMD SB700/SB800 controler
with 2.6.26, 2.6.30 and 2.6.32. Comparing on similar systems, it seems (could be)
that it occurs more often with distinct hard drives or under distinct circumstances (?).

You may want to try some of following (step by step) to narrow down possible causes:

- disable SATA link speed to 1.5 Gbps
- (some of these) kernel command line parameters: hpet=disable highres=off nohz=off nohdparm
- disable services accessing drives: smartd, hddtemp, ...?
- change scheduler to deadline (echo 'deadline' >/sys/Block/sdx/queue/scheduler)

then test:

- test with RAID0 array over the drives
- test with RAID1 array over the drives
- test parallel access to single drives
- test if vibrations have an influence (?)

and of course

- test the hardware extensively (memtest86+ version 4.x, winxp with prime95 and bonnie++ in parallel (http://www.ziborski.net/wartung/bonnie++_win32.zip) (give lower priority to prime95), etc.)

then

- try new kernel
- read libata FAQ for further ideas

good luck ;)

Vikram Ambrose (noel-ambrose) wrote :

I have the same problem on 10.04. Happens on two drives and only happens frequently when syncing a RAID array. It can happen almost once a minute. Otherwise during normal operation it only happens a couple of times a day.

I'm going to replace both drives tomorrow.

DjznBR (djzn-br) wrote :

I really don't think this is hardware related.
I just ran ESTOOL utility on my drive and did extensive tests. The drive is pristine.
I am sure some die hard bug in the kernel is causing this.
This whole thing started 3 months after Karmic Koala release...
I may try a stock kernel since we may be suffering some kind of ubuntu "clever" kernel tweaks...

DjznBR (djzn-br) wrote :

One more thing, is there a way to disable NCQ on the system?

Zrin Ziborski (zrin+launchpad) wrote :

It is for sure some sort of kernel bug and/or deficiency, because

- a process should not get blocked for several minutes or more because of an SCSI/ATA controller or HDD I/O error, no matter what the controller or HDD does (or doesn't do), it should get I/O error / timeout

- md RAID other than level 0 should not hang for several minutes if one component hangs - it should read from the other components and try to reset the one with the problem

I've also experienced frozen tasks in D+ state from several seconds up to 1-2 minutes without any messages
logged - it might be well related to the same problem.

Fortunato Ventre (voria) wrote :

If this can be of any help, many users have reported the problem is fixed by applying this patch to the kernel:

http://marc.info/?l=linux-ide&m=122724081603679&w=2

If anyone wants to try it, a kernel is available on my PPA:

https://launchpad.net/~voria/+archive/ppa

Tormod Volden (tormodvolden) wrote :

Fortunato, interesting. Do you have any references to these user reports?

I don't think I saw so much of this issue with the 2.6.32 kernels, so testing your 2.6.32 kernel might not be so conclusive. Now with the 2.6.34 and especially 2.6.35-rc (drm-next) I see it a lot. So it would be great if you could patch a recent kernel. (I have to use newest kernels to have working suspend and to not overheat my GPU.)

DjznBR (djzn-br) wrote :

Installed the last kernel via update, and problem seems to be gone... didn't show up for 4 days now...
Used to show up at least once every 2 days.

Linux my-desktop 2.6.32-22-generic #36-Ubuntu SMP Thu Jun 3 22:02:19 UTC 2010 i686 GNU/Linux

I'll keep testing this...

DjznBR (djzn-br) wrote :

I have an ASUS motherboard and there is the SATA configuration in the BIOS. I have changed the SATA controller mode to RAID mode (there is IDE and SATA mode). I have done this, and it worked, even with one drive. The BIOS does not give many details about this...

[ 0.942521] ahci 0000:00:11.0: AHCI 0001.0100 32 slots 6 ports 3 Gbps 0x3f impl RAID mode

I get a few lines in dmesg, seems that the AHCI controller is now in action.

I did this because I was fed up last night, the computer started to halt every 2 minutes, literally.
After I set this to RAID, it stopped. Let's see if that works.
I am confused now, isn't RAID supposed to work with only 2 drives?

Changed in linux:
importance: Unknown → Medium

please read my comment here, maybe this is related?
https://bugs.launchpad.net/ubuntu/+bug/550559/comments/41

Alex Eftimie (alexeftimie) wrote :

Hi, I can confirm this bug running 2.6.38-10 from natty proposed, but with 2.6.38-9 it does not.

This bug was filed against a series that is no longer supported and so is being marked as Won't Fix. If this issue still exists in a supported series, please file a new bug.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: Confirmed → Won't Fix
Changed in linux:
status: Confirmed → Expired
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.