Ubuntu

ata1.00: exception Emask 0x0 SAct 0x807f SErr 0x0 action 0x6 frozen

Reported by Marcel on 2008-10-19
166
This bug affects 31 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Unknown
linux (Ubuntu)
Medium
Unassigned
udev (Debian)
Confirmed
Unknown
udev (Fedora)
Unknown
Unknown

Bug Description

Binary package hint: linux-headers-2.6.27-7-generic

Since I'm running 8.10 alpha6 64-bit, I'm having now and then a frozen machine for 1 or more minutes.
Although I can not pinpoint the reason, it seems to happen soon after booting and round the whole hour.
I guess there is some process in the background responsible. So I include the entries of the system log from the freezes at 12:00 and 14:00.

I'm not sure if I file this issue under the right package. Sorry for that.

Marcel (marcel-vd-berg) wrote :
Marcos (deflagmator) wrote :

This seems to be similar to my problem.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/279693

kernel-janitor (kernel-janitor) wrote :

Hi Marcel,

This bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? Can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/karmic .

If it remains an issue, could you run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 285892

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Mikael Bergqvist (mikaelb) wrote :

I just experienced this after an upgrade from Jaunty to Karmic Beta with kernel: 2.6.31-11-generic #38-Ubuntu SMP Fri Oct 2 11:55:55 UTC 2009 i686 GNU/Linux

[ 221.816249] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 221.816279] ata1.00: cmd c8/00:08:87:95:81/00:00:00:00:00/e4 tag 0 dma 4096 in
[ 221.816285] res 40/00:fe:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 221.816296] ata1.00: status: { DRDY }
[ 226.856074] ata1: link is slow to respond, please be patient (ready=0)
[ 231.840063] ata1: device not ready (errno=-16), forcing hardreset
[ 231.840080] ata1: soft resetting link
[ 232.022185] ata1.00: configured for UDMA/100
[ 232.022199] ata1.00: device reported invalid CHS sector 0
[ 232.022218] ata1: EH complete

nahtgesicht (nahtgesicht) wrote :

I also have this issue with Jaunty (2.6.28-15-generic #52-Ubuntu SMP Wed Sep 9 10:49:34 UTC 2009 i686 GNU/Linux) on an IBM Thinkpad X31 with an 160GB Samsung disk:

from dmesg startup:

[ 4.087964] ata_piix 0000:00:1f.1: PCI INT A -> Link[LNKC] -> GSI 11 (level, low) -> IRQ 11
[ 4.088073] ata_piix 0000:00:1f.1: setting latency timer to 64
[ 4.088255] scsi0 : ata_piix
[ 4.088697] scsi1 : ata_piix
[ 4.091191] ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0x1860 irq 14
[ 4.091200] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0x1868 irq 15
[ 4.254198] ata1.00: ATA-8: SAMSUNG HM160HC, LQ100-10, max UDMA/100
[ 4.254207] ata1.00: 312581808 sectors, multi 16: LBA48
[ 4.270188] ata1.00: configured for UDMA/100
[ 4.424334] scsi 0:0:0:0: Direct-Access ATA SAMSUNG HM160HC LQ10 PQ: 0 ANSI: 5
[ 4.424600] sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors: (160 GB/149 GiB)
[ 4.424647] sd 0:0:0:0: [sda] Write Protect is off
[ 4.424655] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 4.424727] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 4.424887] sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors: (160 GB/149 GiB)
[ 4.424929] sd 0:0:0:0: [sda] Write Protect is off
[ 4.424936] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 4.425006] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 4.425017] sda: sda1 sda2 sda3 < sda5 sda6 >
[ 4.502612] sd 0:0:0:0: [sda] Attached SCSI disk

and now the problem:

[ 54.816078] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 54.816090] ata1.00: cmd c8/00:38:cf:88:16/00:00:00:00:00/e1 tag 0 dma 28672 in
[ 54.816092] res 40/00:80:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 54.816096] ata1.00: status: { DRDY }
[ 59.856041] ata1: link is slow to respond, please be patient (ready=0)
[ 64.840127] ata1: device not ready (errno=-16), forcing hardreset
[ 64.840137] ata1: soft resetting link
[ 65.022284] ata1.00: configured for UDMA/100
[ 65.022301] ata1: EH complete
[ 65.031178] sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors: (160 GB/149 GiB)

gab0r (gab0r) wrote :

I have the same problem. I've just upgraded to Karmic (Linux asus-lapi 2.6.31-14-generic #48-Ubuntu SMP Fri Oct 16 14:04:26 UTC 2009 i686 GNU/Linux) on my Asus M6VA laptop, also with a 160GB Samsung HDD. I have this problem only when I put the laptop in standby, or resume from it, but not all times.
---
[48113.000528] ata1: drained 151 bytes to clear DRQ.
[48113.000546] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[48113.000570] ata1.00: cmd c8/00:20:a0:a9:33/00:00:00:00:00/e3 tag 0 dma 16384 in
[48113.000574] res 40/00:fe:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[48113.000582] ata1.00: status: { DRDY }
[48113.056080] ata1: soft resetting link
[48113.246158] ata1.00: configured for UDMA/100
[48113.260560] ata1.01: configured for UDMA/33
[48113.260903] ata1.00: device reported invalid CHS sector 0
[48113.260919] ata1: EH complete

Henning Mersch (ubuntu-hmersch) wrote :

Same here on a Samsung N140, running latest Karmic kernel 2.6.31-14-generic

Anton (avelo) wrote :

Same in a Macbook2,1 using a recently installed karmic x86_64 on ext4. Kernel 2.6.31-15-generic

xamul (luigi-zanderighi) wrote :

Thanks gab0r,
I didn' notice the issue happens after standby, I use it very often and didn't relate the issue to the standby.
Now I always shutdown and the freeze don't happen any more. System is now usable, but without standby :(((((

Graham (graham-g-lambert) wrote :

I have had the same problem on Ubuntu karmic and Suse 11.1. Both installations went well (apart for the fact the GRUB overwrites the disk area used by the RAID on my system in both cases - this is solved by removing stage1_5 from the GRUB installation directory - rename/move or delete the file).

After a successful install both systems started without error and fairly fast. However, after downloading the 'recent updates' (could be irrelevant - see later) that are applied after installation, the system(s) started with the above error "device reported invalid CHS sector 0". Initially this is attempted at UDMA/100 and then the bus is gradually degraded through UDMA/66 and UDMA/33, until finally the disk connection is run at the slowest speed. This takes just under 10 minutes to complete on my system, and I guess would explain the slow startup behaviour experienced by users of other systems as described above. After this the system runs very raggedly - not as smooth as I am used to with various Linux installations. I assume that the bus connection is kept at the lowest speed and the swap partition does not allow fast paging.

One might think that this is a hardware fault but so many people reporting the same error, here and on other forums, that something tells me this is a software fault... and as it happens on more than one Linux release, it is not system specific, but likely to be linked to the GRUB bootloader itself.

The 'standby' issue raised above and not the updates might give a clue. From what I can perceive, GRUB attempts to 'resume' the system from the data stored on the swap partition when the system shuts down. If the swap partition cannot be read as expected during startup then I expect that we would see an error. Removing or commenting out the option 'resume=/dev/swap' from the grub installation file in /boot/grub/menu.lst should solve this.

I am not in a position to try this immediately but would be interested in any comments. I intend to check this myself in a couple of days.

Architecture: i386
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: gio 1944 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xfebfc000 irq 30'
   Mixer name : 'Analog Devices AD1986A'
   Components : 'HDA:11d41986,10431153,00100500 HDA:10573055,104310c6,00100700'
   Controls : 22
   Simple ctrls : 13
DistroRelease: Ubuntu 9.10
HibernationDevice: RESUME=UUID=abf538c5-4738-48db-a72f-11d6e25cbe88
MachineType: ASUSTeK Computer Inc. A8J
NonfreeKernelModules: nvidia
Package: linux (not installed)
ProcCmdLine: root=UUID=80ae286d-b392-4ba0-b57b-6ded4fd0d5e8 ro quiet splash irqpoll
ProcEnviron:
 SHELL=/bin/bash
 PATH=(custom, no user)
 LANG=en_US.UTF-8
ProcVersionSignature: Ubuntu 2.6.31-17.54-generic
RelatedPackageVersions: linux-firmware 1.26
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
Uname: Linux 2.6.31-17-generic i686
UserGroups: adm admin cdrom dialout lpadmin plugdev sambashare
WpaSupplicantLog:

dmi.bios.date: 03/27/2006
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: A8JAS.207
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: A8J
dmi.board.vendor: ASUSTeK Computer Inc.
dmi.board.version: 1.0
dmi.chassis.type: 10
dmi.chassis.vendor: ASUSTeK Computer Inc.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrA8JAS.207:bd03/27/2006:svnASUSTeKComputerInc.:pnA8J:pvr1.0:rvnASUSTeKComputerInc.:rnA8J:rvr1.0:cvnASUSTeKComputerInc.:ct10:cvr:
dmi.product.name: A8J
dmi.product.version: 1.0
dmi.sys.vendor: ASUSTeK Computer Inc.

Changed in linux (Ubuntu):
status: Incomplete → New
tags: added: apport-collected

I don't have this option
...
Removing or commenting out the option 'resume=/dev/swap' from the grub installation file in /boot/grub/menu.lst should solve this.
...

in my GRUB, anyway the bug still there, see reports abowe.
and below syslog

Dec 21 19:48:57 my2912071352 kernel: [ 3102.000243] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 21 19:48:57 my2912071352 kernel: [ 3102.000266] ata1.01: cmd a0/00:00:00:00:00/00:00:00:00:00/b0 tag 0
Dec 21 19:48:57 my2912071352 kernel: [ 3102.000268] cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Dec 21 19:48:57 my2912071352 kernel: [ 3102.000271] res 40/00:03:00:00:00/00:00:00:00:00/b0 Emask 0x4 (timeout)
Dec 21 19:48:57 my2912071352 kernel: [ 3102.000279] ata1.01: status: { DRDY }
Dec 21 19:49:02 my2912071352 kernel: [ 3107.040123] ata1: link is slow to respond, please be patient (ready=0)
Dec 21 19:49:07 my2912071352 kernel: [ 3112.024124] ata1: device not ready (errno=-16), forcing hardreset
Dec 21 19:49:07 my2912071352 kernel: [ 3112.024139] ata1: soft resetting link
Dec 21 19:49:08 my2912071352 kernel: [ 3112.228653] ata1.00: configured for UDMA/100
Dec 21 19:49:08 my2912071352 kernel: [ 3112.268527] ata1.01: configured for PIO0
Dec 21 19:49:08 my2912071352 kernel: [ 3112.276533] ata1: EH complete

Marcel (marcel-vd-berg) wrote :

In reply to kernel-janitor on 2009-08-25

I did not notice your request before, but this bug is not an issue for me anymore.
On 2008-11-03, I reinstalled 8.10 without AHCI enabled in the bios and I did not encounter any freezes.

At this moment I'm running Karmic with 2.6.31-16-generic, still with AHCI disabled in the bios.
(The only, possible related, problem since Karmic is the inability to auto-mount all partitions after a hard reset.)

As a supplementary infoo the asus A8J bios don't have AHCI option.
This could be a solution for someone... not yet a solution of the bug.
Any one found different solutions?

javi (javuchi) wrote :

Same problem on a Samsung N140.
Windows and other Linux do not suffer this issue.

javi (javuchi) wrote :

# tail /var/log/kern.log
Dec 25 16:55:18 baddha-laptop kernel: [ 250.816351] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 25 16:55:18 baddha-laptop kernel: [ 250.816395] ata1.00: cmd ca/00:08:da:4c:e6/00:00:00:00:00/ea tag 0 dma 4096 out
Dec 25 16:55:18 baddha-laptop kernel: [ 250.816402] res 40/00:0c:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Dec 25 16:55:18 baddha-laptop kernel: [ 250.816417] ata1.00: status: { DRDY }
Dec 25 16:55:23 baddha-laptop kernel: [ 255.856306] ata1: link is slow to respond, please be patient (ready=0)
Dec 25 16:55:28 baddha-laptop kernel: [ 260.840305] ata1: device not ready (errno=-16), forcing hardreset
Dec 25 16:55:28 baddha-laptop kernel: [ 260.840330] ata1: soft resetting link
Dec 25 16:55:28 baddha-laptop kernel: [ 261.020666] ata1.00: configured for UDMA/100
Dec 25 16:55:28 baddha-laptop kernel: [ 261.020687] ata1.00: device reported invalid CHS sector 0
Dec 25 16:55:28 baddha-laptop kernel: [ 261.020720] ata1: EH complete

please have a look to bugs: # 297058, # 397096, # 279693
linking together different hypothesis on hdparm, linut-rt, AHCI, HW failures,and so on... without a real way out of the bug.
Actually running karmic -rt on ausu A8J and the same issue still remains.
....seriously after 2 years of ubuntu and this issue persecuting this laptop in different ways since the beginning, I'm thinking to come back to windows...
Like many users I use this pc for many different home & personal works and bugs like this are unacceptable, since the very long time and releases passed by, without never really coming out of this issue definitively!
PS: I'm sure the HW of this machine is in very good state.

javi (javuchi) wrote :

Attention: a not simple workaround has been found for some of us (specially Samsung hard disk/bios users), look at this link:

http://wiki.archlinux.org/index.php/Samsung_N140

Please, kernel developers of Ubuntu, insert these workarounds in the next kernel version:

Here I copy the relevant parts:

---
Possible BIOS problem causes a SATA hardreset shortly after boot. This is unresolved up to Samsung N140 BIOS 04CU, and Samsung N130 BIOS 05CM, although a kernel patch is being investigated. See http://bugzilla.kernel.org/show_bug.cgi?id=14314, http://bugzilla.kernel.org/show_bug.cgi?id=13416, http://lkml.indiana.edu/hypermail/linux/kernel/0908.2/02809.html and http://lkml.indiana.edu/hypermail/linux/kernel/0911.3/01604.html .
A summary of the status as currently understood:
About 5 minutes after boot or resume, the BIOS switches on some power saving features which were not enabled at boot. It enables additional (sleepier) processor C-states, and sends power management instructions to the HDD. It does these behind the operating system's back -- not using ACPI, which would be handled correctly by Linux. Instead the sudden change results in a SATA exception at the first disk access following the switch. At that point the SATA driver resets the disk to resolve the problem. The result: the user sees a complete system freeze for about 30 seconds, after which operation of the machine continues normally. This can occur during the periodic fsck at boot if it is running at switch time. Either Samsung needs to be convinced to fix the BIOS, or the Linux kernel needs to be modified to behave more gracefully (Windows doesn't freeze noticeably if at all).
It has been reported that some OpenSUSE kernels [1] do not freeze and testing is progress in the Arch Forums. The patch libata-ata_piix-clear-spurious-IRQ has been reported to resolve the freezing problem. (Hint: to look at the rpm use rpmextract, and then untar config.tar.bz2 and patches.*.tar.bz2).
There is a kernel patch available which changes the backlight brightness using SMI instead of poking PCI config space. It provides a kernel module called "samsung-laptop". Interestingly a special (as yet unreleased?) BIOS for the N130 can be informed that the OS is Linux by a version of this patch which is included in OpenSUSE 11.1. The effect of this hasn't been published.
The N140 and N130 BIOSes have Phoenix FailSafe (you have been warned). It's not clear if the SATA problem has any relation to this.
Version 01CM of the N130 BIOS has been reported to not cause freezes, unlike all later ones which do.
This problem is hazardous for your filesystem so take precautions. For example use ext3 (not ext4) with option data=journal and install backup software.

the HD for A8J is an ST9100824A Ultra ATA/100 100GB (seagate)

Sandeep Wadhwa (wadhwa100) wrote :

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen.

I am facing the same problem on the N-128 Samsung Netbook with Ubuntu 9.10 UNR. The freeze happens for about 20 seconds after about 5 minutes of switching on. After that it dosen't repeat itself as long as the Netbook is ON. Next start again the same problem. Output from my dmesg:-

                        Monitor-Mwait will be used to enter C-2 state
[ 245.665234] Marking TSC unstable due to TSC halts in idle
[ 284.816184] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 284.816225] ata1.00: cmd ca/00:08:f6:1e:85/00:00:00:00:00/ed tag 0 dma 4096 out
[ 284.816232] res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 284.816246] ata1.00: status: { DRDY }
[ 289.856117] ata1: link is slow to respond, please be patient (ready=0)
[ 294.840109] ata1: device not ready (errno=-16), forcing hardreset
[ 294.840133] ata1: soft resetting link
[ 295.022466] ata1.00: configured for UDMA/133
[ 295.022487] ata1.00: device reported invalid CHS sector 0
[ 295.022516] ata1: EH complete

with today's last update the laptop is not more experiencing HD freezes,
maybe a temporary good combination between kernel and other modules.
Anyway NO kernel update as occoured since last bugs, only other modules updates.
hope still stable...
linux 2.6.31-9-rt
using ext3 filesystem

tags: added: kernel-core kernel-needs-review
removed: needs-upstream-testing
Changed in linux (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
tags: added: kernel-candidate kernel-reviewed
removed: kernel-needs-review
tags: removed: kernel-candidate
32 comments hidden view all 112 comments
tulskiy (tulskiy) wrote :

Is it 64-bit related? Will it stop if I install 32-bit system?

I experienced this on machine with 32 bit CPU so it's not 64 related. What's more important, as far as I understand this error message has more than one source and is related to several kernel vs hardware issue. As for me, I have it fixed. But other people with other HW are still suffering.

Thu, 22 Jul 2010 12:09:15 -0000 письмо от tulskiy <email address hidden>:

>Is it 64-bit related? Will it stop if I install 32-bit system?
>
>--
>ata1.00: exception Emask 0x0 SAct 0x807f SErr 0x0 action 0x6 frozen
>https://bugs.launchpad.net/bugs/285892
>You received this bug notification because you are a direct subscriber
>of the bug.
>
>Status in The Linux Kernel: Unknown
>Status in "linux" package in Ubuntu: Triaged
>
>Bug description:
>Binary package hint: linux-headers-2.6.27-7-generic
>
>Since I'm running 8.10 alpha6 64-bit, I'm having now and then a frozen machine for 1 or more minutes.
>Although I can not pinpoint the reason, it seems to happen soon after booting and round the whole hour.
>I guess there is some process in the background responsible. So I include the entries of the system log from the freezes at 12:00 and 14:00.
>
>I'm not sure if I file this issue under the right package. Sorry for that.
>
>To unsubscribe from this bug, go to:
>https://bugs.launchpad.net/linux/+bug/285892/+subscribe

DjznBR (djzn-br) wrote :

Ok, after biting the bullet, I called it a day again... everything I wrote before about fglrx, google-chrome, flash player is crap... meaning that it was a huge coincidence... Forget also BIOS settings, RAID, AHCI, and stuff like that, coz this is only in the kernel...

I have dumped Ubuntu 10.04 and moved on to Archlinux. You know, with this one, you need to dig things deep... very deep... I'm not saying that Ubuntu is not good... no, in fact, is the best distro around... but I was kinda missing the do-yourself approach for a long time now.... but I still have a Ubuntu live pendrive for other tasks.

Turns out that this bug is also present in Archlinux current kernel. And it manifests this way:

ata3: softreset failed (device not ready)
ata3: applying SB600 PMP SRST workaround and retrying
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

This bug is surely related to the AMD SB600/700 controller mixed with certain hard disks (Samsung, mine).

Like I said some previous posts... something to do with the kernel configuration CONFIG_SATA_PMP, some say that if you turn it off, this bug stops. I have yet to test this in Arch. You know, it's hard to compile a kernel in Ubuntu, following the traditional way and at the same time keeping all things together without breaking the package manager or something. In a similar fashion you need to be careful in Arch too, even though you can easily compile the traditional way, you need to properly create kernel headers package, if you, like me, use the fglrx beast, because it wants to compile a new module every new installed kernel.

But if there is a way to avoid all this kernel mambo-jambo, I am gonna try it first:
I have turned off ncq in my system by adding this line to GRUB kernel paramenters:

libata.force=noncq

I am currently testing this with no errors so far, next time I come here, it will be with another parameter.
In the meanwhile, CONFIG_SATA_PMP may be your next adventure...

DjznBR (djzn-br) wrote :

libata.force=noncq <--- no go. Problem persists even with ncq being turned off.
Next step now is to configure a new kernel through the ABS method, and without PMP option enabled.
I'll let you guys know.

Brian Neu (brianwneu) wrote :

Hey I wanted to mention that I just got this on a Fedora box (2.6.30.10-105.2.23.fc11.i586) today. Yesterday I had to swap out the motherboard, AND I HAD TO CHANGE AN IDE CABLE. This never happened with the old motherboard and cable.

I will try to put in a new cable next week an will report back. I don't know what motherboard settings would be appropriate to change.

DjznBR, try to put in a new cable and re-route it away from where-ever it's currently routed. I think you have a SATA cable, but the concept is the same.

Jul 23 09:44:04 cl1 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jul 23 09:44:04 cl1 kernel: ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Jul 23 09:44:04 cl1 kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 23 09:44:04 cl1 kernel: ata1.00: status: { DRDY }
Jul 23 09:44:09 cl1 kernel: ata1: link is slow to respond, please be patient (ready=0)
Jul 23 09:44:14 cl1 kernel: ata1: device not ready (errno=-16), forcing hardreset
Jul 23 09:44:14 cl1 kernel: ata1: soft resetting link
Jul 23 09:44:20 cl1 kernel: ata1: link is slow to respond, please be patient (ready=0)
Jul 23 09:44:25 cl1 kernel: ata1: SRST failed (errno=-16)
Jul 23 09:44:25 cl1 kernel: ata1: soft resetting link
Jul 23 09:44:30 cl1 kernel: ata1: link is slow to respond, please be patient (ready=0)
Jul 23 09:44:35 cl1 kernel: ata1: SRST failed (errno=-16)
Jul 23 09:44:35 cl1 kernel: ata1: soft resetting link
Jul 23 09:44:37 cl1 kernel: ata1.00: configured for UDMA/100
Jul 23 09:44:37 cl1 kernel: ata1.00: device reported invalid CHS sector 0
Jul 23 09:44:37 cl1 kernel: ata1: EH complete
Jul 23 09:44:37 cl1 kernel: end_request: I/O error, dev sda, sector 385366997
Jul 23 09:44:37 cl1 kernel: md: super_written gets error=-5, uptodate=0
Jul 23 09:44:37 cl1 kernel: raid1: Disk failure on sda2, disabling device.
Jul 23 09:44:37 cl1 kernel: raid1: Operation continuing on 1 devices.
Jul 23 09:44:37 cl1 kernel: RAID1 conf printout:
Jul 23 09:44:37 cl1 kernel: --- wd:1 rd:2
Jul 23 09:44:37 cl1 kernel: disk 0, wo:1, o:0, dev:sda2
Jul 23 09:44:37 cl1 kernel: disk 1, wo:0, o:1, dev:sdc2
Jul 23 09:44:37 cl1 kernel: RAID1 conf printout:
Jul 23 09:44:37 cl1 kernel: --- wd:1 rd:2
Jul 23 09:44:37 cl1 kernel: disk 1, wo:0, o:1, dev:sdc2

tulskiy (tulskiy) wrote :

@Grigory Rechistov: I should've read your message first... At least I have a 32-bit system now and java doesn't eat twice as much memory. If only I knew about pae-enabled kernels before...

Anyways, I've had only one hang up in these two days, as oppose to one every 5 minutes. On the other hand, having a several seconds break from time to time is a good thing, huh?

DjznBR (djzn-br) wrote :

I have changed cables, but I was stubborn and stuck it in the same SATA port. I am going to change this. I don't think it's going to work, but I will try, after I test kernel parameter "libata.force=1.5Gbps" - UNFORTUNATELY, CONFIG_SATA_PMP="n" *DOES NOT WORK* to fix this issue... it is only a rumour in some Fedora forum I read. But I did manage to compile a brand new kernel with that option turned off. Minutes later, there was the system hanging up again!

Here is a list of what I did:

[X] TURNED HDPARM OFF
[X] CHANGED CABLE
[X] EXPERIMENTED AHCI & RAID MODES
[X] DISABLED NCQ
[X] COMPILED KERNEL WITH CONFIG_SATA_PMP DISABLED
[X] TRYING NOW LIBATA.FORCE=1.5GBPS
[ ] - to be done - try different route.

DjznBR (djzn-br) wrote :

libata.force=1.5Gbps DIDN'T WORK as well...
Changed the cables to different routes... SATA1 -> SATA2 SATA2 -> SATA3

DjznBR (djzn-br) wrote :

Back. Changing the cables did not work. I may consider some few options to the kernel such as turning off some ACPI options.

Here is a list of what I did WITHOUT SUCCESS:

[X] TURNED HDPARM OFF
[X] CHANGED CABLE
[X] EXPERIMENTED AHCI & RAID MODES
[X] DISABLED NCQ
[X] COMPILED KERNEL WITH CONFIG_SATA_PMP DISABLED
[X] TRYING NOW LIBATA.FORCE=1.5GBPS
[X] CHANGED CABLE ROUTES

DjznBR (djzn-br) wrote :

Added option in kernel,
In my current GRUB (Arch):

kernel /boot/vmlinuz26 root=/dev/sda1 ro nomodeset libata.noacpi=1

The option you might want to test:
libata.noacpi=1

So far no hang ups.

I still get these messages in dmesg (without any symptom):

ata3: softreset failed (device not ready)
ata3: applying SB600 PMP SRST workaround and retrying
ata5: softreset failed (device not ready)
ata5: applying SB600 PMP SRST workaround and retrying
ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

And this ones after dmesg is finished:

ata3.00: configured for UDMA/133
ata3: EH complete

We'll see if this keeps this bug quiet.

DjznBR (djzn-br) wrote :

Ok, tried different and combined things, like libata.noacpi=1, libata.force=noncq,norst being that this last one was to block the soft and hard resetting, which eventually cause the whole system to crash and kernel panic for the first opportunity this bug came up.

Now I am hopeless... I can tell that nothing in the realm of what a user can do will solve this. At least I got to a conclusion.

So what you guys advise...

Do I trash my SAMSUNG drive, along with my trust in this company...

or

Do I trash my ASUS M3A78-EM equipped with SB700 chipset...

DjznBR (djzn-br) wrote :

I may try to go back to kernel 2.6.24.7, by the time Hardy Heron was released. This bug got introduced right after Intrepid Ibex, so I am gonna try Hardy Heron-time kernel. Stock 2.6.24.7 in ArchLinux.

Do you still think it may be a hardware issue? I have an ASUS X83Vm notebok
with Seagate hard drive. Have no idea how to get more details about my
motherboard.

I don't get that many hang-ups after reinstall so I can't tell yet
if libata.noacpi=1 is working. BTW, what side effects does libata.noacpi=1
have?

On Thu, Jul 29, 2010 at 8:42 AM, DjznBR <email address hidden> wrote:

> Ok, tried different and combined things, like libata.noacpi=1,
> libata.force=noncq,norst being that this last one was to block the
> soft and hard resetting, which eventually cause the whole system to
> crash and kernel panic for the first opportunity this bug came up.
>
> Now I am hopeless... I can tell that nothing in the realm of what a user
> can do will solve this. At least I got to a conclusion.
>
> So what you guys advise...
>
> Do I trash my SAMSUNG drive, along with my trust in this company...
>
> or
>
> Do I trash my ASUS M3A78-EM equipped with SB700 chipset...
>
> --
> ata1.00: exception Emask 0x0 SAct 0x807f SErr 0x0 action 0x6 frozen
> https://bugs.launchpad.net/bugs/285892
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Unknown
> Status in “linux” package in Ubuntu: Triaged
>
> Bug description:
> Binary package hint: linux-headers-2.6.27-7-generic
>
> Since I'm running 8.10 alpha6 64-bit, I'm having now and then a frozen
> machine for 1 or more minutes.
> Although I can not pinpoint the reason, it seems to happen soon after
> booting and round the whole hour.
> I guess there is some process in the background responsible. So I include
> the entries of the system log from the freezes at 12:00 and 14:00.
>
> I'm not sure if I file this issue under the right package. Sorry for that.
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/linux/+bug/285892/+subscribe
>

--
--------------------------------
Best,
Denis Tulskiy

DjznBR (djzn-br) wrote :

Hi there, libata.noacpi=1 seems to have no side-effects... the only kernel switch that had a side effect was libata.force=norst which prevents soft and hard link resettings. If you have that switch on, when this bug comes up, there is a system lock down (because obviously the kernel prevented the soft & hard resetting.) Other switches had no effect, and I gave up trying.

I am furious about AMD SB700... but M3A78-EM is a good board. I think your notebook uses Intel chipset, and it's funny that it's also occurring with Seagate.

I may replace the SAMSUNG drive first, we see how it goes (I need a larger one anyway).

DjznBR (djzn-br) wrote :

I just performed a full test in this hard drive and ESTOOL returned a LBA 287034602 Bad Sector. Now I wonder if this is because the hardware had influence on the test itself, or if it is a real bad sector. Guess I will have to do one more test.

DjznBR (djzn-br) wrote :

Concluded the second test, BAD BLOCK confirmed, at the same spot.
Looks to me this is the trouble maker, and this made me look ridiculous...
I was believing that fsck would "grasp" any inconsistencies or bad blocks upon boot up, silly me, it just does partial check-ups.
The Samsung's ESTOOL utility ended up my rage quest against AMD SB700 and Samsung themselves.

Guess it's time for a backup and a badblocks -svw /dev/sda3

Brian Neu (brianwneu) wrote :

@DjznBR My cable swap was equally ineffective. I'm assuming that I have the same bad block problems.

DjznBR (djzn-br) wrote :

I think I have this bad block for a year now... the thing is that I would never write much data on this disk. The problem started when the headers actually started to cross over this spot, since I almost filled the disk a couple of times.

I never cared to make a full surface scan using the factory tool from SAMSUNG. Neither cared to do this with fsck. I've always relied on fsck partial checks, and I think when the thing starts going really bad you get warned. Otherwise, you don't.

So, download the factory surface scan app from your HDD website company, and do a full scan. It may take an hour depending how large it is (40 minutes for this 160GB).

One thing for sure is that I will never be buying SAMSUNG hard drives again. I remember someone said to me that they were a "so-so" hard disk brand, and my previous disk was a Seagate Spinpoint in 2004 which I believe it is still kicking ass for someone I sold to. Western Digital made the RMA record for me, counting 3 RMA'ed drives in 2 years. So I stopped buying from them too.

I am gonna buy a Seagate 500GB. However, I am gonna low-level format this problematic drive and see if this bad block can be marked away. At least I can still use it on a spare machine.

Zrin Ziborski (zrin+launchpad) wrote :

In the end, it seems that this bug is related to
- HDD problems like unreadable blocks / sectors
- possibly changes in how HDDs act in such situations
- some controllers like the one in AMD SB700
- kernel / libata changes after 2.6.24 (?)

I worry that maybe
- HDD manufacturers are changing the way drives act on hardware problems
- relocating sectors takes more time than intended and/or is producing unexpected states in controller and/or kernel driver
- kernel driver does not handle this situations properly

So the appropriate suggestion would be to
- check HDD thoroughly, check SMART state, check seek times, check reading and writing speed / throughput (!)
- replace the HDD with a "RAID-ready" HDD - this devices limit the time for relocating sectors or whatever "self-healing" they do
- having a new HDD, write to all sectors before using it, e.g. dd if=/dev/zero of=/dev/sdf bs=256K
- press kernel / libata developers to investigate the problem

I had the problem with SB700 and SATA WD VelociRaptor 150GB and WD Green 2 TB drives.

best regards + best luck

Mesa (astewart) wrote :

Confirming this is still an issue in Maverick beta - it's worse than ever. In lucid I had this error occasionally - since moving to maverick it takes two or three failed boots (dumping down to busybox as grub couldn't find the disk) plus lots of keypresses during boot to get it going - it's like the interrupts from the keypresses kick it into life again.

Once it's booted it's generally ok with errors and 30 sec system hang only occurring every hour or so.

uname -a
Linux ion-laptop 2.6.35-19-generic #28-Ubuntu SMP Sun Aug 29 06:36:51 UTC 2010 i686 GNU/Linux

Sep 3 20:35:04 ion-laptop kernel: [ 1.924433] ata1.00: ATA-8: SAMSUNG HM160HI, HH100-06, max UDMA7
Sep 3 20:35:04 ion-laptop kernel: [ 1.924440] ata1.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
Sep 3 20:35:04 ion-laptop kernel: [ 1.930502] ata1.00: configured for UDMA/133
Sep 3 20:35:04 ion-laptop kernel: [ 2.677862] EXT3-fs (sda6): mounted filesystem with ordered data mode
Sep 3 20:36:37 ion-laptop kernel: [ 155.872072] ata1.00: exception Emask 0x0 SAct 0x2 SErr 0x0 action 0x6 frozen
Sep 3 20:36:37 ion-laptop kernel: [ 155.872087] ata1.00: failed command: READ FPDMA QUEUED
Sep 3 20:36:37 ion-laptop kernel: [ 155.872102] ata1.00: cmd 60/08:08:7b:1d:55/00:00:12:00:00/40 tag 1 ncq 4096 in
Sep 3 20:36:37 ion-laptop kernel: [ 155.872112] ata1.00: status: { DRDY }
Sep 3 20:36:37 ion-laptop kernel: [ 155.872123] ata1: hard resetting link
Sep 3 20:36:38 ion-laptop kernel: [ 156.644058] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 3 20:36:38 ion-laptop kernel: [ 156.654458] ata1.00: configured for UDMA/133
Sep 3 20:36:38 ion-laptop kernel: [ 156.654471] ata1.00: device reported invalid CHS sector 0
Sep 3 20:36:38 ion-laptop kernel: [ 156.654488] ata1: EH complete
Sep 3 20:36:38 ion-laptop kernel: [ 156.692079] ata1.00: configured for UDMA/133
Sep 3 20:36:38 ion-laptop kernel: [ 156.692094] ata1: EH complete
Sep 3 20:36:51 ion-laptop kernel: [ 169.465314] ata1.00: configured for UDMA/133
Sep 3 20:36:51 ion-laptop kernel: [ 169.465333] ata1: EH complete

João Pinto (joaopinto) wrote :

I am also experiencing this problem with Maverick.

The disk is:
Western Digital Caviar Black: WD1002FAEX-00Z3A0, 1 TB

The error:
[ 1870.860322] ata1: lost interrupt (Status 0x50)
[ 1870.860343] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x5850002 action 0xe frozen
[ 1870.860348] ata1.00: SError: { RecovComm PHYRdyChg CommWake LinkSeq TrStaTrns DevExch }
[ 1870.860351] ata1.00: failed command: WRITE DMA EXT
[ 1870.860357] ata1.00: cmd 35/00:30:f8:5f:f6/00:00:56:00:00/e0 tag 0 dma 24576 out
[ 1870.860358] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
[ 1870.860361] ata1.00: status: { DRDY }
[ 1870.860369] ata1.00: hard resetting link
[ 1871.609326] ata1.01: hard resetting link
[ 1872.118761] ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1872.118774] ata1.01: SATA link down (SStatus 0 SControl 300)
[ 1872.178939] ata1.00: configured for UDMA/133
[ 1872.178947] ata1.00: device reported invalid CHS sector 0
[ 1872.178954] ata1: EH complete

My hard drive's SMART check shows that there are about 2 million Command
Timeout's and high Seek Error rate. There are two wires connected to the
plate the hard drive is connected to that seem to be the power cables and
they are a bit loose. Check you power cables and SMART results.

2010/10/21 João Pinto <email address hidden>

> I am also experiencing this problem with Maverick.
>
> The disk is:
> Western Digital Caviar Black: WD1002FAEX-00Z3A0, 1 TB
>
> The error:
> [ 1870.860322] ata1: lost interrupt (Status 0x50)
> [ 1870.860343] ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x5850002 action
> 0xe frozen
> [ 1870.860348] ata1.00: SError: { RecovComm PHYRdyChg CommWake LinkSeq
> TrStaTrns DevExch }
> [ 1870.860351] ata1.00: failed command: WRITE DMA EXT
> [ 1870.860357] ata1.00: cmd 35/00:30:f8:5f:f6/00:00:56:00:00/e0 tag 0 dma
> 24576 out
> [ 1870.860358] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x14
> (ATA bus error)
> [ 1870.860361] ata1.00: status: { DRDY }
> [ 1870.860369] ata1.00: hard resetting link
> [ 1871.609326] ata1.01: hard resetting link
> [ 1872.118761] ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [ 1872.118774] ata1.01: SATA link down (SStatus 0 SControl 300)
> [ 1872.178939] ata1.00: configured for UDMA/133
> [ 1872.178947] ata1.00: device reported invalid CHS sector 0
> [ 1872.178954] ata1: EH complete
>
> --
> ata1.00: exception Emask 0x0 SAct 0x807f SErr 0x0 action 0x6 frozen
> https://bugs.launchpad.net/bugs/285892
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Unknown
> Status in “linux” package in Ubuntu: Triaged
>
> Bug description:
> Binary package hint: linux-headers-2.6.27-7-generic
>
> Since I'm running 8.10 alpha6 64-bit, I'm having now and then a frozen
> machine for 1 or more minutes.
> Although I can not pinpoint the reason, it seems to happen soon after
> booting and round the whole hour.
> I guess there is some process in the background responsible. So I include
> the entries of the system log from the freezes at 12:00 and 14:00.
>
> I'm not sure if I file this issue under the right package. Sorry for that.
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/linux/+bug/285892/+subscribe
>

--
--------------------------------
Best,
Denis Tulskiy

João Pinto (joaopinto) wrote :

I have switched the HD, I am sure the cables are properly plugged now. Also I don't experience this issue with a different OS, so it's very unlikely to be a cabling issue.

Wow, guys, I have the fullest respect for your patience and insisting behavior after the long time this bug has been around.

Well, what can I say? Two months ago, I built myself a brand new machine with the following components:

Asus P7H57D-V Evo
Intel Core i7 875K
4 x 4 GB Corsair PC3-16000 (XMS3)
2 x Corsair Force F120 @ RAID 0 mounted on /
2 x Seagate Constellation ES 2 TB @ RAID 0 mounted on /home
BeQuiet Dark Power Pro 750 W

Initially I had the SSDs connected to the mainboard's SATA 3 interfaces (provided by a Marvell 88SE9125 chip) and with that setup, I couldn't even finish the installation (Ubuntu Maverick AMD64). So I moved the SSDs to the Intel controller's ports 1 and 2 and hooked the Seagte HDDs up to the Marvell. Installation went fine but ever since I get those freezes that are described on this page. They sometimes last for 20 or so seconds and vanish without doing any harm, but I seem to notice a higher frequency since the kernel package update that took place last week. However, sometimes the drives won't recover and the whole machine gets stuck without even being able to properly shut down.

Up to now I've never had any data lost but it's a really annoying issue and I'd like to get rid of it. Desperate as I was, I connected all six drives (2 x SSD, 2 x HDD, 2 x BDD) to the (internal) Intel chipset controller a week ago and what can I say? No freeze since then - and I'm running my machine on average several hours per day currently.

Needless to say, all drives are fine. I don't have any other OS installed but I've tested them (the drives) one by one more than once. Even did RW tests but to no avail (other than the fact that there is no hardware issue). So... has anybody from the development team (Kernel?) ever taken a look at this issue and tried to investigate further? I'm more than willing to help but at the moment I don't really see what else I could do.

Thanks for reading! :)

K1300S

description: updated
Raj B (bigwoof) wrote :

fwiw, I am having the same problems with the latest Natty packages.

The kernel is
Linux mythtv 2.6.37-11-generic #25-Ubuntu SMP Tue Dec 21 23:42:56 UTC 2010 x86_64 GNU/Linux

My setup is
Asus P7P55D-E LX Motherboard with the Marvell 88SE9125 SATA 3 controller
Intel i5 760 CPU
PC1333 8G RAM (4GB x 2 Corsair)
5 Sata 2 2 TB Disks

1 Sata 3 Seagate 2 TB Disk (used as the boot disk and the disk on which everything below is being written). plugged in as SATA 3 using AHCI

1 PVR 150 Video Capture card
Asus ENGT430 Graphics Card

I'm getting a ton of
[ 5884.881538] ata9.00: failed command: WRITE FPDMA QUEUED
[ 5884.909807] ata9.00: cmd 61/08:20:f0:71:c5/00:00:73:00:00/40 tag 4 ncq 4096 out
[ 5884.909810] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 5885.022592] ata9.00: status: { DRDY }
[ 5885.051123] ata9: hard resetting link
[ 5885.591937] ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 370)
[ 5885.593771] ata9.00: configured for UDMA/133
[ 5885.593996] ata9.00: device reported invalid CHS sector 0
[ 5885.594002] ata9.00: device reported invalid CHS sector 0
[ 5885.594013] ata9: EH complete

errors in the logs. I noticed this because ivtv was outputting a ton of "Unable to Save MPG stream" errors. I thought it was a bug in ivtv but now I realize that it was a SATA 3 error and that the drive had become read-only.

this has happened a few times now and the system is locked up hard each time. It can ping but nothing is running. I had ssh access during one of these events and nothing worked (reboot, all process were in zombie state, etc.). which makes sense as the root drive was now read-only and inoperable.

I've lost data because of this as well. my entire /var/lib/mysql directory was blown away and recovered into lost+found. other directories are there as well.

I'm going to a) switch the sata 3 drive to the sata 2 controller, and b) reinstall ubuntu (as I'm not sure what went missing with the latest crash). I'm a little surprised that this bug has remained through multiple kernel revisions.

Raj

Amaeth (dfoxpro) wrote :

[Español, por favor traducir]

Yo tengo este error y después de eso parece como si me bloqueara el disco (o me lo dañara) por q' ese error pasa a ser persistente en windows hasta q' lanza el pantallazo azul:

Jan 18 14:08:03 familia-K7S41GX kernel: [14769.056043] ata2: lost interrupt (Status 0x50)
Jan 18 14:08:03 familia-K7S41GX kernel: [14769.056157] ata2: soft resetting link
Jan 18 14:08:03 familia-K7S41GX kernel: [14769.233285] ata2.00: configured for UDMA/33
Jan 18 14:08:03 familia-K7S41GX kernel: [14769.233302] ata2.00: device reported invalid CHS sector 0
Jan 18 14:08:03 familia-K7S41GX kernel: [14769.233327] ata2: EH complete
Jan 18 14:09:17 familia-K7S41GX kernel: [14843.008081] ata2: lost interrupt (Status 0x50)
Jan 18 14:09:17 familia-K7S41GX kernel: [14843.008121] ata2.00: limiting speed to UDMA/25:PIO4
Jan 18 14:09:17 familia-K7S41GX kernel: [14843.008203] ata2: soft resetting link
Jan 18 14:09:17 familia-K7S41GX kernel: [14843.184811] ata2.00: configured for UDMA/25
Jan 18 14:09:17 familia-K7S41GX kernel: [14843.184828] ata2.00: device reported invalid CHS sector 0
Jan 18 14:09:17 familia-K7S41GX kernel: [14843.184852] ata2: EH complete

Changed in linux:
status: Unknown → Fix Released
PsYcHoK9 (psychok9) wrote :

I've solved temporaly adding this parameter on kernel:
libata.force=noncq

The fix when will released?

please read my comment here, maybe this is related?
https://bugs.launchpad.net/ubuntu/+bug/550559/comments/41

Mesa (astewart) wrote :

Since upgrading to natty alpha about month ago the issue has more or less disappeared for me (see my comment earlier re problem in lucid and maverick) - previously used to get it every single boot (i.e. daily) plus intermittently on top - since upgrading only had the issue once.

Note that upgrading to an alpha release is a bad idea for most - wait for proper release unless you can live with the breakages.

Would be good to know if it's also now fixed for Raj B as he was running an earlier version of Natty than I.

Jarek T. (ulvhedin) wrote :

Hi, it looks that this problem still exist in 2.6.38 kernel.
Is someone work on this maybe?

Regards
Jarek

1 comments hidden view all 112 comments
meWho (mewho) wrote :

Hi, I also can confirm the problem, because I am experiencing it with Ubuntu 10.10 2.6.35-28-generic on Dell Latitude E6500. Dell's Diagnostic Tool reports no errors (I have checked it several times).

Travis Ogdon (togdon) wrote :

I'm still seeing the problem in the most recent build of Natty as well:

2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
ext4 / partition for the whole drive (minus a bit of swap)

This is on a new (to me) box that was happily running Windows 7 prior to me installing Natty Beta2.

Hopefully relevant hardware information:

SAMSUNG Spinpoint F1 HD753LJ 750GB hard drive.
Intel Core i7-920 processor
JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 03)
nVidia Corporation G92 [GeForce 9800 GT] (rev a2) (with the proprietary driver installed)

It happens if I set the SATA controller either in IDE or AHCI mode in the BIOS.

I've also tried the vanilla mainline kernel (2.6.38-02063803-generic #201104150912 SMP Fri Apr 15 09:15:15 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux from http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.38.3-natty/) and had the same failures.

NCQ seems to be supported everywhere that matters:

[ 3.181549] ata1.00: ATA-7: SAMSUNG HD753LJ, 1AA01114, max UDMA7
[ 3.181551] ata1.00: 1465149168 sectors, multi 16: LBA48 NCQ (depth 0/32)
...
[ 5.427976] ahci 0000:02:00.0: version 3.0
[ 5.427984] ahci 0000:02:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
[ 5.455655] ahci 0000:02:00.0: AHCI 0001.0000 32 slots 2 ports 3 Gbps 0x3 impl SATA mode
[ 5.455661] ahci 0000:02:00.0: flags: 64bit ncq pm led clo pmp pio slum part
[ 5.455668] ahci 0000:02:00.0: setting latency timer to 64
[ 5.456043] scsi6 : ahci
[ 5.456267] scsi7 : ahci
[ 5.456348] ata7: SATA max UDMA/133 abar m8192@0xf7dfe000 port 0xf7dfe100 irq 19
[ 5.456353] ata8: SATA max UDMA/133 abar m8192@0xf7dfe000 port 0xf7dfe180 irq 19

Here's the error that I see:

[ 190.934060] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 190.934065] ata1.00: failed command: FLUSH CACHE EXT
[ 190.934073] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[ 190.934074] res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 190.934077] ata1.00: status: { DRDY }
[ 190.934087] ata1.00: hard resetting link
[ 191.283287] ata1.01: hard resetting link
[ 196.822167] ata1.00: link is slow to respond, please be patient (ready=0)
[ 200.953882] ata1.00: SRST failed (errno=-16)
[ 200.953889] ata1.00: hard resetting link
[ 201.303204] ata1.01: hard resetting link
[ 206.842098] ata1.00: link is slow to respond, please be patient (ready=0)

What's odd is that the box seemed ok until I started copying my data over to it, now it's essentially unusable.

I can consistently lock up the drive by running a SMART scan on it (smartctl --test=short /dev/sda), so if someone needs a way to consistently repeat the error, there you go.

Same issue here.

OS: Ubuntu Server lucid
Kernel: 2.6.38-7-generic (kernel-ppa)
HD: WDC WD10EARS-00Y

Error:

[ 222.848056] ata2: lost interrupt (Status 0x50)
[ 222.848094] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 222.848166] ata2.00: failed command: READ DMA EXT
[ 222.848234] ata2.00: cmd 25/00:60:e0:a6:a7/00:00:4d:00:00/e0 tag 0 dma 49152 in
[ 222.848238] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 222.848353] ata2.00: status: { DRDY }
[ 222.848414] ata2: soft resetting link
[ 223.056419] ata2.00: configured for UDMA/133
[ 223.056436] ata2.00: device reported invalid CHS sector 0
[ 223.056465] ata2: EH complete
[ 973.008042] ata2: lost interrupt (Status 0x50)
[ 973.008080] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 973.008111] ata2.00: failed command: READ DMA EXT
[ 973.008140] ata2.00: cmd 25/00:08:78:c3:b1/00:00:6a:00:00/e0 tag 0 dma 4096 in
[ 973.008144] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 973.008179] ata2.00: status: { DRDY }
[ 973.008201] ata2: soft resetting link
[ 973.697072] ata2.00: configured for UDMA/133
[ 973.697090] ata2.00: device reported invalid CHS sector 0
[ 973.697116] ata2: EH complete
[ 3694.048044] ata2: lost interrupt (Status 0x50)
[ 3694.048081] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 3694.048113] ata2.00: failed command: WRITE DMA EXT
[ 3694.048142] ata2.00: cmd 35/00:20:e8:2b:a3/00:00:64:00:00/e0 tag 0 dma 16384 out
[ 3694.048146] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 3694.048182] ata2.00: status: { DRDY }
[ 3694.048211] ata2: soft resetting link
[ 3694.244372] ata2.00: configured for UDMA/133
[ 3694.244389] ata2.00: device reported invalid CHS sector 0
[ 3694.244415] ata2: EH complete

Liunx (liunx163) wrote :

have the same problem recently.
ubuntu11.04 natty
Linux enet 2.6.38-8-server #42-Ubuntu SMP Mon Apr 11 03:49:04 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
name WDC WD3200AAJS-00L7A0
size 320 GB
speed 7200 r/m
cache 8 MB
interface SATA Rev 2.5
transrate 300 MB/s
feature S.M.A.R.T, 48-bit LBA, NCQ
[ 5216.002643] ata1: soft resetting link
[ 5216.180056] ata1.01: NODEV after polling detection
[ 5216.180062] ata1.01: revalidation failed (errno=-2)
[ 5221.160032] ata1: soft resetting link
[ 5221.380820] ata1.00: configured for UDMA/133
[ 5221.420145] ata1.01: configured for UDMA/100
[ 5221.420940] ata1: EH complete
[ 5276.002197] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 5276.002205] ata1.01: ST_FIRST: !(DRQ|ERR|DF)
[ 5276.002212] sr 0:0:1:0: CDB: Get event status notification: 4a 01 00 00 10 00 00 00 08 00
[ 5276.002233] ata1.01: cmd a0/00:00:00:08:00/00:00:00:00:00/b0 tag 0 pio 16392 in
[ 5276.002236] res 00/00:00:00:08:00/00:00:00:00:00/b0 Emask 0x2 (HSM violation)
[ 5276.002249] ata1: soft resetting link
[ 5276.170183] ata1.01: NODEV after polling detection
[ 5276.170189] ata1.01: revalidation failed (errno=-2)
[ 5281.170031] ata1: soft resetting link
[ 5281.410467] ata1.00: configured for UDMA/133
[ 5281.450172] ata1.01: configured for UDMA/100
[ 5281.450991] ata1: EH complete

Changed in udev (Debian):
status: Unknown → Confirmed
John Doe (b2109455) wrote :

I have the same problem on a Asus E35M1-I DELUXE with two samsung spinpoints F1's and one F4 ecogreen, running Ubuntu 11.04 Linux 2.6.38-8-server:

[68623.060362] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[68623.060521] ata2.00: failed command: WRITE DMA EXT
[68623.060626] ata2.00: cmd 35/00:00:9f:a2:05/00:04:40:00:00/e0 tag 0 dma 524288 out
[68623.060630] res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[68623.060896] ata2.00: status: { DRDY }
[68623.060976] ata2: hard resetting link
[68633.090302] ata2: softreset failed (device not ready)
[68633.090423] ata2: hard resetting link
[68643.120265] ata2: softreset failed (device not ready)
[68643.120387] ata2: hard resetting link
[68653.750324] ata2: link is slow to respond, please be patient (ready=0)
[68678.170307] ata2: softreset failed (device not ready)
[68678.170429] ata2: limiting SATA link speed to 1.5 Gbps
[68678.170439] ata2: hard resetting link
[68683.380272] ata2: softreset failed (device not ready)
[68683.380391] ata2: reset failed, giving up
[68683.380471] ata2.00: disabled
[68683.380491] ata2.00: device reported invalid CHS sector 0
[68683.380524] ata2: EH complete

Turning off NCQ didn't help, Smartctl and fsck didn't reveal any problems.
Pretty annoying bug, which lingers around for a long time.

This bug might be the same as bug #640525 that I encounter on a Dell Inspiron 9300 with a Samsung HM160HC HD.

It's still present in Oneiric with all kernels up to and including 3.0.0-14-generic.

Symptom : System very often boots either while starting a KDE or Gnome session (right after bootup), or when waking up from resume. The lock corresponds to a steady lit HD LED and entries looking very much like previous comment's.

Sorry, I wrote "System very often boots" when I meant "System very often HANGS", this correction may make my comment above more understandable ;-)

Sergei Andreev (seajey) wrote :

Same error here:

Linux Bellerophon-117 3.0.0-15-generic #24-Ubuntu SMP Mon Dec 12 15:23:55 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

Model Family: Seagate Barracuda 7200.11
Device Model: ST31500341AS
Serial Number: 9VS0931W
LU WWN Device Id: 5 000c50 01051e2ab
Firmware Version: SD17
User Capacity: 1 500 301 910 016 bytes [1,50 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Sun Dec 18 00:34:09 2011 MSK

17.12.11 23:29:42 Bellerophon-117 kernel [33659.872047] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
17.12.11 23:29:42 Bellerophon-117 kernel [33659.872055] ata3.00: failed command: FLUSH CACHE EXT
17.12.11 23:29:42 Bellerophon-117 kernel [33659.872065] ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
17.12.11 23:29:42 Bellerophon-117 kernel [33659.872067] res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
17.12.11 23:29:42 Bellerophon-117 kernel [33659.872072] ata3.00: status: { DRDY }
17.12.11 23:29:42 Bellerophon-117 kernel [33659.872079] ata3: hard resetting link
17.12.11 23:29:42 Bellerophon-117 kernel [33660.364022] ata3: softreset failed (device not ready)
17.12.11 23:29:42 Bellerophon-117 kernel [33660.364029] ata3: applying SB600 PMP SRST workaround and retrying
17.12.11 23:29:42 Bellerophon-117 kernel [33660.536032] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
17.12.11 23:29:42 Bellerophon-117 kernel [33660.539209] ata3.00: configured for UDMA/133
17.12.11 23:29:42 Bellerophon-117 kernel [33660.539213] ata3.00: retrying FLUSH 0xea Emask 0x4
17.12.11 23:29:42 Bellerophon-117 kernel [33660.552015] ata3.00: device reported invalid CHS sector 0
17.12.11 23:29:42 Bellerophon-117 kernel [33660.552026] ata3: EH complete

Marcel, this bug report is being closed due to your last comment https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285892/comments/32 regarding this being fixed with a BIOS configuration change. For future reference you can manage the status of your own bugs by clicking on the current status in the yellow line and then choosing a new status in the revealed drop down box. You can learn more about bug statuses at https://wiki.ubuntu.com/Bugs/Status. Thank you again for taking the time to report this bug and helping to make Ubuntu better. Please submit any future bugs you may find.

Changed in linux (Ubuntu):
status: Triaged → Invalid
Displaying first 40 and last 40 comments. View all 112 comments or add a comment.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.