hdd problems, failed command: READ FPDMA QUEUED

Bug #550559 reported by Crashbit on 2010-03-28
498
This bug affects 102 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned

Bug Description

Hello!

I have a brand new computer. With a SSD device and a SATA hard drive, a Seagate Barracuda XT specifically 6Gb / s of 2TB. The latter is connected to a Marvell 9123 controller that I set AHCI mode in BIOS.

I have the OS installed on the SSD device, but when you try to read the disc 2TB gives several bugs.

I tried to change the disk to another controller and gives the same problem, I even removed the disk partition table, having the same fate.

I checked the disc for flaws from Windows with hd tune and verification tool official record, and does not give me any errors.

I have tested with kernel version 2.6.34-rc2 and it works properly with this disc.

The errors given are the following:

[ 9.115544] ata9: exception Emask 0x0 SAct 0xf SErr 0x0 action 0x10 frozen
[ 9.115550] ata9.00: failed command: READ FPDMA QUEUED
[ 9.115556] ata9.00: cmd 60/04:00:d4:82:85/00:00:1f:00:00/40 tag 0 ncq 2048 in
[ 9.115557] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4 (timeout)
[ 9.115560] ata9.00: status: { DRDY }
[ 9.115562] ata9.00: failed command: READ FPDMA QUEUED
[ 9.115568] ata9.00: cmd 60/01:08:d1:82:85/00:00:1f:00:00/40 tag 1 ncq 512 in
[ 9.115569] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4 (timeout)
[ 9.115572] ata9.00: status: { DRDY }
[ 9.115574] ata9.00: failed command: READ FPDMA QUEUED
[ 9.115579] ata9.00: cmd 60/01:10:d2:82:85/00:00:1f:00:00/40 tag 2 ncq 512 in
[ 9.115581] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4 (timeout)
[ 9.115583] ata9.00: status: { DRDY }
[ 9.115586] ata9.00: failed command: READ FPDMA QUEUED
[ 9.115591] ata9.00: cmd 60/01:18:d3:82:85/00:00:1f:00:00/40 tag 3 ncq 512 in
[ 9.115592] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4 (timeout)
[ 9.115595] ata9.00: status: { DRDY }
[ 9.115609] sd 8:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 9.115612] sd 8:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
[ 9.115616] Descriptor sense data with sense descriptors (in hex):
[ 9.115618] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 9.115626] 1f 85 82 d3
[ 9.115629] sd 8:0:0:0: [sdb] Add. Sense: No additional sense information
[ 9.115633] sd 8:0:0:0: [sdb] CDB: Read(10): 28 00 1f 85 82 d4 00 00 04 00
[ 9.115640] end_request: I/O error, dev sdb, sector 528843476
[ 9.115643] __ratelimit: 18 callbacks suppressed
[ 9.115646] Buffer I/O error on device sdb2, logical block 317299556
[ 9.115649] Buffer I/O error on device sdb2, logical block 317299557
[ 9.115652] Buffer I/O error on device sdb2, logical block 317299558
[ 9.115655] Buffer I/O error on device sdb2, logical block 317299559
[ 9.115671] sd 8:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 9.115674] sd 8:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
[ 9.115678] Descriptor sense data with sense descriptors (in hex):
[ 9.115679] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 9.115687] 1f 85 82 d3
[ 9.115690] sd 8:0:0:0: [sdb] Add. Sense: No additional sense information
[ 9.115693] sd 8:0:0:0: [sdb] CDB: Read(10): 28 00 1f 85 82 d1 00 00 01 00
[ 9.115700] end_request: I/O error, dev sdb, sector 528843473
[ 9.115702] Buffer I/O error on device sdb2, logical block 317299553
[ 9.115707] sd 8:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 9.115710] sd 8:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
[ 9.115714] Descriptor sense data with sense descriptors (in hex):
[ 9.115716] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 9.115723] 1f 85 82 d3
[ 9.115726] sd 8:0:0:0: [sdb] Add. Sense: No additional sense information
[ 9.115729] sd 8:0:0:0: [sdb] CDB: Read(10): 28 00 1f 85 82 d2 00 00 01 00
[ 9.115736] end_request: I/O error, dev sdb, sector 528843474
[ 9.115738] Buffer I/O error on device sdb2, logical block 317299554
[ 9.115743] sd 8:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 9.115746] sd 8:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
[ 9.115749] Descriptor sense data with sense descriptors (in hex):
[ 9.115751] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 9.115759] 1f 85 82 d3
[ 9.115762] sd 8:0:0:0: [sdb] Add. Sense: No additional sense information
[ 9.115765] sd 8:0:0:0: [sdb] CDB: Read(10): 28 00 1f 85 82 d3 00 00 01 00
[ 9.115771] end_request: I/O error, dev sdb, sector 528843475
[ 9.115774] Buffer I/O error on device sdb2, logical block 317299555
[ 16.243531] sd 8:0:0:0: timing out command, waited 7s
[ 23.241557] sd 8:0:0:0: timing out command, waited 7s

lsb_release -rd
Description: Ubuntu lucid (development branch)
Release: 10.04
ignasi@ignasi-desktop:~$

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: yelp 2.29.5-0ubuntu3
ProcVersionSignature: Ubuntu 2.6.32-17.26-generic 2.6.32.10+drm33.1
Uname: Linux 2.6.32-17-generic x86_64
NonfreeKernelModules: nvidia
Architecture: amd64
Date: Mon Mar 29 01:06:27 2010
ExecutablePath: /usr/bin/yelp
InstallationMedia: Ubuntu 10.04 "Lucid Lynx" - Beta amd64 (20100318)
ProcEnviron:
 LANG=ca_ES.utf8
 SHELL=/bin/bash
SourcePackage: yelp

Crashbit (crashbit-gmail) wrote :
Crashbit (crashbit-gmail) wrote :

Sorry!

add my dmesg

Crashbit (crashbit-gmail) wrote :

Eps!

I connect the Seagate Barracuda XT 6Gb/s to jmicron (JMB361) controller, no Marvell 9123, and no errors found using linux.
I think the problem is Marvell 9123 controller.

Crashbit (crashbit-gmail) wrote :

The problems still here!

If I connect seagate disk to jmicron controller works fine, but if I copy /home directory to this disk, and modify fstab and UUID's to mount /home directory, ubuntu doesn't start.
It seems that problem is similar

Pho Dyssey (phodyssey) wrote :

Did you solve your problem? It looks like i have a similar issue with the same disk and same controller (Fedora 13beta, 2.6.33.3).

Pho

Tony T (tonytovar) wrote :

Also suffering this but only at boot-up and only with Lucid 10.04 'Final'. I have a new Dell Latitude E5500 laptop with a traditional SATA drive (not SSD). I initially installed the Lucid Beta-2 and have steadily updated from there. Not sure when these errors first appeared but now my bootup is delayed by 30s, then a screenful of these errors pops-up before the GUI finally loads.

I haven't tested any other kernels, e.g. 2.6.34, instead I'm just running the current 2.6.32-22.

Alex Watson (alexfromapex) wrote :

Same here except my GUI never loads. So basically this error has bricked my Ubuntu installation. I have gone into recovery mode and tried all of the options:

dpkg - fix broken packages
netroot - terminal with networking
grub - update grub
failsafeX - supposed to boot into a failsafe version of X but just goes to a blank screen....

I tried booting other kernels but the same story....

Lucas Hope (lucas-r-hope) wrote :

I am struggling with this problem. I have tried a few kernels from http://kernel.ubuntu.com/~kernel-ppa/mainline/ . Tested http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.34-lucid/ and 2.6.35-lucid-rc1

The same issue seems to appear here: http://ubuntuforums.org/showthread.php?t=1396465

and here:

http://vip.asus.com/forum/view.aspx?board_id=1&model=P6X58D%20Premium&id=20100702050055531&page=1&SLanguage=en-us

The second post in that link implies that switching to a non-Marvell hard drive port is a workaround which may fix the problem. That is what I am trying now.

Crashbit (crashbit-gmail) wrote :

Hey!

Greetings again!
I have the same problem with Maverick.
The kernel is 2.6.35-12, if I connect the disk controller Jmicron not give me errors.
The lspci-k shows this in relation to Marvell 9123 controller:

05:00.0 SATA controller: Device 1b4b:9123 (rev 10)
 Kernel driver in use: ahci
 Kernel modules: ahci

tags: added: maverick
Lucas Hope (lucas-r-hope) wrote :

Crashbit's post reminded me: switching to the non-Marvell port fixed the problem for me, but it should be considered a WORKAROUND. You can't get SATA3 speeds from it. For me, the drive was SATA2 anyway. It might be that you have to set your bios to ahci or sata, too.

It is probably a good idea to re-install linux once you change the ports, too, as disk corruption caused by the original problem can cause ongoing crashes.

I changed the port, re-installed, and have had zero problems for the last two weeks.

moojix (moojix) wrote :

I had the same ugly ata errors with my Asus P7P55D-E Premium and a Crucial C300 SATA drive.
lspci: SATA controller: Device 1b4b:9123 (rev 10)

my workaround: disable NCQ and now I can use my SATA3 drive through the Marvell 9123 controller of my MoBo.
(see: http://ubuntuforums.org/showpost.php?p=9684933&postcount=12)
I tested this workaround with iozone3 without any errors.

before this workaround:
Aug 6 09:58:08 st-002 kernel: [ 3.249455] ata5.00: ATA-9: C300-CTFDDAC256MAG, 0002, max UDMA/100
Aug 6 09:58:08 st-002 kernel: [ 3.249461] ata5.00: 500118192 sectors, multi 1: LBA48 NCQ (depth 31/32), AA

after this workaround:
Aug 6 10:01:36 st-002 kernel: [ 3.369991] ata5.00: ATA-9: C300-CTFDDAC256MAG, 0002, max UDMA/100
Aug 6 10:01:36 st-002 kernel: [ 3.369996] ata5.00: 500118192 sectors, multi 1: LBA48 NCQ (not used)

I have not found, if this bug is patched in any linux kernel yet (I'm using 2.6.32-24 64-bit).

moojix (moojix) wrote :

marvell 9123 sata ahci initialization errors: https://bugzilla.kernel.org/show_bug.cgi?id=15573

Colan Schwartz (colan) wrote :

Confirming this in Lucid.

Changed in ubuntu:
status: New → Confirmed
Colan Schwartz (colan) wrote :

Oct 3 02:40:55 tiger kernel: [447432.011325] ata5.00: exception Emask 0x0 SAct 0x3ffff SErr 0x0 action 0x6 frozen
Oct 3 02:40:55 tiger kernel: [447432.011334] ata5.00: failed command: READ FPDMA QUEUED
Oct 3 02:40:55 tiger kernel: [447432.011344] ata5.00: cmd 60/80:00:bf:1f:3c/00:00:27:00:00/40 tag 0 ncq 65536 in
Oct 3 02:40:55 tiger kernel: [447432.011346] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Oct 3 02:40:55 tiger kernel: [447432.011350] ata5.00: status: { DRDY }

João Pinto (joaopinto) wrote :

I am having the same issue in Maverick, WD Black Caviar 1TB SATA III disk, AHCI mode.

Vangelis Tasoulas (cyberang3l) wrote :

I am affected of the same bug too :(

pepre (ea1256) wrote :

Same here :-(

After adding "libata.force=noncq" to kernel-bootparameters, the error changed to

exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
irq_stat 0x40000001
failed command: READ DMA EXT
cmd 25/00:e0:df:d7:f8/00:00:88:00:00/e0 tag 0 dma 114688 in
         res 51/40:00:f5:d7:f8/00:00:88:00:00/e0 Emask 0x9 (media error)
status: { DRDY ERR }
error: { UNC }
configured for UDMA/133
EH complete

when reading big files fast. I think the disaster began suddenly about two months ago.

Running lucid up to date. HDs: raid5/lvm.

Gerry Reno (greno-verizon) wrote :

I'm seeing this same error with Lucid x86_64 and kernel 2.6.32-21.

My drive is a Hitachi 500GB SATA.

Once this error starts I get hung task timeouts and the systems starts freezing.

Gerry Reno (greno-verizon) wrote :

It also generates filesystem errors such that I have to run fsck on the next boot.

Gerry Reno (greno-verizon) wrote :

And I just checked and my south bridge is an AMD SB750 with 6 SATA. I'm using AHCI mode.

Gerry Reno (greno-verizon) wrote :

I just went and upgraded to kernel 2.6.37-12-server from kernel-ppa
and after rebooting into it I'm still see the same READ FPDMA QUEUED errors I was before.
Both drives in the machine check out fine according to the drive tests.

I've noticed a little unexplained freezing up and releasing the past couple days and then today it started with these errors almost constantly. And I'm trying to remember what packages I might have updated recently that could have contributed to this issue.

I'm also going to check all the cabling in the server to make sure nothing has worked loose.

Gerry Reno (greno-verizon) wrote :

Cabling is fine.

I just tried to run 'sudo apt-get update' and it is telling me that I need to run 'dpkg --configure -a'. When I do that it tries to reconfigure the new kernel package again. So the read errors must have prevented the configure from completing.

But the read errors will not let this kernel package configure run to completion even now so I'm stuck. Cannot run apt-get till I figure out how to get the configure to finish without triggering all these errors.

Gerry Reno (greno-verizon) wrote :

Tried adding libata.force=noncq to 2.6.37-12-server kernel boot line and the errors are changed to READ DMA EXT.

This particular machine has a Gigabtye M/B.

Some details:
# lspci
00:00.0 Host bridge: Advanced Micro Devices [AMD] RS780 Host Bridge
00:01.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (int gfx)
00:0a.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (PCIE port 5)
00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode]
00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3a)
00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller
00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA)
00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge
00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller
00:18.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control
01:05.0 VGA compatible controller: ATI Technologies Inc Radeon HD 3300 Graphics
01:05.1 Audio device: ATI Technologies Inc RS780 Azalia controller
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
03:0e.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link)

Adam Ziegler (mrbond) wrote :

I can confirm the same on Maverick x64 Server, kernel 2.6.35-24-server, all latest software updates. ASUS P6X58D Premium, latest BIOS.

I am seeing ata soft resets, hard resets, and then crashes on both devices connected to my Marvell 9123 ports (SATA-6Gbps), a SATA-3Gbps Corsair F120 SSD (w/latest firmware, as the OS/boot drive) and a SATA-1.5Gbps LG Blu-ray reader. Preceded by "READ FPDMA QUEUED" command errors in dmesg. All other SATA devices connected to the SATA-3Gbps controller are fine.

The system is hardlocked, except...I can still login via SSH from a networked machine, and I can browse the machine through SFTP. Trying to run basic system commands (like ls -l, fdisk, sudo, and the like) results in "Bus error" or "Input/output error". "shutdown" is impossible, so i have to hard-reset the machine. This happens randomly, without any apparent cause, and no data loss or corruption (though it scares me a little to hard-reset at all).

I recall this happening with prior kernels, as well. I will try switching the drive to a SATA-3Gbps port and see what happens. I don't care so much about losing access to the Blu-ray drive periodically, but I'd rather not lose the OS drive.

Adam Ziegler (mrbond) wrote :

Also, I'd move that this bug be marked fairly important/high priority, as if you have your OS drive connected to the affected ports, the system hardcrashes, an absolute failure for systems that need stable/consistent uptime.

Nicolas Krzywinski (nsk7even) wrote :

For me its even worse so that Crucial RealSSD C300 only sometimes is recognised even in GRUB stage. When I am lucky and it is found I rarely can bootup until the end ... mostly it stucks at those .... Exception Emask ... frozen .... messages.

Thing got worse with upgrade from Lucid (where bootup and working with the system worked most of the time but those messages above where omnipresent in syslog and i hoped to solve this with dist-upgrade .... failed) - it was hard work to do the upgrade, I had to do it with minimal system without GUI to not get stuck at some installation steps. And now I can not use the system anymore.

Notice that Windows 7 works without a problem, though I notice some kind of delay at bootup before the first bootup screen appears.

The system is Asus P7P55D-E LX and C300 connected to 6 gb/s Marvel controller.

Lucas Hope (lucas-r-hope) wrote :

For people having problems with this, I would like to re-iterate two things:

1. The problem was fixed when I went through the 3mb/second channel.

2. I had to re-install the OS completely due to data corruption caused by the READ FDMA QUEUED errors. Don't expect your system to work properly until you've re-installed.

The workaround of using the 3mb/second sata channel has worked perfectly for me for five months.

Good luck.

Gerry Reno (greno-verizon) wrote :

>>> 1. The problem was fixed when I went through the 3mb/second channel.

That's probably because it was a different controller or the hardware at least differed sufficiently enought so as to not exhibit the problem.

>>> 2. I had to re-install the OS completely due to data corruption caused by the READ FDMA QUEUED errors. Don't expect your system to work properly until you've re-installed.

I've had no problem getting 'fsck.ext4' to repair the minimal amount of problems that were caused so far. My system remounts the filesystem read-only immediately upon detection of errors so nothing else can be written and maybe that helped reduce any corruption.

Nevertheless, this is certainly a very serious problem and this bug warrants a high priority.

Gerry Reno (greno-verizon) wrote :

Opened a kernel bug about this problem: https://bugzilla.kernel.org/show_bug.cgi?id=26702

.

Nicolas Krzywinski (nsk7even) wrote :

I am pretty sure that my system would work if I connect C300 to other sata ports, controlled by Intel ICH, but I selected those hardware combination _especially_ because of C300 and Marvell controller being able to communication beyond sata 3G performance (benchmark proved that they really use that bandwith).
As soon as there is the other operating system being able to work at that speed (though I admit I never measured, because at work I have to work and cannot play around with stuff like that for a long time...) there is no option for me to downgrade to older interface specifitations.

pepre (ea1256) wrote :

It ist not only caused by marvell chips:

$ lspci
00:00.0 Host bridge: ATI Technologies Inc RX780/RX790 Chipset Host Bridge
00:02.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (external gfx0 port A)
00:04.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (PCI express gpp port A)
00:0a.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (PCI express gpp port F)
00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode]
00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3c)
00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller
00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge
00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller
00:18.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control

Edwin Chiu (edwin-chiu) wrote :

Try a different HD, I've had similar issues with my Seagate ST32000542AS, 3 out of 5 drives died within 4 weeks already.... prior to "officlal" death, I saw similar errors above...

pepre (ea1256) wrote :

> Try a different HD

This doesn't help. My RAID5 with 4 HDs works perfectly with archlinux. SMART and various stresstests didn't show any HD-errors .

It's a real bug, not a hardware failure.

I'm having this problem in 2.6.35-25-generic #44-Ubuntu SMP Fri Jan 21 17:40:44 UTC 2011 x86_64 GNU/Linux too.

Crashing at least daily. Only able to recover with great effort.

jnygaard (jens-olav-nygaard) wrote :

Same here. One way to trigger the problem is to do a "du" on a 500GB partition with a lot of files, both small and large. After a while:

Feb 3 23:10:19 xx kernel: [199664.670378] ata9: hard resetting link
Feb 3 23:10:20 xx kernel: [199665.032664] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb 3 23:10:20 xx kernel: [199665.045688] ata9.00: configured for UDMA/133
Feb 3 23:10:20 xx kernel: [199665.045695] ata9.00: device reported invalid CHS sector 0
Feb 3 23:10:20 xx kernel: [199665.045702] ata9: EH complete

I just noticed this after changing my 3 SATA-disks from the Intel SATA 3Gbps ports on my P8P67-mainboard (the Intel bug thingy) to the 6 Gbps ports on the same mainboard. The error messages stems from the Marvell-ports.

andornaut (andornaut) wrote :

I'm experiencing a similar issue. The system hangs periodically for about a minute while the HD resets.

Environment:
Ubuntu running in a Virtual Box VM running on WIndows 7 64bit.
Asus G53JW Laptop
Intel X25 SSD

Log excerpts:
Jan 17 14:47:58 vm rsyslogd: [origin software="rsyslogd" swVersion="4.2.0" x-pid="578" x-info="http://www.rsyslog.com"] rsyslogd was HUPed, type 'lightweight'.
Jan 17 14:48:32 vm kernel: [ 1513.120252] ata3: hard resetting link
Jan 17 14:48:33 vm kernel: [ 1513.470345] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 17 14:48:33 vm kernel: [ 1513.471232] ata3.00: configured for UDMA/133
Jan 17 14:48:33 vm kernel: [ 1513.471241] ata3.00: device reported invalid CHS sector 0
Jan 17 14:48:33 vm kernel: [ 1513.471255] ata3: EH complete

[ 1513.120200] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[ 1513.120210] ata3.00: failed command: READ FPDMA QUEUED
[ 1513.120222] ata3.00: cmd 60/08:00:30:66:4c/00:00:00:00:00/40 tag 0 ncq 4096 in
[ 1513.120224] res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 1513.120230] ata3.00: status: { DRDY }
[ 1513.120252] ata3: hard resetting link
[ 1513.470345] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1513.471232] ata3.00: configured for UDMA/133
[ 1513.471241] ata3.00: device reported invalid CHS sector 0
[ 1513.471255] ata3: EH complete

Edwin Chiu (edwin-chiu) wrote :

So this appears to be happening on Marvell, JMicron and SB700/800 chips, not good!

What kernel version of archlinux are you running? A possible regression sounds likely in around the 2.6.33 timeframe?

Another solution i've seen (didn't work for me) was to try pcie_aspm=off in your boot options.

Edwin Chiu (edwin-chiu) wrote :
Download full text (3.3 KiB)

Tried booting 2.6.31-22-server (from karmic) on a maverick install and same error. I'm not entirely convinced this is a software bug, seems to target the same drive. I have 5 identical drives, and switching them around, so they are on different ports/cables, etc. doesn't seem to make the problem shift. Seems to be the drive...

On an individual basis, I'd say I had some bad drives, but when taking into account other reports, seems to be more than just a bad drive, but on a single system basis, it doesn't add up? If this was a software or hardware (non HD) bug, why does the problem follow the bad drive around? Why don't I get the problem on other drives?

Below is the output from 2.6.31-22, looks like the ata code isn't as robust, as it fails the drive and kicks it. Maverick seems better at recovering the drive so that it's usable.

My "reliable" way of triggering this is to launch a kvm process (tried virtio and ide emulation, same trigger). On the LV that hosts the kvm guest, I was able to dd the entire volume to /dev/null without and read issues...

[ 286.010222] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 286.010242] ata5.00: cmd 25/00:08:20:b4:4d/00:00:78:00:00/e0 tag 0 dma 4096 in
[ 286.010246] res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
[ 286.010253] ata5.00: status: { DRDY }
[ 286.010262] ata5: hard resetting link
[ 291.580170] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 291.580180] ata5.00: link online but device misclassifed
[ 296.580129] ata5.00: qc timeout (cmd 0xec)
[ 296.580166] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 296.580172] ata5.00: revalidation failed (errno=-5)
[ 296.580181] ata5: hard resetting link
[ 302.150170] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 302.150179] ata5.00: link online but device misclassifed
[ 312.150066] ata5.00: qc timeout (cmd 0xec)
[ 312.150103] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 312.150109] ata5.00: revalidation failed (errno=-5)
[ 312.150116] ata5: limiting SATA link speed to 1.5 Gbps
[ 312.150124] ata5: hard resetting link
[ 317.720136] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 317.720145] ata5.00: link online but device misclassifed
[ 347.720098] ata5.00: qc timeout (cmd 0xec)
[ 347.720135] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 347.720142] ata5.00: revalidation failed (errno=-5)
[ 347.720148] ata5.00: disabled
[ 347.720162] ata5.00: device reported invalid CHS sector 0
[ 347.720176] ata5: hard resetting link
[ 353.290169] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 353.290178] ata5.00: link online but device misclassifed
[ 353.290198] ata5: EH complete
[ 353.290224] sd 4:0:0:0: [sdd] Unhandled error code
[ 353.290229] sd 4:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 353.290237] end_request: I/O error, dev sdd, sector 2018358304
[ 353.290245] raid10: sdd4: rescheduling sector 1542176
[ 353.290271] sd 4:0:0:0: [sdd] Unhandled error code
[ 353.290275] sd 4:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 353.290282] end_request: I/O error, dev sdd, sector 22636850...

Read more...

Matt Cargo (mcargo) wrote :

Affected by the same bug. Running Ubuntu 10.04.2 LTS on hp laptop. See
attached lspci output.

After running fine for a while, the root file system develops errors and
is remounted as read-only. Upon reboot, I get only simple ash shell.
I can fix the long list of file system errors with e2fsck, but I'm
worried about whether this will continue to work, and doing these fixes
wastes time. Any other info supplied on request.

Some history: The first time I upgraded to 10.04, I started having
similar file system problems. Bought a new hard drive, and downgraded
to the previous Ubuntu. No problems for a long time. I decided I should
upgrade, and the problem resurfaced.

Download full text (4.1 KiB)

My Hardware:
Mainboard: GA-879A-UD3

The relevant output from lspci (disc controllers):

00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode] (rev 40)
00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller (rev 40)
04:00.0 SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 03)
04:00.1 IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 03)
06:00.0 SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 02)
06:00.1 IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 02)

Used discs:
Seagate ST31000524NS Firmware SN11 (according to smartctl --all), currently no FW upgrade available
Western DIgital WD1003FBYX

Both disc types are 24/7 discs according to their manufactures.

Everytime I will talk about kernel compilation, I use the .config from the original ubuntu 10.10 server kernel and create .deb files using kernel-package.

All of the issues above are the same that happened to me with the Seagate discs which I bought in the first place in combination with the mentioned mainboard.

I already had a working system on those 4 discs, but connected through an PCI-E SAS controller, working like a charm.
As the controller was only for testing purposes, I connected my 4 discs directly to the mainboard. Thats where all the problems begun you are describing above.
I couldn't boot my system anymore, but each time, when I booted with RIP-Linux or knoppix the installed linux raid + lvm2 signatures were found and usable.
No errors were reported to dmesg.
Simply booting the system was not possible.
As the data on the discs were not to important, I decided to install from scratch... So I booted with my ubuntu 10.10 server cd. When the installer came to the partition manager, the errors you're describing all occurred on console ALT+F4.
When I hit the reset button, the mainboard's bios did not recognize the discs anymore and hung while trying to enumerate the connected discs.
After a hard reboot (pulled the cable) the bios at least worked again.
So I made an bios upgrade.
Same problem as before.
By the time I destroyed my filesystem voluntarily as I thought, that this might have something to do with a signature, the initial SAS controller had written to the discs.
I tried with Ubuntu 10.04 Server - exactly the same...
So every time I booted a recent Ubuntu version, 64 or 32 bit, the partition manager could not read the discs and the mainboard was not able to enumerate after soft-reset.
Everytime I booted with a non Ubuntu distro, I had no problems, I concluded, that this might be an ubuntu issue, so I decided to setup the new Debian Squeeze.
Worked like a charm to me.
But as usual, debian is outdated by the time it's released, so I installed the ubuntu 10.10 server kernel to my debian system and guess: Yes, same problem as above....
Next step was to use the ubuntu kernel sources, compile them in debian, and boot them -> worked
Now I got the vanilla kernel 2.6.37 (current stable), compiled and booted -> worked

Now I knew, that's an Ubuntu Kernel issue.

2 weeks later and having returning 3 of 4 ...

Read more...

Edwin Chiu (edwin-chiu) wrote :

Strange, probably a different issue than mine. Mine seems to have gone away with a new PSU. Seems like my particular drives Barracuda 2TB LPs (ST32000542AS) go way above their power draw when flushing their cache. Normally not an issue, but in a RAID configuration, the simultaneous flush seems to cause one or more of the drives to lose power and cause issues. I had a 12V rail with 17A rating, but 2 drives (with max operating of 2A) seems to be more than enough to cause issues. Moved to a single 12V rail with 52A and problems went away.

Annoying as hell. WD20EARS drives seem to operate within their defined margin much better. In my case, I don't believe it's a Ubuntu related issue, i looked at the differences between stock and Ubuntu and nothing really pops out in the libata area (with regards to cache flushes at least).

Still, based on all my reading and such, I can't help but feel the SB700/800 + Seagate drives just don't play well for whatever reason.

Weardo (athlon74rus) wrote :

[ 32.816080] ata2: lost interrupt (Status 0x58)
[ 32.820012] ata2: drained 32768 bytes to clear DRQ.
[ 32.871768] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 32.871830] ata2.00: failed command: READ DMA
[ 32.871885] ata2.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[ 32.871888] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 32.871985] ata2.00: status: { DRDY }
[ 32.872094] ata2: soft resetting link
[ 33.116959] ata2.00: configured for UDMA/100
[ 33.116973] ata2.00: device reported invalid CHS sector 0
[ 33.117005] ata2: EH complete
[ 63.816077] ata2: lost interrupt (Status 0x58)
[ 63.820019] ata2: drained 32768 bytes to clear DRQ.
[ 63.871764] ata2.00: limiting speed to UDMA/66:PIO4
[ 63.871773] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 63.871835] ata2.00: failed command: READ DMA
[ 63.871890] ata2.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[ 63.871893] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 63.871990] ata2.00: status: { DRDY }
[ 63.872102] ata2: soft resetting link
[ 64.092468] ata2.00: configured for UDMA/66
[ 64.092481] ata2.00: device reported invalid CHS sector 0
[ 64.092511] ata2: EH complete
[ 94.816075] ata2: lost interrupt (Status 0x58)
[ 94.820018] ata2: drained 32768 bytes to clear DRQ.
[ 94.871763] ata2.00: limiting speed to UDMA/33:PIO4
[ 94.871771] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 94.871832] ata2.00: failed command: READ DMA
[ 94.871887] ata2.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[ 94.871890] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 94.871987] ata2.00: status: { DRDY }
[ 94.872100] ata2: soft resetting link
[ 95.044462] ata2.00: configured for UDMA/33
[ 95.044474] ata2.00: device reported invalid CHS sector 0
[ 95.044503] ata2: EH complete
[ 125.816079] ata2: lost interrupt (Status 0x58)
[ 125.820019] ata2: drained 32768 bytes to clear DRQ.
[ 125.871767] ata2.00: limiting speed to PIO4
[ 125.871775] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 125.871836] ata2.00: failed command: READ DMA
[ 125.871891] ata2.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[ 125.871894] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 125.872056] ata2.00: status: { DRDY }
[ 125.872162] ata2: soft resetting link
[ 126.044453] ata2.00: configured for PIO4
[ 126.044465] ata2.00: device reported invalid CHS sector 0
[ 126.044495] ata2: EH complete

Debian 6, need help :(

Weardo (athlon74rus) wrote :

Problem solved by replacing the tires :)

This is effecting me as well. I've the same SB700/800 chipset but with 2 1TB Hitachi drives (I don't have the model to hand). They don't cause the issue all the time. Running an apt update/upgrade caused it momentarily but logging into KDE causes it to almost block the drive for good minute.

Vasco (vasco-visser) wrote :

I can confirm this bug as well. I also have a Gigabyte mainbord with the SB700/800 chipset. I have no option to disable NCQ.

System is running Ubuntu kernel 2.6.32-30-generic

Download full text (6.4 KiB)

Add to your boot params: libata.force=noncq

It's not a guarantee to work, just helps quite a bit.

Also are you running in any sort of RAID configuration? I found a new PSU
helped a little as well, had very few incidents up until yesterday... I
still blame Seagate drives as being part of the problem.

On Thu, Mar 31, 2011 at 14:36, Vasco <email address hidden> wrote:

> I can confirm this bug as well. I also have a Gigabyte mainbord with the
> SB700/800 chipset. I have no option to disable NCQ.
>
> System is running Ubuntu kernel 2.6.32-30-generic
>
> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/550559
>
> Title:
> hdd problems, failed command: READ FPDMA QUEUED
>
> Status in Ubuntu:
> Confirmed
>
> Bug description:
> Hello!
>
> I have a brand new computer. With a SSD device and a SATA hard drive,
> a Seagate Barracuda XT specifically 6Gb / s of 2TB. The latter is
> connected to a Marvell 9123 controller that I set AHCI mode in BIOS.
>
> I have the OS installed on the SSD device, but when you try to read
> the disc 2TB gives several bugs.
>
> I tried to change the disk to another controller and gives the same
> problem, I even removed the disk partition table, having the same
> fate.
>
> I checked the disc for flaws from Windows with hd tune and
> verification tool official record, and does not give me any errors.
>
> I have tested with kernel version 2.6.34-rc2 and it works properly
> with this disc.
>
> The errors given are the following:
>
> [ 9.115544] ata9: exception Emask 0x0 SAct 0xf SErr 0x0 action 0x10
> frozen
> [ 9.115550] ata9.00: failed command: READ FPDMA QUEUED
> [ 9.115556] ata9.00: cmd 60/04:00:d4:82:85/00:00:1f:00:00/40 tag 0 ncq
> 2048 in
> [ 9.115557] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4
> (timeout)
> [ 9.115560] ata9.00: status: { DRDY }
> [ 9.115562] ata9.00: failed command: READ FPDMA QUEUED
> [ 9.115568] ata9.00: cmd 60/01:08:d1:82:85/00:00:1f:00:00/40 tag 1 ncq
> 512 in
> [ 9.115569] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4
> (timeout)
> [ 9.115572] ata9.00: status: { DRDY }
> [ 9.115574] ata9.00: failed command: READ FPDMA QUEUED
> [ 9.115579] ata9.00: cmd 60/01:10:d2:82:85/00:00:1f:00:00/40 tag 2 ncq
> 512 in
> [ 9.115581] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4
> (timeout)
> [ 9.115583] ata9.00: status: { DRDY }
> [ 9.115586] ata9.00: failed command: READ FPDMA QUEUED
> [ 9.115591] ata9.00: cmd 60/01:18:d3:82:85/00:00:1f:00:00/40 tag 3 ncq
> 512 in
> [ 9.115592] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4
> (timeout)
> [ 9.115595] ata9.00: status: { DRDY }
> [ 9.115609] sd 8:0:0:0: [sdb] Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [ 9.115612] sd 8:0:0:0: [sdb] Sense Key : Aborted Command [current]
> [descriptor]
> [ 9.115616] Descriptor sense data with sense descriptors (in hex):
> [ 9.115618] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
> [ 9.115626] 1f 85 82 d3
> [ 9.115629] sd 8:0:0:0: [sdb] Add. Sense: No additional sense
...

Read more...

Vasco (vasco-visser) wrote :
Download full text (12.9 KiB)

> Add to your boot params: libata.force=noncq
> It's not a guarantee to work, just helps quite a bit.
Reading the comments above I dont see this helping much.

>
> Also are you running in any sort of RAID configuration?
Yes, I am. I have four disks attached to the controller; on each pair
of disks two partitions are in a RAID 1.

> I found a new PSU
> helped a little as well, had very few incidents up until yesterday...
I dont see a reason to assume this is related to the PSU. This very
thread more or less proves this is a either a bug in linux kernel or
in firmware/hardware.

> I still blame Seagate drives as being part of the problem.
I just checked and it turns out I do actually have two Seagate
Barracuda 7200.10 disks (I thought I had only Samsung). And it is
indeed one of those that is causing problems. This is interesting.

>
> On Thu, Mar 31, 2011 at 14:36, Vasco <email address hidden> wrote:
>
>> I can confirm this bug as well. I also have a Gigabyte mainbord with the
>> SB700/800 chipset. I have no option to disable NCQ.
>>
>> System is running Ubuntu kernel 2.6.32-30-generic
>>
>> --
>> You received this bug notification because you are a direct subscriber
>> of the bug.
>> https://bugs.launchpad.net/bugs/550559
>>
>> Title:
>>  hdd problems, failed command: READ FPDMA QUEUED
>>
>> Status in Ubuntu:
>>  Confirmed
>>
>> Bug description:
>>  Hello!
>>
>>  I have a brand new computer. With a SSD device and a SATA hard drive,
>>  a Seagate Barracuda XT specifically 6Gb / s of 2TB. The latter is
>>  connected to a Marvell 9123 controller that I set AHCI mode in BIOS.
>>
>>  I have the OS installed on the SSD device, but when you try to read
>>  the disc 2TB gives several bugs.
>>
>>  I tried to change the disk to another controller and gives the same
>>  problem, I even removed the disk partition table, having the same
>>  fate.
>>
>>  I checked the disc for flaws from Windows with hd tune and
>>  verification tool official record, and does not give me any errors.
>>
>>  I have tested with kernel version 2.6.34-rc2 and it works properly
>>  with this disc.
>>
>>  The errors given are the following:
>>
>>  [    9.115544] ata9: exception Emask 0x0 SAct 0xf SErr 0x0 action 0x10
>> frozen
>>  [    9.115550] ata9.00: failed command: READ FPDMA QUEUED
>>  [    9.115556] ata9.00: cmd 60/04:00:d4:82:85/00:00:1f:00:00/40 tag 0 ncq
>> 2048 in
>>  [    9.115557]          res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4
>> (timeout)
>>  [    9.115560] ata9.00: status: { DRDY }
>>  [    9.115562] ata9.00: failed command: READ FPDMA QUEUED
>>  [    9.115568] ata9.00: cmd 60/01:08:d1:82:85/00:00:1f:00:00/40 tag 1 ncq
>> 512 in
>>  [    9.115569]          res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4
>> (timeout)
>>  [    9.115572] ata9.00: status: { DRDY }
>>  [    9.115574] ata9.00: failed command: READ FPDMA QUEUED
>>  [    9.115579] ata9.00: cmd 60/01:10:d2:82:85/00:00:1f:00:00/40 tag 2 ncq
>> 512 in
>>  [    9.115581]          res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4
>> (timeout)
>>  [    9.115583] ata9.00: status: { DRDY }
>>  [    9.115586] ata9.00: failed command: READ FPDMA QUEUED
>>  [    9.115591] ata9.00: cmd 60/01:18:d3:8...

Edwin Chiu (edwin-chiu) wrote :
Download full text (20.0 KiB)

I suspect the seagates are operating outside of their defined maximum
current draws. I don't really have the equipment to measure this
properly though.

The problem with RAID is that when syncing up writes, a simultaneous
cache flush is done, and this is suspected to cause a spike in power
draw and presumably a drive isn't getting what it needs and "funny"
errors crop up.

Switching to a single 12V raile (max 50A) seems to have helped.
Doesn't really make sense, drive has max 2A draw, but its hard to
ignore the fact that it appears to make a difference...

Of course, only had these issues with seagate barracuda lp 2tb drives
so far (only seagate drives I have); sata2 variant of the drives.

If you can rig it, try to run the drives off a diff psu and see if
that works??? Definitely less than ideal....

On 2011-04-01, Vasco <email address hidden> wrote:
>> Add to your boot params: libata.force=noncq
>> It's not a guarantee to work, just helps quite a bit.
> Reading the comments above I dont see this helping much.
>
>>
>> Also are you running in any sort of RAID configuration?
> Yes, I am. I have four disks attached to the controller; on each pair
> of disks two partitions are in a RAID 1.
>
>> I found a new PSU
>> helped a little as well, had very few incidents up until yesterday...
> I dont see a reason to assume this is related to the PSU. This very
> thread more or less proves this is a either a bug in linux kernel or
> in firmware/hardware.
>
>> I still blame Seagate drives as being part of the problem.
> I just checked and it turns out I do actually have two Seagate
> Barracuda 7200.10 disks (I thought I had only Samsung). And it is
> indeed one of those that is causing problems. This is interesting.
>
>>
>> On Thu, Mar 31, 2011 at 14:36, Vasco <email address hidden> wrote:
>>
>>> I can confirm this bug as well. I also have a Gigabyte mainbord with the
>>> SB700/800 chipset. I have no option to disable NCQ.
>>>
>>> System is running Ubuntu kernel 2.6.32-30-generic
>>>
>>> --
>>> You received this bug notification because you are a direct subscriber
>>> of the bug.
>>> https://bugs.launchpad.net/bugs/550559
>>>
>>> Title:
>>>  hdd problems, failed command: READ FPDMA QUEUED
>>>
>>> Status in Ubuntu:
>>>  Confirmed
>>>
>>> Bug description:
>>>  Hello!
>>>
>>>  I have a brand new computer. With a SSD device and a SATA hard drive,
>>>  a Seagate Barracuda XT specifically 6Gb / s of 2TB. The latter is
>>>  connected to a Marvell 9123 controller that I set AHCI mode in BIOS.
>>>
>>>  I have the OS installed on the SSD device, but when you try to read
>>>  the disc 2TB gives several bugs.
>>>
>>>  I tried to change the disk to another controller and gives the same
>>>  problem, I even removed the disk partition table, having the same
>>>  fate.
>>>
>>>  I checked the disc for flaws from Windows with hd tune and
>>>  verification tool official record, and does not give me any errors.
>>>
>>>  I have tested with kernel version 2.6.34-rc2 and it works properly
>>>  with this disc.
>>>
>>>  The errors given are the following:
>>>
>>>  [    9.115544] ata9: exception Emask 0x0 SAct 0xf SErr 0x0 action 0x10
>>> frozen
>>>  [    9.115550] at...

I am running two Hitachi drives so I don't agree (perhaps Seagate drives also have a problem but obviously not in my case) I have also tried a 2nd 700W power supply in case my current one was the issue and it was exactly the same problem.

Vasco (vasco-visser) wrote :

If this is indeed a power issue, then why does it manifest now? These
two drives have been spinning in the same configuration for years,
never had any problems.

> Switching to a single 12V raile (max 50A) seems to have helped.
> Doesn't really make sense, drive has max 2A draw, but its hard to
> ignore the fact that it appears to make a difference...
What does make a difference mean, is it gone or appears less often? It
could be just coincidence, as it seems for me the problem is quite
erratic.

> Of course, only had these issues with seagate barracuda lp 2tb drives
> so far (only seagate drives I have); sata2 variant of the drives.
Other people here that don't have Seagates also have the problem, so
we should be careful not to make a premature assumption that this
problem has a causal relationship with Seagate drives. It could be
that Seagates are somehow more susceptible to the phenomenon, but it
looks like not limited to only Seagates.

> If you can rig it, try to run the drives off a diff psu and see if
> that works??? Definitely less than ideal....

I don't have a spare PSU lying around.

>
>
>
> On 2011-04-01, Vasco <email address hidden> wrote:
>>> Add to your boot params: libata.force=noncq
>>> It's not a guarantee to work, just helps quite a bit.
>> Reading the comments above I dont see this helping much.
>>
>>>
>>> Also are you running in any sort of RAID configuration?
>> Yes, I am. I have four disks attached to the controller; on each pair
>> of disks two partitions are in a RAID 1.
>>
>>> I found a new PSU
>>> helped a little as well, had very few incidents up until yesterday...
>> I dont see a reason to assume this is related to the PSU. This very
>> thread more or less proves this is a either a bug in linux kernel or
>> in firmware/hardware.
>>
>>> I still blame Seagate drives as being part of the problem.
>> I just checked and it turns out I do actually have two Seagate
>> Barracuda 7200.10 disks (I thought I had only Samsung). And it is
>> indeed one of those that is causing problems. This is interesting.
>>
>>>
>>> On Thu, Mar 31, 2011 at 14:36, Vasco <email address hidden> wrote:
>>>
>>>> I can confirm this bug as well. I also have a Gigabyte mainbord with the
>>>> SB700/800 chipset. I have no option to disable NCQ.
>>>>
>>>> System is running Ubuntu kernel 2.6.32-30-generic
>>>>
>>>> --
>>>> You received this bug notification because you are a direct subscriber
>>>> of the bug.
>>>> https://bugs.launchpad.net/bugs/550559
>>>>
>>>> Title:
>>>>  hdd problems, failed command: READ FPDMA QUEUED
>>>>
>>>> Status in Ubuntu:
>>>>  Confirmed
>>>>
>>>> Bug description:
>>>>  Hello!
>>>>
>>>>  I have a brand new computer. With a SSD device and a SATA hard drive,
>>>>  a Seagate Barracuda XT specifically 6Gb / s of 2TB. The latter is
>>>>  connected to a Marvell 9123 controller that I set AHCI mode in BIOS

Edwin Chiu (edwin-chiu) wrote :
Download full text (9.6 KiB)

I'm just speaking from my own experience. I have WD drives in play and they
don't seem impacted at all, only the Seagates. As for other with issues, I
don't recall if those were related or not. With the new PSU, the problem
pretty much went away for almost 2 months, but has since resurfaced. I
rechecked all the cabling and everything looked fine. I've tried swapping
cabling, no difference. It's a bit of a mystery... for now, I will switch to
WD drives. That's a solution that works for me. Whether or not it works for
someone else, I have no idea. Why this is the case? Again, I don't really
have any logical reason why, just evidence from my own experience that the
problem doesn't manifest itself when using WD drives.

On Fri, Apr 1, 2011 at 10:39, Vasco <email address hidden> wrote:

> If this is indeed a power issue, then why does it manifest now? These
> two drives have been spinning in the same configuration for years,
> never had any problems.
>
> > Switching to a single 12V raile (max 50A) seems to have helped.
> > Doesn't really make sense, drive has max 2A draw, but its hard to
> > ignore the fact that it appears to make a difference...
> What does make a difference mean, is it gone or appears less often? It
> could be just coincidence, as it seems for me the problem is quite
> erratic.
>
> > Of course, only had these issues with seagate barracuda lp 2tb drives
> > so far (only seagate drives I have); sata2 variant of the drives.
> Other people here that don't have Seagates also have the problem, so
> we should be careful not to make a premature assumption that this
> problem has a causal relationship with Seagate drives. It could be
> that Seagates are somehow more susceptible to the phenomenon, but it
> looks like not limited to only Seagates.
>
> > If you can rig it, try to run the drives off a diff psu and see if
> > that works??? Definitely less than ideal....
>
> I don't have a spare PSU lying around.
>
> >
> >
> >
> > On 2011-04-01, Vasco <email address hidden> wrote:
> >>> Add to your boot params: libata.force=noncq
> >>> It's not a guarantee to work, just helps quite a bit.
> >> Reading the comments above I dont see this helping much.
> >>
> >>>
> >>> Also are you running in any sort of RAID configuration?
> >> Yes, I am. I have four disks attached to the controller; on each pair
> >> of disks two partitions are in a RAID 1.
> >>
> >>> I found a new PSU
> >>> helped a little as well, had very few incidents up until yesterday...
> >> I dont see a reason to assume this is related to the PSU. This very
> >> thread more or less proves this is a either a bug in linux kernel or
> >> in firmware/hardware.
> >>
> >>> I still blame Seagate drives as being part of the problem.
> >> I just checked and it turns out I do actually have two Seagate
> >> Barracuda 7200.10 disks (I thought I had only Samsung). And it is
> >> indeed one of those that is causing problems. This is interesting.
> >>
> >>>
> >>> On Thu, Mar 31, 2011 at 14:36, Vasco <email address hidden>
> wrote:
> >>>
> >>>> I can confirm this bug as well. I also have a Gigabyte mainbord with
> the
> >>>> SB700/800 chipset. I have no option to disable NCQ.
> >>>>
> >>...

Read more...

IKT (ikt) wrote :

Awesome bug.

Same situation as OP, 120GB SSD drive w/ 1.5TB files drive.

For reference the 1.5TB drive is a western digital.

Edwin Chiu (edwin-chiu) wrote :
Download full text (6.1 KiB)

Which one is "failing"? Same MB? Which WD model?

On Mon, Apr 4, 2011 at 07:58, IKT <email address hidden> wrote:

> Awesome bug.
>
> Same situation as OP, 120GB SSD drive w/ 1.5TB files drive.
>
> For reference the 1.5TB drive is a western digital.
>
> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/550559
>
> Title:
> hdd problems, failed command: READ FPDMA QUEUED
>
> Status in Ubuntu:
> Confirmed
>
> Bug description:
> Hello!
>
> I have a brand new computer. With a SSD device and a SATA hard drive,
> a Seagate Barracuda XT specifically 6Gb / s of 2TB. The latter is
> connected to a Marvell 9123 controller that I set AHCI mode in BIOS.
>
> I have the OS installed on the SSD device, but when you try to read
> the disc 2TB gives several bugs.
>
> I tried to change the disk to another controller and gives the same
> problem, I even removed the disk partition table, having the same
> fate.
>
> I checked the disc for flaws from Windows with hd tune and
> verification tool official record, and does not give me any errors.
>
> I have tested with kernel version 2.6.34-rc2 and it works properly
> with this disc.
>
> The errors given are the following:
>
> [ 9.115544] ata9: exception Emask 0x0 SAct 0xf SErr 0x0 action 0x10
> frozen
> [ 9.115550] ata9.00: failed command: READ FPDMA QUEUED
> [ 9.115556] ata9.00: cmd 60/04:00:d4:82:85/00:00:1f:00:00/40 tag 0 ncq
> 2048 in
> [ 9.115557] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4
> (timeout)
> [ 9.115560] ata9.00: status: { DRDY }
> [ 9.115562] ata9.00: failed command: READ FPDMA QUEUED
> [ 9.115568] ata9.00: cmd 60/01:08:d1:82:85/00:00:1f:00:00/40 tag 1 ncq
> 512 in
> [ 9.115569] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4
> (timeout)
> [ 9.115572] ata9.00: status: { DRDY }
> [ 9.115574] ata9.00: failed command: READ FPDMA QUEUED
> [ 9.115579] ata9.00: cmd 60/01:10:d2:82:85/00:00:1f:00:00/40 tag 2 ncq
> 512 in
> [ 9.115581] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4
> (timeout)
> [ 9.115583] ata9.00: status: { DRDY }
> [ 9.115586] ata9.00: failed command: READ FPDMA QUEUED
> [ 9.115591] ata9.00: cmd 60/01:18:d3:82:85/00:00:1f:00:00/40 tag 3 ncq
> 512 in
> [ 9.115592] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4
> (timeout)
> [ 9.115595] ata9.00: status: { DRDY }
> [ 9.115609] sd 8:0:0:0: [sdb] Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [ 9.115612] sd 8:0:0:0: [sdb] Sense Key : Aborted Command [current]
> [descriptor]
> [ 9.115616] Descriptor sense data with sense descriptors (in hex):
> [ 9.115618] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
> [ 9.115626] 1f 85 82 d3
> [ 9.115629] sd 8:0:0:0: [sdb] Add. Sense: No additional sense
> information
> [ 9.115633] sd 8:0:0:0: [sdb] CDB: Read(10): 28 00 1f 85 82 d4 00 00 04
> 00
> [ 9.115640] end_request: I/O error, dev sdb, sector 528843476
> [ 9.115643] __ratelimit: 18 callbacks suppressed
> [ 9.115646] Buffer I/O error on device sdb2, logical block 317299556
> [ 9.115649...

Read more...

IKT (ikt) wrote :

Same MB, I have an Asrock 890FX Deluxe 3, WD Model: WD15EARS

Heh @ top google result:

http://community.wdc.com/t5/Desktop/WDC-WD15EARS-00Z5B1-awful-performance/td-p/5242

:/

Edwin Chiu (edwin-chiu) wrote :
Download full text (6.7 KiB)

Just the one drive? Strange... I have a WD20EARS drive in the mix, no
problems. Gonna swap out 2 Seagate later with WDs and hopefully my problems
will go away.

Not sure if the MB is entirely at fault. using MHDD, I found several very
"slow" sectors on the Seagates on another controller (old P4 Intel chipset).
2 out of 3 drives passed Seatools, 1 out of 3 failed. And 1 out of 3 that
passed, but failed to complete a short test a couple of times.

Never had so many issues before. Was long time user of Seagate 7200
Barracuda drives (in the 200-500MB range), no issues, 3-5 years no problems
in > 30C environment.

On Mon, Apr 4, 2011 at 13:57, IKT <email address hidden> wrote:

> Same MB, I have an Asrock 890FX Deluxe 3, WD Model: WD15EARS
>
> Heh @ top google result:
>
> http://community.wdc.com/t5/Desktop/WDC-WD15EARS-00Z5B1-awful-
> performance/td-p/5242
>
> :/
>
> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/550559
>
> Title:
> hdd problems, failed command: READ FPDMA QUEUED
>
> Status in Ubuntu:
> Confirmed
>
> Bug description:
> Hello!
>
> I have a brand new computer. With a SSD device and a SATA hard drive,
> a Seagate Barracuda XT specifically 6Gb / s of 2TB. The latter is
> connected to a Marvell 9123 controller that I set AHCI mode in BIOS.
>
> I have the OS installed on the SSD device, but when you try to read
> the disc 2TB gives several bugs.
>
> I tried to change the disk to another controller and gives the same
> problem, I even removed the disk partition table, having the same
> fate.
>
> I checked the disc for flaws from Windows with hd tune and
> verification tool official record, and does not give me any errors.
>
> I have tested with kernel version 2.6.34-rc2 and it works properly
> with this disc.
>
> The errors given are the following:
>
> [ 9.115544] ata9: exception Emask 0x0 SAct 0xf SErr 0x0 action 0x10
> frozen
> [ 9.115550] ata9.00: failed command: READ FPDMA QUEUED
> [ 9.115556] ata9.00: cmd 60/04:00:d4:82:85/00:00:1f:00:00/40 tag 0 ncq
> 2048 in
> [ 9.115557] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4
> (timeout)
> [ 9.115560] ata9.00: status: { DRDY }
> [ 9.115562] ata9.00: failed command: READ FPDMA QUEUED
> [ 9.115568] ata9.00: cmd 60/01:08:d1:82:85/00:00:1f:00:00/40 tag 1 ncq
> 512 in
> [ 9.115569] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4
> (timeout)
> [ 9.115572] ata9.00: status: { DRDY }
> [ 9.115574] ata9.00: failed command: READ FPDMA QUEUED
> [ 9.115579] ata9.00: cmd 60/01:10:d2:82:85/00:00:1f:00:00/40 tag 2 ncq
> 512 in
> [ 9.115581] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4
> (timeout)
> [ 9.115583] ata9.00: status: { DRDY }
> [ 9.115586] ata9.00: failed command: READ FPDMA QUEUED
> [ 9.115591] ata9.00: cmd 60/01:18:d3:82:85/00:00:1f:00:00/40 tag 3 ncq
> 512 in
> [ 9.115592] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4
> (timeout)
> [ 9.115595] ata9.00: status: { DRDY }
> [ 9.115609] sd 8:0:0:0: [sdb] Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [ 9.115612] sd 8:0:0:0: [...

Read more...

IKT (ikt) wrote :

only have 1 drive,

1 x OCZ 120GB SSD
1 x 1.5TB WD HDD

Edwin Chiu (edwin-chiu) wrote :
Download full text (6.0 KiB)

You could try adding the kernel option: pcie_aspm=off

Didn't work for me though...

On Tue, Apr 5, 2011 at 03:55, IKT <email address hidden> wrote:

> only have 1 drive,
>
> 1 x OCZ 120GB SSD
> 1 x 1.5TB WD HDD
>
> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/550559
>
> Title:
> hdd problems, failed command: READ FPDMA QUEUED
>
> Status in Ubuntu:
> Confirmed
>
> Bug description:
> Hello!
>
> I have a brand new computer. With a SSD device and a SATA hard drive,
> a Seagate Barracuda XT specifically 6Gb / s of 2TB. The latter is
> connected to a Marvell 9123 controller that I set AHCI mode in BIOS.
>
> I have the OS installed on the SSD device, but when you try to read
> the disc 2TB gives several bugs.
>
> I tried to change the disk to another controller and gives the same
> problem, I even removed the disk partition table, having the same
> fate.
>
> I checked the disc for flaws from Windows with hd tune and
> verification tool official record, and does not give me any errors.
>
> I have tested with kernel version 2.6.34-rc2 and it works properly
> with this disc.
>
> The errors given are the following:
>
> [ 9.115544] ata9: exception Emask 0x0 SAct 0xf SErr 0x0 action 0x10
> frozen
> [ 9.115550] ata9.00: failed command: READ FPDMA QUEUED
> [ 9.115556] ata9.00: cmd 60/04:00:d4:82:85/00:00:1f:00:00/40 tag 0 ncq
> 2048 in
> [ 9.115557] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4
> (timeout)
> [ 9.115560] ata9.00: status: { DRDY }
> [ 9.115562] ata9.00: failed command: READ FPDMA QUEUED
> [ 9.115568] ata9.00: cmd 60/01:08:d1:82:85/00:00:1f:00:00/40 tag 1 ncq
> 512 in
> [ 9.115569] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4
> (timeout)
> [ 9.115572] ata9.00: status: { DRDY }
> [ 9.115574] ata9.00: failed command: READ FPDMA QUEUED
> [ 9.115579] ata9.00: cmd 60/01:10:d2:82:85/00:00:1f:00:00/40 tag 2 ncq
> 512 in
> [ 9.115581] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4
> (timeout)
> [ 9.115583] ata9.00: status: { DRDY }
> [ 9.115586] ata9.00: failed command: READ FPDMA QUEUED
> [ 9.115591] ata9.00: cmd 60/01:18:d3:82:85/00:00:1f:00:00/40 tag 3 ncq
> 512 in
> [ 9.115592] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4
> (timeout)
> [ 9.115595] ata9.00: status: { DRDY }
> [ 9.115609] sd 8:0:0:0: [sdb] Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [ 9.115612] sd 8:0:0:0: [sdb] Sense Key : Aborted Command [current]
> [descriptor]
> [ 9.115616] Descriptor sense data with sense descriptors (in hex):
> [ 9.115618] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
> [ 9.115626] 1f 85 82 d3
> [ 9.115629] sd 8:0:0:0: [sdb] Add. Sense: No additional sense
> information
> [ 9.115633] sd 8:0:0:0: [sdb] CDB: Read(10): 28 00 1f 85 82 d4 00 00 04
> 00
> [ 9.115640] end_request: I/O error, dev sdb, sector 528843476
> [ 9.115643] __ratelimit: 18 callbacks suppressed
> [ 9.115646] Buffer I/O error on device sdb2, logical block 317299556
> [ 9.115649] Buffer I/O error on device sdb2, log...

Read more...

menthurae (menthurae) wrote :
Download full text (7.2 KiB)

I am also being affected by this bug... Ubuntu 10.04 64-bit.

1x 64GB G.Skill Falcon on Intel ICH9R
4x 2TB WD Black WD2001FASS on Marvell 88SX7042 (Adaptec 1430SA PCI-E SATA Card)

uname -a

--------------------------------------------------------------------------------------------------------------------------------------------
Linux MyPC 2.6.32-30-generic #59-Ubuntu SMP Tue Mar 1 21:30:46 UTC 2011 x86_64 GNU/Linux
--------------------------------------------------------------------------------------------------------------------------------------------

Kernel Log

--------------------------------------------------------------------------------------------------------------------------------------------
Apr 13 20:09:19 MyPC kernel: [ 3877.651433] ata10: SATA link down (SStatus 0 SControl 300)
Apr 13 20:09:19 MyPC kernel: [ 3877.690014] ata9: SATA link down (SStatus 0 SControl 300)
Apr 13 20:09:19 MyPC kernel: [ 3877.812508] ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Apr 13 20:09:19 MyPC kernel: [ 3877.820024] ata13: SATA link down (SStatus 0 SControl 300)
Apr 13 20:09:19 MyPC kernel: [ 3877.830022] ata14: SATA link down (SStatus 0 SControl 300)
Apr 13 20:09:19 MyPC kernel: [ 3877.882542] ata8.00: configured for UDMA/100
Apr 13 20:09:19 MyPC kernel: [ 3877.890009] ata11: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 13 20:09:19 MyPC kernel: [ 3877.890820] ata11.00: configured for UDMA/133
Apr 13 20:09:19 MyPC kernel: [ 3877.912511] ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Apr 13 20:09:19 MyPC kernel: [ 3877.915500] ata7.00: configured for UDMA/66
Apr 13 20:09:19 MyPC kernel: [ 3878.030015] PM: resume of drv:usb dev:usb5 complete after 268.733 msecs
Apr 13 20:09:19 MyPC kernel: [ 3878.300005] PM: resume of drv:usb dev:usb6 complete after 269.985 msecs
Apr 13 20:09:19 MyPC kernel: [ 3878.570007] PM: resume of drv:usb dev:usb8 complete after 269.994 msecs
Apr 13 20:09:19 MyPC kernel: [ 3878.570438] sd 2:0:0:0: [sda] Starting disk
Apr 13 20:09:19 MyPC kernel: [ 3878.614245] ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x4000000 action 0xe frozen
Apr 13 20:09:19 MyPC kernel: [ 3878.614247] ata3: SError: { DevExch }
Apr 13 20:09:19 MyPC kernel: [ 3878.614248] ata3.00: failed command: READ VERIFY SECTOR(S)
Apr 13 20:09:19 MyPC kernel: [ 3878.614252] ata3.00: cmd 40/00:01:00:00:00/00:00:00:00:00/e0 tag 0
Apr 13 20:09:19 MyPC kernel: [ 3878.614252] res 7f/00:01:00:00:00/00:00:00:00:00/e0 Emask 0x12 (ATA bus error)
Apr 13 20:09:19 MyPC kernel: [ 3878.614254] ata3.00: status: { DRDY DF DRQ ERR }
Apr 13 20:09:19 MyPC kernel: [ 3878.614257] ata3: hard resetting link
Apr 13 20:09:19 MyPC kernel: [ 3881.150011] ata12: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Apr 13 20:09:19 MyPC kernel: [ 3881.155777] ata12.00: configured for UDMA/133
Apr 13 20:09:19 MyPC kernel: [ 3884.560007] ata3: link is slow to respond, please be patient (ready=0)
Apr 13 20:09:19 MyPC kernel: [ 3888.640007] ata3: SRST failed (errno=-16)
Apr 13 20:09:19 MyPC kernel: [ 3888.640008] ata3: hard resetting link
Apr 13 20:09:19 MyPC kernel: [ 3894.590006] ata3: link is slow to respond, please be patien...

Read more...

With an upgrade to Natty (2.6.38-9-generic #43-Ubuntu SMP) this is now working. I didn't think this was power related (in my case anyway) as I had tried multiple power supplies and the system has got 2 HDD, processor and memory, everything else onboard. I've full RAID1 running on both disks with no issues.

Motherboard: ASUS M4A78-HTPC

HDD: 2x Hitachi HD DESK 1TB 3.5" 7200 SATA 32MB

Lars (lars-taeuber) wrote :

Hi,

I also experience this problem.

my situation:
Supermicro H8SCM-F (AMD SR5650+SP5100)
PSU: redundant: 2x48A@12V (this is definitely not the problem)
Ubuntu 10.04.2 x86_64 server
SW-RAID6 over 6 devices
(multiple) SW-RAID1 over 2 devices
onboard AHCI + sata_mv (+ DVD @ sata_sil)
problems only on high IO-load
reproducable with _all_ schedulers (noop,anticipatory,cfq,deadline)
reproducable with kernel options: noapic + acpi=off

I suggest this is a problem in libata layer.
We have some more similar linux boxes with much more drives and SW-RAID6 but they are driven by SAS-controllers.

This problem is really important as our box is a storage server.
I have the very same problem with an opensuse 11.2. (tested once)
Any idea?

My box is ready for tests for the next days/weeks. Any suggestions?

Thanks

Lars (lars-taeuber) wrote :

addition:

when not using (noapic acpi=off) the errors look like this:

[ 1433.950104] ata10: limiting SATA link speed to 1.5 Gbps
[ 1433.950116] ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 1433.972372] ata10.00: failed command: READ DMA EXT
[ 1433.983833] ata10.00: cmd 25/00:08:30:ac:16/00:00:1b:00:00/e0 tag 0 dma 4096 in
[ 1433.983836] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1434.029995] ata10.00: status: { DRDY }
[ 1434.041694] ata10: hard resetting link

Thanks

Lars (lars-taeuber) wrote :
Download full text (3.2 KiB)

The following erros occur with the live demo of the desktop install CD of 11.04 natty:
kernel: 2.6.38-8-generic x86_64

[ 876.391515] md/raid:md3: device sde operational as raid disk 2
[ 876.391520] md/raid:md3: device sdb operational as raid disk 1
[ 876.391525] md/raid:md3: device sda operational as raid disk 0
[ 876.393557] md/raid:md3: allocated 8490kB
[ 876.393650] md/raid:md3: raid level 6 active with 6 out of 8 devices, algorithm 2
[ 876.393789] RAID conf printout:
[ 876.393794] --- level:6 rd:8 wd:6
[ 876.393800] disk 0, o:1, dev:sda
[ 876.393805] disk 1, o:1, dev:sdb
[ 876.393809] disk 2, o:1, dev:sde
[ 876.393813] disk 3, o:1, dev:sdf
[ 876.393817] disk 4, o:1, dev:sdg
[ 876.393821] disk 5, o:1, dev:sdh
[ 876.393925] md3: detected capacity change from 0 to 12002383626240
[ 876.394872] md3: unknown partition table
[ 1082.960084] ata2.00: exception Emask 0x0 SAct 0x3f SErr 0x0 action 0x6 frozen
[ 1082.960095] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1082.960111] ata2.00: cmd 61/a0:00:b0:08:0a/02:00:09:00:00/40 tag 0 ncq 344064 out
[ 1082.960114] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1082.960122] ata2.00: status: { DRDY }
[ 1082.960128] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1082.960140] ata2.00: cmd 61/08:08:f0:0b:0a/00:00:09:00:00/40 tag 1 ncq 4096 out
[ 1082.960143] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1082.960150] ata2.00: status: { DRDY }
[ 1082.960155] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1082.960168] ata2.00: cmd 61/40:10:00:5c:09/00:00:09:00:00/40 tag 2 ncq 32768 out
[ 1082.960171] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1082.960177] ata2.00: status: { DRDY }
[ 1082.960182] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1082.960194] ata2.00: cmd 61/10:18:40:5c:09/00:00:09:00:00/40 tag 3 ncq 8192 out
[ 1082.960197] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1082.960203] ata2.00: status: { DRDY }
[ 1082.960209] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1082.960221] ata2.00: cmd 61/a8:20:50:5c:09/03:00:09:00:00/40 tag 4 ncq 479232 out
[ 1082.960223] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1082.960230] ata2.00: status: { DRDY }
[ 1082.960235] ata2.00: failed command: WRITE FPDMA QUEUED
[ 1082.960247] ata2.00: cmd 61/08:28:f8:5f:09/00:00:09:00:00/40 tag 5 ncq 4096 out
[ 1082.960250] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1082.960256] ata2.00: status: { DRDY }
[ 1082.960265] ata2: hard resetting link
[ 1083.470120] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1083.550175] ata2.00: configured for UDMA/133
[ 1083.550190] ata2.00: device reported invalid CHS sector 0
[ 1083.550197] ata2.00: device reported invalid CHS sector 0
[ 1083.550203] ata2.00: device reported invalid CHS sector 0
[ 1083.550209] ata2.00: device reported invalid CHS sector 0
[ 1083.550217] ata2.00: device reported invalid CHS sector 0
[ 1083.550222] ata2.00: device reported invalid CHS sector 0
[ 1083.550246] ata2: EH complete

In the end the machine locked hard.

Next week I'll try the same with the SATA disks attached t...

Read more...

Lars (lars-taeuber) wrote :
Download full text (9.6 KiB)

Hi,

here is the report about my test with the SAS-HBA.
The problem seems the same the symptoms change.
The result is as bad as before.

For me it is very important to get this tracked down.

SAS-HBA: areca ARC-1300ix-16
module: mvsas

test:
mdadm -C /dev/md3 -l6 -n8 /dev/sdc-h] missing missing
(the two missing hdds prevents this raid from initial sync)

[ 112.537311] md: bind<sdc>
[ 112.565624] md: bind<sdd>
[ 112.592516] md: bind<sde>
[ 112.623219] md: bind<sdf>
[ 112.649941] md: bind<sdg>
[ 112.678637] md: bind<sdh>
[ 112.743525] raid5: device sdh operational as raid disk 5
[ 112.743532] raid5: device sdg operational as raid disk 4
[ 112.743538] raid5: device sdf operational as raid disk 3
[ 112.743543] raid5: device sde operational as raid disk 2
[ 112.743547] raid5: device sdd operational as raid disk 1
[ 112.743552] raid5: device sdc operational as raid disk 0
[ 112.744821] raid5: allocated 8490kB for md3
[ 112.744939] 5: w=1 pa=0 pr=8 m=2 a=2 r=8 op1=0 op2=0
[ 112.744946] 4: w=2 pa=0 pr=8 m=2 a=2 r=8 op1=0 op2=0
[ 112.744953] 3: w=3 pa=0 pr=8 m=2 a=2 r=8 op1=0 op2=0
[ 112.744958] 2: w=4 pa=0 pr=8 m=2 a=2 r=8 op1=0 op2=0
[ 112.744964] 1: w=5 pa=0 pr=8 m=2 a=2 r=8 op1=0 op2=0
[ 112.744969] 0: w=6 pa=0 pr=8 m=2 a=2 r=8 op1=0 op2=0
[ 112.744975] raid5: raid level 6 set md3 active with 6 out of 8 devices, algorithm 2
[ 112.766420] RAID5 conf printout:
[ 112.766427] --- rd:8 wd:6
[ 112.766434] disk 0, o:1, dev:sdc
[ 112.766441] disk 1, o:1, dev:sdd
[ 112.766448] disk 2, o:1, dev:sde
[ 112.766454] disk 3, o:1, dev:sdf
[ 112.766461] disk 4, o:1, dev:sdg
[ 112.766468] disk 5, o:1, dev:sdh
[ 112.766554] md3: detected capacity change from 0 to 12002393063424
[ 112.766852] md3: unknown partition table

everything is just fine till yet.

Now produce high io-load:
mke2fs -j /dev/md3

[ 190.981812] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 190.981821] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 190.981848] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 190.981855] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 190.981885] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 190.981890] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 190.981916] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 190.981922] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 190.981946] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 190.981952] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 190.981974] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 190.981980] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 221.980143] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
[ 221.980152] /build/buildd/linux-2.6.32/drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
[ 221.980...

Read more...

Lars (lars-taeuber) wrote :

- same with only one cpu core (nosmp as kernel option) and with the lowest cpu clock (800MHz)
- same with HDDs directly attached to HBA without disk drive bay in between (Chenbro SK33502)

_Please_ make a suggestion what I could test next.

Thanks
Lars

Vasco (vasco-visser) wrote :
Download full text (6.8 KiB)

Lars,

The only thing that seems to prevent this error for me is running an
older kernel. I'm running 2.6.32-22 now without any trouble. I haven't
checked at exactly what kernel version things start to go wrong, and I
don't think I will because I don't want to risk corrupting my data.
Maybe now that you are experimenting you can find out at what kernel
version things go wrong. We could use that info to post a bug on the
kernel bug tracker (as the launchpad bug tracker seems to be pretty
much useless for this kind of stuff).

If it turns out that you also experience problems with older kernel
versions we might not be suffering from the same problem.

--
Vasco

On Mon, May 9, 2011 at 2:38 PM, Lars <email address hidden> wrote:
> - same with only one cpu core (nosmp as kernel option) and with the lowest cpu clock (800MHz)
> - same with HDDs directly attached to HBA without disk drive bay in between (Chenbro SK33502)
>
> _Please_ make a suggestion what I could test next.
>
> Thanks
> Lars
>
> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/550559
>
> Title:
>  hdd problems, failed command: READ FPDMA QUEUED
>
> Status in Ubuntu:
>  Confirmed
>
> Bug description:
>  Hello!
>
>  I have a brand new computer. With a SSD device and a SATA hard drive,
>  a Seagate Barracuda XT specifically 6Gb / s of 2TB. The latter is
>  connected to a Marvell 9123 controller that I set AHCI mode in BIOS.
>
>  I have the OS installed on the SSD device, but when you try to read
>  the disc 2TB gives several bugs.
>
>  I tried to change the disk to another controller and gives the same
>  problem, I even removed the disk partition table, having the same
>  fate.
>
>  I checked the disc for flaws from Windows with hd tune and
>  verification tool official record, and does not give me any errors.
>
>  I have tested with kernel version 2.6.34-rc2 and it works properly
>  with this disc.
>
>  The errors given are the following:
>
>  [    9.115544] ata9: exception Emask 0x0 SAct 0xf SErr 0x0 action 0x10 frozen
>  [    9.115550] ata9.00: failed command: READ FPDMA QUEUED
>  [    9.115556] ata9.00: cmd 60/04:00:d4:82:85/00:00:1f:00:00/40 tag 0 ncq 2048 in
>  [    9.115557]          res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4 (timeout)
>  [    9.115560] ata9.00: status: { DRDY }
>  [    9.115562] ata9.00: failed command: READ FPDMA QUEUED
>  [    9.115568] ata9.00: cmd 60/01:08:d1:82:85/00:00:1f:00:00/40 tag 1 ncq 512 in
>  [    9.115569]          res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4 (timeout)
>  [    9.115572] ata9.00: status: { DRDY }
>  [    9.115574] ata9.00: failed command: READ FPDMA QUEUED
>  [    9.115579] ata9.00: cmd 60/01:10:d2:82:85/00:00:1f:00:00/40 tag 2 ncq 512 in
>  [    9.115581]          res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4 (timeout)
>  [    9.115583] ata9.00: status: { DRDY }
>  [    9.115586] ata9.00: failed command: READ FPDMA QUEUED
>  [    9.115591] ata9.00: cmd 60/01:18:d3:82:85/00:00:1f:00:00/40 tag 3 ncq 512 in
>  [    9.115592]          res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4 (timeout)
>  [    9.115595] ata9.00: status: { DRDY }...

Read more...

Lars (lars-taeuber) wrote :

- Changing acoustic management from 254 to 128 does not change anything.

BTW: the HDDs I use are:
Western Digital
- WDC WD2002FYPS
- WDC WD2003FYYS

I'll exchange the HDDs against Seagates (but only 320GB)

Lars

Lars (lars-taeuber) wrote :

different HDDs smae problem:

Seagate ST3320620NS

tested kernels so far: 2.6.32-31-server

I'll try different kernels.
Thanks Vasco for the hint.

Lars

Hi all of you, who are having this problem:

Please review my report:
https://bugs.launchpad.net/ubuntu/+bug/550559/comments/41
That's how I was able to resolve the Problem.

in short words: get current kernel source from www.kernel.org
install kernel-package
make oldconfig using the ubuntu config found in /boot/config-2.6.XX
build the new kernel, optionally create a deb package and install it.
boot the new kernel... give it a try...

I'm running a server now for 3 Month's without any problems.

Felix

Lars (lars-taeuber) wrote :

elder kernel 2.6.32-22-server same problem
newer kernel 2.6.35-25-server same problem occurs later

I'll try Felix' hint and compile a vanilla kernel.

regards
Lars

Casey Greene (casey-s-greene) wrote :

This still occurs in natty:
http://ubuntuforums.org/showthread.php?t=1731070

Also I am observing it and SMART report is healthy.

Lars (lars-taeuber) wrote :

Hi!

I tested with
2.6.32-22-server: same problem
2.6.35-25-server: problem occurs later
2.6.35.13: same as 2.6.35-25-server

now i try to get 2.6.38.6 compiled an running.

I'm away for a week. so expect the result on next wednesday at earliest.

Lars

Edwin Chiu (edwin-chiu) wrote :
Download full text (6.4 KiB)

I suspect this isn't just one bug, but more than one. Everyone has a
different "fix" for the "same" problem. For me, ditching Seagate
ST32000542AS for WD20EARS drives fixes my problem.

On Wed, May 11, 2011 at 05:21, Lars <email address hidden> wrote:

> Hi!
>
> I tested with
> 2.6.32-22-server: same problem
> 2.6.35-25-server: problem occurs later
> 2.6.35.13: same as 2.6.35-25-server
>
> now i try to get 2.6.38.6 compiled an running.
>
> I'm away for a week. so expect the result on next wednesday at earliest.
>
> Lars
>
> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/550559
>
> Title:
> hdd problems, failed command: READ FPDMA QUEUED
>
> Status in Ubuntu:
> Confirmed
>
> Bug description:
> Hello!
>
> I have a brand new computer. With a SSD device and a SATA hard drive,
> a Seagate Barracuda XT specifically 6Gb / s of 2TB. The latter is
> connected to a Marvell 9123 controller that I set AHCI mode in BIOS.
>
> I have the OS installed on the SSD device, but when you try to read
> the disc 2TB gives several bugs.
>
> I tried to change the disk to another controller and gives the same
> problem, I even removed the disk partition table, having the same
> fate.
>
> I checked the disc for flaws from Windows with hd tune and
> verification tool official record, and does not give me any errors.
>
> I have tested with kernel version 2.6.34-rc2 and it works properly
> with this disc.
>
> The errors given are the following:
>
> [ 9.115544] ata9: exception Emask 0x0 SAct 0xf SErr 0x0 action 0x10
> frozen
> [ 9.115550] ata9.00: failed command: READ FPDMA QUEUED
> [ 9.115556] ata9.00: cmd 60/04:00:d4:82:85/00:00:1f:00:00/40 tag 0 ncq
> 2048 in
> [ 9.115557] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4
> (timeout)
> [ 9.115560] ata9.00: status: { DRDY }
> [ 9.115562] ata9.00: failed command: READ FPDMA QUEUED
> [ 9.115568] ata9.00: cmd 60/01:08:d1:82:85/00:00:1f:00:00/40 tag 1 ncq
> 512 in
> [ 9.115569] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4
> (timeout)
> [ 9.115572] ata9.00: status: { DRDY }
> [ 9.115574] ata9.00: failed command: READ FPDMA QUEUED
> [ 9.115579] ata9.00: cmd 60/01:10:d2:82:85/00:00:1f:00:00/40 tag 2 ncq
> 512 in
> [ 9.115581] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4
> (timeout)
> [ 9.115583] ata9.00: status: { DRDY }
> [ 9.115586] ata9.00: failed command: READ FPDMA QUEUED
> [ 9.115591] ata9.00: cmd 60/01:18:d3:82:85/00:00:1f:00:00/40 tag 3 ncq
> 512 in
> [ 9.115592] res 40/00:18:d3:82:85/00:00:1f:00:00/40 Emask 0x4
> (timeout)
> [ 9.115595] ata9.00: status: { DRDY }
> [ 9.115609] sd 8:0:0:0: [sdb] Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [ 9.115612] sd 8:0:0:0: [sdb] Sense Key : Aborted Command [current]
> [descriptor]
> [ 9.115616] Descriptor sense data with sense descriptors (in hex):
> [ 9.115618] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
> [ 9.115626] 1f 85 82 d3
> [ 9.115629] sd 8:0:0:0: [sdb] Add. Sense: No additional sense
> information
> [ 9.11563...

Read more...

Lars (lars-taeuber) wrote :

Hi!

I tested with
2.6.39: same problem occurs much later

dmesg attached

Where else should I report this? A kernel ML? (linux_scsi, linux_raid?)

Best regards
Lars

Hello

Looks like I have the same problem.

Also with ubuntu 10.10 Live
and this controller: Marvell Technology Group Ltd. 88SX7042 PCI-e 4-port SATA-II (rev 02)
and this controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01)
and with mainboard controller.

Deactivated NQC doesn't change anything.

I'm really out of options right now :-(

affects: ubuntu → linux (Ubuntu)
Mike Doherty (doherty) wrote :

Some more dmesg for your perusal: http://sprunge.us/MbDd

Lars (lars-taeuber) wrote :

According to this thread http://lkml.indiana.edu/hypermail/linux/kernel/1106.0/02136.html
my problem is related to the hdds, which are not RAID-able.

This seems true for this disks:
* Western Digital WD RE4-GP
* Seagate Barracuda ES

I'll post again when I received the new disks.

Lars

I've meant to update this. I thought mine was fixed but it rapidly turned out to be wrong. It now builds the RAID1 array, looks to complete but then fails at the very end and marks the array as stale. This is with Hitachi drives. I'm trying to track down if this is an issue like you might have found with the WD & Seagate, but no luck so far, they don't have great data available as far as I can see.

pepre (me-pepre) wrote :

OS: Ubuntu 10.04.2 LTS up to date

Never ending story... :-(

A few months the problem appeared once in a blue moon. In the last two days i extended my raid and lvm, and growed the ext4. All was running fine, although the HDs were under heavy stress. I rebooted this morning, and installed a new kernel (1) as suggested. And now i get this again from time to time:

ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
ata3.00: irq_stat 0x40000008
ata3.00: failed command: READ FPDMA QUEUED
ata3.00: cmd 60/00:00:3f:36:12/01:00:6c:00:00/40 tag 0 ncq 131072 in
                res 41/40:58:e7:36:12/00:00:6c:00:00/40 Emask 0x409 (media error) <F>
ata3.00: status: { DRDY ERR }
ata3.00: error: { UNC }
ata3.00: configured for UDMA/133
ata3: EH complete

Boring! :-( Tomorrow i will try the old kernel. Report will follow the next few days.

--
(1) 2.6.32-33-generic #70-Ubuntu SMP Thu Jul 7 21:13:52 UTC 2011 x86_64 GNU/Linux

Lars (lars-taeuber) wrote :

Hi there!

here is an update.

I think I found the source of my problems. Its the board or the CPU.
board: Supermicro H8SCM-F
cpu: Opteron 4170HE

I think it's the chipset or the like.

http://thread.gmane.org/gmane.linux.kernel/1150608

Greetings
Lars

IKT (ikt) wrote :

It would be nice if it was just that motherboard, but given how varied our setups are, (including mine which isn't in a raid at all) I think this is more than just a chipset or cpu.

darkofdayl (darkofday) wrote :

Hi,everyone

I have the same error when I upgrade my hp cq45-307tx with a new harddisk(Hitachi HTS725050A9A364).After I install debian-6.0.1-i386 linux and begin to reboot,the same error as #18 occured(not set parameter noncq).But win7 works good at the same machine.

noof (kalas) wrote :

I'm actually running Debian Lenny (2.6.32-5), but I ended up here when googling for that "READ FPDMA QUEUED" error. I have an MSI K9AG Neo2 (MS-7368) motherboard with the following SATA controller:

00:12.0 SATA controller: ATI Technologies Inc SB600 Non-Raid-5 SATA

I have three disks in my computer:
* SAMSUNG HD204UI (SATA)
* SAMSUNG HD154UI (SATA)
* WDC WD800BB (IDE)

I started getting hangs during large file operations on the HD204UI and error spam in my syslog. I then moved both SATA drives to another controller card:

03:05.0 Mass storage controller: Promise Technology, Inc. PDC40775 (SATA 300 TX2plus) (rev 02)

After this it seems to work much better. I have a RAID 1 partition on the samsungs that failed hard to resync after rebooting with the drives connected to the mobo SATA controller. After moving them to the promise card it looks much better, I'm at 35% without any error at all. Before it bailed out after less than 10% because of errors.

alricsca (alricsca) wrote :

I have this problem, but I have learned something new about it, the problem seems controller related during its interaction with a ext4 file systems using ncq. When I first had this problem, I assumed it was hardware so I bought a new disk and added it to my system and on a lark I reformatted and placed a btrfs file system on my old disk to try it out. When I did these things the problem seemed to stop with the original ATA ST3400830AS disk. Then one day I happened to be copying files from the new disk en-mass to my old one when I saw the error start happening again with my brand new disk a ATA ST910021AS. To be clear it seemed to have jumped from my original disk to the new disk. Here is what is I see. On my original disk I used btrfs on the second partition this or adding the new disk made the problem stop happening on this disk. On the new disk I used ext4 on the second partition that itself is in a extended partition where I am now getting this error. Clearly the error seems to have some connection to ext4, the sata module (sata_nv in my case), and most likely ncq. One thing that makes the problem appear to go away or at least happen very rarely is when I used these options msi=0, adma=0, swncq=0. Hope this helps.

Here is the default for sata_nv
parm: adma:Enable use of ADMA (Default: false) (bool)
parm: swncq:Enable use of SWNCQ (Default: true) (bool)
parm: msi:Enable use of MSI (Default: false) (bool)

2.6.38-11-generic #48-Ubuntu SMP Fri Jul 29 19:02:55 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
larak@Linux-Robert:~$

[ 6825.673649] ata5: EH in SWNCQ mode,QC:qc_active 0x3C sactive 0x3C
[ 6825.673654] ata5: SWNCQ:qc_active 0x3C defer_bits 0x0 last_issue_tag 0x5
[ 6825.673656] dhfis 0x3C dmafis 0x1C sdbfis 0x3
[ 6825.673660] ata5: ATA_REG 0x41 ERR_REG 0x0
[ 6825.673663] ata5: tag : dhfis dmafis sdbfis sacitve
[ 6825.673666] ata5: tag 0x2: 1 1 0 1
[ 6825.673669] ata5: tag 0x3: 1 1 0 1
[ 6825.673672] ata5: tag 0x4: 1 1 0 1
[ 6825.673675] ata5: tag 0x5: 1 0 0 1
[ 6825.673685] ata5.00: exception Emask 0x1 SAct 0x3c SErr 0x0 action 0x6 frozen
[ 6825.673688] ata5.00: Ata error. fis:0x21
[ 6825.673692] ata5.00: failed command: READ FPDMA QUEUED
[ 6825.673700] ata5.00: cmd 60/10:10:3f:9b:e6/00:00:05:00:00/40 tag 2 ncq 8192 in
[ 6825.673702] res 41/00:28:af:71:e6/00:00:05:00:00/40 Emask 0x1 (device error)
[ 6825.673706] ata5.00: status: { DRDY ERR }
[ 6825.673709] ata5.00: failed command: READ FPDMA QUEUED
[ 6825.673716] ata5.00: cmd 60/08:18:57:9b:e6/00:00:05:00:00/40 tag 3 ncq 4096 in

00:0a.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3)
00:0b.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1)
00:0b.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2)
00:0d.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
00:0e.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:0e.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:0e.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)

Ricard Bou (ricard-bou) wrote :
Download full text (11.9 KiB)

I have been this kind of errors since a month ago, and it's going worse. At first it affected MP3 playing, after that, video playing. Now, Synaptic is affected too and it looks as if my HD is getting rotted...
I own a HP Pavilion tx2000, installed Ubuntu 10.04, intensive user for engineering: python programmer, cross-compiling kernels form ARM platforms, Arduino programmer, PCB designer...
I have Windows 7 as main (and seldom used) OS.

I did fsck but it says my partitions are in PERFECT state and shape.

My fstab:

# /etc/fstab: static file system information.
#
# Use 'blkid -o value -s UUID' to print the universally unique identifier
# for a device; this may be used with UUID= as a more robust way to name
# devices that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point> <type> <options> <dump> <pass>
proc /proc proc nodev,noexec,nosuid 0 0
# / was on /dev/sda5 during installation
UUID=6e3ce0ed-5bf2-4d40-8901-4948064fa010 / ext4 errors=remount-ro 0 1
# swap was on /dev/sda6 during installation
UUID=692433c3-c222-436a-afb1-76ff99b7500f none swap sw 0 0
# Add windows partition and binds
UUID=361EE6161EE5CF45 /mnt/windows ntfs rw,auto,users,exec,nls=utf8,umask=003,gid=46,uid=1000 0 2
/mnt/windows/Documents\040and\040Settings/Ricard/Documents /home/ricard/windows/Documentos none bind
/mnt/windows/Documents\040and\040Settings/Ricard/Desktop /home/ricard/windows/Escritorio none bind
/mnt/windows/Documents\040and\040Settings/Ricard/Pictures /home/ricard/windows/Imagenes none bind

blkid says...

ricard@ricard-HPtx2500:/var/log$ sudo blkid
/dev/sda1: LABEL="Reservado para el sistema" UUID="3418CD8B18CD4C94" TYPE="ntfs"
/dev/sda2: UUID="361EE6161EE5CF45" TYPE="ntfs"
/dev/sda5: UUID="6e3ce0ed-5bf2-4d40-8901-4948064fa010" TYPE="ext4"
/dev/sda6: UUID="692433c3-c222-436a-afb1-76ff99b7500f" TYPE="swap"
/dev/sdb1: SEC_TYPE="msdos" LABEL="FAT" UUID="F9F4-0660" TYPE="vfat"
/dev/sdb2: LABEL="ext3" UUID="4b696001-2cc1-4ba9-921d-8ad6d43d5397" TYPE="ext3"
/dev/sdc1: LABEL="USB DISK" UUID="44E3-F405" TYPE="vfat"

Last 50 lines of my /etc/kern.log

ricard@ricard-HPtx2500:/var/log$ tail -n50 kern.log
Sep 16 11:21:02 ricard-HPtx2500 kernel: [15743.241297] res 41/40:00:ed:a9:72/00:00:1e:00:00/40 Emask 0x409 (media error) <F>
Sep 16 11:21:02 ricard-HPtx2500 kernel: [15743.241311] ata3.00: status: { DRDY ERR }
Sep 16 11:21:02 ricard-HPtx2500 kernel: [15743.241321] ata3.00: error: { UNC }
Sep 16 11:21:02 ricard-HPtx2500 kernel: [15743.251528] ata3.00: configured for UDMA/100
Sep 16 11:21:02 ricard-HPtx2500 kernel: [15743.251564] ata3: EH complete
Sep 16 11:21:04 ricard-HPtx2500 kernel: [15745.733540] ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
Sep 16 11:21:04 ricard-HPtx2500 kernel: [15745.733558] ata3.00: irq_stat 0x40000008
Sep 16 11:21:04 ricard-HPtx2500 kernel: [15745.733574] ata3.00: failed command: READ FPDMA QUEUED
Sep 16 11:21:04 ricard-HPtx2500 kernel: [15745.733603] ata3.00: cmd 60/08:00:e8:a9:72/00:00:1e:00:00/40 tag 0 ncq 4096 in
Sep 16 11:21:04 ricard-HPtx2500 kernel: [15745....

Lars (lars-taeuber) wrote :

Hi there!

my problem is solved. It was the mainboard.
I got it exchanged and the new one runs just fine.

Good luck!
Lars

Redsandro (redsandro) wrote :
Download full text (4.6 KiB)

This bug is still present in Oneiric Ocelot:

Kernel: 3.0.0-11
Chipset: AMD A50M Fusion
Mainboard: Zotac Fusion
Hard drive: Samsung Spinpoint F4EG 2TB drive.

OS is unaffected because it's running from a separate SSD.

I don't know how to test the provided options that work in 2% of cases though, because I am only getting this error a thousand times after badblocks -wsv -t random /dev/sdb reaches about 50%, which takes a good 6 hours.

Syslog:

Oct 9 16:34:33 mcRed kernel: [71485.413301] ata2.00: exception Emask 0x10 SAct 0x1 SErr 0x280100 action 0x6 frozen
Oct 9 16:34:33 mcRed kernel: [71485.413310] ata2.00: irq_stat 0x08000000, interface fatal error
Oct 9 16:34:33 mcRed kernel: [71485.413316] ata2: SError: { UnrecovData 10B8B BadCRC }
Oct 9 16:34:33 mcRed kernel: [71485.413322] ata2.00: failed command: READ FPDMA QUEUED
Oct 9 16:34:33 mcRed kernel: [71485.413332] ata2.00: cmd 60/80:00:80:ec:05/00:00:93:00:00/40 tag 0 ncq 65536 in
Oct 9 16:34:33 mcRed kernel: [71485.413335] res 40/00:04:80:ec:05/00:00:93:00:00/40 Emask 0x10 (ATA bus error)
Oct 9 16:34:33 mcRed kernel: [71485.413339] ata2.00: status: { DRDY }
Oct 9 16:34:33 mcRed kernel: [71485.413349] ata2: hard resetting link
Oct 9 16:34:33 mcRed kernel: [71485.960075] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Oct 9 16:34:33 mcRed kernel: [71485.972453] ata2.00: configured for UDMA/33
Oct 9 16:34:33 mcRed kernel: [71485.990105] ata2: EH complete
(x1000)

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu oneiric (development branch)
Release: 11.10
Codename: oneiric

$ uname -a
Linux Redsandro 3.0.0-11-generic #18-Ubuntu SMP Tue Sep 13 23:38:01 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

$ lspci
00:00.0 Host bridge: Advanced Micro Devices [AMD] Family 14h Processor Root Complex
00:01.0 VGA compatible controller: ATI Technologies Inc AMD Radeon HD 6310 GraphicsATI
00:01.1 Audio device: ATI Technologies Inc Wrestler HDMI Audio [Radeon HD 6250/6310]
00:04.0 PCI bridge: Advanced Micro Devices [AMD] Family 14h Processor Root Port
00:11.0 SATA controller: ATI Technologies Inc SB7x0/SB8x0/SB9x0 SATA Controller [IDE mode] (rev 40)
00:12.0 USB Controller: ATI Technologies Inc SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:12.2 USB Controller: ATI Technologies Inc SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:13.0 USB Controller: ATI Technologies Inc SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:13.2 USB Controller: ATI Technologies Inc SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 42)
00:14.1 IDE interface: ATI Technologies Inc SB7x0/SB8x0/SB9x0 IDE Controller (rev 40)
00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA) (rev 40)
00:14.3 ISA bridge: ATI Technologies Inc SB7x0/SB8x0/SB9x0 LPC host controller (rev 40)
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge (rev 40)
00:14.5 USB Controller: ATI Technologies Inc SB7x0/SB8x0/SB9x0 USB OHCI2 Controller
00:15.0 PCI bridge: ATI Technologies Inc SB700/SB800/SB900 PCI to PCI bridge (PCIE port 0)
00:15.2 PCI bridge: ATI Technologies Inc SB900 PCI to PCI bridge ...

Read more...

Ian! D. Allen (idallen) wrote :
Download full text (4.6 KiB)

Ubuntu 11.04 natty
fresh boot - errors are during boot sequence

Linux linux 2.6.38-11-generic #50-Ubuntu SMP Mon Sep 12 21:17:25 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

# hdparm -i /dev/sdd
/dev/sdd:
 Model=ST32000542AS, FwRev=CC37, SerialNo=5XW0S99F
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=3907029168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4
 DMA modes: mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 AdvancedPM=yes: unknown setting WriteCache=enabled
 Drive conforms to: unknown: ATA/ATAPI-4,5,6,7

10-12 08:54:20 ata8: SATA max UDMA/133 abar m1024@0xfbafe400 port 0xfbafe780 irq 22

10-12 08:54:20 ata8: softreset failed (1st FIS failed)
10-12 08:54:20 ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
10-12 08:54:20 ata8.15: Port Multiplier 1.1, 0x1095:0x5744 r33, 3 ports, feat 0x1/0x9
10-12 08:54:20 ata8.00: hard resetting link
10-12 08:54:20 ata8.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
10-12 08:54:20 ata8.01: hard resetting link
10-12 08:54:20 ata8.01: SATA link down (SStatus 0 SControl 320)
10-12 08:54:20 ata8.02: hard resetting link
10-12 08:54:20 ata8.02: SATA link down (SStatus 0 SControl 320)
10-12 08:54:20 ata8.00: ATA-8: ST32000542AS, CC37, max UDMA/133
10-12 08:54:20 ata8.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32)
10-12 08:54:20 ata8.00: configured for UDMA/133
10-12 08:54:20 ata8: EH complete
10-12 08:54:20 scsi 7:0:0:0: Direct-Access ATA ST32000542AS CC37 PQ: 0 ANSI: 5
10-12 08:54:20 sd 7:0:0:0: Attached scsi generic sg5 type 0
10-12 08:54:20 sd 7:0:0:0: [sdd] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
10-12 08:54:20 sd 7:0:0:0: [sdd] Write Protect is off
10-12 08:54:20 sd 7:0:0:0: [sdd] Mode Sense: 00 3a 00 00
10-12 08:54:20 sd 7:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
10-12 08:54:20 sdd: sdd1
10-12 08:54:20 sd 7:0:0:0: [sdd] Attached SCSI disk

10-12 08:56:49 ata8.01: failed to read SCR 1 (Emask=0x40)
10-12 08:56:49 ata8.02: failed to read SCR 1 (Emask=0x40)
10-12 08:56:49 ata8.15: exception Emask 0x4 SAct 0x0 SErr 0x400001 action 0x6 frozen
10-12 08:56:49 ata8.15: SError: { RecovData Handshk }
10-12 08:56:49 ata8.00: exception Emask 0x100 SAct 0x1f SErr 0x0 action 0x6 frozen
10-12 08:56:49 ata8.00: failed command: WRITE FPDMA QUEUED
10-12 08:56:49 ata8.00: cmd 61/00:00:00:b0:fc/04:00:01:00:00/40 tag 0 ncq 524288 out
10-12 08:56:49 res 40/00:10:00:b8:fc/00:00:01:00:00/40 Emask 0x4 (timeout)
10-12 08:56:49 ata8.00: status: { DRDY }
10-12 08:56:49 ata8.00: failed command: WRITE FPDMA QUEUED
10-12 08:56:49 ata8.00: cmd 61/00:08:00:b4:fc/04:00:01:00:00/40 tag 1 ncq 524288 out
10-12 08:56:49 res 40/00:10:00:b8:fc/00:00:01:00:00/40 Emask 0x4 (timeout)
10-12 08:56:49 ata8.00: status: { DRDY }
10-12 08:56:49 ata8.00: failed command: WRITE FPDMA QUEUED
10-12 08:56:49 ata8.00: cmd 61/00:10:00:b8:fc/04:00:01:00:00/40 tag 2 ncq 524288 out
10-12 08:56:49 res 4...

Read more...

Zrin Ziborski (zrin+launchpad) wrote :

I was just thinking that these problems might be due to misconfiguration or suboptimal configuration of the controller, or even suboptimal configuration of and/or conflicts with other devices in the system.

So it might be a driver problem, but also a driver problem of some other device (other controller) in the system.

Try to disable other devices / controllers one at the time and then try to reproduce the problem.

the best of luck,
Zrin

Erik1984 (erik1984) wrote :
Download full text (3.7 KiB)

I get similar error messages. I'm know on Lucid with the following kernel:
2.6.32-35-generic-pae #78-Ubuntu SMP Tue Oct 11 17:01:12 UTC 2011 i686 GNU/Linux

Although I've experienced similar errors on Natty however I don't have the logs from those freezes anymore.

Anyway here are the latest relevant lines from kern.log:
Dec 7 23:10:09 erik-desktop kernel: [14681.976049] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
Dec 7 23:10:09 erik-desktop kernel: [14681.976063] ata1.00: failed command: READ FPDMA QUEUED
Dec 7 23:10:09 erik-desktop kernel: [14681.976076] ata1.00: cmd 60/80:00:00:39:50/00:00:2b:00:00/40 tag 0 ncq 65536 in
Dec 7 23:10:09 erik-desktop kernel: [14681.976079] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 7 23:10:09 erik-desktop kernel: [14681.976085] ata1.00: status: { DRDY }
Dec 7 23:10:09 erik-desktop kernel: [14681.976094] ata1: hard resetting link
Dec 7 23:10:14 erik-desktop kernel: [14687.500031] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 7 23:10:14 erik-desktop kernel: [14687.500041] ata1.00: link online but device misclassifed
Dec 7 23:10:19 erik-desktop kernel: [14692.500036] ata1.00: qc timeout (cmd 0xec)
Dec 7 23:10:19 erik-desktop kernel: [14692.500054] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Dec 7 23:10:19 erik-desktop kernel: [14692.500060] ata1.00: revalidation failed (errno=-5)
Dec 7 23:10:19 erik-desktop kernel: [14692.500072] ata1: hard resetting link
Dec 7 23:10:20 erik-desktop kernel: [14693.152025] ata1: softreset failed (device not ready)
Dec 7 23:10:20 erik-desktop kernel: [14693.152034] ata1: applying SB600 PMP SRST workaround and retrying
Dec 7 23:10:20 erik-desktop kernel: [14693.316033] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 7 23:10:20 erik-desktop kernel: [14693.345733] ata1.00: configured for UDMA/133
Dec 7 23:10:20 erik-desktop kernel: [14693.345745] ata1.00: device reported invalid CHS sector 0
Dec 7 23:10:20 erik-desktop kernel: [14693.345760] ata1: EH complete
Dec 7 23:10:51 erik-desktop kernel: [14723.976059] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
Dec 7 23:10:51 erik-desktop kernel: [14723.976072] ata1.00: failed command: READ FPDMA QUEUED
Dec 7 23:10:51 erik-desktop kernel: [14723.976085] ata1.00: cmd 60/80:00:00:39:50/00:00:2b:00:00/40 tag 0 ncq 65536 in
Dec 7 23:10:51 erik-desktop kernel: [14723.976087] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 7 23:10:51 erik-desktop kernel: [14723.976093] ata1.00: status: { DRDY }
Dec 7 23:10:51 erik-desktop kernel: [14723.976103] ata1: hard resetting link
Dec 7 23:10:51 erik-desktop kernel: [14724.461023] ata1: softreset failed (device not ready)
Dec 7 23:10:51 erik-desktop kernel: [14724.461034] ata1: applying SB600 PMP SRST workaround and retrying
Dec 7 23:10:52 erik-desktop kernel: [14724.625288] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 7 23:10:52 erik-desktop kernel: [14724.649901] ata1.00: configured for UDMA/133
Dec 7 23:10:52 erik-desktop kernel: [14724.649911] ata1.00: device reported invalid CHS sector 0
Dec 7 23:10:52 erik-desktop kernel: [14724.64...

Read more...

Redsandro (redsandro) wrote :

I never believed in anything other than the cheapest SATA cables, but for me the problems went away after using a thicker more expensive SATA cable with firm braced connectors.

I still consider this a bug though. If there are communication problems with your death-ray hardware, you expect it to chill accordingly, and not shoot lasers through the milky way making random planets 'unreadable' and continue doing so.

Erik1984 (erik1984) wrote :

Another one of those crashes/freezes. This time not only READ but also WRITE FPDMA queued. For the rest the same symptoms: System freezes > hard reset required > after reset everything seems fine.

Distributor ID: Ubuntu
Description: Ubuntu 10.04.3 LTS
Release: 10.04
Codename: lucid
Kernel: 2.6.32-36-generic-pae #79-Ubuntu SMP Tue Nov 8 23:25:26 UTC 2011 i686 GNU/Linux

Attached the relevant lines.

I'm having the exact same issues on 8 machines. :-(
Various kernels tested. I placed new disks in the servers. 12 hours later first errors appeared.
Quite annoying. I have switched now to ext3 filesystem because people here wrote the errors will appear only on ext4.
So I'll cross now fingers.

Download full text (64.4 KiB)

Switching back to ext3 made the situation overall more worse.
So it's most probably a problem with mainboard chipset:

Here is output of syslog:

[102980.640120] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[102980.640211] ata1.01: failed command: READ DMA EXT
[102980.640272] ata1.01: cmd 25/00:08:48:d9:5f/00:00:4c:00:00/f0 tag 0 dma 4096 in
[102980.640275] res 40/00:01:00:00:00/00:00:00:00:00/50 Emask 0x4 (timeout)
[102980.640437] ata1.01: status: { DRDY }
[102985.690287] ata1: link is slow to respond, please be patient (ready=0)
[102990.699161] ata1: device not ready (errno=-16), forcing hardreset
[102990.699177] ata1: soft resetting link
[102992.730910] ata1.00: configured for UDMA/100
[102992.780911] ata1.01: configured for UDMA/100
[102992.780925] ata1.01: device reported invalid CHS sector 0
[102992.780946] ata1: EH complete
[103481.841796] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[103481.841862] ata1.01: failed command: READ DMA EXT
[103481.841922] ata1.01: cmd 25/00:10:60:d1:61/00:00:4c:00:00/f0 tag 0 dma 8192 in
[103481.841926] res 40/00:01:00:00:00/00:00:00:00:00/50 Emask 0x4 (timeout)
[103481.842101] ata1.01: status: { DRDY }
[103486.882573] ata1: link is slow to respond, please be patient (ready=0)
[103491.861492] ata1: device not ready (errno=-16), forcing hardreset
[103491.861510] ata1: soft resetting link
[103497.182613] ata1: link is slow to respond, please be patient (ready=0)
[103499.900924] ata1.00: configured for UDMA/100
[103499.940932] ata1.01: configured for UDMA/100
[103499.940948] ata1.01: device reported invalid CHS sector 0
[103499.940968] ata1: EH complete
[103780.080121] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[103780.080187] ata1.01: failed command: READ DMA EXT
[103780.080247] ata1.01: cmd 25/00:08:d0:a3:b8/00:00:4c:00:00/f0 tag 0 dma 4096 in
[103780.080251] res 40/00:01:00:00:00/00:00:00:00:00/50 Emask 0x4 (timeout)
[103780.080413] ata1.01: status: { DRDY }
[103785.121943] ata1: link is slow to respond, please be patient (ready=0)
[103790.100931] ata1: device not ready (errno=-16), forcing hardreset
[103790.100949] ata1: soft resetting link
[103795.422766] ata1: link is slow to respond, please be patient (ready=0)
[103799.760752] ata1.00: configured for UDMA/100
[103799.803461] ata1.01: configured for UDMA/100
[103799.803478] ata1.01: device reported invalid CHS sector 0
[103799.803497] ata1: EH complete
[104378.040117] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[104378.040183] ata1.01: failed command: READ DMA EXT
[104378.040265] ata1.01: cmd 25/00:40:80:3b:d6/00:00:46:00:00/f0 tag 0 dma 32768 in
[104378.040270] res 40/00:01:00:00:00/00:00:00:00:00/50 Emask 0x4 (timeout)
[104378.040445] ata1.01: status: { DRDY }
[104383.080075] ata1: link is slow to respond, please be patient (ready=0)
[104388.060071] ata1: device not ready (errno=-16), forcing hardreset
[104388.060086] ata1: soft resetting link
[104390.171132] ata1.00: configured for UDMA/100
[104390.210829] ata1.01: configured for UDMA/100
[104390.210847] ata1.01: device reported invalid CHS sector 0
[104390.210868] ata1: EH complete...

Agesp Anonymous (agespam) wrote :

Same here:

- The previous, aged Ubuntu 9.10 (Karmic Koala) worked without trouble.
- Fresh install from CD: Ubuntu 11.10 (Oneiric Ocelot)

After copying from backups (from several different medias ofcourse) the same situation can be seen:
See http://skalaria.japo.fi/HDD-errors.txt

In my case the machine hung on logout (Ok, i should not repeat all that we already know...let's search for the real culprit)

What the error messages tells to me?:

failed command: READ FPDMA QUEUED
- I understand this is the command is sent to the HD.
- As far as i know/remember, DRDY refers to "Data Ready" signal on microcontrollers. Nothing to it...
- "cmd" is the sent command (shown in HEX there) to the disk controller and the "res" might refer to "response" (as i see it) that is also shown translated in the syslog: "(timeout)"

The latest "timeout" combined the others idea of changing HD drives and power supplies, sounds mostly something like CONTROLLER TIMING PROBLEM (or accompanied POWER problem) to me. Why some other HDD works, is explained by this: they have slightly different timing.

And no, it is not the hardware failing, because it affects everyone with different hardware specs. How the drive is actually controlled? I don't know. But it is because of the kernel update/upgrade, not the hardware change, i'm roughly 100% sure to tell it must be the kernel code that has changed. Nothing else. It is possibly setting/driving the controller wrong, or overlapping the data, think about HDD is already responding. What happens? Bits fall.

What we can do is to get the kernel developers to check the code from when the problem appeared first time. I can confirm that SATA drive is affected, and not PATA. This is critical because it corrupts a data.

Anyway, i could roughly say this problem has nothing to do with:
- Power supply, PATA, SATA, or other HDD drives, replacement of them (several setups)
- the idea of copying data or accessing disk drives in general (f.e. 2 simultaneous cp commands shouldn't affect)
- the type of filesystem (ext3/ext4)
- Static electricity, asteroids or cosmic rays (a connection cut and open pins would - but after Ubuntu upgrade?)

Rather something like:
- HDD drivers (failing kernel code, modules), overlapping HDD handling code?
- The method the disk is accessed
- Bus or code lock-ups/hangs by wrong settings or power control(?)
- Intermittent extra access to HDD, during (large) transfer
- Timing of signals, simultaneous try use of disk by kernel, or added/removed delay from the code

Please note this is my assumption and it is only a ROUGH GUESS what's going on. But We shouldn't be changing hardware because of that.

Shortly: One bit has changed in the kernel. Some code needs to be examined and fixed.

Willam Preston (dogbert77) wrote :

This particular issue has been plaguing my setup for quite some time. Many different hard drives were tried (different mainboard, too), but I eventually just learned to tolerate it and kept replacing hard drives after they were inevitably 'rotted' by the machine's behavior. I'm not entirely sure _what_ the system is doing to the drives, but it seems to cause serious mechanical wear over time. I think that it is causing excessive resets of the drive, and this eventually leads to degradation of the drive's surface itself.

However, at least in my case I think I found the culprit. I was attempting to copy data from a 'bad' drive to a new 2TB WD drive, and the new drive was throwing these errors as well. Stuck the new drive in an external USB enclosure and the drive copies sped up, and no errors were reported.

I think the source of my problem was some old slide-in SATA enclosures I was using on my server. I can't find the manufacturer anymore, but something about the circuitboard on these units was causing the drive problems. I think it was ONLY causing problems with newer SATA-spec drives...older, 1.5SATA drives did not seem to be throwing errors like this.

I realize I'm probably in the minority here (I doubt anyone else was using these things), but by directly connecting my drives to the SATA controller I'm not seeing the errors anymore. This may shed light on what the real issue is, but I don't know the kernel innards well enough to venture a guess as to what the dysfunction is.

Ryan Finnin Day (rday) wrote :

This problem recently appeared on my Acer Aspire laptop running Ubuntu 11.10 oneiric.
kern.log and syslog are filling up fast with the errors and some commands (grepping through syslog) return IO Errors. Booting and sleeping take longer than usual, but the machine is still usable for network operations.

fsck and smartctl report that the disk is healthy, but I still believe that this is a hardware problem and so have ordered a replacement disk (Western Digital WD3200BEVT-22ZCT0 320GB, 5400).

Here is the lspci:

00:1f.2 SATA controller: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA AHCI Controller (rev 03) (prog-if 01 [AHCI 1.0])
 Subsystem: Acer Incorporated [ALI] Device 0146
 Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 46
 I/O ports at 18d8 [size=8]
 I/O ports at 18cc [size=4]
 I/O ports at 18d0 [size=8]
 I/O ports at 18c8 [size=4]
 I/O ports at 18e0 [size=32]
 Memory at fc304000 (32-bit, non-prefetchable) [size=2K]
 Capabilities: [80] MSI: Enable+ Count=1/4 Maskable- 64bit-
 Capabilities: [70] Power Management version 3
 Capabilities: [a8] SATA HBA v1.0
 Kernel driver in use: ahci
 Kernel modules: ahci

durilka (durilka) wrote :

Acer 5750z same issue

00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 04) (prog-if 01 [AHCI 1.0])
 Subsystem: Acer Incorporated [ALI] Device 0504
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
 Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Latency: 0
 Interrupt: pin B routed to IRQ 40
 Region 0: I/O ports at 2088 [size=8]
 Region 1: I/O ports at 209c [size=4]
 Region 2: I/O ports at 2080 [size=8]
 Region 3: I/O ports at 2098 [size=4]
 Region 4: I/O ports at 2060 [size=32]
 Region 5: Memory at c0608000 (32-bit, non-prefetchable) [size=2K]
 Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
  Address: fee0100c Data: 4159
 Capabilities: [70] Power Management version 3
  Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
  Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
 Capabilities: [a8] SATA HBA v1.0 BAR4 Offset=00000004
 Capabilities: [b0] PCI Advanced Features
  AFCap: TP+ FLR+
  AFCtrl: FLR-
  AFStatus: TP-
 Kernel driver in use: ahci
 Kernel modules: ahci

00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 04)
 Subsystem: Acer Incorporated [ALI] Device 0504
 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
 Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Interrupt: pin C routed to IRQ 7
 Region 0: Memory at c0606000 (64-bit, non-prefetchable) [size=256]
 Region 4: I/O ports at 2040 [size=32]
 Kernel modules: i2c-i801

[ 97.695518] ata1.00: exception Emask 0x0 SAct 0x59 SErr 0x0 action 0x0
[ 97.695571] ata1.00: irq_stat 0x40000008
[ 97.695609] ata1.00: failed command: READ FPDMA QUEUED
[ 97.695645] ata1.00: cmd 60/08:30:38:41:2f/00:00:20:00:00/40 tag 6 ncq 4096 in
[ 97.695646] res 41/40:08:38:41:2f/00:00:20:00:00/60 Emask 0x409 (media error) <F>
[ 97.695723] ata1.00: status: { DRDY ERR }
[ 97.695744] ata1.00: error: { UNC }
[ 97.697629] ata1.00: configured for UDMA/100
[ 97.697657] ata1: EH complete
[ 101.612365] ata1.00: exception Emask 0x0 SAct 0x1f SErr 0x0 action 0x0
[ 101.612418] ata1.00: irq_stat 0x40000008
[ 101.612456] ata1.00: failed command: READ FPDMA QUEUED
[ 101.612493] ata1.00: cmd 60/08:00:38:41:2f/00:00:20:00:00/40 tag 0 ncq 4096 in
[ 101.612494] res 41/40:08:38:41:2f/00:00:20:00:00/60 Emask 0x409 (media error) <F>
[ 101.612570] ata1.00: status: { DRDY ERR }
[ 101.612591] ata1.00: error: { UNC }
[ 101.614473] ata1.00: configured for UDMA/100
[ 101.614502] ata1: EH complete
... and so on

And I don't think this should stay in "undecided". This looks scary.

Erik1984 (erik1984) wrote :

The problem occured to me again, this time on a different setup: dual boot Windows 7 and Kubuntu 11.10 Oneiric Ocelot, so a completely different kernel than Lucid. Ran Windows 7 exclusively for a week: no apparent problems with the disk. My 2nd day with Kubuntu and after some 5 hours of uptime... boom.

I agree with #96: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/550559/comments/96 It seems to have nothing to do with the physical state of the disk or cables. Well maybe in some cases it might have been the cause but certainly not always. It's also not distro related and seems to happen on many kernel versions so I guess it's a kernel thing that persists under the radar. The combination of the Linux kernel and some type of (SATA) HDD controllers. In 2 years of Vista I never had problems with the HDD in different versions of *buntu it has been bugging me. Performed the SMART test several times and each time it reported that the HDD was healthy.

Just had this show up on an ASUS 1001P. In this case it may be failing hardware, for all I know. Getting a lot of:

Mar 2 08:31:29 boot2 kernel: [ 2958.107227] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
Mar 2 08:31:29 boot2 kernel: [ 2958.107242] ata1.00: irq_stat 0x40000008
Mar 2 08:31:29 boot2 kernel: [ 2958.107254] ata1.00: failed command: READ FPDMA QUEUED
Mar 2 08:31:29 boot2 kernel: [ 2958.107278] ata1.00: cmd 60/08:00:a8:9e:a1/00:00:12:00:00/40 tag 0 ncq 4096 in
Mar 2 08:31:29 boot2 kernel: [ 2958.107283] res 41/40:08:af:9e:a1/00:00:12:00:00/00 Emask 0x409 (media error) <F>
Mar 2 08:31:29 boot2 kernel: [ 2958.107294] ata1.00: status: { DRDY ERR }
Mar 2 08:31:29 boot2 kernel: [ 2958.107303] ata1.00: error: { UNC }
Mar 2 08:31:29 boot2 kernel: [ 2958.111521] ata1.00: configured for UDMA/133
Mar 2 08:31:29 boot2 kernel: [ 2958.111554] ata1: EH complete

both delaying bootup, and while running. This is with Ubuntu 10.04.2 and happens now with both Ubuntu kernels and a vanilla 2.6.34 that I've been using on this for a long time (tweaked to support the Eee better). It suddenly started yesterday - no such errors before - and is no persistent. However there's no such problem if I boot Lucid Puppy from USB, and then mount the partition and read/write files on it. (That I was stupid enough to let Ubuntu encrypt my home directory was a problem - that should not be the default option on install - too dangerous on hardware failure!)

Running fsck was necessary to get the partition even to mount, but even after a pass with -f -cc -k Ubuntu is unhappy booting from the partition - too busy throwing the FPDMA errors. The Windows7 partition still seems to boot with normal speed (i.e., it's always been sluggish to boot).

So on the one hand I'm allowing this is marginal hardware (the build quality on the 1001P is no where near as good as on the older ASUS 700 series, so this doesn't surprise). On the other, with all the suggestions here from others on this issue being somewhat hardware agnostic, it looks like something where the kernel drivers are too highly strung, with hopefully the prospect that tuning them to be more relaxed might result in more dependable performance. Some of the reports here suggest that, even if they're pointing to different parameters. I wonder if there's a single underlying cause in one or more of the ATA drivers.

Found and fixed the problem. In my case the kernel was trying to check the secondary GUID partition table (GPT), where there was a bad sector - in this case in the second-to-last sector on the disk (and outside of all partitions). Except on my system there also was no GUID partition table - no primary GPT, no secondary GPT, as it's using a standard old msdos partition table. Parted has no problem seeing that as the case. But the kernel - some bright programmer thought the kernel should not only check for a primary GPT, but even after finding none there, and even after booting using the msdos partition table, that the kernel should obsess if it finds a bad sector where the secondary GPT would be - if the system even had one - and try to read that sector again and again, thus crippling the system for no good reason at all.

Anyway, "hdparm --repair-sector ######## --yes-i-know-what-i-am-doing /dev/sda" totally fixed the problem.

In summary, GPT support has only shown up in more recent kernels. So if other READ FPDMA QUEUED bug reports are right that it's more recent kernels implicated, there's some chance that this brain-dead GPT code has bitten more people than me. The question would be: is the sector you're having trouble with after or before those allocated to your partitions? If so, that would explain why running e2fsck -c isn't going to fix them, and why the problem will persist even if your disk looks clean. You can as it turns out confirm the bad sector with smartctl, and it's folks on the list for that who pointed me towards the problem with GPT - which of course I wasn't looking for because the system that had the problem has never had a GPT.

Turns out not to be the kernel in my case, but udisks-daemon which is doing the persistent polling of the non-existent GTP on the bad sector, and thus severly retarding boot times and performance.

I've filed a bug on that:

https://bugs.launchpad.net/ubuntu/+source/udisks/+bug/946565

dayzman (dayzman) wrote :

This bug should be marked as being important. I'm affected by this bug and because of this, my grub is broken and is unrepairable.

dayzman (dayzman) wrote :

OK. I've just fixed mine after struggling for the last few days: it was just a bad SATA cable.

Pinky (m-algoe) wrote :

Two years later and still nobody assigned?

I did not have this problem with 10.04 installed around feb 2011. I never upgraded the kernel and it ran fine. Yesterday i decided to go for 12.04 and the problem appeared. Tried going back to 10.04, but the only version I get my hands on now, 10.04.4, also seems to have the problem.
A lot of googling later it seems that lot's of people have this problem, not just in ubuntu. No simple solution in sight it seems. I have checked my drives and cables.

Pinky (m-algoe) wrote :

I found a workaround that worked for me at the moment at least.
Disable acpi by editing /etc/default/grub and add "acpi=off pci=noacpi" to GRUB_CMDLINE_LINUX_DEFAULT. I'll report if the errors come back.

P. S. (sadowsky46) wrote :

Indeed, we need to put more "fire" to this bug. It appeared out of the blue today on my Aspire 1810TZ (Laptop). For the first time in 1 year of flawless operation with Ubuntu 10.04 wrote a file of 2.6GB size. The error occurs persistently when I try to read this particular huge file again. This is a HW completely different from the OP's. And please - this is no "cable" bug ;-)

Redsandro (redsandro) wrote :

Quote: "And please - this is no "cable" bug ;-)"

As you can read in the comments, for me and others the problems went away after using a thicker more expensive SATA cable with firm braced connectors.

The logical conclusion would be that the same effect can be observed with two different causes (software and hardware) and I think it's important to acknowledge that. It might be easier to find out the cause (bug in newer kernels) when we know physical I/O problems trigger the same errors and messages.

pepre (me-pepre) wrote :

I switched around the cables in my RAID5 with no effect (sdc gets the error).

I switched around the HDs with no effect (sdc gets the error).

Since

for i in a b c d e ; do echo 1 > /sys/block/sd$i/device/queue_depth ; done

in rc.local the error appears rarely. But under stress (reading large files fast) it's reproducible.

Everything works fine with archlinux.

crashbit, thank you for reporting this and helping make Ubuntu better. This bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? Can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/ .

If it remains an issue, could you run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux <replace-with-bug-number>

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
P. S. (sadowsky46) wrote :

Shame on me... on my machine this issue was in fact an issue of the HDD: some not-readable sectors. Strange, the "long" self-test of SMART did report no errors. Then I ran the WD proprietary HD checker. It also reported "no error". But after the tool ran the issue is gone. Obviously it did silently fix the bad blocks from the spare area.
So in the end, also not a SW bug on my side.

Steve Franks (bahamasfranks) wrote :

@penalvch :

It's happening to me right now on 3.1.8, on a month-old Lenovo E420 and a brand-new OCZ Virtex2 60GB SSD. Is that upstream enough for ya?

Steve

Apr 9 12:01:07 fyre kernel: [ 104.803190] ata1: hard resetting link
Apr 9 12:01:07 fyre kernel: [ 105.129298] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Apr 9 12:01:07 fyre kernel: [ 105.200236] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Apr 9 12:01:07 fyre kernel: [ 105.200246] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
Apr 9 12:01:07 fyre kernel: [ 105.220215] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Apr 9 12:01:07 fyre kernel: [ 105.220223] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
Apr 9 12:01:07 fyre kernel: [ 105.230597] ata1.00: configured for UDMA/133
Apr 9 12:01:07 fyre kernel: [ 105.230965] ata1.00: device reported invalid CHS sector 0
Apr 9 12:01:07 fyre kernel: [ 105.230971] ata1.00: device reported invalid CHS sector 0
Apr 9 12:01:07 fyre kernel: [ 105.230976] ata1.00: device reported invalid CHS sector 0
Apr 9 12:01:07 fyre kernel: [ 105.230982] ata1.00: device reported invalid CHS sector 0
Apr 9 12:01:07 fyre kernel: [ 105.230987] ata1.00: device reported invalid CHS sector 0
Apr 9 12:01:07 fyre kernel: [ 105.231006] ata1: EH complete
steve@fyre:~$ uname -a
Linux fyre 3.1.8-030108-generic-pae #201201061759 SMP Fri Jan 6 23:15:58 UTC 2012 i686 GNU/Linux
steve@fyre:~$

Steve Franks (bahamasfranks) wrote :

@penalvch :

> apport-collect -p linux 550559

"You are not the reporter or subscriber of this problem report, or the report is a duplicate or already closed.
Please create a new report using "apport-bug"."

Then tried apport-bug and it complains that I'm not running a ubuntu kernel, which kinda makes sense since it's upstream like you asked. Go figure. If you want your insider info, you're gonna have to help me fool my system into getting it for you...

Steve

Steve Frank, please execute the following at the Terminal and feel free to subscribe me to it:
ubuntu-bug linux

Thanks!

Steve Frank = Steve Franks

PieroCampa (piero-campa) wrote :

Same problem here.
Sometime I get very long boots.
Very. Very. Long..
I attach my whole dmesg, whereas here I report a significant extract:
     ...
     811 [ 14.297122] EXT4-fs (sda5): INFO: recovery required on readonly filesystem
     812 [ 14.297130] EXT4-fs (sda5): write access will be enabled during recovery
     813 [ 186.461499] EXT4-fs (sda5): recovery complete
     814 [ 186.934519] EXT4-fs (sda5): mounted filesystem with ordered data mode. Opts: (null)
     815 [ 270.832089] ata1.00: exception Emask 0x0 SAct 0x6003fffe SErr 0x0 action 0x6 frozen
     816 [ 270.832103] ata1.00: failed command: READ FPDMA QUEUED
     817 [ 270.832113] ata1.00: cmd 60/00:08:00:10:4d/04:00:0e:00:00/40 tag 1 ncq 524288 in
     818 [ 270.832114] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
     ...

Running Ubuntu Oneiric + LXDE on Dell XPS Studio 1640 Laptop, with following HDD:

    ATA device, with non-removable media
 Model Number: WDC WD3200BJKT-75F4T0
 Serial Number: WD-WXM209AS1908
 Firmware Revision: 11.01A11
 Transport: Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5

Thanks for all your work to help us.

PieroCampa, please execute the following at the Terminal and feel free to subscribe me to it:
ubuntu-bug linux

Thanks!

I'm also seeing "failed command: WRITE FPDMA QUEUED" using a GA 990FXA-UD7 board with Crucial m4 SSDs connected to the Southbridge SATA controller (ATI SB950) on oneiric . Sometimes this error would crash the machine (the RAID5 module actually), sometimes it will just reset the SATA link and keep going.

A workaround that solves the issue for me is to go to the BIOS and disable "SATA3.0 Mode" for the southbridge SATA. This reduces the SATA link speed to 3Gbps, although Linux still reports 6Gpbs in the kernel log.

I'm using a SATA drive bay from Thermaltake. Maybe the additional connectors in there degrade the SATA link in a way that makes SATA3 unreliable. This is just speculation, I didn't test it when the drives are directly connected to the mainboard.

Christoph Dwertmann, please execute the following at the Terminal and feel free to subscribe me to it:
ubuntu-bug linux

Thanks!

yemu (yemu) wrote :

same thing happens here on a OCZ Vertex 3 with Asrock 770 Extereme 3 board. the system randomly freezes for a couple of seconds and then I see errors in the log.
...
ata6.00: exception Emask 0x0 SAct 0x1fffff SErr 0x0 action 0x6 frozen
ata6.00: failed command: READ FPDMA QUEUED
ata6.00: cmd 60/08:00:d0:d4:e6/00:00:00:00:00/40 tag 0 ncq 4096 in
res 40/00:01:00:00:00/00:00:00:00:00/e0 Emask 0x4 (timeout)
...
and so on

I'm using Precise upgraded on 26.04.2012 - kernel 3.2.0-23-generic-pae.
y

yemu, please execute the following at the Terminal and feel free to subscribe me to it:
ubuntu-bug linux

Thanks!

yemu (yemu) wrote :

@Christopher: thanks, I've just created new bug report as you said and subscribed you to it.

I have been just been hit by this issue, with the exact same messages as the OP. This is on Dell E6410 2 year old with Ubuntu 12.04 completely updated.

The system boots but it is so slow it is useless. I'm trying to report a bug from the system but is is so slow that it is becoming almost impossible, I started with the ubuntu-bug but I will try to do it without the GUI and attach the files

The new bug is bug 1002670 , after 45 minutes I have been able to send it from the affected system. If this is is a bug it is really serious in my opinion.

If anybody knows of a workaround to stop the kernel checking the disk and make the system usable...

ads (garboge) wrote :

I've had freeze ups for a while now and I seem to remember it occurring after a kernel upgrade a while back. Would generally have to reboot and sometimes boot from a live install CD to fix disk errors. The recent upgrade to Lucid kernel 3.0.0-19 seems to have remedied my situation

yemu (yemu) wrote :

after searching forums for the solution to my problem for the last couple of days, I think I may have found what's causing freezes (at least for the SSD drives). I'm still testing it to check for 100% if this is the case, but for now I believe that freezes are caused by the driver using the NCQ to communicate with SSD (which is of course pointless - SSD drives don't need to optimze the order of disk operationsm apparently it causes some errors).

The point is that after disabling NCQ for the SSD the errors stoped occuring (I've not encountered any for about 48h - earlier te freezes happened every couple of hours or even more often).

To disable NCQ I added "libata.force=5:noncq" to default kernel options in /etc/default/grub like this:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash libata.force=5:noncq"

and updated grub.

I used "5:" because errors i saw occured with ata5 info in dmesg - in original report to this bug it was ata6, as you can see above, but it changed to ata5, probably because I was switching ports on my motherboard.

if anyone experience similar bug, please try this solution and report here.

credits for the solution go to: http://ubuntuforums.org/showpost.php?p=10480137&postcount=8

Daniel Day (danielday) wrote :

I had this problem with 10.04 some years ago. I just updated to 12.04 (and 3.2.0-26-generic kernel) and its back. I am using an older OCZ Agility SSD. Booting with NCQ disabled for that drive fixed my problem then and now, in terms of boot time. I still get an error message:

[ 2.895665] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro
[ 2.906325] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[ 2.908085] ACPI: resource piix4_smbus [io 0x0b00-0x0b07] conflicts with ACPI region SOR1 [io 0xb00-0xb0f]
[ 2.908087] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver

I have no idea how to handle that one.
Cheers.

Tulasi (murtulasi) wrote :

I have oneiric on my IBM lenovo and today i have this problem of failed command: READ FPDMA QUEUED when iam trying to boot . How to solve this issue ? Any pointers will be helpful.

Thanks.

Tulasi, could you please file a new report by executing the following in a terminal:
ubuntu-bug linux

For more on this, please see https://help.ubuntu.com/community/ReportingBugs#Bug_Reporting_Etiquette . If you do file a new report, please feel free to subscribe me to it. Thank you for your understanding.

Helpful Bug Reporting Links:
https://help.ubuntu.com/community/ReportingBugs#A3._Make_sure_the_bug_hasn.27t_already_been_reported
https://help.ubuntu.com/community/ReportingBugs#Adding_Apport_Debug_Information_to_an_Existing_Launchpad_Bug
https://help.ubuntu.com/community/ReportingBugs#Adding_Additional_Attachments_to_an_Existing_Launchpad_Bug

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Laurent Dinclaux (dreadlox) wrote :

What is wrong with that bug report ? There are 130 comments, about 80 affected people, also the issue has been narrowed to NCQ (disabling it is a workaround), that bug even prevents people facing it to properly install Ubuntu.

Isn't that "complete" enough ??

pepre (me-pepre) wrote :

> Isn't that "complete" enough?

I don't know why Christopher incompletes it all the time; perhaps:

"We don't need to think about things that are not possible."

;-)

Just for completeness:

since installing 12.04.1 server (adding fluxbox) the bug doesn't appear any more.

SW RAID5 with

SATA controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode]
SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s controller (rev 11)

Pepre, if you have a bug in Ubuntu, could you please file a new report by executing the following in a terminal:
ubuntu-bug linux

For more on this, please see the Ubuntu Bug Control and Ubuntu Bug Squad article:
https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue

and Ubuntu Community article:
https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette

When opening up the new report, please feel free to subscribe me to it.

Please note, not filing a new report may delay your problem being addressed as quickly as possible.

Thank you for your understanding.

Bernhard (baumber) wrote :

Hello,

I have the same problem on an ASUS P7P55D-E PRO mainboard with the Marvell
88SE9123 PCIe SATA 6.0 onboard controller and two WDC WD1002FAEX-00Z3A0 Sata 6.0 disks.

=> READ and WRITE freezes with SATA 6.0Gbps and NCQ; Limiting to SATA 3.0Gbps without NCQ gives me a stable system.
( kernel parameter at grub: libata.force=7:noncq,3.0Gbps,8:noncq,3.0Gbps )

There is a kernel thread with the problem and I added my description to it:

https://bugzilla.kernel.org/show_bug.cgi?id=43160

Best regards, Bernhard

pepre (me-pepre) wrote :

> Christopher M. Penalver (penalvch) wrote on 2012-11-01:

> > Pepre, if you have a bug in Ubuntu, could you please file a
> > new report by executing the following in a terminal:
> > ubuntu-bug linux

Ok, done. See:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1077718

:-)

I have the same problem, but running Debian 6.0.6 with 2.6 kernel. I have tried two different motherboards, different cables etc. but the problem is always there with the SSD SATA3 drive (that works perfectly in Windows 7 with the same hardware). I have SATA3 HDDs on the same machine that work without problems. So it is either a misunderstanding between the hardware designers and kernel coders or then many SSD drives are just faulty somehow. I wouldn't be surprised if they all use some common component that causes these problems.

Oh, forgot to mention in comment #135 that the SSD drive is connected to Intel SATA3 port.

00:1f.2 SATA controller: Intel Corporation Device 1e02 (rev 04)

igor (icicimov-gmail) wrote :
Download full text (6.9 KiB)

Same problem here after upgrade to 12.04 I think. It's on 1TB Seagate drive that I have running for 4 and a half years now in Mythbuntu setup. It's the OS drive ufortunatelly, I have another WD 2TB which is fine. Started experiencing Mythtv freezes and crashes lately so thought to do a SMARt test and found the errors. AHCI mode enbaled in BIOS for the drive pre install.

00:11.0 SATA controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] (prog-if 01 [AHCI 1.0])
 Subsystem: Giga-byte Technology Device b002
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
 Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Latency: 32
 Interrupt: pin A routed to IRQ 42
 Region 0: I/O ports at ff00 [size=8]
 Region 1: I/O ports at fe00 [size=4]
 Region 2: I/O ports at fd00 [size=8]
 Region 3: I/O ports at fc00 [size=4]
 Region 4: I/O ports at fb00 [size=16]
 Region 5: Memory at fe02f000 (32-bit, non-prefetchable) [size=1K]
 Capabilities: <access denied>
 Kernel driver in use: ahci

[ 9397.726911] ata1.00: exception Emask 0x0 SAct 0x1ff SErr 0x0 action 0x0
[ 9397.726918] ata1.00: irq_stat 0x40000008
[ 9397.726926] ata1.00: failed command: READ FPDMA QUEUED
[ 9397.726941] ata1.00: cmd 60/38:20:6e:29:7a/00:00:1b:00:00/40 tag 4 ncq 28672 in
[ 9397.726950] ata1.00: status: { DRDY ERR }
[ 9397.726955] ata1.00: error: { UNC }
[ 9397.749426] ata1.00: configured for UDMA/133
[ 9397.749455] ata1: EH complete
[ 9532.787964] ata1.00: exception Emask 0x0 SAct 0xfc SErr 0x0 action 0x0
[ 9532.787972] ata1.00: irq_stat 0x40000008
[ 9532.787981] ata1.00: failed command: READ FPDMA QUEUED
[ 9532.787996] ata1.00: cmd 60/e0:10:f6:2b:64/00:00:1b:00:00/40 tag 2 ncq 114688 in
[ 9532.788025] ata1.00: status: { DRDY ERR }
[ 9532.788031] ata1.00: error: { UNC }
[ 9532.867447] ata1.00: configured for UDMA/133
[ 9532.867475] ata1: EH complete
[ 9536.952430] ata1.00: exception Emask 0x0 SAct 0x3e SErr 0x0 action 0x0
[ 9536.952438] ata1.00: irq_stat 0x40000008
[ 9536.952446] ata1.00: failed command: READ FPDMA QUEUED
[ 9536.952460] ata1.00: cmd 60/e0:28:f6:2b:64/00:00:1b:00:00/40 tag 5 ncq 114688 in
[ 9536.952469] ata1.00: status: { DRDY ERR }
[ 9536.952474] ata1.00: error: { UNC }
[ 9536.974938] ata1.00: configured for UDMA/133
[ 9536.974964] ata1: EH complete
[ 9541.683446] ata1.00: exception Emask 0x0 SAct 0x1fff SErr 0x0 action 0x0
[ 9541.683454] ata1.00: irq_stat 0x40000008
[ 9541.683463] ata1.00: failed command: READ FPDMA QUEUED
[ 9541.683477] ata1.00: cmd 60/e0:00:f6:2b:64/00:00:1b:00:00/40 tag 0 ncq 114688 in
[ 9541.683487] ata1.00: status: { DRDY ERR }
[ 9541.683492] ata1.00: error: { UNC }
[ 9541.705986] ata1.00: configured for UDMA/133
[ 9541.706020] ata1: EH complete
[48304.968186] ata1.00: exception Emask 0x0 SAct 0x3ff SErr 0x0 action 0x0
[48304.968195] ata1.00: irq_stat 0x40000001
[48304.968203] ata1.00: failed command: READ FPDMA QUEUED
[48304.968218] ata1.00: cmd 60/20:00:9e:73:a2/00:00:1b:00:00/40 tag 0 ncq 16384 in
[48304.968228] ata1.00: status: { DRDY ERR }
[48304.968232] ata1.00: error: { ABRT }
[48304.968238] ata1.00: faile...

Read more...

igor, if you have a bug in Ubuntu, could you please file a new report by executing the following in a terminal:
ubuntu-bug linux

For more on this, please see the Ubuntu Kernel team article:
https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies#Filing_Kernel_Bug_reports

the Ubuntu Bug Control team and Ubuntu Bug Squad team article:
https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue

and Ubuntu Community article:
https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette

When opening up the new report, please feel free to subscribe me to it.

Please note, not filing a new report may delay your problem being addressed as quickly as possible.

Thank you for your understanding.

Phillip Susi (psusi) wrote :

This bug should have expired a while ago due to lack of response from the original reporter. It appears that several other people have piled on different and unrelated issues. If they are still having issues, they should file separate bug reports.

Changed in linux (Ubuntu):
status: Incomplete → Invalid
Vincent (vincent-voyer) wrote :

I would like to say that comment #126 was right, in my case at least.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/550559/comments/126

Try disabling ncq for your SSD drive before anything else.

I also tried latest kernels (3.7) with no change.

Vincent, if you have a bug in Ubuntu, could you please file a new report by executing the following in a terminal:
ubuntu-bug linux

For more on this, please see the Ubuntu Kernel team article:
https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies#Filing_Kernel_Bug_reports

the Ubuntu Bug Control team and Ubuntu Bug Squad team article:
https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue

and Ubuntu Community article:
https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette

When opening up the new report, please feel free to subscribe me to it.

Please note, not filing a new report may delay your problem being addressed as quickly as possible.

Thank you for your understanding.

tigrez (davide-bernardo) wrote :

I've also this problem, I've solved using partition ext2 and ext3. Maybe and ext4 bug?

tigrez, if you have a bug in Ubuntu, the Ubuntu Kernel team, Ubuntu Bug Control team, and Ubuntu Bug Squad would like you to please file a new report by executing the following in a terminal:
ubuntu-bug linux

For more on this, please see the Ubuntu Kernel team article:
https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies#Filing_Kernel_Bug_reports

the Ubuntu Bug Control team and Ubuntu Bug Squad team article:
https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue

and Ubuntu Community article:
https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette

When opening up the new report, please feel free to subscribe me to it.

Please note, not filing a new report would delay your problem being addressed as quickly as possible.

Thank you for your understanding.

pepre (me-pepre) wrote :

NB: since using lowlatency-kernel and disabling NCQ the problem disappeared too :-)

Adrian Jones (adriqn) wrote :

I have had this problem, both with read and write errors but I have only recently noticed it after doing a new install on a headless server and leaving the screen plugged in. I have not had any crashed, data loss or data corruption.
Interestingly I have 3 HP ML150 servers but only 2 report this error the spec of each server is as follows:
server 1
Operon quad core 2.2
8GB RAM
2 additional PCI-e ethernet cards
4 x 250GB SATA HDD in software RAID10
Ubuntu server 12.04 64bit

server 2
Operon quad core 2.2
4GB RAM
1 additional pci-e ethernet card
4 x 1.5TB SATA HDD in software RAID10
Ubuntu server 10.04 64bit

server 3
Operon quad core 2.2
2GB RAM
0 additional pci/pci-e cards
4 x 1.5TB SATA HDD in software RAID10
Ubuntu server 10.04 64bit

The Bios settings are all identical. Servers 1 & 2 both have this error, the 250GB drive are getting on a bit but the 1.5TB are brand new. I have tried changing the cables with no effect. The error has been reported on all 4 drives for each machine.

Server 3 has not had any reports. The only other difference is the PSU has been replaced in server 3 for a cheap one from amazon!

This has lead me to think that is is a power related issue, but since it does not seem to have any impact on performance or data integrity I am not sure I need to be concerned??

 My next plan is to use the drive from either server 1 or 2 and put them in server 3 to see if I get any errors.

Adrian Jones, if you have a bug in Ubuntu, the Ubuntu Kernel team, Ubuntu Bug Control team, and Ubuntu Bug Squad would like you to please file a new report by executing the following in a terminal:
ubuntu-bug linux

For more on this, please see the Ubuntu Kernel team article:
https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies#Filing_Kernel_Bug_reports

the Ubuntu Bug Control team and Ubuntu Bug Squad team article:
https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue

and Ubuntu Community article:
https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette

When opening up the new report, please feel free to subscribe me to it.

Please note, not filing a new report would delay your problem being addressed as quickly as possible.

Thank you for your understanding.

Dmitriy Altuhov (altuhov.su) wrote :

Same at server HP ProLiant DL320e Gen8 v2, BIOS P80 08/28/2013 with two hard drives ST500NM0011 and ST2000NC001-1DY164

Problems only with ST2000NC001! ST500NM0011 working fine.

Dmitriy Altuhov, as this report is Status Invalid, please file a new report via a terminal:
ubuntu-bug linux

pepre (me-pepre) wrote :

After removing marvell-card (now SiI 3132, sata_sil24 module) this error and various other problems (e.g. "gpu has fallen of the bus", early freezes, nic sometimes not available) are gone. Looks like an incompatibility between marvell and board [Asrock A770DE+]).

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.