Ubuntu

western digital WD800ADFS ncq problems

Reported by SimonR on 2007-10-01
12
Affects Status Importance Assigned to Milestone
Ubuntu
Undecided
Unassigned
linux (Ubuntu)
Low
Unassigned
linux-source-2.6.22 (Ubuntu)
Medium
Unassigned

Bug Description

Binary package hint: linux-source-2.6.22

hi,

we are using gutsy gibbon (up-to-date) and are experiencing ncq problems
with western digital WD800ADFS (Raptor, OEM version) drives, since ncq is
enabled by default and the WD740ADFD is already blacklisted, i kindly ask
you to also add the WDC WD800ADFS-07SLR4 to the libata ncq blacklist.

see also this bug report: #131633 in linux-source-2.6.22 ("spurious completions during NCQ")

fyi, i also posted this to lkml: http://marc.info/?l=linux-kernel&m=119127129831497&w=2

(please cc me, as i´m not subscribed to linux-kernel/ide - thanks)

hi,

we have some Fujitsu Siemens Celsius M450 workstations (Intel 975X Express chipset)
running Linux Kernel 2.6.22 (Debian, Ubuntu) on an md raid 1 using two Western Digital
Raptor drives, it´s the WD800ADFS which is an oem version (you will find them in
workstations from HP, IBM, Dell, Fujitsu-Siemens,... and they have different model
numbers) using ahci and ncq we are experiencing low performace, and many
hsm violations, sometimes also timeouts, so i think this drive should be added
to the libata ncq blacklist, as the WD740ADFD is already blacklisted and the
Raptor series generally seems to have some ncq issues...

kind regards,
simon.

drive information:

/dev/sda:

ATA device, with non-removable media
        Model Number: WDC WD800ADFS-07SLR4
        Serial Number: WD-WMANS1******
        Firmware Revision: 21.07QR4
Standards:
        Used: ATA/ATAPI-7 published, ANSI INCITS 397-2005
        Supported: 7 6 5 4
Configuration:
        Logical max current
        cylinders 16383 16383
        heads 16 16
        sectors/track 63 63
        --
        CHS current addressable sectors: 16514064
        LBA user addressable sectors: 156301488
        LBA48 user addressable sectors: 156301488
        device size with M = 1024*1024: 76319 MBytes
        device size with M = 1000*1000: 80026 MBytes (80 GB)
Capabilities:
        LBA, IORDY(can be disabled)
        Queue depth: 32
        Standby timer values: spec'd by Standard, with device specific minimum
        R/W multiple sector transfer: Max = 16 Current = 16
        Recommended acoustic management value: 128, current value: 254
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
             Cycle time: min=120ns recommended=120ns
        PIO: pio0 pio1 pio2 pio3 pio4
             Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
        Enabled Supported:
           * SMART feature set
                Security Mode feature set
           * Power Management feature set
           * Write cache
           * Look-ahead
           * Host Protected Area feature set
           * WRITE_BUFFER command
           * READ_BUFFER command
           * NOP cmd
           * DOWNLOAD_MICROCODE
                Power-Up In Standby feature set
           * SET_FEATURES required to spinup after power up
                SET_MAX security extension
           * Automatic Acoustic Management feature set
           * 48-bit Address feature set
           * Device Configuration Overlay feature set
           * Mandatory FLUSH_CACHE
           * FLUSH_CACHE_EXT
           * SMART error logging
           * SMART self-test
           * General Purpose Logging feature set
           * 64-bit World wide name
           * SATA-I signaling speed (1.5Gb/s)
           * SATA-II signaling speed (3.0Gb/s)
           * Native Command Queueing (NCQ)
           * Host-initiated interface power management
           * Phy event counters
                DMA Setup Auto-Activate optimization
                Device-initiated interface power management
           * Software settings preservation
           * SMART Command Transport (SCT) feature set
           * SCT Long Sector Access (AC1)
           * SCT LBA Segment Access (AC2)
           * SCT Error Recovery Control (AC3)
           * SCT Features Control (AC4)
           * SCT Data Tables (AC5)
                unknown 206[12]
Security:
        Master password revision code = 65534
                supported
        not enabled
        not locked
                frozen
        not expired: security count
        not supported: enhanced erase
Checksum: correct

some of the kernel messages:

[70747.193717] ata3.00: exception Emask 0x2 SAct 0xfe00 SErr 0x0 action 0x2 frozen
[70747.193725] ata3.00: (spurious completions during NCQ issue=0x0 SAct=0xfe00 FIS=004040a1:00000100)
[70747.193734] ata3.00: cmd 61/20:48:d2:9d:24/00:00:08:00:00/40 tag 9 cdb 0x0 data 16384 out
[70747.193736] res 40/00:78:42:ae:24/00:00:08:00:00/40 Emask 0x2 (HSM violation)
...
[70747.193789] ata3.00: cmd 61/08:78:42:ae:24/00:00:08:00:00/40 tag 15 cdb 0x0 data 4096 out
[70747.193791] res 40/00:78:42:ae:24/00:00:08:00:00/40 Emask 0x2 (HSM violation)
[70747.502219] ata3: soft resetting port
[70747.673864] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[70747.678834] ata3.00: configured for UDMA/133
[70747.678853] ata3: EH complete

[143055.609784] ata4.00: exception Emask 0x2 SAct 0xe811 SErr 0x0 action 0x2 frozen
[143055.609800] ata4.00: (spurious completions during NCQ issue=0x0 SAct=0xe811 FIS=004040a1:00000080)
[143055.609809] ata4.00: cmd 61/10:00:02:68:56/00:00:00:00:00/40 tag 0 cdb 0x0 data 8192 out
[143055.609811] res 40/00:78:6a:d7:59/00:00:02:00:00/40 Emask 0x2 (HSM violation)
...
[143055.609870] ata4.00: cmd 61/08:78:6a:d7:59/00:00:02:00:00/40 tag 15 cdb 0x0 data 4096 out
[143055.609872] res 40/00:78:6a:d7:59/00:00:02:00:00/40 Emask 0x2 (HSM violation)
[143055.920558] ata4: soft resetting port
[143056.092224] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[143056.097130] ata4.00: configured for UDMA/133
[143056.097146] ata4: EH complete

[234300.421373] ata4.00: exception Emask 0x0 SAct 0x7fff SErr 0x0 action 0x2 frozen
[234300.421386] ata4.00: cmd 60/08:00:f2:fc:71/00:00:00:00:00/40 tag 0 cdb 0x0 data 4096 in
[234300.421388] res 40/00:48:92:e9:9d/00:00:04:00:00/40 Emask 0x4 (timeout)
...
[234300.421511] ata4.00: cmd 60/08:70:5a:b0:16/00:00:08:00:00/40 tag 14 cdb 0x0 data 4096 in
[234300.421513] res 40/00:88:7a:68:56/00:00:00:00:00/40 Emask 0x4 (timeout)
[234300.732730] ata4: soft resetting port
[234300.904388] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[234300.909120] ata4.00: configured for UDMA/133
[234300.909149] ata4: EH complete

CVE References

Changed in linux-source-2.6.22:
assignee: nobody → ubuntu-kernel-team
importance: Undecided → Medium
status: New → Triaged
Changed in linux-source-2.6.22:
assignee: ubuntu-kernel-team → zulcss
Tim Gardner (timg-tpi) wrote :
Tim Gardner (timg-tpi) wrote :

Gutsy commit 26fbd0074457fe6ee016de14206cf51705ecc6b0

Changed in linux-source-2.6.22:
assignee: zulcss → timg-tpi
status: Triaged → Fix Committed
Enrico Sardi (enricoss) wrote :

Hi all!

Same problem with an hitachi HD.

I posted the problem on lkml and the disk was blacklisted:

http://groups.google.it/group/linux.kernel/browse_thread/thread/a4bd3c19565a2009/389817602f0cd551?hl=it&lnk=st&q=hitachi+hsm+violation&rnum=3#389817602f0cd551

Can you add the patch in gutsy too?

Many thanks

Enrico

Tim Gardner (timg-tpi) wrote :

Somehow this one missed the boat, nor is it upstream. If it is still causing problems in Hardy, then reopen the bug report.

Changed in linux-source-2.6.22:
assignee: timg-tpi → nobody
status: Fix Committed → Won't Fix

Hi I have same problem in Hardy with WDC WD1500ADFD

I find this guide: http://inferno.slug.org/cgi-bin/wiki?action=browse&id=Western_Digital_NCQ
and its works.

Can you add that patch to Hardy?

Many thanks, Olli

Sorry.

Row has to be:

{ "WDC WD1500ADFD-00NLR5", NULL, ATA_HORKAGE_NONCQ },

I just build kernel with that and it really disable NCQ. Now my hard disk is very fast.

So can you add that patch to Hardy kernel?

Just adding a note that I'm reassigning the Ubuntu Hardy kernel source package from 'linux-source-2.6.24' to just 'linux'. Beginning with the Hardy release the package naming convention changed from linux-source-2.6.x to just linux. Sorry for any confusion.

Tim Gardner (timg-tpi) on 2008-02-12
Changed in linux:
assignee: nobody → timg-tpi
importance: Undecided → Low
milestone: none → hardy-alpha-5
status: New → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.24-8.13

---------------
linux (2.6.24-8.13) hardy; urgency=low

  [Soren Hansen]

  * Add missing iscsi modules to kernel udebs

  [Stefan Bader]

  * Lower message level for PCI memory and I/O allocation.

  [Tim Gardner]

  * Enabled IP_ADVANCED_ROUTER and IP_MULTIPLE_TABLES in sparc, hppa
    - LP: #189560
  * Compile RealTek 8139 using PIO method.
    - LP: #90271
  * Add WD WD800ADFS NCQ horkage quirk support.
    - LP: #147858

  [Upstream Kernel Changes]

  * Introduce WEXT scan capabilities
  * DVB: cx23885: add missing subsystem ID for Hauppauge HVR1800 Retail
  * slab: fix bootstrap on memoryless node
  * vm audit: add VM_DONTEXPAND to mmap for drivers that need it
    (CVE-2008-0007)
  * USB: keyspan: Fix oops
  * usb gadget: fix fsl_usb2_udc potential OOPS
  * USB: CP2101 New Device IDs
  * USB: add support for 4348:5523 WinChipHead USB->RS 232 adapter
  * USB: Sierra - Add support for Aircard 881U
  * USB: Adding YC Cable USB Serial device to pl2303
  * USB: sierra driver - add devices
  * USB: ftdi_sio - enabling multiple ELV devices, adding EM1010PC
  * USB: ftdi-sio: Patch to add vendor/device id for ATK_16IC CCD
  * USB: sierra: add support for Onda H600/Zte MF330 datacard to USB Driver
    for Sierra Wireless
  * USB: remove duplicate entry in Option driver and Pl2303 driver for
    Huawei modem
  * USB: pl2303: add support for RATOC REX-USB60F
  * USB: ftdi driver - add support for optical probe device
  * USB: use GFP_NOIO in reset path
  * USB: Variant of the Dell Wireless 5520 driver
  * USB: storage: Add unusual_dev for HP r707
  * USB: fix usbtest halt check on big endian systems
  * USB: handle idVendor of 0x0000
  * forcedeth: mac address mcp77/79
  * lockdep: annotate epoll
  * sys_remap_file_pages: fix ->vm_file accounting
  * PCI: Fix fakephp deadlock
  * ACPI: update ACPI blacklist
  * x86: restore correct module name for apm
  * sky2: restore multicast addresses after recovery
  * sky2: fix for WOL on some devices
  * b43: Fix suspend/resume
  * b43: Drop packets we are not able to encrypt
  * b43: Fix dma-slot resource leakage
  * b43legacy: fix PIO crash
  * b43legacy: fix suspend/resume
  * b43legacy: drop packets we are not able to encrypt
  * b43legacy: fix DMA slot resource leakage
  * selinux: fix labeling of /proc/net inodes
  * b43: Reject new firmware early
  * sched: let +nice tasks have smaller impact
  * sched: fix high wake up latencies with FAIR_USER_SCHED
  * fix writev regression: pan hanging unkillable and un-straceable
  * Driver core: Revert "Fix Firmware class name collision"
  * drm: the drm really should call pci_set_master..
  * splice: missing user pointer access verification (CVE-2008-0009/10)
  * Linux 2.6.24.1
  * splice: fix user pointer access in get_iovec_page_array()
  * Linux 2.6.24.2

 -- Tim Gardner <email address hidden> Thu, 07 Feb 2008 06:50:13 -0700

Changed in linux:
status: Fix Committed → Fix Released

Doesn't work yet because my hard disk model is: WDC WD1500ADFD-00NLR5 no WDC WD1500ADFD-0

Need use this patch.

Changed in linux:
status: Fix Released → Fix Committed
Tim Gardner (timg-tpi) wrote :

Hardy commit e577ce1c0e4f5d01cde2e6a8fd5bbca1b064242f

Changed in linux:
milestone: hardy-alpha-5 → hardy-alpha-6
Changed in linux-source-2.6.24:
status: New → Invalid
Pkapsc (andre-pietsch) wrote :
Download full text (4.2 KiB)

As this seems more active, I crosspost from here: https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/137470/comments/26

Hi all,

I need to disappoint some of you:
Dell Vostro 1700 with a Seagate ST980813ASG

Suggested fix in https://wiki.ubuntu.com/InstallingUbuntuOnADellVostro1700 was to turn off NCQ by piping "1" into /sys/block/sda/device/queue_depth

I did this.

Also I upgraded to 2.6.24 as explained here: http://axebase.net/blog/?p=178 (uses a script "hardy.py")
> uname -a
> Linux vostroxx 2.6.24-8-generic #1 SMP Thu Feb 14 20:40:45 UTC 2008 i686 GNU/Linux

The problem first occurred to be gone but after about half an hour of working it appeared again (see below).

What should I do now? I remember to see a patch where NCQ was turned off for ST980813AS in the kernel 2.6.24. But I do have a "ST980813ASG" (note the appended "G"). But then, I have turned off NCQ anyway by piping "1" into /sys/block/sda/device/queue_depth, didn't I?

Do I (and others) experience a wholly different problem here?

<--- snip from /var/log/dmesg --->
[ 1866.716836] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x44 (timeout)
[ 1866.716857] ata3: hard resetting link
[ 1867.352238] ata3: port is slow to respond, please be patient (Status 0x80)
[ 1867.867169] ata3: hard resetting link
[ 1867.917645] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1867.919637] ata3.00: configured for UDMA/133
[ 1867.919665] ata3: EH complete
[ 1868.035436] sd 2:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB)
[ 1868.035580] sd 2:0:0:0: [sda] Write Protect is off
[ 1868.038246] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 2371.992310] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x44 (timeout)
[ 2371.992338] ata3: hard resetting link
[ 2372.630043] ata3: port is slow to respond, please be patient (Status 0x80)
[ 2373.098527] ata3: hard resetting link
[ 2373.142150] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 2373.144818] ata3.00: configured for UDMA/133
[ 2373.144845] ata3: EH complete
[ 2373.145047] sd 2:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB)
[ 2373.145089] sd 2:0:0:0: [sda] Write Protect is off
[ 2373.145248] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 2398.484827] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x44 (timeout)
[ 2398.484859] ata3: hard resetting link
[ 2401.190281] ata3: port is slow to respond, please be patient (Status 0x80)
[ 2403.768282] ata3: hard resetting link
[ 2404.092050] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 2404.094618] ata3.00: configured for UDMA/133
[ 2404.094636] ata3: EH complete
[ 2404.094818] sd 2:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB)
[ 2404.094859] sd 2:0:0:0: [sda] Write Protect is off
[ 2404.095068] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 2558.474374] res 50/00:00:5e:32:04/00:00:00:00:00/e0 Emask 0x50 (ATA bus error)
[ 2558.474405] ata3: hard resetting link
[ 2559.612860] ata3: port is slow to respond, please be patient (Status 0x80)
[ 2560.043102] ata3: hard resetting link
[ 2560.086261] ata3...

Read more...

Pkapsc (andre-pietsch) wrote :

Hi,

some more info: A really secure way to reproduce this behaviour at least on my machine is to run a VMware 1.0.4 instance (WinXP in my case).

HTH
Andre

Pkapsc (andre-pietsch) wrote :

Some more logging info:
[ 1396.483092] res 50/00:00:d6:7d:89/00:00:00:00:00/e1 Emask 0x50 (ATA bus error)
[ 1396.483112] ata1: hard resetting link
[ 1399.421650] ata1: port is slow to respond, please be patient (Status 0x80)
[ 1403.021202] ata1: hard resetting link
[ 1403.326865] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1403.329406] ata1.00: configured for UDMA/133
[ 1403.329429] ata1: EH complete
[ 1403.402362] sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB)
[ 1403.402508] sd 0:0:0:0: [sda] Write Protect is off
[ 1403.402694] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 1404.285469] res 50/00:00:be:3a:8a/00:00:00:00:00/e1 Emask 0x10 (ATA bus error)
[ 1404.427853] ata1: soft resetting link
[ 1406.924126] ata1: port is slow to respond, please be patient (Status 0xd0)
[ 1409.360792] ata1: hard resetting link
[ 1412.483990] ata1: port is slow to respond, please be patient (Status 0x80)
[ 1415.097181] ata1: hard resetting link
[ 1415.243497] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1415.246061] ata1.00: configured for UDMA/133
[ 1415.246086] ata1: EH complete
[ 1415.300895] sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB)
[ 1415.310901] sd 0:0:0:0: [sda] Write Protect is off
[ 1415.311407] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 1881.827526] res 40/00:00:be:3a:8a/00:00:00:00:00/e1 Emask 0x44 (timeout)
[ 1881.827559] ata1: hard resetting link
[ 1884.557122] ata1: port is slow to respond, please be patient (Status 0x80)
[ 1886.676490] ata1: hard resetting link
[ 1886.839824] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1886.841754] ata1.00: configured for UDMA/133
[ 1886.841784] ata1: EH complete
[ 1886.844734] sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB)
[ 1886.846095] sd 0:0:0:0: [sda] Write Protect is off
[ 1886.846742] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 2160.086015] res 40/00:00:be:3a:8a/00:00:00:00:00/e1 Emask 0x44 (timeout)
[ 2160.086046] ata1: hard resetting link
[ 2162.142880] ata1: port is slow to respond, please be patient (Status 0x80)
[ 2163.788488] ata1: hard resetting link
[ 2163.924745] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 2163.927385] ata1.00: configured for UDMA/133
[ 2163.927408] ata1: EH complete
[ 2163.927978] sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB)
[ 2163.928337] sd 0:0:0:0: [sda] Write Protect is off
[ 2163.928628] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Tim Gardner (timg-tpi) wrote :

2.6.24-11.17

Changed in linux:
assignee: timg-tpi → nobody
status: Fix Committed → Fix Released
Pkapsc (andre-pietsch) wrote :

Hi Tim,

I take this as a "Relax, bug has been fixed in '2.6.24-11.17'!"

OK, I relax :)

If I misunderstood plz say so

Thx
Andre

Pkapsc (andre-pietsch) wrote :

Hi,

sorry, now, 2.6.24-11 did not solve it either.

See more information here as this bug-topic still is marked as fixed:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/196076

Regards
Andre

Pkapsc (andre-pietsch) wrote :

Respective comments regarding 2.6.24-11 start here
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/196076/comments/6

gsiliceo (nombre-falso) wrote :

I'm not sure, but i have this drive WD740ADFD, ubuntu hardy heron with the 2.6.24-16 kernel and i can't enable 32 bit transfer, its stuck on the low performance 16 bit.
$hdaparm /dev/sda
/dev/sda:
 IO_support = 0 (default)
16-bit)
 HDIO_GET_UNMASKINTR failed: Inappropriate ioctl for device
 HDIO_GET_DMA failed: Inappropriate ioctl for device
 HDIO_GET_KEEPSETTINGS failed: Inappropriate ioctl for device

Is this the same bug? should i file another one?

Alexander Rødseth (alexanro) wrote :

gsiliceo, from what I've read elsewhere on the net, it's supposed to look like that with hdparm. Since it's a SATA disk, "sdparm" should be the tool to use.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers