Ubuntu

devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential hardware death

Reported by theluketaylor on 2009-10-07
504
This bug affects 84 people
Affects Status Importance Assigned to Milestone
EasyPeasy Overview
High
Unassigned
Linux
Won't Fix
Medium
Nominated for 2.6.31 by Юрий Аполлов
libatasmart
Confirmed
Low
devicekit-disks (Fedora)
New
Undecided
Unassigned
devicekit-disks (Ubuntu)
Undecided
Unassigned
Declined for Dapper by Martin Pitt
Declined for Hardy by Martin Pitt
Declined for Intrepid by Martin Pitt
Declined for Jaunty by Martin Pitt
Nominated for Maverick by rogmorri
Karmic
Critical
Martin Pitt
Lucid
Undecided
Unassigned
libatasmart (Ubuntu)
High
Martin Pitt
Declined for Dapper by Martin Pitt
Declined for Hardy by Martin Pitt
Declined for Intrepid by Martin Pitt
Declined for Jaunty by Martin Pitt
Nominated for Maverick by rogmorri
Karmic
Medium
Unassigned
Lucid
High
Martin Pitt

Bug Description

TEMPORARY WORK AROUND FOR THIS PROBLEM IN KARMIC: (This is now also in karmic-proposed and needs testing feedback):

1. sudo gedit /lib/udev/rules.d/95-devkit-disks.rules

2. locate the following lines (about 1/3 the way into the file; search for "smart")

# ATA disks driven by libata
KERNEL=="sd*[!0-9]", ATTR{removable}=="0", ENV{ID_BUS}=="ata", ENV{DEVTYPE}=="disk", IMPORT{program}="devkit-disks-probe-ata-smart $tempnode"

3. comment out the second line by adding a # in front, so you should have

# ATA disks driven by libata
#KERNEL=="sd*[!0-9]", ATTR{removable}=="0", ENV{ID_BUS}=="ata", ENV{DEVTYPE}=="disk", IMPORT{program}="devkit-disks-probe-ata-smart $tempnode"

4. save the file and reboot

TECHNICAL ANALYSIS: https://bugs.launchpad.net/ubuntu/karmic/+source/libatasmart/+bug/445852/comments/202
LUCID STATUS: https://bugs.launchpad.net/ubuntu/karmic/+source/libatasmart/+bug/445852/comments/203
KARMIC SOLUTION: https://bugs.launchpad.net/ubuntu/karmic/+source/libatasmart/+bug/445852/comments/204

BUG DESCRIPTION FOLLOWS:

In the Karmic beta I experience ssd stalls during the boot process. It happens almost everytime before xsplash loads and happens again frequently between logging into gdm and the desktop loading. When it happens during login I think it is making gnome time out on loading panel items as I get errors related to lots of panel items failing to load. If I log out and back in again when the ssd isn't stalled the panel items load fine.

When it happens the following messages appear before xplash (or in dmesg when it happens after gdm):

ata2: lost interrupt (Status 0x58)
ata2: drained 16384 bytes to clear DRQ.
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata2.00: BMDMA stat 0x4
ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
ata2.00: status: { DRDY DRQ }
ata2: soft resetting link
ata2.00: configured for UDMA/66
ata2: EH complete

I did not have this issue in jaunty with this hardware and I don't think it has happened once the system is fully loaded. I am running karmic unr on an Acer Aspire One netbook.

ProblemType: Bug
AplayDevices:
 **** List of PLAYBACK Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
Architecture: i386
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: luke 1990 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
   Mixer name : 'Realtek ALC268'
   Components : 'HDA:10ec0268,1025015b,00100101'
   Controls : 9
   Simple ctrls : 6
CheckboxSubmission: 12ef539f3788bfbc46bc56b5c28128a6
CheckboxSystem: c69722ecac764861be52925fa50b4dcc
Date: Wed Oct 7 17:54:56 2009
DistroRelease: Ubuntu 9.10
HibernationDevice: RESUME=UUID=8d44b89b-2edb-4c02-a4be-94bd25b65081
MachineType: Acer AOA110
Package: linux-image-2.6.31-12-generic 2.6.31-12.40
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-12-generic root=UUID=039a096e-3486-4898-9eeb-44a705f8b7fd ro quiet splash elevator=noop usbcore.autosuspend=1
ProcEnviron:
 LANG=en_CA.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-12.40-generic
RelatedPackageVersions: linux-firmware 1.21
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
Tags: ubuntu-unr
Uname: Linux 2.6.31-12-generic i686
XsessionErrors:
 (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error: assertion `src != NULL' failed
 (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error: assertion `src != NULL' failed
 (nautilus:2092): Eel-CRITICAL **: eel_preferences_get_boolean: assertion `preferences_is_initialized ()' failed
 (polkit-gnome-authentication-agent-1:2118): GLib-CRITICAL **: g_once_init_leave: assertion `initialization_value != 0' failed
 (gnome-panel:2048): Gdk-WARNING **: /build/buildd/gtk+2.0-2.18.2/gdk/x11/gdkdrawable-x11.c:952 drawable is not a pixmap or window
dmi.bios.date: 10/06/2008
dmi.bios.vendor: Acer
dmi.bios.version: v0.3309
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.vendor: Acer
dmi.board.version: Base Board Version
dmi.chassis.type: 1
dmi.chassis.vendor: Chassis Manufacturer
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnAcer:bvrv0.3309:bd10/06/2008:svnAcer:pnAOA110:pvr1:rvnAcer:rn:rvrBaseBoardVersion:cvnChassisManufacturer:ct1:cvrChassisVersion:
dmi.product.name: AOA110
dmi.product.version: 1
dmi.sys.vendor: Acer

theluketaylor (ekul-taylor) wrote :
theluketaylor (ekul-taylor) wrote :
theluketaylor (ekul-taylor) wrote :

I updated to kernel 2.6.31-12 today and the problem seems to have gotten worse. Under the older karmic kernels it would almost always happen before xsplash and very rarely after gdm. with -12 it seems to be happening before xsplash and after gdm every boot.

av8r (av8r) wrote :

It doesn't change with 2.6.31-13-generic #44 - on a EeePC 900A with upgrade RAM/SSD disk.
For me, It's usualy freeze between fsck and setting up the resolver. And once again while launching the first session - doesn't matter if it's UNR or not (I've both setup).
I had to the grub boot line: "clocksource=hpet notsc", It remove me the warning message abount tsc clock unstable but didn't change anything with stall SSD. I also remove(edit) from the grub boot line: "quiet splash"

$ dmesg | grep ata2
[ 1.253633] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xffa8 irq 15
[ 1.427461] ata2.00: CFA: Patriot Memory 64GB PATA Storage Drive, Ver2.M0G, max UDMA/66
[ 1.427461] ata2.00: 126090720 sectors, multi 1: LBA
[ 1.440008] ata2.00: configured for UDMA/66
[ 40.809047] ata2: lost interrupt (Status 0x58)
[ 40.809047] ata2: drained 2048 bytes to clear DRQ.
[ 40.811862] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 40.815862] ata2.00: BMDMA stat 0x24
[ 40.818548] ata2.00: cmd c8/00:08:32:51:22/00:00:00:00:00/e0 tag 0 dma 4096 in
[ 40.822958] ata2.00: status: { DRDY DRQ }
[ 40.826958] ata2: soft resetting link
[ 41.000008] ata2.00: configured for UDMA/66
[ 41.000008] ata2: EH complete
[ 232.820015] ata2: lost interrupt (Status 0x58)
[ 232.820081] ata2: drained 8192 bytes to clear DRQ.
[ 232.848018] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 232.848018] ata2.00: BMDMA stat 0x24
[ 232.848018] ata2.00: cmd c8/00:20:ba:72:27/00:00:00:00:00/e0 tag 0 dma 16384 in
[ 232.848018] ata2.00: status: { DRDY DRQ }
[ 232.848018] ata2: soft resetting link
[ 233.020008] ata2.00: configured for UDMA/66
[ 233.020008] ata2: EH complete
[ 273.820016] ata2: lost interrupt (Status 0x58)
[ 273.820089] ata2: drained 6144 bytes to clear DRQ.
[ 273.837834] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 273.843269] ata2.00: BMDMA stat 0x24
[ 273.849571] ata2.00: cmd c8/00:18:da:72:27/00:00:00:00:00/e0 tag 0 dma 12288 in
[ 273.860739] ata2.00: status: { DRDY DRQ }
[ 273.866515] ata2: soft resetting link
[ 274.041007] ata2.00: configured for UDMA/66
[ 274.041007] ata2: EH complete
[ 407.436313] ata2.00: ACPI cmd ef/03:44:00:00:00:a0 filtered out
[ 407.436313] ata2.00: ACPI cmd ef/03:0c:00:00:00:a0 filtered out
[ 407.436313] ata2.00: ACPI cmd c6/00:01:00:00:00:a0 succeeded
[ 407.452005] ata2.00: configured for UDMA/66
[ 407.468005] ata2.00: configured for UDMA/66
[ 407.468005] ata2: EH complete

av8r (av8r) on 2009-10-11
Changed in linux (Ubuntu):
status: New → Confirmed
redDEADresolve (reddeadresolve) wrote :

I am also getting the same error on My Dell Mini 9.

[38.825065] ata1.00 exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[38.825227] BMDMA Stat 0x24
[38.825318] ata1.00:cmd c8/00:18:8f:89:20/00:00:00:00:/e0 tag 0 dma 1228in
[38.825321] res 58/00:18:8f:89:20/00:00:00:00:/e0 Emask 0x2 (HSM violation)
[38.825598] ata1.00: status {DRDY DRQ}

Occasionally I get sent to the root prompt to manually run an fsck.

redDEADresolve (reddeadresolve) wrote :
Gav Mack (gavinmac) wrote :

I have identical issues with my Aspire One A110 with a SuperTalent SSD 32Gb upgrade as the OP of this bug - it makes Karmic take almost 3 minutes to start the first time and at least 3 restarts later (with ever decreasing time) I get a relatively stable desktop.

Gav Mack (gavinmac) wrote :
Gav Mack (gavinmac) wrote :
Johan Van den Neste (jvdneste) wrote :

I have the same configuration as Gav Mack (Aspire One A110 with a SuperTalent SSD 32Gb upgrade). Same problem here.

Johan Van den Neste (jvdneste) wrote :

It is easy to reproduce simply by starting gparted. The error is then produced twice just as before while 'searching /dev/sda partitions'. As a result, the 'searching /dev/sda partitions' activity in gparted takes a long time.

professordes (d-a-johnston-hw) wrote :

A "me too" on an eeePC 901 with an upgraded crucial SSD and an upgrade install of karmic RC

The relevant bit in dmesg is:

[ 35.816124] ata2: lost interrupt (Status 0x58)
[ 35.820096] ata2: drained 2048 bytes to clear DRQ.
[ 35.823180] ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 35.826268] ata2.01: BMDMA stat 0x64
[ 35.829292] ata2.01: cmd c8/00:08:f7:41:0d/00:00:00:00:00/f2 tag 0 dma 4096 i
n
[ 35.829295] res 58/00:08:f7:41:0d/00:00:00:00:00/f2 Emask 0x2 (HSM v
iolation)
[ 35.835971] ata2.01: status: { DRDY DRQ }

The machine is also (mostly) failing to pick up an SDHC card in the reader, which wasn't the case in 9.04

Adam Gianola (adam-gianola) wrote :

Same story here. Dell Mini 9 upgraded with a Super Talent FEM16GFDL 16 GB SSD. I experience this both after the upgrade from 9.04 to 9.10 as well as on a clean install of 9.10.

I can also confirm starting gparted reproduces the dmesg output normally seen after (long) boot up.

Another 'me too'.

Just upgraded an Acer Aspire One A110 (ZG5) from existing (factory installed) 8 GB SSD to Super Talent 16 GB (FEM16GF13M).

Running the LiveCD (on USB stick) with 9.10 RC, then opening gParted shows the essentially the same messages in dmesg as other reports (and it takes a long time).

Everything else seems fine.

danq989 (danq989) wrote :

Me too!

I have the same configuration as Gav Mack (including upgraded 32GB SSD) with the same results.

Verified on both a 9.04 to 9.10 upgrade and a fresh 9.10 install on a freshly erased and partitioned drive. Same problem on boot and in gparted. Seems to only occur during mounting of the drive (possibly during initial mount and then remount as -rw)

---danq989

I have linked this bug report to (what looks to be) the same problem at the kernel bug tracker. Not sure I've done the linking correctly ;-)

http://bugzilla.kernel.org/show_bug.cgi?id=14515

Changed in linux:
status: Unknown → Confirmed
Kory (postmako) wrote :

Me too! AAO ZG5 running stock 8GB drive. Jaunty was booting in about 20 seconds and Karmic is taking about 90 seconds. I am attaching parts from dmesg and the most recent copy of bootchart.

...
[ 7.242403] input: SynPS/2 Synaptics TouchPad as /devices/platform/i8042/serio2/input/input8
[ 36.820096] ata2: lost interrupt (Status 0x58)
[ 36.824029] ata2: drained 2048 bytes to clear DRQ.
[ 36.827217] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 36.827227] ata2.00: BMDMA stat 0x4
[ 36.827248] ata2.00: cmd c8/00:08:ef:cc:ce/00:00:00:00:00/e0 tag 0 dma 4096 in
[ 36.827251] res 58/00:08:ef:cc:ce/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
[ 36.827258] ata2.00: status: { DRDY DRQ }
[ 36.827302] ata2: soft resetting link
[ 36.996463] ata2.00: configured for UDMA/66
[ 36.996497] ata2: EH complete
[ 37.001030] Clocksource tsc unstable (delta = -133907975 ns)
[ 37.042527] ath5k 0000:03:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[ 37.042586] ath5k 0000:03:00.0: setting latency timer to 64
[ 37.042686] ath5k 0000:03:00.0: registered as 'phy0'
...
[ 56.778794] groups: 1 0
[ 335.989086] ata2: lost interrupt (Status 0x58)
[ 335.993064] ata2: drained 2048 bytes to clear DRQ.
[ 335.996576] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 335.996588] ata2.00: BMDMA stat 0x4
[ 335.996609] ata2.00: cmd c8/00:08:4f:1c:0c/00:00:00:00:00/e0 tag 0 dma 4096 in
[ 335.996613] res 58/00:08:4f:1c:0c/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
[ 335.996623] ata2.00: status: { DRDY DRQ }
[ 335.996675] ata2: soft resetting link
[ 336.168420] ata2.00: configured for UDMA/66
[ 336.168453] ata2: EH complete
[ 372.004111] ata2: lost interrupt (Status 0x58)
[ 372.008033] ata2: drained 2048 bytes to clear DRQ.
[ 372.011577] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 372.011588] ata2.00: BMDMA stat 0x4
[ 372.011609] ata2.00: cmd c8/00:08:bf:a6:49/00:00:00:00:00/e0 tag 0 dma 4096 in
[ 372.011613] res 58/00:08:bf:a6:49/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
[ 372.011623] ata2.00: status: { DRDY DRQ }
[ 372.011674] ata2: soft resetting link
[ 372.184426] ata2.00: configured for UDMA/66
[ 372.184458] ata2: EH complete
[ 372.586276] gdu-notificatio[1693]: segfault at c ip 00efd50e sp bfbf6d50 error 4 in libgdu.so.0.0.0[ef4000+1c000]
[ 377.073516] wlan0: authenticate with AP 00:0f:66:b9:59:0f
[ 377.079461] wlan0: authenticated
[ 377.079473] wlan0: associate with AP 00:0f:66:b9:59:0f
[ 377.085704] wlan0: RX AssocResp from 00:0f:66:b9:59:0f (capab=0x11 status=0 aid=5)
[ 377.085716] wlan0: associated
...
And it happens from time to time after login...

Johan Van den Neste (jvdneste) wrote :

I'd like to point out that even though the delays in the boot process are annoying, what is worse is the series of applets crashing when logging in to gnome. Usually I cannot log out again because that applet has crashed. So I switch to tty1 and do a 'sudo service gdm restart'. The next and subsequent logins are fine until the next reboot.

Download full text (5.5 KiB)

Yeah I started to notice that kind of stuff as well. That is why I'm
leaving Jaunty on my wife's machine.

On Sat, Oct 31, 2009 at 8:51 AM, Johan Van den Neste <email address hidden>wrote:

> I'd like to point out that even though the delays in the boot process
> are annoying, what is worse is the series of applets crashing when
> logging in to gnome. Usually I cannot log out again because that applet
> has crashed. So I switch to tty1 and do a 'sudo service gdm restart'.
> The next and subsequent logins are fine until the next reboot.
>
> --
> SSD stall durin g boot
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading. When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load. If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded. I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: luke 1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
> Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
> Mixer name : 'Realtek ALC268'
> Components : 'HDA:10ec0268,1025015b,00100101'
> Controls : 9
> Simple ctrls : 6
> CheckboxSubmission: 12ef539f3788bfbc46bc56b5c28128a6
> CheckboxSystem: c69722ecac764861be52925fa50b4dcc
> Date: Wed Oct 7 17:54:56 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=UUID=8d44b89b-2edb-4c02-a4be-94bd25b65081
> MachineType: Acer AOA110
> Package: linux-image-2.6.31-12-generic 2.6.31-12.40
> ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-12-generic
> root=UUID=039a096e-3486-4898-9eeb-44a705f8b7fd ro quiet spl...

Read more...

tags: added: ubuntu

My Asus eee pc 900 also affected by this bug. (Fresh install of final Ubuntu 9.10)

Saif Ahmed (saif) wrote :

A me too here

eeepc 900 fresh install of final 9.1.

Moreover if I have any kind of usb flash drives attached, machine doesn't complete boot at all.

Moreover. I got this error on non-SSD drive. Check https://bugs.launchpad.net/ubuntu/+source/linux/+bug/473765 for details

@andrey i. mavlyanov

Andrey,

I don't think that this is the same bug.

On this line:

Nov 4 08:18:45 aim-laptop kernel: [35132.010175] res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)

You are getting a 'timeout', whereas this bug is causing 'HSM Violations'.

Adam Gianola (adam-gianola) wrote :

Interestingly, if I use the Dell Mini 9 Factory SSD with 9.10 this problem goes away.

Playing with LiveCD (on a USB stick) with an Aspire One with Super Talent 16Gb SSD:

- Normal LiveCD boot shows the problem in dmesg.

- Booting with 'libata.dma=0' in kernel line fixes the problem (by disabling DMA) in dmesg.

- Booting with 'libata.ignore_hpa=0' had no affect.

Since the problem looked to be DMA related, I tried slowing down the transfer with 'libata.force=udma/33'. No affect, though plenty of logs about UDMA being forced to 33.

The same machine is working fine with Jaunty. Another Aspire One with the standard (factory) 8 Gb SSD is running Karmic without any problem.

mdyn (tamerlaha-gmail) wrote :

acer aoa-110 some problem....

danq989 (danq989) wrote :

I just verified that this bug still present for me in the just-released kernel 2.6.31-15.

Alan Pope ㋛ (popey) wrote :

Linux kernel bug 14515 has nothing to do with this

Changed in linux:
importance: Unknown → Undecided
status: Confirmed → New
Dave V (mindkeep) wrote :

Affects my asus eeepc 900. Please raise to critical before I have to learn to hassle with Gentoo again.

Changed in linux:
importance: Undecided → Unknown
status: New → Unknown
Changed in linux:
status: Unknown → Confirmed
Horácio (horacioh) wrote :

I had exactly the same problem (boot stall) on an Asus eee 900. After 2 weeks of use, i got a grub -error: "error: biosdisk read error" and the system become completely useless, pending a disk wipe and full reinstall.
Similar situations are reported on: https://bugs.launchpad.net/bugs/387272
considered a duplicate of this bug.
But I do not see the grub-error problem reported here. May this be a different bug?

Alan Pope ㋛ (popey) wrote :

Ok, so after running 9.10 and discovering this issue. I now have booted off a 9.04 USB stick and dd'ed /dev/zero over both sda and sdb. I then installed 9.04 and have no issues.

So the hardware is not faulty.

A possible work around from the upstream bug report is to boot with 'irqpoll' in the kernel boot parameters. It's not a good fix, the logs are still full of error messages, but at least the 'stall' is reduced.

Regrettably, it's probably best to avoid using Karmic on SSD equipped netbooks. Use Jaunty instead, since this bug probably won't be fixed in the near future.

I have the same issue as reported above with an eeepc 701 (with 16GB SuperTalent SSD, and also with the original 8GB SSD).

An alternate method of fixing a Karmic-corrupted SSD - at least on the 701 - is to boot with ASUS's rescue DVD and allow it to reinstall the default Xandros installation.

With Karmic installed, I can confirm that kernel option "irqpoll" stops the stall during a Karmic boot, but does anyone know if that stops Karmic from messing up the SSD?

Johan Van den Neste (jvdneste) wrote :

I'm a bit disappointed that the previously mentioned kernel bug is discarded so quickly. Could it not still be related? There are indeed no *optical* drives to be polled, however, on the acer one there are 2 card readers (= pollable removeable media drives). Since the kernel bug report claims that the problem is caused by a removeable media drive choking on the polling commands, could one of the card readers not be the cause?

So I tried 'hal-disable-polling' on the card readers...

One reader marked as 'storage extension' is /dev/mmcblk0, and is apparently not seen as a removeable device (message by hal-disable-polling). The other... doesn't seem to work. I get no response whatsoever when inserting or removing and sd card (which only raises my suspicion). Hence, I don't know its /dev/ name, and dont know what to give hal-disable-polling as --device argument. Maybe the device is not even detected at boot?

Anyone else care to investigate on his/her laptop? (I'd really hate to switch back to 9.04)

Johan Van den Neste (jvdneste) wrote :

I should add that boot-time, gnome login and gparted startup are typically moments where I'd expect polling for media to take place.

Johan Van den Neste (jvdneste) wrote :

I notice changed behaviour when there are sd cards inserted: The stall is longer with 2 cards inserted than with 1 card, and 1 card is worse than no cards, though it does not disappear. How would I disable the card readers entirely? (I see sdhci-pci mentioned, and when there are cards inserted, the drives are detected as mmc0 and mmc1)

The eeepc 701 has one SD reader that can be disabled in the BIOS. Disabling it doesn't seem to affect the SSD issues at all on my system.

@Johan
Interesting comment. I have private doubts that this bug is totally due to hardware 'quality' problems (see the current kernel bug report).

If the hardware was at fault then: firstly, the bug would not be spread over such a range of differing hardware, and secondly, Ubuntu 9.04 should also be failing in a similar manner?

Well after having this issue for a couple of weeks now my netbook will no
longer boot. I ran a live USB stick and gparted can't even read the
partition. As soon as I have time, I'm going back to 9.04 and I suggest
everyone else do the same before their drive craps out like mine did. It
seems to write out bad sectors or destroy your data as well because my
wireless stopped working and I rebooted hoping it would fix the problem. So
all of my netbooks will wait until this issue is resolved before upgrading.
Enjoy!

@Andrew
Couldn't agree more that this is a code regression and not simply a hardware quality issue.

I haven't checked the various changelogs, but I wouldn't be surprised if something in the IDE IRQ handler or the hardware initialization was subjected to optimization (in libdma?). Possibly the SuperTalent drives do react in a non-standard way that was never exposed before.

Hopefully Mr. Heo will take the time to look through the code and see. I'm sure it would help if someone could supply the developers hardware that reliably produces the error. I need my netbook too much to send it away, but maybe someone has an extra SuperTalent drive?

---danq989

Gav Mack (gavinmac) on 2009-12-09
Changed in linux (Ubuntu):
assignee: nobody → Gav Mack (gavinmac)
assignee: Gav Mack (gavinmac) → nobody
assignee: nobody → Upstart Developers (upstart-devel)
Changed in linux (Ubuntu):
assignee: Upstart Developers (upstart-devel) → nobody
affects: linux (Ubuntu) → libatasmart (Ubuntu)
summary: - SSD stall during boot
+ devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential
+ hardware death
Changed in devicekit-disks (Ubuntu):
status: New → Triaged
Changed in libatasmart (Ubuntu):
status: Confirmed → Triaged
importance: Undecided → High
importance: High → Critical
Changed in devicekit-disks (Ubuntu):
importance: Undecided → High
importance: High → Critical
description: updated
Changed in libatasmart:
status: Unknown → Confirmed
Changed in linux:
status: Confirmed → Invalid
Martin Pitt (pitti) on 2010-02-15
Changed in devicekit-disks (Ubuntu):
status: Triaged → Invalid
importance: Critical → Undecided
Martin Pitt (pitti) on 2010-03-24
Changed in libatasmart (Ubuntu Lucid):
assignee: nobody → Martin Pitt (pitti)
status: Triaged → In Progress
Martin Pitt (pitti) on 2010-03-25
Changed in devicekit-disks (Ubuntu Karmic):
status: New → Invalid
Martin Pitt (pitti) on 2010-03-25
Changed in devicekit-disks (Ubuntu Karmic):
assignee: nobody → Martin Pitt (pitti)
importance: Undecided → Critical
status: Invalid → In Progress
Martin Pitt (pitti) on 2010-03-25
Changed in libatasmart (Ubuntu Karmic):
status: New → Triaged
Steve Langasek (vorlon) on 2010-03-25
Changed in devicekit-disks (Ubuntu Karmic):
status: In Progress → Fix Committed
tags: added: verification-needed
Martin Pitt (pitti) on 2010-03-26
Changed in libatasmart (Ubuntu Lucid):
status: In Progress → Fix Released
Changed in libatasmart (Ubuntu Karmic):
assignee: nobody → Martin Pitt (pitti)
importance: Undecided → Medium
Martin Pitt (pitti) on 2010-03-26
description: updated
Martin Pitt (pitti) on 2010-03-30
tags: added: verification-done
removed: verification-needed
Changed in devicekit-disks (Ubuntu Karmic):
status: Fix Committed → Fix Released
217 comments hidden view all 297 comments
Martin Pitt (pitti) wrote :

Reopening for lucid then, since some machines still seem to be affected. For those who get it on Lucid beta-2, can you please confirm that applying the workaround in /lib/udev/rules.d/80-udisks.rules works? Also, please do

  sudo strace -vvfs1024 -o /tmp/probe-smart.txt /lib/udev/udisks-probe-ata-smart /dev/sda

and attach /tmp/probe-smart.txt here. It'd be best if someone could give me ssh access to an affected machine, since it works on those I can put my hands on now.

Changed in libatasmart (Ubuntu Lucid):
importance: Critical → High
status: Fix Released → Confirmed
Martin Pitt (pitti) on 2010-04-12
Changed in libatasmart (Ubuntu Lucid):
status: Confirmed → Incomplete
ubuntu-crypto (davexthc) wrote :

just to be sure it is safe to remove this patch if you *don't* use SDDs correct?

Raf (4263004-noduck) wrote :

I have not been able to reproduce the HSM violations. I rebooted 20 times, cold booted, booted with battery, replaced battery and booted, tried 2.6.32-19 and 2.6.32-20, all of these seem to work without problem.

Previously (after the fix went in) I sometimes got the HSM violation, but only on boot, I was not able to trigger it by manually running udisks-probe-ata-smart.

I have replaced /lib/udev/udisks-probe-ata-smart with a script that generates a trace, so if it should cause an HSM violation again, it should generate a trace output.

My understanding of this bug is that it is only related to running of udisks-probe-ata-smart, which should only run at boot (unless repartitioning the drive). But some reports seem to indicate failures after boot (e.g. #234).

The workaround did seem to work for me.

ubuntu-crypto [2010-04-12 21:52 -0000]:
> just to be sure it is safe to remove this patch if you *don't* use
> SDDs correct?

The SMART probing is not inherently tied to SSDs. It just seems that
many of today's SSDs use a kind of controller which acts up when its
asked for SMART status.

So, nobody can guarantee that this problem does not affect normal HDDs
as well, but it seems we haven't heard about those yet.

If you re-enable the SMART probing and it works for you (easy to
notice if your startup speed suddenly increases by 30 seconds or so),
then it's safe, yes.

Martin

--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

Gav Mack (gavinmac) wrote :

Even though I have the divert set I've just had a HSM Violation shortly after Lucid has fully started - portion of the log attached from boot with the error logged at the end.

Martin Pitt (pitti) wrote :

Gav Mack [2010-04-13 23:18 -0000]:
> Even though I have the divert set I've just had a HSM Violation shortly
> after Lucid has fully started - portion of the log attached from boot
> with the error logged at the end.
>
> ** Attachment added: "dmesg"
> http://launchpadlibrarian.net/44091487/dmesg

This looks rather different, though (no HSM violation). If you have
the divert set, then it's not due to devkit-disks-probe-ata-smart.
Another potential culprit could be hdparm, please see bug 515023.

Raf (4263004-noduck) wrote :

I have not had anymore HSM violations. And nobody else has reported any problems. It looks like this problem is fixed.

I still have two bricked SSD's on an EEE 900

On Apr 17, 2010 10:01 AM, "Raf" <email address hidden> wrote:

I have not had anymore HSM violations. And nobody else has reported any
problems. It looks like this problem is fixed.

-- devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential
hardware death https:/...

So, the problem has reappeared with Lucid. Last night I did an upgrade via upgrade-manager to Lucid, and after updating all packages and using the system, I received this:

==
[ 89.816125] ata2: lost interrupt (Status 0x58)
[ 89.820090] ata2: drained 2048 bytes to clear DRQ.
[ 89.823689] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 89.823708] ata2.00: BMDMA stat 0x4
[ 89.823726] ata2.00: failed command: READ DMA
[ 89.823765] ata2.00: cmd c8/00:08:00:55:e1/00:00:00:00:00/e0 tag 0 dma 4096 in
[ 89.823774] res 58/00:08:00:55:e1/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
[ 89.823794] ata2.00: status: { DRDY DRQ }
[ 89.823870] ata2: soft resetting link
[ 89.992540] ata2.00: configured for UDMA/66
[ 89.992580] ata2: EH complete
[ 120.816124] ata2: lost interrupt (Status 0x58)
[ 120.820091] ata2: drained 32768 bytes to clear DRQ.
[ 120.934970] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 120.938005] ata2.00: BMDMA stat 0x4
[ 120.940833] ata2.00: failed command: READ DMA
[ 120.943786] ata2.00: cmd c8/00:f8:a8:da:15/00:00:00:00:00/e0 tag 0 dma 126976 in
[ 120.943795] res 58/00:f8:a8:da:15/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
[ 120.950725] ata2.00: status: { DRDY DRQ }
[ 120.954028] ata2: soft resetting link
[ 121.124583] ata2.00: configured for UDMA/66
[ 121.124625] ata2: EH complete
[ 209.787794] __ratelimit: 9 callbacks suppressed
[ 209.787814] apt-get[1574]: segfault at 0 ip 00327d10 sp bfa0e0ec error 4 in libc-2.11.1.so[247000+153000]
==

Kernel: 2.6.32-21-generic (i686)

Horácio (horacioh) wrote :

I confirm the bug in lucid beta2. I had this problem in a asus eee 900 originally, was apparently solved after patch, but after upgrade from karmic to lucid beta2, I detected again HSM violations. The difference is that it does not appear during boot but randomly during normal use of computer (my case web browsing).

   82.816046] ata2: lost interrupt (Status 0x58)
[ 82.816103] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 82.816111] ata2.00: BMDMA stat 0x64
[ 82.816119] ata2.00: failed command: WRITE DMA
[ 82.816133] ata2.00: cmd ca/00:10:1f:16:00/00:00:00:00:00/e0 tag 0 dma 8192 out
[ 82.816137] res 58/00:10:1f:16:00/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
[ 82.816144] ata2.00: status: { DRDY DRQ }
[ 82.816180] ata2: soft resetting link
[ 83.014771] ata2.00: configured for UDMA/66
[ 83.020331] ata2.01: configured for UDMA/66
[ 83.044327] ata2.00: configured for UDMA/66
[ 83.052277] ata2.01: configured for UDMA/66
[ 83.052295] ata2: EH complete
[ 113.816056] ata2: lost interrupt (Status 0x58)
[ 113.816113] ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 113.816121] ata2.01: BMDMA stat 0x64
[ 113.816129] ata2.01: failed command: WRITE DMA
[ 113.816144] ata2.01: cmd ca/00:88:39:08:e2/00:00:00:00:00/f0 tag 0 dma 69632 out
[ 113.816147] res 58/00:88:39:08:e2/00:00:00:00:00/f0 Emask 0x2 (HSM violation)
[ 113.816154] ata2.01: status: { DRDY DRQ }
[ 113.816191] ata2: soft resetting link
[ 114.056305] ata2.00: configured for UDMA/66
[ 114.064298] ata2.01: configured for UDMA/66
[ 114.088283] ata2.00: configured for UDMA/66
[ 114.096279] ata2.01: configured for UDMA/66
[ 114.096296] ata2: EH complete
[ 147.816044] ata2: lost interrupt (Status 0x58)
[ 147.816102] ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 147.816109] ata2.01: BMDMA stat 0x64
[ 147.816117] ata2.01: failed command: WRITE DMA
[ 147.816131] ata2.01: cmd ca/00:08:d1:ed:61/00:00:00:00:00/f0 tag 0 dma 4096 out
[ 147.816135] res 58/00:08:d1:ed:61/00:00:00:00:00/f0 Emask 0x2 (HSM violation)
[ 147.816142] ata2.01: status: { DRDY DRQ }
[ 147.816178] ata2: soft resetting link
[ 148.056297] ata2.00: configured for UDMA/66
[ 148.064307] ata2.01: configured for UDMA/66
[ 148.088278] ata2.00: configured for UDMA/66
[ 148.096288] ata2.01: configured for UDMA/66
[ 148.096303] ata2: EH complete
[ 180.816048] ata2: lost interrupt (Status 0x58)
[ 180.816105] ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 180.816233] ata2.01: BMDMA stat 0x64
[ 180.816297] ata2.01: failed command: WRITE DMA
[ 180.816380] ata2.01: cmd ca/00:08:b9:01:6a/00:00:00:00:00/f0 tag 0 dma 4096 out
[ 180.816384] res 58/00:08:b9:01:6a/00:00:00:00:00/f0 Emask 0x2 (HSM violation)
[ 180.816624] ata2.01: status: { DRDY DRQ }
[ 180.816727] ata2: soft resetting link
[ 181.056279] ata2.00: configured for UDMA/66
[ 181.064276] ata2.01: configured for UDMA/66
[ 181.088275] ata2.00: configured for UDMA/66
[ 181.096274] ata2.01: configured for UDMA/66
[ 181.096289] ata2: EH complete
[ 244.738599] [drm:drm_mode_getfb] *ERROR* invalid framebuffer id

Jarige (jarikvh) wrote :

I can confirm that I still have this bug having all updates installed in Lucid.

I was told (by Martin Pitt) to execute the following command:
sudo strace -vvfs1024 -o /tmp/probe-smart.txt /lib/udev/udisks-probe-ata-smart /dev/sda

And add /tmp/probe-smart.txt as an attachment. So I did that, hoping it would help... I did this without applying any workaround (except for the one that was auto-released with Karmic, but the problems reappeared in Lucid)

Basically, I don't understand anything of this bug. I don't know how to apply the workaround on an already installed UNR, since the explanation only states booting from a LiveCD, and I don't know whether the workaround has any bad side effects.

How do I apply the workaround on my machine, on an already installed UNR?

I'm not convinced this recent problem is related to devkit-disks-probe-ata-smart (the original bug). I have experienced the recent problem once - during an 'apt-get update'.

From what I see it's typified by the kernel giving a READ DMA or WRITE DMA command, to which the drive responds in an unexpected manner (HSM Violation). After a suitable timeout the drive is reset and things continue.

Also, the bug does not occur at boot, but randomly during use. And using the probing with 'Disk Utility' has no affect for me.

Martin Pitt (pitti) wrote :

Given how much trouble this still causes on Lucid, I won't reenable the SMART prober for karmic very soon.

Changed in libatasmart (Ubuntu Karmic):
assignee: Martin Pitt (pitti) → nobody
Jarige (jarikvh) wrote :

If there's anything I can do to produce more data for debugging, contact me.
I didn't apply any workaround, since I do not know how to do that.

Jon Ramvi (ramvi) on 2010-04-26
Changed in easypeasy-project:
status: New → Confirmed
assignee: nobody → Jon Ramvi (ramvi)
importance: Undecided → High
ipig (infopiggy) wrote :

My (Post #198) install had a lockup & then would only boot to a grub read error - not sure what happened.

I decided to give 10.04 Beta2/RC a whirl - repeating the steps in #198 - i couldn't dd the drive w/out it resulting in a loop of HSM violations.

I then tried dd'ing the drive in (live) 9.04, 8.10 & then 8.04.

In 8.04 was i able to dd the drive w/out a loop of HSM violations.

Side Note: I've noticed when dd'ing the drive under normal circumstances the HD light stays on solid, when it starts pulsing in a timed manor that = an hsm loop going on in the bg

Side Note: seems to effect 8.10/9.04/9.10/10.04 beta1 & beta2/rc) - (but not 8.04)

The bug surely seems to be in effect earlier then i thought in re: versions & installation(s)

Anyways, i know the full release is tomorrow so hopefully everything is g2g (re: #250)

ipig [2010-04-28 19:53 -0000]:
> Side Note: seems to effect 8.10/9.04/9.10/10.04 beta1 & beta2/rc) - (but
> not 8.04)

This is definitively unrelated then, since libatasmart was only
introduced in 9.10. Perhaps your problem is more like bug 515023 or
bug 548513?

Jon Ramvi (ramvi) on 2010-04-29
Changed in easypeasy-project:
status: Confirmed → Triaged
status: Triaged → Fix Released
ipig (infopiggy) wrote :

When i boot up a live (nightly) build from Tuesday/27th i get an error saying a hard disk has health problems (ATA ASUS-PHISON SSD/TST2.0L4) (Port 2 of PATA Host Adapter) (8.1GB) (/dev/sdb)

SMART Status: Disk Failure is Imminent

ID: 235 / Good Block Rate (Number of available reserved blocks as a percentage of the total number of reserved blocks) Assessment: Failing / Normalized: 1 / Worst: 1 / Threshold: 3 / Value: N/A

I don't really know what the deal is. Maybe the disk really is failing. It's been failing for a while+ then.

Maybe libatasmart just brings out the worst of it. I'm a little split still.

theluketaylor (ekul-taylor) wrote :

I am the original reporter of this issue. I installed Lucid today and I can confirm this issue has NOT been corrected. I had to comment out the SMART portions of /lib/udev/rules.d/80-udisks.rules to avoid 5-10 second I/O stalls and numerous HSM errors (exactly the same symptoms as originally reported)

Jim Connor (jim-canada) wrote :

6 months, 22 days ago?

theluketaylor (ekul-taylor) wrote :

@Jim Connor

Yes, I reported this 6 months ago. I'm more than a little frustrated it hasn't been fixed yet, especially since in comment 203 I read this:

"So in summary, the problem is fixed in the lucid version of libatasmart. While the code could be a little more robust for future extensions (which I'll discuss in the upstream bug), there are currently no code paths which can lead to the situation that triggers HSM violations."

I assumed based on that when I upgraded from 9.10 to 10.04 I would have no issues. I did a fresh install and it went fine until I rebooted into the new system. Then I got the same errors I reported oh so long ago. The work around from 9.10 worked, though the SMART udev rules are now located in a different file (80-udisks.rules)

Martin Pitt (pitti) wrote :

theluketaylor [2010-04-29 21:24 -0000]:
> Yes, I reported this 6 months ago. I'm more than a little frustrated it
> hasn't been fixed yet, especially since in comment 203 I read this:
>
> "So in summary, the problem is fixed in the lucid version of
> libatasmart. While the code could be a little more robust for future
> extensions (which I'll discuss in the upstream bug), there are currently
> no code paths which can lead to the situation that triggers HSM
> violations."

So far I just got access to one machine where this happened. I could
reproduce the bug and found the cause (see upstream report). As of
today, nobody offered me SSH access to a machine which is still
affected, so I'm afraid there's nothing else I can do..

Martin
--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

Jarige (jarikvh) wrote :

@Martin Pitt
I'm willing to give you SSH access to my netbook, but you have got to tell me how to do so. I do not have any experience with that. I must tell you that I did apply the workaround yesterday. I can uncomment the lines if necessarily. I'm probably going to be online 5 hours from now, and maybe even longer. I'll receive an e-mail notification if you reply here.

And, of course, if you've got SSH access (which I guess is some kind of terminal access) don't screw it :P This is a production machine, not a test machine. I work on this netbook every day. My important files are backed up with Dropbox though, so no need to worry about that.

So just tell me what to do :)

Martin Pitt (pitti) wrote :

These are the steps for allowing me SSH access:

 * Install openssh-server
 * Create a new user for me (e. g. "pitti"), with admin privileges
 * Log in as that user, write the password in a file "password.txt" in the home directory (so that you do not need to pass it around by mail, but I can get access to it once I'm logged in and need sudo)
 * mkdir ~/.ssh
 * wget -O ~/.ssh/authorized_keys https://launchpad.net/~pitti/+sshkeys
 * Configure your router to allow access to your machine's Port "22" (for ssh from outside)
 * Tell me (via private mail, IRC, or bug followup) your IP address (visible in the router configuration web page usually).

Martin Pitt (pitti) wrote :

Oh, for the record: I will track my changes and revert them, but I'll need to install a few additional packages (thus I need a few MB of download quota), build some test code, and run it as root. Thus I _will_ access your hard drive with those SMART probing commands to reproduce the problem a few times.

I will not need to see anything in other home directories, and the like.

Martin Pitt (pitti) wrote :

I debugged this issue on Jarige's machine, and it has a pretty different root cause. Due to that, and because this bug has become way too long, and because it fixes the issue for most people here, we opened a new report in bug 574462. If you still have the problem, please subscribe to that one instead.

Thank you!

Changed in libatasmart (Ubuntu Lucid):
status: Incomplete → Fix Released
Changed in libatasmart (Ubuntu):
status: Incomplete → Fix Released
Changed in libatasmart (Ubuntu Karmic):
status: Triaged → Won't Fix
Changed in libatasmart:
importance: Unknown → Critical
loewe_78 (bergloewe) wrote :
Download full text (3.6 KiB)

I've got a hp 510 notebook pc. Back when it was new, it was shipped with open-dos. So, it was running with Linux since Gutsy Gibbon and, up to now, had only some difficulties to be solved with the southbridge that were working out of the box in Hardy or Jackalope.
It has got an Intel Celeron M 360 1.4 Mhz Processor with 400 Mhz frontside-bus and Intel 910 GML Chipset with Intel-ICH-6m SB.
The HD is a IBM/Hitachi 40GB 4200RPM 2MB Cache Travelstar HTS421240H9AT00.

When I upgraded to Karmic about a year ago, I had the problem described above. So, I reinstalled Jackalope where the hardware worked without problems.
Now I want to pass on my laptop as I bought a new one. A test with the desktop-CD made no obvious problems (it's a HD-failure that might occur more often when the program is installed on the HD and not on CD, ha-ha). This and the fact that it's easier to do so is why I tried to install Lucid.

Despite I implemented the workaround for Lucid given above, the device produces the following output when running dmesg:

[ 0.271537] ata_piix 0000:00:1f.1: version 2.13
[ 0.271555] alloc irq_desc for 16 on node -1
[ 0.271559] alloc kstat_irqs on node -1
[ 0.271567] ata_piix 0000:00:1f.1: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 0.271628] ata_piix 0000:00:1f.1: setting latency timer to 64
[ 0.277201] isapnp: Scanning for PnP cards...
[ 0.282766] scsi0 : ata_piix
[ 0.282911] scsi1 : ata_piix
[ 0.283654] ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0x3580 irq 14
[ 0.283659] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0x3588 irq 15
[ 0.284177] Fixed MDIO Bus: probed

-
.
.
.
-

[ 0.489196] ata1.00: ATA-7: HTS421240H9AT00, HACOA70S, max UDMA/100
[ 0.489204] ata1.00: 78140160 sectors, multi 16: LBA48
[ 0.489256] ata1.01: ATAPI: TSSTcorpCDW/DVD TS-L462D, HS02, max MWDMA2
[ 0.548617] ata1.00: configured for UDMA/100
[ 0.556926] ACPI: Battery Slot [C15E] (battery present)
[ 0.580432] ata1.01: configured for MWDMA2

-
.
.
.
-
[ 158.816041] ata1: lost interrupt (Status 0x58)
[ 158.820017] ata1: drained 32768 bytes to clear DRQ.
[ 158.909721] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 158.909728] sr 0:0:1:0: CDB: Test Unit Ready: 00 00 00 00 00 00
[ 158.909744] ata1.01: cmd a0/00:00:00:00:00/00:00:00:00:00/b0 tag 0
[ 158.909746] res 58/00:01:00:00:00/00:00:00:00:00/b0 Emask 0x2 (HSM violation)
[ 158.909750] ata1.01: status: { DRDY DRQ }
[ 158.909787] ata1: soft resetting link
[ 159.128657] ata1.00: configured for UDMA/100
[ 159.160312] ata1.01: configured for MWDMA2
[ 159.179141] ata1: EH complete
[ 1113.785054] atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
[ 1113.785060] atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.
[ 2920.000474] ata1: lost interrupt (Status 0x58)
[ 2920.004015] ata1: drained 32768 bytes to clear DRQ.
[ 2920.093417] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 2920.093423] ata1.01: ATAPI check failed (ireason=0x1 bytes=8)
[ 2920.093429] sr 0:0:1:0: CDB: Get event status notification: 4a 01 00 00 10 00 00 00 08 00
[ 2920.093448] ata1.01: cmd a0/00:00:00:08:...

Read more...

1 comments hidden view all 297 comments
rogmorri (frontporsche) wrote :

On the same acer aspire one laptop where I saw this issue last year, I am perhaps seeing it again with ubuntu-10.10-rc-desktop-i386...

Oct 4 00:59:43 ubuntu kernel: [ 602.639266] res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation)
Oct 4 00:59:44 ubuntu kernel: [ 604.149075] ata2: soft resetting link
Oct 4 00:59:45 ubuntu kernel: [ 604.321486] ata2.00: configured for UDMA/100
Oct 4 00:59:45 ubuntu kernel: [ 604.321529] ata2: EH complete
Oct 4 00:59:49 ubuntu kernel: [ 609.173903] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
Oct 4 00:59:49 ubuntu kernel: [ 609.173915] ata2.00: BMDMA stat 0x5
Oct 4 00:59:49 ubuntu kernel: [ 609.173926] ata2.00: failed command: WRITE DMA
Oct 4 00:59:49 ubuntu kernel: [ 609.173945] ata2.00: cmd ca/00:00:40:69:c9/00:00:00:00:00/e0 tag 0 dma 131072 out
Oct 4 00:59:49 ubuntu kernel: [ 609.173949] res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation)

@rogmorri
I don't think this is the same bug. You are getting HSM Violations with WRITE DMA, whereas this bug occurred with READ DMA.

(This is being written on an AA1 with 10.10 also!)

rogmorri (frontporsche) wrote :

Thank, Andrew. Maybe then I just have bad hardware.

Trey (trey333) wrote :
Download full text (6.0 KiB)

just to throw it out there - this bug was never "fixed." I had to sell a
bricked EEE 900 for scrap because it killed both SSD's.

On Mon, Oct 4, 2010 at 2:08 PM, rogmorri <email address hidden> wrote:

> Thank, Andrew. Maybe then I just have bad hardware.
>
> --
> devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential
> hardware death
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of a duplicate bug (387272).
>
> Status in EasyPeasy Overview: Fix Released
> Status in ATA S.M.A.R.T. Disk Health Monitoring Library: Confirmed
> Status in The Linux Kernel: Invalid
> Status in “devicekit-disks” package in Ubuntu: Invalid
> Status in “libatasmart” package in Ubuntu: Fix Released
> Status in “devicekit-disks” source package in Lucid: Invalid
> Status in “libatasmart” source package in Lucid: Fix Released
> Status in “devicekit-disks” source package in Karmic: Fix Released
> Status in “libatasmart” source package in Karmic: Won't Fix
> Status in “devicekit-disks” package in Fedora: New
>
> Bug description:
> TEMPORARY WORK AROUND FOR THIS PROBLEM IN KARMIC: (This is now also in
> karmic-proposed and needs testing feedback):
>
> 1. sudo gedit /lib/udev/rules.d/95-devkit-disks.rules
>
> 2. locate the following lines (about 1/3 the way into the file; search for
> "smart")
>
> # ATA disks driven by libata
> KERNEL=="sd*[!0-9]", ATTR{removable}=="0", ENV{ID_BUS}=="ata",
> ENV{DEVTYPE}=="disk", IMPORT{program}="devkit-disks-probe-ata-smart
> $tempnode"
>
> 3. comment out the second line by adding a # in front, so you should have
>
> # ATA disks driven by libata
> #KERNEL=="sd*[!0-9]", ATTR{removable}=="0", ENV{ID_BUS}=="ata",
> ENV{DEVTYPE}=="disk", IMPORT{program}="devkit-disks-probe-ata-smart
> $tempnode"
>
> 4. save the file and reboot
>
> TECHNICAL ANALYSIS:
> https://bugs.launchpad.net/ubuntu/karmic/+source/libatasmart/+bug/445852/comments/202
> LUCID STATUS:
> https://bugs.launchpad.net/ubuntu/karmic/+source/libatasmart/+bug/445852/comments/203
> KARMIC SOLUTION:
> https://bugs.launchpad.net/ubuntu/karmic/+source/libatasmart/+bug/445852/comments/204
>
> BUG DESCRIPTION FOLLOWS:
>
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading. When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load. If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not hav...

Read more...

lotus49 (lotus-49) wrote :
Download full text (12.9 KiB)

Trey's right, it never was fixed.

Although I have fortunately not suffered from any permanent hardware
problems, the bug resurfaces every now and then. I have worked around it by
editing /lib/udev/rules.d/80-udisks.rules and commenting out this line:

# ATA disks driven by libata
#KERNEL=="sd*[!0-9]", ATTR{removable}=="0", ENV{ID_BUS}=="ata",
ENV{DEVTYPE}=="disk", IMPORT{program}="udisks-probe-ata-smart $tempnode"

This workaround has done the trick but unfortunately, this is overwritten
every now and then (as it warns it will be at the beginning of the file).
At least it is easy to spot when this has happened as my boot times go from
about 15 secs to a couple of minutes.

Simon

On 4 October 2010 07:41, Trey <email address hidden> wrote:

> just to throw it out there - this bug was never "fixed." I had to sell a
> bricked EEE 900 for scrap because it killed both SSD's.
>
> On Mon, Oct 4, 2010 at 2:08 PM, rogmorri <email address hidden>
> wrote:
>
> > Thank, Andrew. Maybe then I just have bad hardware.
> >
> > --
> > devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential
> > hardware death
> > https://bugs.launchpad.net/bugs/445852
> > You received this bug notification because you are a direct subscriber
> > of a duplicate bug (387272).
> >
> > Status in EasyPeasy Overview: Fix Released
> > Status in ATA S.M.A.R.T. Disk Health Monitoring Library: Confirmed
> > Status in The Linux Kernel: Invalid
> > Status in “devicekit-disks” package in Ubuntu: Invalid
> > Status in “libatasmart” package in Ubuntu: Fix Released
> > Status in “devicekit-disks” source package in Lucid: Invalid
> > Status in “libatasmart” source package in Lucid: Fix Released
> > Status in “devicekit-disks” source package in Karmic: Fix Released
> > Status in “libatasmart” source package in Karmic: Won't Fix
> > Status in “devicekit-disks” package in Fedora: New
> >
> > Bug description:
> > TEMPORARY WORK AROUND FOR THIS PROBLEM IN KARMIC: (This is now also in
> > karmic-proposed and needs testing feedback):
> >
> > 1. sudo gedit /lib/udev/rules.d/95-devkit-disks.rules
> >
> > 2. locate the following lines (about 1/3 the way into the file; search
> for
> > "smart")
> >
> > # ATA disks driven by libata
> > KERNEL=="sd*[!0-9]", ATTR{removable}=="0", ENV{ID_BUS}=="ata",
> > ENV{DEVTYPE}=="disk", IMPORT{program}="devkit-disks-probe-ata-smart
> > $tempnode"
> >
> > 3. comment out the second line by adding a # in front, so you should have
> >
> > # ATA disks driven by libata
> > #KERNEL=="sd*[!0-9]", ATTR{removable}=="0", ENV{ID_BUS}=="ata",
> > ENV{DEVTYPE}=="disk", IMPORT{program}="devkit-disks-probe-ata-smart
> > $tempnode"
> >
> > 4. save the file and reboot
> >
> > TECHNICAL ANALYSIS:
> >
> https://bugs.launchpad.net/ubuntu/karmic/+source/libatasmart/+bug/445852/comments/202
> > LUCID STATUS:
> >
> https://bugs.launchpad.net/ubuntu/karmic/+source/libatasmart/+bug/445852/comments/203
> > KARMIC SOLUTION:
> >
> https://bugs.launchpad.net/ubuntu/karmic/+source/libatasmart/+bug/445852/comments/204
> >
> > BUG DESCRIPTION FOLLOWS:
> >
> > In the Karmic beta I experience ssd stalls during the boot process. It
> > happens almost everytime before xsplash loads a...

Changed in linux:
status: Invalid → Won't Fix
Changed in libatasmart:
importance: Critical → Unknown
Neil Hooey (nhooey) wrote :

Is anyone going to fix this?

I've disabled s.m.a.r.t. on all of my drives in /lib/udev/rules.d/80-udisks.rules, and I've disabled hdparm for them in /lib/udev/rules.d/85-hdparm.rules, but I still get IDENTIFY DEVICE errors on boot, shutdown and reboot.

Does anyone even know what software package is actually responsible for the bug?

I get the same Problem with a NVIDIA MCP61 Chipset, OCZ Vertex2 60GB and ext4 Filesystem.
It´s real Pain -- sometimes the whole system is freezing.
The SSD had dataLoss.

The Problem occurs less frequently with ext2...

And not even once with Windows XP, 7 ...

SATA hdd and optical works fine.

To jail hdparm and smart doesn´t work for me running the latest rc kernels.
I think it´s between the SSD firmware and the libata kernel stack.
(see differenc to Windows, but other drives work fine with libata)

Changed in linux:
importance: Unknown → Medium
Changed in libatasmart:
importance: Unknown → Critical
Neil Hooey (nhooey) wrote :

I just installed Fedora Core 14 which uses kernel 2.6.35.6-45.fc14.i686, and the "failed command: IDENTIFY DEVICE" and "failed command: FLUSH CACHE" problems went away.

Here's the Fedora Bug:
https://bugzilla.redhat.com/show_bug.cgi?id=549981

More details at my StackExchange question:
http://askubuntu.com/questions/16608/how-do-you-fix-failed-command-identify-device-showing-up-in-dmesg

Jon Ramvi (ramvi) on 2011-02-06
Changed in easypeasy-project:
assignee: Jon Ramvi (ramvi) → nobody

MArtin, can you prep a patch for your requested changes?

I'm sorry, you have the wrong email. My name is Greg.
On Oct 31, 2013 9:28 AM, "Bugs-freedesktop-org-n" <email address hidden>
wrote:

> you there, martin?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/445852
>
> Title:
> devkit-disks-probe-ata-smart causes HSM Violations on SSD, and
> potential hardware death
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/easypeasy-project/+bug/445852/+subscriptions
>

Download full text (6.7 KiB)

Something goes wrong, guys :)

2013/10/31 Greg Unger <email address hidden>

> I'm sorry, you have the wrong email. My name is Greg.
> On Oct 31, 2013 9:28 AM, "Bugs-freedesktop-org-n" <
> <email address hidden>>
> wrote:
>
> > you there, martin?
> >
> > --
> > You received this bug notification because you are subscribed to the bug
> > report.
> > https://bugs.launchpad.net/bugs/445852
> >
> > Title:
> > devkit-disks-probe-ata-smart causes HSM Violations on SSD, and
> > potential hardware death
> >
> > To manage notifications about this bug go to:
> > https://bugs.launchpad.net/easypeasy-project/+bug/445852/+subscriptions
> >
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/445852
>
> Title:
> devkit-disks-probe-ata-smart causes HSM Violations on SSD, and
> potential hardware death
>
> Status in EasyPeasy Overview:
> Fix Released
> Status in ATA S.M.A.R.T. Disk Health Monitoring Library:
> Confirmed
> Status in The Linux Kernel:
> Won't Fix
> Status in “devicekit-disks” package in Ubuntu:
> Invalid
> Status in “libatasmart” package in Ubuntu:
> Fix Released
> Status in “devicekit-disks” source package in Lucid:
> Invalid
> Status in “libatasmart” source package in Lucid:
> Fix Released
> Status in “devicekit-disks” source package in Karmic:
> Fix Released
> Status in “libatasmart” source package in Karmic:
> Won't Fix
> Status in “devicekit-disks” package in Fedora:
> New
>
> Bug description:
> TEMPORARY WORK AROUND FOR THIS PROBLEM IN KARMIC: (This is now also in
> karmic-proposed and needs testing feedback):
>
> 1. sudo gedit /lib/udev/rules.d/95-devkit-disks.rules
>
> 2. locate the following lines (about 1/3 the way into the file; search
> for "smart")
>
> # ATA disks driven by libata
> KERNEL=="sd*[!0-9]", ATTR{removable}=="0", ENV{ID_BUS}=="ata",
> ENV{DEVTYPE}=="disk", IMPORT{program}="devkit-disks-probe-ata-smart
> $tempnode"
>
> 3. comment out the second line by adding a # in front, so you should
> have
>
> # ATA disks driven by libata
> #KERNEL=="sd*[!0-9]", ATTR{removable}=="0", ENV{ID_BUS}=="ata",
> ENV{DEVTYPE}=="disk", IMPORT{program}="devkit-disks-probe-ata-smart
> $tempnode"
>
> 4. save the file and reboot
>
> TECHNICAL ANALYSIS:
> https://bugs.launchpad.net/ubuntu/karmic/+source/libatasmart/+bug/445852/comments/202
> LUCID STATUS:
> https://bugs.launchpad.net/ubuntu/karmic/+source/libatasmart/+bug/445852/comments/203
> KARMIC SOLUTION:
> https://bugs.launchpad.net/ubuntu/karmic/+source/libatasmart/+bug/445852/comments/204
>
> BUG DESCRIPTION FOLLOWS:
>
> In the Karmic beta I experience ssd stalls during the boot process.
> It happens almost everytime before xsplash loads and happens again
> frequently between logging into gdm and the desktop loading. When it
> happens during login I think it is making gnome time out on loading
> panel items as I get errors related to lots of panel items failing to
> load. If I log out and back in again when the ssd isn't stalled the
> panel items load fine.
>
> When it happens the following messages appear before xplash (or in
> dmesg ...

Read more...

I didn't actually see Lennart's comment 20 two years ago, sorry. Downgrading priority as the actual bug has been fixed two years ago. What's left is some robustification which I outlined in the last paragraph of comment 19.

Changed in libatasmart:
importance: Critical → Low
Displaying first 40 and last 40 comments. View all 297 comments or add a comment.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.