Bug #445852 “devkit-disks-probe-ata-smart causes HSM Violations ... : Bugs : libatasmart package : Ubuntu

Revision history for this message

theluketaylor (ekul-taylor) wrote on 2009-10-07:

#1

dmesg.log Edit (51.2 KiB, text/plain)
AlsaDevices.txt Edit (403 bytes, text/plain; charset="utf-8")
BootDmesg.txt Edit (48.2 KiB, text/plain; charset="utf-8")
Card0.Amixer.values.txt Edit (1.2 KiB, text/plain; charset="utf-8")
Card0.Codecs.codec.0.txt Edit (7.4 KiB, text/plain; charset="utf-8")
CurrentDmesg.txt Edit (3.1 KiB, text/plain; charset="utf-8")
Dependencies.txt Edit (1.4 KiB, text/plain; charset="utf-8")
IwConfig.txt Edit (571 bytes, text/plain; charset="utf-8")
Lspci.txt Edit (13.7 KiB, text/plain; charset="utf-8")
Lsusb.txt Edit (382 bytes, text/plain; charset="utf-8")
PciMultimedia.txt Edit (591 bytes, text/plain; charset="utf-8")
ProcCpuinfo.txt Edit (1.3 KiB, text/plain; charset="utf-8")
ProcInterrupts.txt Edit (1.2 KiB, text/plain; charset="utf-8")
ProcModules.txt Edit (2.4 KiB, text/plain; charset="utf-8")
UdevDb.txt Edit (88.6 KiB, text/plain; charset="utf-8")
UdevLog.txt Edit (187.4 KiB, text/plain; charset="utf-8")
WifiSyslog.txt Edit (458.8 KiB, text/plain; charset="utf-8")

Revision history for this message

theluketaylor (ekul-taylor) wrote on 2009-10-07:

#2

lspci-vnvn.log Edit (28.3 KiB, text/plain)

Revision history for this message

theluketaylor (ekul-taylor) wrote on 2009-10-07:

#3

I updated to kernel 2.6.31-12 today and the problem seems to have gotten worse. Under the older karmic kernels it would almost always happen before xsplash and very rarely after gdm. with -12 it seems to be happening before xsplash and after gdm every boot.

Revision history for this message

av8r (av8r) wrote on 2009-10-11:

#4

It doesn't change with 2.6.31-13-generic #44 - on a EeePC 900A with upgrade RAM/SSD disk.
For me, It's usualy freeze between fsck and setting up the resolver. And once again while launching the first session - doesn't matter if it's UNR or not (I've both setup).
I had to the grub boot line: "clocksource=hpet notsc", It remove me the warning message abount tsc clock unstable but didn't change anything with stall SSD. I also remove(edit) from the grub boot line: "quiet splash"

$ dmesg | grep ata2
[ 1.253633] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xffa8 irq 15
[ 1.427461] ata2.00: CFA: Patriot Memory 64GB PATA Storage Drive, Ver2.M0G, max UDMA/66
[ 1.427461] ata2.00: 126090720 sectors, multi 1: LBA
[ 1.440008] ata2.00: configured for UDMA/66
[ 40.809047] ata2: lost interrupt (Status 0x58)
[ 40.809047] ata2: drained 2048 bytes to clear DRQ.
[ 40.811862] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 40.815862] ata2.00: BMDMA stat 0x24
[ 40.818548] ata2.00: cmd c8/00:08:32:51:22/00:00:00:00:00/e0 tag 0 dma 4096 in
[ 40.822958] ata2.00: status: { DRDY DRQ }
[ 40.826958] ata2: soft resetting link
[ 41.000008] ata2.00: configured for UDMA/66
[ 41.000008] ata2: EH complete
[ 232.820015] ata2: lost interrupt (Status 0x58)
[ 232.820081] ata2: drained 8192 bytes to clear DRQ.
[ 232.848018] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 232.848018] ata2.00: BMDMA stat 0x24
[ 232.848018] ata2.00: cmd c8/00:20:ba:72:27/00:00:00:00:00/e0 tag 0 dma 16384 in
[ 232.848018] ata2.00: status: { DRDY DRQ }
[ 232.848018] ata2: soft resetting link
[ 233.020008] ata2.00: configured for UDMA/66
[ 233.020008] ata2: EH complete
[ 273.820016] ata2: lost interrupt (Status 0x58)
[ 273.820089] ata2: drained 6144 bytes to clear DRQ.
[ 273.837834] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 273.843269] ata2.00: BMDMA stat 0x24
[ 273.849571] ata2.00: cmd c8/00:18:da:72:27/00:00:00:00:00/e0 tag 0 dma 12288 in
[ 273.860739] ata2.00: status: { DRDY DRQ }
[ 273.866515] ata2: soft resetting link
[ 274.041007] ata2.00: configured for UDMA/66
[ 274.041007] ata2: EH complete
[ 407.436313] ata2.00: ACPI cmd ef/03:44:00:00:00:a0 filtered out
[ 407.436313] ata2.00: ACPI cmd ef/03:0c:00:00:00:a0 filtered out
[ 407.436313] ata2.00: ACPI cmd c6/00:01:00:00:00:a0 succeeded
[ 407.452005] ata2.00: configured for UDMA/66
[ 407.468005] ata2.00: configured for UDMA/66
[ 407.468005] ata2: EH complete

It doesn't change with 2.6.31-13-generic #44 - on a EeePC 900A with upgrade RAM/SSD disk.
For me, It's usualy freeze between fsck and setting up the resolver. And once again while launching the first session - doesn't matter if it's UNR or not (I've both setup).
I had to the grub boot line: "clocksource=hpet notsc", It remove me the warning message abount tsc clock unstable but didn't change anything with stall SSD. I also remove(edit)  from the grub boot line: "quiet splash"

$ dmesg | grep ata2
[    1.253633] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xffa8 irq 15
[    1.427461] ata2.00: CFA: Patriot Memory 64GB PATA Storage Drive, Ver2.M0G, max UDMA/66
[    1.427461] ata2.00: 126090720 sectors, multi 1: LBA 
[    1.440008] ata2.00: configured for UDMA/66
[   40.809047] ata2: lost interrupt (Status 0x58)
[   40.809047] ata2: drained 2048 bytes to clear DRQ.
[   40.811862] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[   40.815862] ata2.00: BMDMA stat 0x24
[   40.818548] ata2.00: cmd c8/00:08:32:51:22/00:00:00:00:00/e0 tag 0 dma 4096 in
[   40.822958] ata2.00: status: { DRDY DRQ }
[   40.826958] ata2: soft resetting link
[   41.000008] ata2.00: configured for UDMA/66
[   41.000008] ata2: EH complete
[  232.820015] ata2: lost interrupt (Status 0x58)
[  232.820081] ata2: drained 8192 bytes to clear DRQ.
[  232.848018] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  232.848018] ata2.00: BMDMA stat 0x24
[  232.848018] ata2.00: cmd c8/00:20:ba:72:27/00:00:00:00:00/e0 tag 0 dma 16384 in
[  232.848018] ata2.00: status: { DRDY DRQ }
[  232.848018] ata2: soft resetting link
[  233.020008] ata2.00: configured for UDMA/66
[  233.020008] ata2: EH complete
[  273.820016] ata2: lost interrupt (Status 0x58)
[  273.820089] ata2: drained 6144 bytes to clear DRQ.
[  273.837834] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  273.843269] ata2.00: BMDMA stat 0x24
[  273.849571] ata2.00: cmd c8/00:18:da:72:27/00:00:00:00:00/e0 tag 0 dma 12288 in
[  273.860739] ata2.00: status: { DRDY DRQ }
[  273.866515] ata2: soft resetting link
[  274.041007] ata2.00: configured for UDMA/66
[  274.041007] ata2: EH complete
[  407.436313] ata2.00: ACPI cmd ef/03:44:00:00:00:a0 filtered out
[  407.436313] ata2.00: ACPI cmd ef/03:0c:00:00:00:a0 filtered out
[  407.436313] ata2.00: ACPI cmd c6/00:01:00:00:00:a0 succeeded
[  407.452005] ata2.00: configured for UDMA/66
[  407.468005] ata2.00: configured for UDMA/66
[  407.468005] ata2: EH complete

av8r (av8r) on 2009-10-11

Changed in linux (Ubuntu):
status:	New → Confirmed

Revision history for this message

redDEADresolve (reddeadresolve) wrote on 2009-10-15:

#5

dsmeg output Edit (45.2 KiB, text/plain)

I am also getting the same error on My Dell Mini 9.

[38.825065] ata1.00 exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[38.825227] BMDMA Stat 0x24
[38.825318] ata1.00:cmd c8/00:18:8f:89:20/00:00:00:00:/e0 tag 0 dma 1228in
[38.825321] res 58/00:18:8f:89:20/00:00:00:00:/e0 Emask 0x2 (HSM violation)
[38.825598] ata1.00: status {DRDY DRQ}

Occasionally I get sent to the root prompt to manually run an fsck.

Revision history for this message

redDEADresolve (reddeadresolve) wrote on 2009-10-15:

#6

hdparm -I /dev/sda output Edit (1.5 KiB, text/plain)

Revision history for this message

Gav Mack (gavinmac) wrote on 2009-10-17:

#7

I have identical issues with my Aspire One A110 with a SuperTalent SSD 32Gb upgrade as the OP of this bug - it makes Karmic take almost 3 minutes to start the first time and at least 3 restarts later (with ever decreasing time) I get a relatively stable desktop.

Revision history for this message

Gav Mack (gavinmac) wrote on 2009-10-17:

#8

hdparmoutput.txt Edit (1.5 KiB, text/plain)

Revision history for this message

Gav Mack (gavinmac) wrote on 2009-10-17:

#9

dmesg.log Edit (51.7 KiB, text/plain)

Revision history for this message

Johan Van den Neste (jvdneste) wrote on 2009-10-24:

#10

I have the same configuration as Gav Mack (Aspire One A110 with a SuperTalent SSD 32Gb upgrade). Same problem here.

Revision history for this message

Johan Van den Neste (jvdneste) wrote on 2009-10-24:

#11

It is easy to reproduce simply by starting gparted. The error is then produced twice just as before while 'searching /dev/sda partitions'. As a result, the 'searching /dev/sda partitions' activity in gparted takes a long time.

Revision history for this message

professordes (d-a-johnston-hw) wrote on 2009-10-27:

#12

A "me too" on an eeePC 901 with an upgraded crucial SSD and an upgrade install of karmic RC

The relevant bit in dmesg is:

[ 35.816124] ata2: lost interrupt (Status 0x58)
[ 35.820096] ata2: drained 2048 bytes to clear DRQ.
[ 35.823180] ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 35.826268] ata2.01: BMDMA stat 0x64
[ 35.829292] ata2.01: cmd c8/00:08:f7:41:0d/00:00:00:00:00/f2 tag 0 dma 4096 i
n
[ 35.829295] res 58/00:08:f7:41:0d/00:00:00:00:00/f2 Emask 0x2 (HSM v
iolation)
[ 35.835971] ata2.01: status: { DRDY DRQ }

The machine is also (mostly) failing to pick up an SDHC card in the reader, which wasn't the case in 9.04

Revision history for this message

Adam Gianola (adam-gianola) wrote on 2009-10-30:

#13

Same story here. Dell Mini 9 upgraded with a Super Talent FEM16GFDL 16 GB SSD. I experience this both after the upgrade from 9.04 to 9.10 as well as on a clean install of 9.10.

I can also confirm starting gparted reproduces the dmesg output normally seen after (long) boot up.

Revision history for this message

Andrew Simpson (andrew-simpson) wrote on 2009-10-30:

#14

Another 'me too'.

Just upgraded an Acer Aspire One A110 (ZG5) from existing (factory installed) 8 GB SSD to Super Talent 16 GB (FEM16GF13M).

Running the LiveCD (on USB stick) with 9.10 RC, then opening gParted shows the essentially the same messages in dmesg as other reports (and it takes a long time).

Everything else seems fine.

Revision history for this message

danq989 (danq989) wrote on 2009-10-30:

#15

Me too!

I have the same configuration as Gav Mack (including upgraded 32GB SSD) with the same results.

Verified on both a 9.04 to 9.10 upgrade and a fresh 9.10 install on a freshly erased and partitioned drive. Same problem on boot and in gparted. Seems to only occur during mounting of the drive (possibly during initial mount and then remount as -rw)

---danq989

Revision history for this message

Andrew Simpson (andrew-simpson) wrote on 2009-10-31:

#16

I have linked this bug report to (what looks to be) the same problem at the kernel bug tracker. Not sure I've done the linking correctly ;-)

http://bugzilla.kernel.org/show_bug.cgi?id=14515

Bug Watch Updater (bug-watch-updater) on 2009-10-31

Changed in linux:
status:	Unknown → Confirmed

Revision history for this message

Kory (postmako) wrote on 2009-10-31:

#17

bootchart Edit (77.4 KiB, image/svg+xml)

Me too! AAO ZG5 running stock 8GB drive. Jaunty was booting in about 20 seconds and Karmic is taking about 90 seconds. I am attaching parts from dmesg and the most recent copy of bootchart.

...
[ 7.242403] input: SynPS/2 Synaptics TouchPad as /devices/platform/i8042/serio2/input/input8
[ 36.820096] ata2: lost interrupt (Status 0x58)
[ 36.824029] ata2: drained 2048 bytes to clear DRQ.
[ 36.827217] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 36.827227] ata2.00: BMDMA stat 0x4
[ 36.827248] ata2.00: cmd c8/00:08:ef:cc:ce/00:00:00:00:00/e0 tag 0 dma 4096 in
[ 36.827251] res 58/00:08:ef:cc:ce/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
[ 36.827258] ata2.00: status: { DRDY DRQ }
[ 36.827302] ata2: soft resetting link
[ 36.996463] ata2.00: configured for UDMA/66
[ 36.996497] ata2: EH complete
[ 37.001030] Clocksource tsc unstable (delta = -133907975 ns)
[ 37.042527] ath5k 0000:03:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[ 37.042586] ath5k 0000:03:00.0: setting latency timer to 64
[ 37.042686] ath5k 0000:03:00.0: registered as 'phy0'
...
[ 56.778794] groups: 1 0
[ 335.989086] ata2: lost interrupt (Status 0x58)
[ 335.993064] ata2: drained 2048 bytes to clear DRQ.
[ 335.996576] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 335.996588] ata2.00: BMDMA stat 0x4
[ 335.996609] ata2.00: cmd c8/00:08:4f:1c:0c/00:00:00:00:00/e0 tag 0 dma 4096 in
[ 335.996613] res 58/00:08:4f:1c:0c/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
[ 335.996623] ata2.00: status: { DRDY DRQ }
[ 335.996675] ata2: soft resetting link
[ 336.168420] ata2.00: configured for UDMA/66
[ 336.168453] ata2: EH complete
[ 372.004111] ata2: lost interrupt (Status 0x58)
[ 372.008033] ata2: drained 2048 bytes to clear DRQ.
[ 372.011577] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 372.011588] ata2.00: BMDMA stat 0x4
[ 372.011609] ata2.00: cmd c8/00:08:bf:a6:49/00:00:00:00:00/e0 tag 0 dma 4096 in
[ 372.011613] res 58/00:08:bf:a6:49/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
[ 372.011623] ata2.00: status: { DRDY DRQ }
[ 372.011674] ata2: soft resetting link
[ 372.184426] ata2.00: configured for UDMA/66
[ 372.184458] ata2: EH complete
[ 372.586276] gdu-notificatio[1693]: segfault at c ip 00efd50e sp bfbf6d50 error 4 in libgdu.so.0.0.0[ef4000+1c000]
[ 377.073516] wlan0: authenticate with AP 00:0f:66:b9:59:0f
[ 377.079461] wlan0: authenticated
[ 377.079473] wlan0: associate with AP 00:0f:66:b9:59:0f
[ 377.085704] wlan0: RX AssocResp from 00:0f:66:b9:59:0f (capab=0x11 status=0 aid=5)
[ 377.085716] wlan0: associated
...
And it happens from time to time after login...

Me too!  AAO ZG5 running stock 8GB drive.  Jaunty was booting in about 20 seconds and Karmic is taking about 90 seconds.  I am attaching parts from dmesg and the most recent copy of bootchart.

...
[    7.242403] input: SynPS/2 Synaptics TouchPad as /devices/platform/i8042/serio2/input/input8
[   36.820096] ata2: lost interrupt (Status 0x58)
[   36.824029] ata2: drained 2048 bytes to clear DRQ.
[   36.827217] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[   36.827227] ata2.00: BMDMA stat 0x4
[   36.827248] ata2.00: cmd c8/00:08:ef:cc:ce/00:00:00:00:00/e0 tag 0 dma 4096 in
[   36.827251]          res 58/00:08:ef:cc:ce/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
[   36.827258] ata2.00: status: { DRDY DRQ }
[   36.827302] ata2: soft resetting link
[   36.996463] ata2.00: configured for UDMA/66
[   36.996497] ata2: EH complete
[   37.001030] Clocksource tsc unstable (delta = -133907975 ns)
[   37.042527] ath5k 0000:03:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[   37.042586] ath5k 0000:03:00.0: setting latency timer to 64
[   37.042686] ath5k 0000:03:00.0: registered as 'phy0'
...
[   56.778794]   groups: 1 0
[  335.989086] ata2: lost interrupt (Status 0x58)
[  335.993064] ata2: drained 2048 bytes to clear DRQ.
[  335.996576] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  335.996588] ata2.00: BMDMA stat 0x4
[  335.996609] ata2.00: cmd c8/00:08:4f:1c:0c/00:00:00:00:00/e0 tag 0 dma 4096 in
[  335.996613]          res 58/00:08:4f:1c:0c/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
[  335.996623] ata2.00: status: { DRDY DRQ }
[  335.996675] ata2: soft resetting link
[  336.168420] ata2.00: configured for UDMA/66
[  336.168453] ata2: EH complete
[  372.004111] ata2: lost interrupt (Status 0x58)
[  372.008033] ata2: drained 2048 bytes to clear DRQ.
[  372.011577] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  372.011588] ata2.00: BMDMA stat 0x4
[  372.011609] ata2.00: cmd c8/00:08:bf:a6:49/00:00:00:00:00/e0 tag 0 dma 4096 in
[  372.011613]          res 58/00:08:bf:a6:49/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
[  372.011623] ata2.00: status: { DRDY DRQ }
[  372.011674] ata2: soft resetting link
[  372.184426] ata2.00: configured for UDMA/66
[  372.184458] ata2: EH complete
[  372.586276] gdu-notificatio[1693]: segfault at c ip 00efd50e sp bfbf6d50 error 4 in libgdu.so.0.0.0[ef4000+1c000]
[  377.073516] wlan0: authenticate with AP 00:0f:66:b9:59:0f
[  377.079461] wlan0: authenticated
[  377.079473] wlan0: associate with AP 00:0f:66:b9:59:0f
[  377.085704] wlan0: RX AssocResp from 00:0f:66:b9:59:0f (capab=0x11 status=0 aid=5)
[  377.085716] wlan0: associated
...
And it happens from time to time after login...

Revision history for this message

Johan Van den Neste (jvdneste) wrote on 2009-10-31:

#18

I'd like to point out that even though the delays in the boot process are annoying, what is worse is the series of applets crashing when logging in to gnome. Usually I cannot log out again because that applet has crashed. So I switch to tty1 and do a 'sudo service gdm restart'. The next and subsequent logins are fine until the next reboot.

Revision history for this message

Kory (postmako) wrote on 2009-10-31: Re: [Bug 445852] Re: SSD stall during boot

#19

Download full text (5.5 KiB)

Yeah I started to notice that kind of stuff as well. That is why I'm
leaving Jaunty on my wife's machine.

On Sat, Oct 31, 2009 at 8:51 AM, Johan Van den Neste <email address hidden>wrote:

> I'd like to point out that even though the delays in the boot process
> are annoying, what is worse is the series of applets crashing when
> logging in to gnome. Usually I cannot log out again because that applet
> has crashed. So I switch to tty1 and do a 'sudo service gdm restart'.
> The next and subsequent logins are fine until the next reboot.
>
> --
> SSD stall durin g boot
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading. When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load. If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded. I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: luke 1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
> Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
> Mixer name : 'Realtek ALC268'
> Components : 'HDA:10ec0268,1025015b,00100101'
> Controls : 9
> Simple ctrls : 6
> CheckboxSubmission: 12ef539f3788bfbc46bc56b5c28128a6
> CheckboxSystem: c69722ecac764861be52925fa50b4dcc
> Date: Wed Oct 7 17:54:56 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=UUID=8d44b89b-2edb-4c02-a4be-94bd25b65081
> MachineType: Acer AOA110
> Package: linux-image-2.6.31-12-generic 2.6.31-12.40
> ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-12-generic
> root=UUID=039a096e-3486-4898-9eeb-44a705f8b7fd ro quiet spl...

Yeah I started to notice that kind of stuff as well.  That is why I'm
leaving Jaunty on my wife's machine.

On Sat, Oct 31, 2009 at 8:51 AM, Johan Van den Neste <jvdneste@gmail.com>wrote:

> I'd like to point out that even though the delays in the boot process
> are annoying, what is worse is the series of applets crashing when
> logging in to gnome. Usually I cannot log out again because that applet
> has crashed. So I switch to tty1 and do a 'sudo service gdm restart'.
> The next and subsequent logins are fine until the next reboot.
>
> --
> SSD stall durin g boot
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process.  It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading.  When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load.  If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded.  I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
>  **** List of PLAYBACK Hardware Devices ****
>  card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>   Subdevices: 1/1
>   Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
>  **** List of CAPTURE Hardware Devices ****
>  card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>   Subdevices: 1/1
>   Subdevice #0: subdevice #0
> AudioDevicesInUse:
>  USER        PID ACCESS COMMAND
>  /dev/snd/controlC0:  luke       1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
>  Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
>   Mixer name   : 'Realtek ALC268'
>   Components   : 'HDA:10ec0268,1025015b,00100101'
>   Controls      : 9
>   Simple ctrls  : 6
> CheckboxSubmission: 12ef539f3788bfbc46bc56b5c28128a6
> CheckboxSystem: c69722ecac764861be52925fa50b4dcc
> Date: Wed Oct  7 17:54:56 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=UUID=8d44b89b-2edb-4c02-a4be-94bd25b65081
> MachineType: Acer AOA110
> Package: linux-image-2.6.31-12-generic 2.6.31-12.40
> ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-12-generic
> root=UUID=039a096e-3486-4898-9eeb-44a705f8b7fd ro quiet splash elevator=noop
> usbcore.autosuspend=1
> ProcEnviron:
>  LANG=en_CA.UTF-8
>  SHELL=/bin/bash
> ProcVersionSignature: Ubuntu 2.6.31-12.40-generic
> RelatedPackageVersions: linux-firmware 1.21
> RfKill:
>  0: phy0: Wireless LAN
>        Soft blocked: no
>        Hard blocked: no
> SourcePackage: linux
> Tags:  ubuntu-unr
> Uname: Linux 2.6.31-12-generic i686
> XsessionErrors:
>  (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:
> assertion `src != NULL' failed
>  (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:
> assertion `src != NULL' failed
>  (nautilus:2092): Eel-CRITICAL **: eel_preferences_get_boolean: assertion
> `preferences_is_initialized ()' failed
>  (polkit-gnome-authentication-agent-1:2118): GLib-CRITICAL **:
> g_once_init_leave: assertion `initialization_value != 0' failed
>  (gnome-panel:2048): Gdk-WARNING **:
> /build/buildd/gtk+2.0-2.18.2/gdk/x11/gdkdrawable-x11.c:952 drawable is not a
> pixmap or window
> dmi.bios.date: 10/06/2008
> dmi.bios.vendor: Acer
> dmi.bios.version: v0.3309
> dmi.board.asset.tag: Base Board Asset Tag
> dmi.board.vendor: Acer
> dmi.board.version: Base Board Version
> dmi.chassis.type: 1
> dmi.chassis.vendor: Chassis Manufacturer
> dmi.chassis.version: Chassis Version
> dmi.modalias:
> dmi:bvnAcer:bvrv0.3309:bd10/06/2008:svnAcer:pnAOA110:pvr1:rvnAcer:rn:rvrBaseBoardVersion:cvnChassisManufacturer:ct1:cvrChassisVersion:
> dmi.product.name: AOA110
> dmi.product.version: 1
> dmi.sys.vendor: Acer
>

-- 
=======================================================
"The philosophy of gun control: Teenagers are roaring through town at 90
MPH, where the speed limit is 25. Your solution is to lower the speed limit
to 20." - Sam Cohen, inventor of the Neutron Bomb

"We maintain [privately-owned] arms largely because we seek to prevent
violence. Those that wish to disarm us do so that they may perpetrate it
with impunity." - R. Murray

"Men cannot be governed and remain men. Domesticate the wolf and he changes
both physically and mentally. His muzzle shrinks, his teeth diminish, he
loses size, speed, and strength, He grows spots. His ears flop. His brain
withers. He becomes a dog. Men are on the verge of becoming dogs -- the
changes are underway already -- unless we do something to stop it."
The Ceo Lia Wheeler, Phoebus Krumm, forthcoming

"No man ever believes that the Bible means what it says; he is always
convinced that it says what he means."
George Bernard Shaw

Andrew Simpson (andrew-simpson) on 2009-11-03

tags:

added: ubuntu

Revision history for this message

ownyourown (ownyourown) wrote on 2009-11-05: Re: SSD stall during boot

#20

My Asus eee pc 900 also affected by this bug. (Fresh install of final Ubuntu 9.10)

Revision history for this message

Saif Ahmed (saif) wrote on 2009-11-06:

#21

A me too here

eeepc 900 fresh install of final 9.1.

Moreover if I have any kind of usb flash drives attached, machine doesn't complete boot at all.

Revision history for this message

andrey i. mavlyanov (andrey-mavlyanov) wrote on 2009-11-06:

#22

Moreover. I got this error on non-SSD drive. Check https://bugs.launchpad.net/ubuntu/+source/linux/+bug/473765 for details

Revision history for this message

Andrew Simpson (andrew-simpson) wrote on 2009-11-06:

#23

@andrey i. mavlyanov

Andrey,

I don't think that this is the same bug.

On this line:

Nov 4 08:18:45 aim-laptop kernel: [35132.010175] res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)

You are getting a 'timeout', whereas this bug is causing 'HSM Violations'.

Revision history for this message

Adam Gianola (adam-gianola) wrote on 2009-11-08:

#24

hdparm -I /dev/sda Edit (1.8 KiB, text/plain)

Interestingly, if I use the Dell Mini 9 Factory SSD with 9.10 this problem goes away.

Revision history for this message

Andrew Simpson (andrew-simpson) wrote on 2009-11-09:

#25

Playing with LiveCD (on a USB stick) with an Aspire One with Super Talent 16Gb SSD:

- Normal LiveCD boot shows the problem in dmesg.

- Booting with 'libata.dma=0' in kernel line fixes the problem (by disabling DMA) in dmesg.

- Booting with 'libata.ignore_hpa=0' had no affect.

Since the problem looked to be DMA related, I tried slowing down the transfer with 'libata.force=udma/33'. No affect, though plenty of logs about UDMA being forced to 33.

The same machine is working fine with Jaunty. Another Aspire One with the standard (factory) 8 Gb SSD is running Karmic without any problem.

Revision history for this message

mdyn (tamerlaha-gmail) wrote on 2009-11-09:

#26

acer aoa-110 some problem....

Revision history for this message

danq989 (danq989) wrote on 2009-11-10:

#27

I just verified that this bug still present for me in the just-released kernel 2.6.31-15.

Revision history for this message

Alan Pope 🍺🐧🐱 🦄 (popey) wrote on 2009-11-10:

#28

Linux kernel bug 14515 has nothing to do with this

Changed in linux:
importance:	Unknown → Undecided
status:	Confirmed → New

Revision history for this message

Dave V (mindkeep) wrote on 2009-11-10:

#29

Affects my asus eeepc 900. Please raise to critical before I have to learn to hassle with Gentoo again.

Andrew Simpson (andrew-simpson) on 2009-11-11

Changed in linux:
importance:	Undecided → Unknown
status:	New → Unknown

Bug Watch Updater (bug-watch-updater) on 2009-11-11

Changed in linux:
status:	Unknown → Confirmed

Revision history for this message

Horácio (horacioh) wrote on 2009-11-11:

#30

I had exactly the same problem (boot stall) on an Asus eee 900. After 2 weeks of use, i got a grub -error: "error: biosdisk read error" and the system become completely useless, pending a disk wipe and full reinstall.
Similar situations are reported on: https://bugs.launchpad.net/bugs/387272
considered a duplicate of this bug.
But I do not see the grub-error problem reported here. May this be a different bug?

Revision history for this message

Alan Pope 🍺🐧🐱 🦄 (popey) wrote on 2009-11-11:

#31

Ok, so after running 9.10 and discovering this issue. I now have booted off a 9.04 USB stick and dd'ed /dev/zero over both sda and sdb. I then installed 9.04 and have no issues.

So the hardware is not faulty.

Revision history for this message

Andrew Simpson (andrew-simpson) wrote on 2009-11-13:

#32

A possible work around from the upstream bug report is to boot with 'irqpoll' in the kernel boot parameters. It's not a good fix, the logs are still full of error messages, but at least the 'stall' is reduced.

Regrettably, it's probably best to avoid using Karmic on SSD equipped netbooks. Use Jaunty instead, since this bug probably won't be fixed in the near future.

Revision history for this message

Rick @ rickandpatty.com (rick-rickandpatty) wrote on 2009-11-16:

#33

I have the same issue as reported above with an eeepc 701 (with 16GB SuperTalent SSD, and also with the original 8GB SSD).

An alternate method of fixing a Karmic-corrupted SSD - at least on the 701 - is to boot with ASUS's rescue DVD and allow it to reinstall the default Xandros installation.

With Karmic installed, I can confirm that kernel option "irqpoll" stops the stall during a Karmic boot, but does anyone know if that stops Karmic from messing up the SSD?

Revision history for this message

Johan Van den Neste (jvdneste) wrote on 2009-11-16:

#34

I'm a bit disappointed that the previously mentioned kernel bug is discarded so quickly. Could it not still be related? There are indeed no *optical* drives to be polled, however, on the acer one there are 2 card readers (= pollable removeable media drives). Since the kernel bug report claims that the problem is caused by a removeable media drive choking on the polling commands, could one of the card readers not be the cause?

So I tried 'hal-disable-polling' on the card readers...

One reader marked as 'storage extension' is /dev/mmcblk0, and is apparently not seen as a removeable device (message by hal-disable-polling). The other... doesn't seem to work. I get no response whatsoever when inserting or removing and sd card (which only raises my suspicion). Hence, I don't know its /dev/ name, and dont know what to give hal-disable-polling as --device argument. Maybe the device is not even detected at boot?

Anyone else care to investigate on his/her laptop? (I'd really hate to switch back to 9.04)

Revision history for this message

Johan Van den Neste (jvdneste) wrote on 2009-11-16:

#35

I should add that boot-time, gnome login and gparted startup are typically moments where I'd expect polling for media to take place.

Revision history for this message

Johan Van den Neste (jvdneste) wrote on 2009-11-16:

#36

I notice changed behaviour when there are sd cards inserted: The stall is longer with 2 cards inserted than with 1 card, and 1 card is worse than no cards, though it does not disappear. How would I disable the card readers entirely? (I see sdhci-pci mentioned, and when there are cards inserted, the drives are detected as mmc0 and mmc1)

Revision history for this message

Rick @ rickandpatty.com (rick-rickandpatty) wrote on 2009-11-17:

#37

The eeepc 701 has one SD reader that can be disabled in the BIOS. Disabling it doesn't seem to affect the SSD issues at all on my system.

Revision history for this message

Andrew Simpson (andrew-simpson) wrote on 2009-11-17:

#38

@Johan
Interesting comment. I have private doubts that this bug is totally due to hardware 'quality' problems (see the current kernel bug report).

If the hardware was at fault then: firstly, the bug would not be spread over such a range of differing hardware, and secondly, Ubuntu 9.04 should also be failing in a similar manner?

Revision history for this message

Kory (postmako) wrote on 2009-11-17: Re: [Bug 445852] Re: SSD stall during boot

#39

Well after having this issue for a couple of weeks now my netbook will no
longer boot. I ran a live USB stick and gparted can't even read the
partition. As soon as I have time, I'm going back to 9.04 and I suggest
everyone else do the same before their drive craps out like mine did. It
seems to write out bad sectors or destroy your data as well because my
wireless stopped working and I rebooted hoping it would fix the problem. So
all of my netbooks will wait until this issue is resolved before upgrading.
Enjoy!

Revision history for this message

danq989 (danq989) wrote on 2009-11-17: Re: SSD stall during boot

#40

@Andrew
Couldn't agree more that this is a code regression and not simply a hardware quality issue.

I haven't checked the various changelogs, but I wouldn't be surprised if something in the IDE IRQ handler or the hardware initialization was subjected to optimization (in libdma?). Possibly the SuperTalent drives do react in a non-standard way that was never exposed before.

Hopefully Mr. Heo will take the time to look through the code and see. I'm sure it would help if someone could supply the developers hardware that reliably produces the error. I need my netbook too much to send it away, but maybe someone has an extra SuperTalent drive?

---danq989

Revision history for this message

Johan Van den Neste (jvdneste) wrote on 2009-11-17:

#41

@Andrew
Don't get me wrong, i'm also convinced it's a software issue. I know little about the linux kernel, but I know a lot about concurrent programming, and I know certain bugs only manifest themselves under very specific conditions usually related to very subtle differences in timing (which may also only happen on different processors, different number of cores and so on).

Not that I'm saying it's a concurrency bug.

All I'm saying is that different devices influence the timing of all sorts of events and that specific combinations of hardware may trigger specific bugs, and I was guessing that maybe - just maybe - the combination of the ssd with the card readers triggered this one. (since the timing characteristics of ssd's are after all quite different from regular hard drives)

So I would still like to try and disable the card readers entirely (which I cannot do in the bios). But yes, it's a long shot.

Revision history for this message

Gav Mack (gavinmac) wrote on 2009-11-17:

#42

I have noticed if I power up the AAO without mains power more often than not it only freezes on the first part of the boot process and not during GDM making all the applets fail, which gives me the most stable I've got Karmic working so far. Fsck seems to want to run every boot now just before the freeze. It's enough for me to stick with Karmic unless the SSD gets trashed like Kory but I'm running EXT4 with journaling still enabled, perhaps that's why I've not suffered the corruption problem,

I've posted this issue on the Supertalent forum to see if they can possibly help with maybe a firmware update but I'm not holding my breath! http://www.supertalent.com/home/forum/viewtopic.php?f=17&t=2083 I agree with you both Johan and Dan - this does seem to be a Linux bug - one they seemingly can't be arsed to fix at the moment which is very frustrating. Maybe when they get enough pissed off users like us they'll do something about it :o(

I use my card readers a lot, the left hand in particular for data because I'm dual booting the SSD with Windows 7 so that rules out that option in my case.

Revision history for this message

David Staples (dcstaples) wrote on 2009-11-18:

#43

I've been having this problem too on my Acer aoa110 (using UNR 9.10)
After a few days of using karmic and getting frustrated with the ~1:45 boot times (from grub to desktop) along with the panel configuration problems, I tried to use a PPA kernel (comment #20 in http://bugs.launchpad.net/ubuntu/karmic/+source/sreadahead/+bug/432089), which bricked my netbook. After trying unsuccessfully several times to reinstall karmic by using sysresccd for gparted (wasn't trusting karmic at this point), and doing a testdisk I noticed there was a read error at cylinder 257, along with an intermittent read-error at sector 1.
I gave up and did a wipe of the drive (not using dd. wipe. took about six hours).
Read-error went away, partitioning worked fine, karmic finally installed, but no change to boot times.
Gah!
I'm going back to jaunty until 10.04 comes out. Which is a shame, because I like the new UI in karmic, but I've decided not to risk my ssd for eye candy.

Revision history for this message

David Staples (dcstaples) wrote on 2009-11-18:

#44

Oh yeah, here's the output of one of the logs.
Btw, sorry about the long meandering story above ;)

ata2: lost interrupt (Status 0x58)
ata2: drained 2048 bytes to clear DRQ.
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata2.00: BMDMA stat 0x4
ata2.00: cmd c8/00:08:87:31:01/00:00:00:00:00/e0 tag 0 dma 4096 in
res 58/00:08:87:31:01/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
ata2.00: status: { DRDY DRQ }
ata2: soft resetting link
ata2.00: configured for UDMA/66
ata2: EH complete

Revision history for this message

Jim (wilsja) wrote on 2009-11-19:

#45

Reproduced on a stock eeepc 900 linux version, installing karmic to the 4G ssd. (I did a dist-upgrade from jaunty, which works fine) Also, I have gotten many cases where I needed to overwrite the disk with zeros to make it usable again. (this happens after one or two boots with karmic, I then reinstall to try to fix it)

kernel is 2.6.31-14-generic

dmesg |grep ata2 gives me

[ 1.103371] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xffa8 irq 15
[ 1.344348] ata2.00: ATA-0: ASUS-PHISON OB SSD, TST2.04L, max UDMA/66
[ 1.344356] ata2.00: 7880544 sectors, multi 0: LBA
[ 1.344436] ata2.01: ATA-0: ASUS-PHISON SSD, TST2.04L, max UDMA/66
[ 1.344442] ata2.01: 31522176 sectors, multi 0: LBA
[ 1.356270] ata2.00: configured for UDMA/66
[ 1.368273] ata2.01: configured for UDMA/66
[ 1.401892] ata2.00: configured for UDMA/66
[ 1.408271] ata2.01: configured for UDMA/66
[ 1.408279] ata2: EH complete
[ 36.816048] ata2: lost interrupt (Status 0x58)
[ 36.820016] ata2: drained 8192 bytes to clear DRQ.
[ 36.834372] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 36.834380] ata2.00: BMDMA stat 0x64
[ 36.834397] ata2.00: cmd c8/00:20:87:1a:35/00:00:00:00:00/e0 tag 0 dma 16384 in
[ 36.834408] ata2.00: status: { DRDY DRQ }
[ 36.834443] ata2: soft resetting link
[ 37.028315] ata2.00: configured for UDMA/66
[ 37.036314] ata2.01: configured for UDMA/66
[ 37.060315] ata2.00: configured for UDMA/66
[ 37.068312] ata2.01: configured for UDMA/66
[ 37.068331] ata2: EH complete
[ 67.816056] ata2: lost interrupt (Status 0x58)
[ 67.820016] ata2: drained 6144 bytes to clear DRQ.
[ 67.829818] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 67.829945] ata2.00: BMDMA stat 0x64
[ 67.830019] ata2.00: cmd c8/00:18:cf:4b:3c/00:00:00:00:00/e0 tag 0 dma 12288 in
[ 67.830275] ata2.00: status: { DRDY DRQ }
[ 67.830379] ata2: soft resetting link
[ 68.024317] ata2.00: configured for UDMA/66
[ 68.032313] ata2.01: configured for UDMA/66
[ 68.056315] ata2.00: configured for UDMA/66
[ 68.064310] ata2.01: configured for UDMA/66
[ 68.064329] ata2: EH complete

Reproduced on a stock eeepc 900 linux version, installing karmic to the 4G ssd.  (I did a dist-upgrade from jaunty, which works fine)  Also, I have gotten many cases where I needed to overwrite the disk with zeros to make it usable again.  (this happens after one or two boots with karmic, I then reinstall to try to fix it)

kernel is 2.6.31-14-generic

dmesg |grep ata2                              gives me

[    1.103371] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xffa8 irq 15
[    1.344348] ata2.00: ATA-0: ASUS-PHISON OB SSD, TST2.04L, max UDMA/66
[    1.344356] ata2.00: 7880544 sectors, multi 0: LBA 
[    1.344436] ata2.01: ATA-0: ASUS-PHISON SSD, TST2.04L, max UDMA/66
[    1.344442] ata2.01: 31522176 sectors, multi 0: LBA 
[    1.356270] ata2.00: configured for UDMA/66
[    1.368273] ata2.01: configured for UDMA/66
[    1.401892] ata2.00: configured for UDMA/66
[    1.408271] ata2.01: configured for UDMA/66
[    1.408279] ata2: EH complete
[   36.816048] ata2: lost interrupt (Status 0x58)
[   36.820016] ata2: drained 8192 bytes to clear DRQ.
[   36.834372] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[   36.834380] ata2.00: BMDMA stat 0x64
[   36.834397] ata2.00: cmd c8/00:20:87:1a:35/00:00:00:00:00/e0 tag 0 dma 16384 in
[   36.834408] ata2.00: status: { DRDY DRQ }
[   36.834443] ata2: soft resetting link
[   37.028315] ata2.00: configured for UDMA/66
[   37.036314] ata2.01: configured for UDMA/66
[   37.060315] ata2.00: configured for UDMA/66
[   37.068312] ata2.01: configured for UDMA/66
[   37.068331] ata2: EH complete
[   67.816056] ata2: lost interrupt (Status 0x58)
[   67.820016] ata2: drained 6144 bytes to clear DRQ.
[   67.829818] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[   67.829945] ata2.00: BMDMA stat 0x64
[   67.830019] ata2.00: cmd c8/00:18:cf:4b:3c/00:00:00:00:00/e0 tag 0 dma 12288 in
[   67.830275] ata2.00: status: { DRDY DRQ }
[   67.830379] ata2: soft resetting link
[   68.024317] ata2.00: configured for UDMA/66
[   68.032313] ata2.01: configured for UDMA/66
[   68.056315] ata2.00: configured for UDMA/66
[   68.064310] ata2.01: configured for UDMA/66
[   68.064329] ata2: EH complete

Revision history for this message

Andrew Simpson (andrew-simpson) wrote on 2009-11-19:

#46

@Ubuntu Bugs

Can you please have a look at the status of this bug (currently 'undecided')? This bug could do with some input from the Ubuntu devs. Here's why:

1. The bug is occurring on a wide range of net books with SSD units. This is a growing target audience for Ubuntu.

2. In the simplest case the bug makes the machine unresponsive and impractical to use.

3. If the user continues with the above state, the machine often gets 'bricked'. Total data loss occurs, and the SSD can only be recovered with low level formatting (Normal rescue tools don't work).

Total data loss with bricked machines confirmed several times over and no apparent workaround, has to be a more than 'undecided' bug?

I have opened a bug report on the kernel bug list which is getting some high level attention. It would be good if Ubuntu was able to give some support on this.

Revision history for this message

Alan Pope 🍺🐧🐱 🦄 (popey) wrote on 2009-11-19:

#47

I contacted Leann on the kernel team and asked for some advice, and this is her response. I'm not at home, but at UDS so can't try this right now.. be good if someone else could:-

"Thanks for the heads up Alan. I'll get this bug on our list for review.
It seems this has been forwarded upstream as well:

http://bugzilla.kernel.org/show_bug.cgi?id=14583

Care to give the latest upstream mainline kernel builds a test:

https://wiki.ubuntu.com/KernelTeam/MainlineBuilds

2.6.31.6 is the latest upstream stable kernel (which we're going to
release as an SRU for karmic):
http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.31.6/

The Karmic kernel for SRU with the 2.6.31.6 patches is currently baking
in Stefan's PPA (2.6.31-16-51~pre2):
https://edge.launchpad.net/~stefan-bader-canonical/+archive/karmic/+packages

2.6.32-rc6 is the latest 2.6.32 release candidate
http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.32-rc6/

Might not hurt to give both the 2.6.31.6 and 2.6.32-rc6 kernels a test
and confim the issue remains. Then relay the info to both the upstream
bug and lp bug."

Revision history for this message

Rick @ rickandpatty.com (rick-rickandpatty) wrote on 2009-11-19:

#48

Here's one test of 2.6.32-rc6 on an ASUS eeepc 701 with 16GB SuperTalent SSD: The original Karmic kernel also shows thus bug with the stock ASUS 8GB SSD.

$ uname -a
Linux eeepc 2.6.32-020632rc6-generic #020632rc6 SMP Wed Nov 4 10:54:30 UTC 2009 i686 GNU/Linux

Looks like the same thing is happening in 2.6.32-rc6.

[ 8.420522] ata2: drained 2048 bytes to clear DRQ.
[ 8.421826] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 8.421839] ata2.00: BMDMA stat 0x24
[ 8.421850] ata2.00: failed command: READ DMA
[ 8.421872] ata2.00: cmd c8/00:08:86:39:04/00:00:00:00:00/e0 tag 0 dma 4096 in
[ 8.421876] res 58/00:08:86:39:04/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
[ 8.421887] ata2.00: status: { DRDY DRQ }
[ 8.421948] ata2: soft resetting link

Revision history for this message

Trey (trey333) wrote on 2009-11-19:

#49

I want people, and especially the dev team, to know how serious this problem is. it killed my SSDs, as I reported in an duplicate bug of "unknown" importance. I mean this is hardware issue. Nothing could get either of my chips working get - not zeroing out, not a gparted live USB, not Windows / Partition magic on a USB. Not even taking out the 16gb chip and putting it in a different computer. This is physical. This is real and expensive to replace. Get Karmic off your SSD drives and dev team, please change this damn status! Having to find a new solid state drive because of your release is not of "unknown" importance.

Revision history for this message

danq989 (danq989) wrote on 2009-11-19:

#50

@Trey
Just a point of info for you. After I performed an in-place install of Karmic over my existing Jaunty, I found my AAO-110's SuperTalent-upgrade SSD bricked.

I managed to recover it using HDDERASE version 3.3.

This DOS program uses the "secure erase" function of the drive to completely erase the drive and set all blocks to unused. Unlike a dd of zeros, it only takes a minute or so to complete. Apparently this is a trick that folks have been using on Intel gen1 SSDs to recover performance after block fragmentation has caused a performance drop. Note that the more widely available HDDERASE versions 3.1 and 4.0 did NOT work for me (both threw different exceptions and bombed), so be sure to find and use v3.3. Guide and link here: http://www.iishacks.com/index.php/2009/06/30/how-to-secure-erase-reset-an-intel-solid-state-drive-ssd/

Now this won't fix things if there's enough real damage to the flash due to too many write/erase cycles, but it's worth a shot.

Good luck Trey!
---danq989

Revision history for this message

Rick @ rickandpatty.com (rick-rickandpatty) wrote on 2009-11-19:

#51

One more piece of information that may or may not be useful to the develpoers:

On my other laptop, I have Jaunty installed with kernel 2.6.30.9 (from http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.30.9/ ) and the other Intel-related patches from the Jaunty/Intel graphics performance howto at http://ubuntuforums.org/showthread.php?t=1130582 .

If I install that kernel on my eeepc 701 running Karmic, I get the same stall / HSM violation errors that I do under 2.6.31 or 2.6.32-rc6. (There are other problems with that kernel under karmic, but I mainly wanted to see if it would stop the SSD problems, and it doesn't...)

Revision history for this message

Jim (wilsja) wrote on 2009-11-19:

#52

Trey I'm not sure about your specific situation, but for me a regular dd didn't work, though a dd with of=/dev/sda bs=1M did work.

Just to back up what Rick is saying, I installed 2.6.28-12-netbook-eeepc from array.org, and I still have the errors. Furthermore, I used the jaunty kernel that was still on there: 2.6.28-11-generic, and I still have the errors. If I install jaunty, that same kernel does not give me errors.

I'm not sure what the effects of using an old kernel on a new distribution are (for instance, the touchpad driver doesn't work), so I don't know how to isolate the problem

Revision history for this message

Skylord (me-skylord) wrote on 2009-11-19:

#53

Just want to confirm the whole bug on my EeePC 901 4G+16G. Installation of fresh Karmic not working at all with mentioned errors in logs and freezing of partitioning setup page. And surely all is good on Jaunty.

Revision history for this message

Raf (4283534-noduck) wrote on 2009-11-19:

#54

I also see these log entries on my Acer Aspire One with SUPER TALENT FEM32GF13M. However, I have not yet seen any corruption as a result of this. In fact, sometimes this does not result in a stall during boot. If it does stall the boot, it hangs for about 12 seconds.

Revision history for this message

Jim (wilsja) wrote on 2009-11-19:

#55

dmesg.jaunty Edit (39.5 KiB, text/plain)

It actually seems that it gives a slightly different error message with the old kernel, but has a similar effect. This error with the old kernel only shows up after a dist-upgrade to karmic -- essentially a karmic install with the old 2.6.28 kernel

I am attaching three dmesg outputs.

dmesg.jaunty is before upgrading to karmic

dmesg.oldkern is directly after upgrading to karmic, but using the same kernel as before

dmesg.karmic is using the karmic kernel

Revision history for this message

Jim (wilsja) wrote on 2009-11-19:

#56

dmesg.oldkern Edit (44.5 KiB, text/plain)

This has a timeout error rather than an HSM violation

Revision history for this message

Jim (wilsja) wrote on 2009-11-19:

#57

dmesg.karmic Edit (43.6 KiB, text/plain)

Revision history for this message

lotus49 (lotus-49) wrote on 2009-11-22:

#58

I have the same computer and SSD (Acer Aspire One and SUPER TALENT FEM32GF13M) as well as the same symptoms as Raf above.

Revision history for this message

sles (slesru) wrote on 2009-11-25:

#59

this bug is not SSD specific.
here is output from my colleague's dmesg notebook dell 120L, this is hdd:

[ 6702.000206] ata1: lost interrupt (Status 0x58)
[ 6702.004018] ata1: drained 32768 bytes to clear DRQ.
[ 6702.093725] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 6702.093739] ata1.01: cmd a0/00:00:00:00:00/00:00:00:00:00/b0 tag 0
[ 6702.093740] cdb 1e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 6702.093742] res 58/00:01:00:00:00/00:00:00:00:00/b0 Emask 0x2 (HSM violation)
[ 6702.093746] ata1.01: status: { DRDY DRQ }
[ 6702.093859] ata1: soft resetting link
[ 6702.340796] ata1.00: configured for UDMA/100
[ 6702.372409] ata1.01: configured for UDMA/33
[ 6702.391202] ata1: EH complete

Revision history for this message

professordes (d-a-johnston-hw) wrote on 2009-12-01:

#60

This bug is still present with the 31-15 kernel and all other updates applied on my eeePC 901 (1 Dec). There doesn't seem to be a lot going on at linux-kernel-bugs 14583?

Revision history for this message

Sal Mazzola (salmaz) wrote on 2009-12-05:

#61

I am having a similar problem with an Acer Aspire 1410. Ubuntu LiveCD booted perfectly off of the USB drive, but booting off the hard drive give me either HSM Violations or timeouts.

Setting libata.force=noncq allowed the machine to finally boot up without any errors.

Revision history for this message

Rick @ rickandpatty.com (rick-rickandpatty) wrote on 2009-12-06:

#62

The kernel parameter mentioned in #61 has no effect on the stall/HSM violation errors on my ASUS EeePC 701 (booted with the 9.10 Live CD and added the parameter to the kernel options, as I have had to reinstall Jaunty to use the machine daily).

Back to the drawing board, I guess.

Revision history for this message

Alan Pope 🍺🐧🐱 🦄 (popey) wrote on 2009-12-06:

#63

dmesg2632.txt Edit (42.5 KiB, text/plain)

Ok, there's no way this is a hardware issue. I have on my desk two idenical Eee 900's. Both have exhibited this issue under Jaunty and both exhibit the issue under Karmic. If I roll them back to karmic and wipe the SSD along the way, the error goes away.

I have now installed 9.10 UNR on one, and have added the 2.6.32 kernel from the links in comment #47.

[ 113.816054] ata2: lost interrupt (Status 0x58)
[ 113.820008] ata2: drained 8192 bytes to clear DRQ.
[ 113.835302] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 113.835310] ata2.00: BMDMA stat 0x64
[ 113.835317] ata2.00: failed command: READ DMA
[ 113.835332] ata2.00: cmd c8/00:20:a7:8f:09/00:00:00:00:00/e0 tag 0 dma 16384 in
[ 113.835335] res 58/00:20:a7:8f:09/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
[ 113.835343] ata2.00: status: { DRDY DRQ }
[ 113.835393] ata2: soft resetting link

Revision history for this message

Andrew Simpson (andrew-simpson) wrote on 2009-12-07:

#64

The upstream bug report has asked whether anyone has tested kernels 2.6.29 or 2.6.30 as this would help narrow down when the bug was (re)introduced to the kernel.

For my own interest, I have noted that the bug generally occurs with newer/faster SSD units. For instance the Intel SSD originally fitted to the early Acer Aspire One is, well, rather slow, but doesn't show the bug (I have one). However another same machine (they were brought as a pair) upgraded with the Super Talent unit does have the problem. Anyone notice any similarity?

Alan Cox has suggested that the SSD is responding to a command so fast that the kernel misses seeing the interrupt.

Revision history for this message

lotus49 (lotus-49) wrote on 2009-12-07: Re: [Bug 445852] Re: SSD stall during boot

#65

Download full text (5.1 KiB)

I can confirm that upgrading from the stock SSD with Karmic already
installed to a Super Talent SSD produced this error which had not
previously been present.

Simon

Sent from My iPhone

On 7 Dec 2009, at 06:31, Andrew Simpson
<email address hidden> wrote:

> The upstream bug report has asked whether anyone has tested kernels
> 2.6.29 or 2.6.30 as this would help narrow down when the bug was
> (re)introduced to the kernel.
>
> For my own interest, I have noted that the bug generally occurs with
> newer/faster SSD units. For instance the Intel SSD originally
> fitted to
> the early Acer Aspire One is, well, rather slow, but doesn't show the
> bug (I have one). However another same machine (they were brought
> as a
> pair) upgraded with the Super Talent unit does have the problem.
> Anyone
> notice any similarity?
>
> Alan Cox has suggested that the SSD is responding to a command so fast
> that the kernel misses seeing the interrupt.
>
> --
> SSD stall during boot
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process.
> It happens almost everytime before xsplash loads and happens again
> frequently between logging into gdm and the desktop loading. When
> it happens during login I think it is making gnome time out on
> loading panel items as I get errors related to lots of panel items
> failing to load. If I log out and back in again when the ssd isn't
> stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in
> dmesg when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't
> think it has happened once the system is fully loaded. I am running
> karmic unr on an Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: luke 1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
> Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
> Mixer name : 'Realtek ALC268'
> Components : 'HDA:10ec0268,1025015b,00100101'
> Controls : 9
> Simple ctrl...

I can confirm that upgrading from the stock SSD with Karmic already  
installed to a Super Talent SSD produced this error which had not  
previously been present.

Simon

Sent from My iPhone

On 7 Dec 2009, at 06:31, Andrew Simpson  
<andrew.simpson@paradise.net.nz> wrote:

> The upstream bug report has asked whether anyone has tested kernels
> 2.6.29 or 2.6.30 as this would help narrow down when the bug was
> (re)introduced to the kernel.
>
> For my own interest, I have noted that the bug generally occurs with
> newer/faster SSD units.  For instance the Intel SSD originally  
> fitted to
> the early Acer Aspire One is, well, rather slow, but doesn't show the
> bug (I have one).  However another same machine (they were brought  
> as a
> pair) upgraded with the Super Talent unit does have the problem.   
> Anyone
> notice any similarity?
>
> Alan Cox has suggested that the SSD is responding to a command so fast
> that the kernel misses seeing the interrupt.
>
> -- 
> SSD stall during boot
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process.   
> It happens almost everytime before xsplash loads and happens again  
> frequently between logging into gdm and the desktop loading.  When  
> it happens during login I think it is making gnome time out on  
> loading panel items as I get errors related to lots of panel items  
> failing to load.  If I log out and back in again when the ssd isn't  
> stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in  
> dmesg when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't  
> think it has happened once the system is fully loaded.  I am running  
> karmic unr on an Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>   Subdevices: 1/1
>   Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>   Subdevices: 1/1
>   Subdevice #0: subdevice #0
> AudioDevicesInUse:
> USER        PID ACCESS COMMAND
> /dev/snd/controlC0:  luke       1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
> Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
>   Mixer name    : 'Realtek ALC268'
>   Components    : 'HDA:10ec0268,1025015b,00100101'
>   Controls      : 9
>   Simple ctrls  : 6
> CheckboxSubmission: 12ef539f3788bfbc46bc56b5c28128a6
> CheckboxSystem: c69722ecac764861be52925fa50b4dcc
> Date: Wed Oct  7 17:54:56 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=UUID=8d44b89b-2edb-4c02-a4be-94bd25b65081
> MachineType: Acer AOA110
> Package: linux-image-2.6.31-12-generic 2.6.31-12.40
> ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-12-generic  
> root=UUID=039a096e-3486-4898-9eeb-44a705f8b7fd ro quiet splash  
> elevator=noop usbcore.autosuspend=1
> ProcEnviron:
> LANG=en_CA.UTF-8
> SHELL=/bin/bash
> ProcVersionSignature: Ubuntu 2.6.31-12.40-generic
> RelatedPackageVersions: linux-firmware 1.21
> RfKill:
> 0: phy0: Wireless LAN
>    Soft blocked: no
>    Hard blocked: no
> SourcePackage: linux
> Tags:  ubuntu-unr
> Uname: Linux 2.6.31-12-generic i686
> XsessionErrors:
> (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:  
> assertion `src != NULL' failed
> (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:  
> assertion `src != NULL' failed
> (nautilus:2092): Eel-CRITICAL **: eel_preferences_get_boolean:  
> assertion `preferences_is_initialized ()' failed
> (polkit-gnome-authentication-agent-1:2118): GLib-CRITICAL **:  
> g_once_init_leave: assertion `initialization_value != 0' failed
> (gnome-panel:2048): Gdk-WARNING **: /build/buildd/gtk+2.0-2.18.2/gdk/ 
> x11/gdkdrawable-x11.c:952 drawable is not a pixmap or window
> dmi.bios.date: 10/06/2008
> dmi.bios.vendor: Acer
> dmi.bios.version: v0.3309
> dmi.board.asset.tag: Base Board Asset Tag
> dmi.board.vendor: Acer
> dmi.board.version: Base Board Version
> dmi.chassis.type: 1
> dmi.chassis.vendor: Chassis Manufacturer
> dmi.chassis.version: Chassis Version
> dmi.modalias:  
> dmi:bvnAcer:bvrv0.3309: 
> bd10/ 
> 06/ 
> 2008: 
> svnAcer:pnAOA110: 
> pvr1: 
> rvnAcer:rn:rvrBaseBoardVersion:cvnChassisManufacturer:ct1: 
> cvrChassisVersion:
> dmi.product.name: AOA110
> dmi.product.version: 1
> dmi.sys.vendor: Acer
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/linux/+bug/445852/+subscribe

Revision history for this message

theluketaylor (ekul-taylor) wrote on 2009-12-07: Re: SSD stall during boot

#66

I can confirm kernel 2.6.30 works fine with my netbook (Acer Aspire One with 8 GB SSD) using a Jaunty userland. I am running 2.6.30-02063009-generic.
I am somewhat reluctant to try a karmic kernel since once I started getting the errors after installing karmic to get rid of them in any distro/kernel combination I had to write all zeros to the SSD. If it's truly necessary I will try.

Revision history for this message

Юрий Аполлов (apollovy) wrote on 2009-12-07: Re: [Bug 445852] Re: SSD stall during boot

#67

Download full text (5.1 KiB)

Telling kernel libata.force=noncq during boot-time from liveUSB had no
effect for me - still got hangups on the same place. It is interesting, that
this happens only after X and gdm is starting. Don't know why really.
Now I got this: booting into single-user mode gave an interesting
result:after getting onto root shell, in about a minute I got a message
about exception Emask, DRQ etc. But after it - another thing, that got my
interest - a message, telling that "Starting init crypto disks... [OK]"
So, my guess that it's really there - in crypto disks.

2009/12/7 theluketaylor <email address hidden>

> I can confirm kernel 2.6.30 works fine with my netbook (Acer Aspire One
> with 8 GB SSD) using a Jaunty userland. I am running
> 2.6.30-02063009-generic.
> I am somewhat reluctant to try a karmic kernel since once I started getting
> the errors after installing karmic to get rid of them in any distro/kernel
> combination I had to write all zeros to the SSD. If it's truly necessary I
> will try.
>
> --
> SSD stall during boot
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading. When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load. If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded. I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: luke 1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
> Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
> Mixer name : 'Realtek ALC268'
> Components : 'HDA:10ec0268,1025015b,00100101'
> Controls ...

Telling kernel libata.force=noncq during boot-time from liveUSB had no
effect for me - still got hangups on the same place. It is interesting, that
this happens only after X and gdm is starting. Don't know why really.
Now I got this: booting into single-user mode gave an interesting
result:after getting onto root shell, in about a minute I got a message
about exception Emask, DRQ etc. But after it - another thing, that got my
interest - a message, telling that "Starting init crypto disks... [OK]"
So, my guess that it's really there - in crypto disks.

2009/12/7 theluketaylor <ekul.taylor@gmail.com>

> I can confirm kernel 2.6.30 works fine with my netbook (Acer Aspire One
> with 8 GB SSD) using a Jaunty userland.  I am running
> 2.6.30-02063009-generic.
> I am somewhat reluctant to try a karmic kernel since once I started getting
> the errors after installing karmic to get rid of them in any distro/kernel
> combination I had to write all zeros to the SSD.  If it's truly necessary I
> will try.
>
> --
> SSD stall during boot
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process.  It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading.  When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load.  If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded.  I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
>  **** List of PLAYBACK Hardware Devices ****
>  card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>   Subdevices: 1/1
>   Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
>  **** List of CAPTURE Hardware Devices ****
>  card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>   Subdevices: 1/1
>   Subdevice #0: subdevice #0
> AudioDevicesInUse:
>  USER        PID ACCESS COMMAND
>  /dev/snd/controlC0:  luke       1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
>  Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
>   Mixer name   : 'Realtek ALC268'
>   Components   : 'HDA:10ec0268,1025015b,00100101'
>   Controls      : 9
>   Simple ctrls  : 6
> CheckboxSubmission: 12ef539f3788bfbc46bc56b5c28128a6
> CheckboxSystem: c69722ecac764861be52925fa50b4dcc
> Date: Wed Oct  7 17:54:56 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=UUID=8d44b89b-2edb-4c02-a4be-94bd25b65081
> MachineType: Acer AOA110
> Package: linux-image-2.6.31-12-generic 2.6.31-12.40
> ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-12-generic
> root=UUID=039a096e-3486-4898-9eeb-44a705f8b7fd ro quiet splash elevator=noop
> usbcore.autosuspend=1
> ProcEnviron:
>  LANG=en_CA.UTF-8
>  SHELL=/bin/bash
> ProcVersionSignature: Ubuntu 2.6.31-12.40-generic
> RelatedPackageVersions: linux-firmware 1.21
> RfKill:
>  0: phy0: Wireless LAN
>        Soft blocked: no
>        Hard blocked: no
> SourcePackage: linux
> Tags:  ubuntu-unr
> Uname: Linux 2.6.31-12-generic i686
> XsessionErrors:
>  (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:
> assertion `src != NULL' failed
>  (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:
> assertion `src != NULL' failed
>  (nautilus:2092): Eel-CRITICAL **: eel_preferences_get_boolean: assertion
> `preferences_is_initialized ()' failed
>  (polkit-gnome-authentication-agent-1:2118): GLib-CRITICAL **:
> g_once_init_leave: assertion `initialization_value != 0' failed
>  (gnome-panel:2048): Gdk-WARNING **:
> /build/buildd/gtk+2.0-2.18.2/gdk/x11/gdkdrawable-x11.c:952 drawable is not a
> pixmap or window
> dmi.bios.date: 10/06/2008
> dmi.bios.vendor: Acer
> dmi.bios.version: v0.3309
> dmi.board.asset.tag: Base Board Asset Tag
> dmi.board.vendor: Acer
> dmi.board.version: Base Board Version
> dmi.chassis.type: 1
> dmi.chassis.vendor: Chassis Manufacturer
> dmi.chassis.version: Chassis Version
> dmi.modalias:
> dmi:bvnAcer:bvrv0.3309:bd10/06/2008:svnAcer:pnAOA110:pvr1:rvnAcer:rn:rvrBaseBoardVersion:cvnChassisManufacturer:ct1:cvrChassisVersion:
> dmi.product.name: AOA110
> dmi.product.version: 1
> dmi.sys.vendor: Acer
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/linux/+bug/445852/+subscribe
>

Revision history for this message

Andrew Squire (andrewsquire) wrote on 2009-12-08: Re: SSD stall during boot

#68

I had this issue on my EeePC 901. I took the following actions and have not *yet* seen the issue reappear:

1. Boot from 9.10 NBR Live CD
2. dd if=/dev/zero of=/dev/sda bs=1M
3. dd if=/dev/zero of=/dev/sdb bs=1M
4. install 9.10 NBR
5. Add kernal command "libata.dma=0" in GRUB2 config and rebuild GRUB2 menu

NOTE: When I first got the issue I first tried steps 4. & 5. without doing 1. 2. 3. and it did not fix the issue.

Revision history for this message

Юрий Аполлов (apollovy) wrote on 2009-12-08: Re: [Bug 445852] Re: SSD stall during boot

#69

Download full text (5.5 KiB)

And yes- it's in libata!!
SSD is eating brains only when libata.dma>=4

http://www.kernel.org/doc/Documentation/kernel-parameters.txt

> libata.dma= [LIBATA] DMA control
> libata.dma=0 Disable all PATA and SATA DMA
> libata.dma=1 PATA and SATA Disk DMA only
> libata.dma=2 ATAPI (CDROM) DMA only
> libata.dma=4 Compact Flash DMA only
> Combinations also work, so libata.dma=3 enables DMA
> for disks and CDROMs, but not CFs.
>
> So, SSD is going to CFs. No surprise, of course. But if no dma is on -
haha, the speed of devise suffers well!
During bootup kernel says (in case of libata.dma={<=3}), that my SSD
(AAO-110L, 8Gb SSD-PAMM, Samsung) is in (!) PIO4 transfer data mode. It is
REALLY slow.

hdparm -tT /dev/sda
Cached reads: 370 MB/sec
Buffered reads: 2.3 MB/sec

And it is reading - do I have to talk 'bout writing?
Where to dig now, when the reason of problem seems 2 be localized?

2009/12/8 Andrew Squire <email address hidden>

> I had this issue on my EeePC 901. I took the following actions and have
> not *yet* seen the issue reappear:
>
> 1. Boot from 9.10 NBR Live CD
> 2. dd if=/dev/zero of=/dev/sda bs=1M
> 3. dd if=/dev/zero of=/dev/sdb bs=1M
> 4. install 9.10 NBR
> 5. Add kernal command "libata.dma=0" in GRUB2 config and rebuild GRUB2 menu
>
> NOTE: When I first got the issue I first tried steps 4. & 5. without
> doing 1. 2. 3. and it did not fix the issue.
>
> --
> SSD stall during boot
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading. When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load. If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded. I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268...

And yes- it's in libata!!
SSD is eating brains only when libata.dma>=4

http://www.kernel.org/doc/Documentation/kernel-parameters.txt

> libata.dma=	[LIBATA] DMA control
> 			libata.dma=0	  Disable all PATA and SATA DMA
> 			libata.dma=1	  PATA and SATA Disk DMA only
> 			libata.dma=2	  ATAPI (CDROM) DMA only
> 			libata.dma=4	  Compact Flash DMA only
> 			Combinations also work, so libata.dma=3 enables DMA
> 			for disks and CDROMs, but not CFs.
>
> So, SSD is going to CFs. No surprise, of course. But if no dma is on -
haha, the speed of devise suffers well!
During bootup kernel says (in case of libata.dma={<=3}), that my SSD
(AAO-110L, 8Gb SSD-PAMM, Samsung) is in (!) PIO4 transfer data mode. It is
REALLY slow.

hdparm -tT /dev/sda
Cached reads: 370 MB/sec
Buffered reads: 2.3 MB/sec

And it is reading - do I have to talk 'bout writing?
Where to dig now, when the reason of problem seems 2 be localized?

2009/12/8 Andrew Squire <andrewsquire@gmail.com>

> I had this issue on my EeePC 901. I took the following actions and have
> not *yet* seen the issue reappear:
>
> 1. Boot from 9.10 NBR Live CD
> 2. dd if=/dev/zero of=/dev/sda bs=1M
> 3. dd if=/dev/zero of=/dev/sdb bs=1M
> 4. install 9.10 NBR
> 5. Add kernal command "libata.dma=0" in GRUB2 config and rebuild GRUB2 menu
>
> NOTE: When I first got the issue I first tried steps 4. & 5. without
> doing 1. 2. 3. and it did not fix the issue.
>
> --
> SSD stall during boot
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process.  It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading.  When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load.  If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded.  I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
>  **** List of PLAYBACK Hardware Devices ****
>  card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>   Subdevices: 1/1
>   Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
>  **** List of CAPTURE Hardware Devices ****
>  card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>   Subdevices: 1/1
>   Subdevice #0: subdevice #0
> AudioDevicesInUse:
>  USER        PID ACCESS COMMAND
>  /dev/snd/controlC0:  luke       1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
>  Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
>   Mixer name   : 'Realtek ALC268'
>   Components   : 'HDA:10ec0268,1025015b,00100101'
>   Controls      : 9
>   Simple ctrls  : 6
> CheckboxSubmission: 12ef539f3788bfbc46bc56b5c28128a6
> CheckboxSystem: c69722ecac764861be52925fa50b4dcc
> Date: Wed Oct  7 17:54:56 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=UUID=8d44b89b-2edb-4c02-a4be-94bd25b65081
> MachineType: Acer AOA110
> Package: linux-image-2.6.31-12-generic 2.6.31-12.40
> ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-12-generic
> root=UUID=039a096e-3486-4898-9eeb-44a705f8b7fd ro quiet splash elevator=noop
> usbcore.autosuspend=1
> ProcEnviron:
>  LANG=en_CA.UTF-8
>  SHELL=/bin/bash
> ProcVersionSignature: Ubuntu 2.6.31-12.40-generic
> RelatedPackageVersions: linux-firmware 1.21
> RfKill:
>  0: phy0: Wireless LAN
>        Soft blocked: no
>        Hard blocked: no
> SourcePackage: linux
> Tags:  ubuntu-unr
> Uname: Linux 2.6.31-12-generic i686
> XsessionErrors:
>  (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:
> assertion `src != NULL' failed
>  (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:
> assertion `src != NULL' failed
>  (nautilus:2092): Eel-CRITICAL **: eel_preferences_get_boolean: assertion
> `preferences_is_initialized ()' failed
>  (polkit-gnome-authentication-agent-1:2118): GLib-CRITICAL **:
> g_once_init_leave: assertion `initialization_value != 0' failed
>  (gnome-panel:2048): Gdk-WARNING **:
> /build/buildd/gtk+2.0-2.18.2/gdk/x11/gdkdrawable-x11.c:952 drawable is not a
> pixmap or window
> dmi.bios.date: 10/06/2008
> dmi.bios.vendor: Acer
> dmi.bios.version: v0.3309
> dmi.board.asset.tag: Base Board Asset Tag
> dmi.board.vendor: Acer
> dmi.board.version: Base Board Version
> dmi.chassis.type: 1
> dmi.chassis.vendor: Chassis Manufacturer
> dmi.chassis.version: Chassis Version
> dmi.modalias:
> dmi:bvnAcer:bvrv0.3309:bd10/06/2008:svnAcer:pnAOA110:pvr1:rvnAcer:rn:rvrBaseBoardVersion:cvnChassisManufacturer:ct1:cvrChassisVersion:
> dmi.product.name: AOA110
> dmi.product.version: 1
> dmi.sys.vendor: Acer
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/linux/+bug/445852/+subscribe
>

Revision history for this message

Gav Mack (gavinmac) wrote on 2009-12-08: Re: SSD stall during boot

#70

Setting libata.dma=3 to grub seems to have stopped the hang on my AAO so far, the entries in dmesg are gone. Still taking over a minute to boot though but that's something I'll live with for now.

It's early days but premature thanks go out to Andrew Squire in advance!

Revision history for this message

Юрий Аполлов (apollovy) wrote on 2009-12-09: Re: [Bug 445852] Re: SSD stall during boot

#71

Download full text (4.6 KiB)

Btw libata.dma=3 do not make SSD work in DMA mode though!
There's something to be told to developers, cause I stay on Jaunty so far
this stall goes away!

2009/12/8 Gav Mack <email address hidden>

> Setting libata.dma=3 to grub seems to have stopped the hang on my AAO so
> far, the entries in dmesg are gone. Still taking over a minute to boot
> though but that's something I'll live with for now.
>
> It's early days but premature thanks go out to Andrew Squire in advance!
>
> --
> SSD stall during boot
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading. When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load. If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded. I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: luke 1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
> Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
> Mixer name : 'Realtek ALC268'
> Components : 'HDA:10ec0268,1025015b,00100101'
> Controls : 9
> Simple ctrls : 6
> CheckboxSubmission: 12ef539f3788bfbc46bc56b5c28128a6
> CheckboxSystem: c69722ecac764861be52925fa50b4dcc
> Date: Wed Oct 7 17:54:56 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=UUID=8d44b89b-2edb-4c02-a4be-94bd25b65081
> MachineType: Acer AOA110
> Package: linux-image-2.6.31-12-generic 2.6.31-12.40
> ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-12-generic
> root=UUID=039a096e-3486-4898-9eeb-44a705f8b7fd ro quiet splash elevator=noop
> usbcore.autosuspend=1
> ProcEnviron:
> LANG...

Btw libata.dma=3 do not make SSD work in DMA mode though!
There's something to be told to developers, cause  I stay on Jaunty so far
this stall goes away!

2009/12/8 Gav Mack <gavinmac@hotmail.com>

> Setting libata.dma=3 to grub seems to have stopped the hang on my AAO so
> far, the entries in dmesg are gone.  Still taking over a minute to boot
> though but that's something I'll live with for now.
>
> It's early days but premature thanks go out to Andrew Squire in advance!
>
> --
> SSD stall during boot
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process.  It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading.  When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load.  If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded.  I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
>  **** List of PLAYBACK Hardware Devices ****
>  card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>   Subdevices: 1/1
>   Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
>  **** List of CAPTURE Hardware Devices ****
>  card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>   Subdevices: 1/1
>   Subdevice #0: subdevice #0
> AudioDevicesInUse:
>  USER        PID ACCESS COMMAND
>  /dev/snd/controlC0:  luke       1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
>  Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
>   Mixer name   : 'Realtek ALC268'
>   Components   : 'HDA:10ec0268,1025015b,00100101'
>   Controls      : 9
>   Simple ctrls  : 6
> CheckboxSubmission: 12ef539f3788bfbc46bc56b5c28128a6
> CheckboxSystem: c69722ecac764861be52925fa50b4dcc
> Date: Wed Oct  7 17:54:56 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=UUID=8d44b89b-2edb-4c02-a4be-94bd25b65081
> MachineType: Acer AOA110
> Package: linux-image-2.6.31-12-generic 2.6.31-12.40
> ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-12-generic
> root=UUID=039a096e-3486-4898-9eeb-44a705f8b7fd ro quiet splash elevator=noop
> usbcore.autosuspend=1
> ProcEnviron:
>  LANG=en_CA.UTF-8
>  SHELL=/bin/bash
> ProcVersionSignature: Ubuntu 2.6.31-12.40-generic
> RelatedPackageVersions: linux-firmware 1.21
> RfKill:
>  0: phy0: Wireless LAN
>        Soft blocked: no
>        Hard blocked: no
> SourcePackage: linux
> Tags:  ubuntu-unr
> Uname: Linux 2.6.31-12-generic i686
> XsessionErrors:
>  (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:
> assertion `src != NULL' failed
>  (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:
> assertion `src != NULL' failed
>  (nautilus:2092): Eel-CRITICAL **: eel_preferences_get_boolean: assertion
> `preferences_is_initialized ()' failed
>  (polkit-gnome-authentication-agent-1:2118): GLib-CRITICAL **:
> g_once_init_leave: assertion `initialization_value != 0' failed
>  (gnome-panel:2048): Gdk-WARNING **:
> /build/buildd/gtk+2.0-2.18.2/gdk/x11/gdkdrawable-x11.c:952 drawable is not a
> pixmap or window
> dmi.bios.date: 10/06/2008
> dmi.bios.vendor: Acer
> dmi.bios.version: v0.3309
> dmi.board.asset.tag: Base Board Asset Tag
> dmi.board.vendor: Acer
> dmi.board.version: Base Board Version
> dmi.chassis.type: 1
> dmi.chassis.vendor: Chassis Manufacturer
> dmi.chassis.version: Chassis Version
> dmi.modalias:
> dmi:bvnAcer:bvrv0.3309:bd10/06/2008:svnAcer:pnAOA110:pvr1:rvnAcer:rn:rvrBaseBoardVersion:cvnChassisManufacturer:ct1:cvrChassisVersion:
> dmi.product.name: AOA110
> dmi.product.version: 1
> dmi.sys.vendor: Acer
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/linux/+bug/445852/+subscribe
>

Revision history for this message

Gav Mack (gavinmac) wrote on 2009-12-09: Re: SSD stall during boot

#72

You're correct - it was early days and premature! Though I could put up with the slow boot time with anything other than basic web browsing I couldn't handle the drop the disk read performance from 67mb/sec to 2Mb/sec never mind write. It was worse than my old stock AAO SSD was with Jaunty playing back video.

Back to the situation in post 67 methinks - devs stop dragging your heels and sort this issue out!

Revision history for this message

Andrew Squire (andrewsquire) wrote on 2009-12-09:

#73

Agree completely, Gav Mack. Seems likely to me to be a regression in the kernal that should be fixed. Unfortunately, it looks like the associated kernal bug (#14583) is being put down to dodgy hardware.

The libata.dma=<4 is just a workaround to keep us up and running in the interim... if you can put up with the negative impact on performance :)

Revision history for this message

Gav Mack (gavinmac) wrote on 2009-12-09:

#74

Couldn't agree more Andrew. It's the easy way out to blame this on hardware, they may have had a valid point but clearly none of us ever had a problem with this until Karmic and the newer kernels. To paraphrase that old saying "If it looks like a Kernel Bug, acts like a Kerrnel Bug then it's a Kernel bug" and I can't help but think "waffle" and "b*llsh*t" by laying the blame on the storage devices.

Unless there's more of us or if we get lucky with and a netbook OEM runs into major trouble with the same problem or some IT hacks get involved to force them to do something about it I'll be booting into Windows 7 far more than I want to. A shame :(

Now where's my logins for theregister.co.uk and theinquirer.net?

Revision history for this message

Юрий Аполлов (apollovy) wrote on 2009-12-09: Re: [Bug 445852] Re: SSD stall during boot

#75

Download full text (5.7 KiB)

A small thing to keep in mind:
1. I load Karmic - and got this bug. Kernel 2.6.31 as to my mind
2. I load Karmic and 2.6.32 - still got this bug
3. I load Jaunty - and got not this bug. Kernel 2.6.28
4. I load Jaunty and 2.6.32 - and got NOT this bug
Though it's NOT a kernel bug I guess. Otherwise anyone could reproduce it on
Jaunty easy.
So, I repeat, it's NOT a direct kernel bug - there's something that we miss.
Something makes an SSD hangup when in DMA. What could it be?? If it's not
kernel - than what? A userspace process, that stalls SSD? Specifically what?
Really - maybe if that one is killed - there would be no hangups? Or anyway
we could write a bug report in correct place - neither on a Launchpad, nor
in kernel bugs section.
That was my thoughts. IMHO.

2009/12/9 Gav Mack <email address hidden>

> Couldn't agree more Andrew. It's the easy way out to blame this on
> hardware, they may have had a valid point but clearly none of us ever
> had a problem with this until Karmic and the newer kernels. To
> paraphrase that old saying "If it looks like a Kernel Bug, acts like a
> Kerrnel Bug then it's a Kernel bug" and I can't help but think "waffle"
> and "b*llsh*t" by laying the blame on the storage devices.
>
> Unless there's more of us or if we get lucky with and a netbook OEM runs
> into major trouble with the same problem or some IT hacks get involved
> to force them to do something about it I'll be booting into Windows 7
> far more than I want to. A shame :(
>
> Now where's my logins for theregister.co.uk and theinquirer.net?
>
> --
> SSD stall during boot
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading. When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load. If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded. I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice...

A small thing to keep in mind:
1. I load Karmic - and got this bug. Kernel 2.6.31 as to my mind
2. I load Karmic and 2.6.32 - still got this bug
3. I load Jaunty - and got not this bug. Kernel 2.6.28
4. I load Jaunty and 2.6.32 - and got NOT this bug
Though it's NOT a kernel bug I guess. Otherwise anyone could reproduce it on
Jaunty easy.
So, I repeat, it's NOT a direct kernel bug - there's something that we miss.
Something makes an SSD hangup when in DMA. What could it be?? If it's not
kernel - than what? A userspace process, that stalls SSD? Specifically what?
Really - maybe if that one is killed - there would be no hangups? Or anyway
we could write a bug report in correct place - neither on a Launchpad, nor
in kernel bugs section.
That was my thoughts. IMHO.

2009/12/9 Gav Mack <gavinmac@hotmail.com>

> Couldn't agree more Andrew.  It's the easy way out to blame this on
> hardware, they may have had a valid point but clearly none of us ever
> had a problem with this until Karmic and the newer kernels.  To
> paraphrase that old saying "If it looks like a Kernel Bug, acts like a
> Kerrnel Bug then it's a Kernel bug" and I can't help but think "waffle"
> and "b*llsh*t" by laying the blame on the storage devices.
>
> Unless there's more of us or if we get lucky with and a netbook OEM runs
> into major trouble with the same problem or some IT hacks get involved
> to force them to do something about it I'll be booting into Windows 7
> far more than I want to. A shame :(
>
> Now where's my logins for theregister.co.uk and theinquirer.net?
>
> --
> SSD stall during boot
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process.  It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading.  When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load.  If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded.  I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
>  **** List of PLAYBACK Hardware Devices ****
>  card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>   Subdevices: 1/1
>   Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
>  **** List of CAPTURE Hardware Devices ****
>  card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>   Subdevices: 1/1
>   Subdevice #0: subdevice #0
> AudioDevicesInUse:
>  USER        PID ACCESS COMMAND
>  /dev/snd/controlC0:  luke       1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
>  Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
>   Mixer name   : 'Realtek ALC268'
>   Components   : 'HDA:10ec0268,1025015b,00100101'
>   Controls      : 9
>   Simple ctrls  : 6
> CheckboxSubmission: 12ef539f3788bfbc46bc56b5c28128a6
> CheckboxSystem: c69722ecac764861be52925fa50b4dcc
> Date: Wed Oct  7 17:54:56 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=UUID=8d44b89b-2edb-4c02-a4be-94bd25b65081
> MachineType: Acer AOA110
> Package: linux-image-2.6.31-12-generic 2.6.31-12.40
> ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-12-generic
> root=UUID=039a096e-3486-4898-9eeb-44a705f8b7fd ro quiet splash elevator=noop
> usbcore.autosuspend=1
> ProcEnviron:
>  LANG=en_CA.UTF-8
>  SHELL=/bin/bash
> ProcVersionSignature: Ubuntu 2.6.31-12.40-generic
> RelatedPackageVersions: linux-firmware 1.21
> RfKill:
>  0: phy0: Wireless LAN
>        Soft blocked: no
>        Hard blocked: no
> SourcePackage: linux
> Tags:  ubuntu-unr
> Uname: Linux 2.6.31-12-generic i686
> XsessionErrors:
>  (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:
> assertion `src != NULL' failed
>  (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:
> assertion `src != NULL' failed
>  (nautilus:2092): Eel-CRITICAL **: eel_preferences_get_boolean: assertion
> `preferences_is_initialized ()' failed
>  (polkit-gnome-authentication-agent-1:2118): GLib-CRITICAL **:
> g_once_init_leave: assertion `initialization_value != 0' failed
>  (gnome-panel:2048): Gdk-WARNING **:
> /build/buildd/gtk+2.0-2.18.2/gdk/x11/gdkdrawable-x11.c:952 drawable is not a
> pixmap or window
> dmi.bios.date: 10/06/2008
> dmi.bios.vendor: Acer
> dmi.bios.version: v0.3309
> dmi.board.asset.tag: Base Board Asset Tag
> dmi.board.vendor: Acer
> dmi.board.version: Base Board Version
> dmi.chassis.type: 1
> dmi.chassis.vendor: Chassis Manufacturer
> dmi.chassis.version: Chassis Version
> dmi.modalias:
> dmi:bvnAcer:bvrv0.3309:bd10/06/2008:svnAcer:pnAOA110:pvr1:rvnAcer:rn:rvrBaseBoardVersion:cvnChassisManufacturer:ct1:cvrChassisVersion:
> dmi.product.name: AOA110
> dmi.product.version: 1
> dmi.sys.vendor: Acer
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/linux/+bug/445852/+subscribe
>

Revision history for this message

Raf (4283534-noduck) wrote on 2009-12-09: Re: SSD stall during boot

#76

#75 is correct. My karmic install always gives the error (even with 2.6.29-rc3*). But my jaunty never gives the error, even with 2.6.31-16. However, there is one more difference between my karmic and jaunty installs: karmic uses ext4, while jaunty uses ext2 (and for the record all of those are inside LVM).

Could it be something related to ext4? I notice that the bug triggers during or right after filesystem check/mount (mountall). But I have not been able to reproduce this bug by doing filesystem checks/mounting of the karmic-ext4 partition under jaunty.

I have not been able to debug it further. Debugging upstart to see exactly when the bug triggers generates way too much output.

I tried disabling ureadahead, but that didn't make any difference.

*Note that the error message changed between 2.6.29-6 and 2.6.30-rc1, but it still triggers an error message and the same delay.

Revision history for this message

Gav Mack (gavinmac) wrote on 2009-12-09:

#77

@Raf:

Don't think it's an ext4 issue - I'm very sure that when I first ran Karmic Beta on the USB boot stick it hung on the detection of the SSD before I even got to the partition menu. I tried ext2 first and got corruption pretty quickly and then setup in ext4 before I spotted this bug report on launchpad.

Don't forget also that Andrew Simpson said in post 15 of the Kernel bug list that Mandriva 2010.0 with the same kernel doesn't show this error but Fedora 12 Beta does.

Revision history for this message

theluketaylor (ekul-taylor) wrote on 2009-12-09:

#78

I can confirm this isn't an ext4 issue; I am using ext4 for my / partition in Jaunty with a 2.6.30 kernel without any trouble

Revision history for this message

Rick @ rickandpatty.com (rick-rickandpatty) wrote on 2009-12-09:

#79

I've got to agree with #77 - It isn't an ext4 issue. I installed Karmic on my Eee with ext4 and again with ext2 and got similar errors both times.

I can also confirm that even using kernels that cause no problems under Jaunty cause problems under Karmic. So what's different about the boot process of karmic that's trashing our SSDs? Maybe it's something that can be fixed outside the kernel?

Revision history for this message

Stephen O (soglesby1) wrote on 2009-12-09:

#80

It's not just Karmic, as Gav pointed out. I installed Fedora 12 two days ago in part because I was curious if this problem would follow me from Karmic. Sure enough, a panel failed to load and my dmesg output had the same ata errors as it did in karmic. Even though my partitioning scheme was different since Fedora defaults to LVM. I can also join the chorus in stating that I used ext4 with this drive on Jaunty and had no issues.

Revision history for this message

Raf (4283534-noduck) wrote on 2009-12-09:

#81

What about the increased parallelism in the new upstart? Could this be the cause of the problems: increased simultaneous disk access.

Doesn't Fedora also use upstart?

Unfortunately it looks like upstart cannot be serialized. (I am talking about the jobs in /etc/init, not /etc/init.d)

Revision history for this message

Юрий Аполлов (apollovy) wrote on 2009-12-09: Re: [Bug 445852] Re: SSD stall during boot

#82

Download full text (4.6 KiB)

I'm about to please somebody to look distinctly at upstart - can it be
removed (I have no info now on this - I'm still on 9.04). Can somebody try
removing?

2009/12/9 Raf <email address hidden>

> What about the increased parallelism in the new upstart? Could this be
> the cause of the problems: increased simultaneous disk access.
>
> Doesn't Fedora also use upstart?
>
> Unfortunately it looks like upstart cannot be serialized. (I am talking
> about the jobs in /etc/init, not /etc/init.d)
>
> --
> SSD stall during boot
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading. When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load. If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded. I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: luke 1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
> Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
> Mixer name : 'Realtek ALC268'
> Components : 'HDA:10ec0268,1025015b,00100101'
> Controls : 9
> Simple ctrls : 6
> CheckboxSubmission: 12ef539f3788bfbc46bc56b5c28128a6
> CheckboxSystem: c69722ecac764861be52925fa50b4dcc
> Date: Wed Oct 7 17:54:56 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=UUID=8d44b89b-2edb-4c02-a4be-94bd25b65081
> MachineType: Acer AOA110
> Package: linux-image-2.6.31-12-generic 2.6.31-12.40
> ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-12-generic
> root=UUID=039a096e-3486-4898-9eeb-44a705f8b7fd ro quiet splash elevator=noop
> usbcore.autosuspend=1
> ProcEnv...

I'm about to please somebody to look distinctly at upstart - can it be
removed (I have no info now on this - I'm still on 9.04). Can somebody try
removing?

2009/12/9 Raf <4283534@noduck.org>

> What about the increased parallelism in the new upstart? Could this be
> the cause of the problems: increased simultaneous disk access.
>
> Doesn't Fedora also use upstart?
>
> Unfortunately it looks like upstart cannot be serialized. (I am talking
> about the jobs in /etc/init, not /etc/init.d)
>
> --
> SSD stall during boot
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process.  It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading.  When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load.  If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded.  I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
>  **** List of PLAYBACK Hardware Devices ****
>  card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>   Subdevices: 1/1
>   Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
>  **** List of CAPTURE Hardware Devices ****
>  card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>   Subdevices: 1/1
>   Subdevice #0: subdevice #0
> AudioDevicesInUse:
>  USER        PID ACCESS COMMAND
>  /dev/snd/controlC0:  luke       1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
>  Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
>   Mixer name   : 'Realtek ALC268'
>   Components   : 'HDA:10ec0268,1025015b,00100101'
>   Controls      : 9
>   Simple ctrls  : 6
> CheckboxSubmission: 12ef539f3788bfbc46bc56b5c28128a6
> CheckboxSystem: c69722ecac764861be52925fa50b4dcc
> Date: Wed Oct  7 17:54:56 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=UUID=8d44b89b-2edb-4c02-a4be-94bd25b65081
> MachineType: Acer AOA110
> Package: linux-image-2.6.31-12-generic 2.6.31-12.40
> ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-12-generic
> root=UUID=039a096e-3486-4898-9eeb-44a705f8b7fd ro quiet splash elevator=noop
> usbcore.autosuspend=1
> ProcEnviron:
>  LANG=en_CA.UTF-8
>  SHELL=/bin/bash
> ProcVersionSignature: Ubuntu 2.6.31-12.40-generic
> RelatedPackageVersions: linux-firmware 1.21
> RfKill:
>  0: phy0: Wireless LAN
>        Soft blocked: no
>        Hard blocked: no
> SourcePackage: linux
> Tags:  ubuntu-unr
> Uname: Linux 2.6.31-12-generic i686
> XsessionErrors:
>  (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:
> assertion `src != NULL' failed
>  (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:
> assertion `src != NULL' failed
>  (nautilus:2092): Eel-CRITICAL **: eel_preferences_get_boolean: assertion
> `preferences_is_initialized ()' failed
>  (polkit-gnome-authentication-agent-1:2118): GLib-CRITICAL **:
> g_once_init_leave: assertion `initialization_value != 0' failed
>  (gnome-panel:2048): Gdk-WARNING **:
> /build/buildd/gtk+2.0-2.18.2/gdk/x11/gdkdrawable-x11.c:952 drawable is not a
> pixmap or window
> dmi.bios.date: 10/06/2008
> dmi.bios.vendor: Acer
> dmi.bios.version: v0.3309
> dmi.board.asset.tag: Base Board Asset Tag
> dmi.board.vendor: Acer
> dmi.board.version: Base Board Version
> dmi.chassis.type: 1
> dmi.chassis.vendor: Chassis Manufacturer
> dmi.chassis.version: Chassis Version
> dmi.modalias:
> dmi:bvnAcer:bvrv0.3309:bd10/06/2008:svnAcer:pnAOA110:pvr1:rvnAcer:rn:rvrBaseBoardVersion:cvnChassisManufacturer:ct1:cvrChassisVersion:
> dmi.product.name: AOA110
> dmi.product.version: 1
> dmi.sys.vendor: Acer
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/linux/+bug/445852/+subscribe
>

Gav Mack (gavinmac) on 2009-12-09

Changed in linux (Ubuntu):
assignee:	nobody → Gav Mack (gavinmac)
assignee:	Gav Mack (gavinmac) → nobody
assignee:	nobody → Upstart Developers (upstart-devel)

Revision history for this message

Gav Mack (gavinmac) wrote on 2009-12-09: Re: SSD stall during boot

#83

Notified the upstart devs - do any other distros other than Fedora 12 and Karmic use upstart?

Scott James Remnant (Canonical) (canonical-scott) on 2009-12-09

Changed in linux (Ubuntu):
assignee:	Upstart Developers (upstart-devel) → nobody

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-12-09:

#84

Please do not subscribe that team, the team's own description explicitly asks you not to.

This bug clearly is nothing to do with userspace, it should not be possible for userspace to cause problems in this way (that's what the kernel is there for).

It smells like a kernel driver bug to me, especially given the reports of fiddling with DMA. The high number of similar SSDs mean it could be a hardware company being creative with the spec, but that's still a kernel driver bug for failing to quirk them properly.

I'm not a kernel developer, but I would recommend the following debugging technique:

- those affected should supply detailed information about not only their SSD, but the I/O controller in their laptop (dmesg, lcpci -vvnn, etc.)

- if one release of Ubuntu is affected more than the other, that suggests a regression

- first try a mainline kernel build from http://kernel.ubuntu.com/~kernel-ppa/mainline/ of the equivalent release; if that fixes the problem (unlikely, but still possible), then it is with an Ubuntu patch

- if that does not fix the problem, start working backwards through the kernel releases until you find one that does fix the problem

- if the first kernel is still affected, try kernels from previous Ubuntu releases

(one assumes that the kernel from the release where things work fine, installed on karmic, will work)

- Given a loose idea, narrow it down using the kernel packages you can download from https://launchpad.net/ubuntu/+source/linux/+publishinghistory

Basically what would be ideal would be to find one kernel that works, and then the *immediate next kernel* that doesn't work. This would give a limited number of changes that broke it, and start to reveal what the bug might be

Revision history for this message

Alan Pope 🍺🐧🐱 🦄 (popey) wrote on 2009-12-09:

#85

Thanks for the debugging advice Scott.

The only issue for me is that when I install Karmic and it goes _very_ bad I end up with a bricked machine (IO errors on SSD causing me to drop to initramfs promt). The only way to test other kernels is to dd zeros over the SSD and reinstall the OS again, then add in whatever kernel to test. It's a maddeningly time consuming task.

Revision history for this message

Юрий Аполлов (apollovy) wrote on 2009-12-10: Re: [Bug 445852] Re: SSD stall during boot

#86

Download full text (5.0 KiB)

2 Scott:
How could it be: same program (here I mean kernel version), same hardware -
and different behaviour on different systems of Ubuntu. Don't you think that
it would be rather strange, if 2 things work here and not there - than it's
not because that things gone bad, and the reason is somewhere in
environment?
Nevertheless I think your idea is interesting - can anyone here that is
sitting on 9.10 with SSD test this workaround?

2009/12/10 Alan Pope <email address hidden>

> Thanks for the debugging advice Scott.
>
> The only issue for me is that when I install Karmic and it goes _very_
> bad I end up with a bricked machine (IO errors on SSD causing me to drop
> to initramfs promt). The only way to test other kernels is to dd zeros
> over the SSD and reinstall the OS again, then add in whatever kernel to
> test. It's a maddeningly time consuming task.
>
> --
> SSD stall during boot
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading. When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load. If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded. I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: luke 1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
> Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
> Mixer name : 'Realtek ALC268'
> Components : 'HDA:10ec0268,1025015b,00100101'
> Controls : 9
> Simple ctrls : 6
> CheckboxSubmission: 12ef539f3788bfbc46bc56b5c28128a6
> CheckboxSystem: c69722ecac764861be52925fa50b4dcc
> Date: Wed Oc...

2 Scott:
How could it be: same program (here I mean kernel version), same hardware -
and different behaviour on different systems of Ubuntu. Don't you think that
it would be rather strange, if 2 things work here and not there - than it's
not because that things gone bad, and the reason is somewhere in
environment?
Nevertheless I think your idea is interesting - can anyone here that is
sitting on 9.10 with SSD test this workaround?

2009/12/10 Alan Pope <alan@popey.com>

> Thanks for the debugging advice Scott.
>
> The only issue for me is that when I install Karmic and it goes _very_
> bad I end up with a bricked machine (IO errors on SSD causing me to drop
> to initramfs promt). The only way to test other kernels is to dd zeros
> over the SSD and reinstall the OS again, then add in whatever kernel to
> test. It's a maddeningly time consuming task.
>
> --
> SSD stall during boot
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process.  It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading.  When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load.  If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded.  I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
>  **** List of PLAYBACK Hardware Devices ****
>  card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>   Subdevices: 1/1
>   Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
>  **** List of CAPTURE Hardware Devices ****
>  card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>   Subdevices: 1/1
>   Subdevice #0: subdevice #0
> AudioDevicesInUse:
>  USER        PID ACCESS COMMAND
>  /dev/snd/controlC0:  luke       1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
>  Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
>   Mixer name   : 'Realtek ALC268'
>   Components   : 'HDA:10ec0268,1025015b,00100101'
>   Controls      : 9
>   Simple ctrls  : 6
> CheckboxSubmission: 12ef539f3788bfbc46bc56b5c28128a6
> CheckboxSystem: c69722ecac764861be52925fa50b4dcc
> Date: Wed Oct  7 17:54:56 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=UUID=8d44b89b-2edb-4c02-a4be-94bd25b65081
> MachineType: Acer AOA110
> Package: linux-image-2.6.31-12-generic 2.6.31-12.40
> ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-12-generic
> root=UUID=039a096e-3486-4898-9eeb-44a705f8b7fd ro quiet splash elevator=noop
> usbcore.autosuspend=1
> ProcEnviron:
>  LANG=en_CA.UTF-8
>  SHELL=/bin/bash
> ProcVersionSignature: Ubuntu 2.6.31-12.40-generic
> RelatedPackageVersions: linux-firmware 1.21
> RfKill:
>  0: phy0: Wireless LAN
>        Soft blocked: no
>        Hard blocked: no
> SourcePackage: linux
> Tags:  ubuntu-unr
> Uname: Linux 2.6.31-12-generic i686
> XsessionErrors:
>  (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:
> assertion `src != NULL' failed
>  (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:
> assertion `src != NULL' failed
>  (nautilus:2092): Eel-CRITICAL **: eel_preferences_get_boolean: assertion
> `preferences_is_initialized ()' failed
>  (polkit-gnome-authentication-agent-1:2118): GLib-CRITICAL **:
> g_once_init_leave: assertion `initialization_value != 0' failed
>  (gnome-panel:2048): Gdk-WARNING **:
> /build/buildd/gtk+2.0-2.18.2/gdk/x11/gdkdrawable-x11.c:952 drawable is not a
> pixmap or window
> dmi.bios.date: 10/06/2008
> dmi.bios.vendor: Acer
> dmi.bios.version: v0.3309
> dmi.board.asset.tag: Base Board Asset Tag
> dmi.board.vendor: Acer
> dmi.board.version: Base Board Version
> dmi.chassis.type: 1
> dmi.chassis.vendor: Chassis Manufacturer
> dmi.chassis.version: Chassis Version
> dmi.modalias:
> dmi:bvnAcer:bvrv0.3309:bd10/06/2008:svnAcer:pnAOA110:pvr1:rvnAcer:rn:rvrBaseBoardVersion:cvnChassisManufacturer:ct1:cvrChassisVersion:
> dmi.product.name: AOA110
> dmi.product.version: 1
> dmi.sys.vendor: Acer
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/linux/+bug/445852/+subscribe
>

Revision history for this message

Raf (4283534-noduck) wrote on 2009-12-10: Re: SSD stall during boot

#87

I don't think that there is a bug in upstart. More likely the increased parallelism of upstart triggers a bug in the kernel, firmware, or hardware.

If it is a kernel bug, it has been in the kernel for several releases. I have tested with 2.6.29-rc3 (and most versions in between), that is the oldest kernel on http://kernel.ubuntu.com/~kernel-ppa/mainline/ that can still boot my ext4-karmic partition.

I am wondering if #61 is not on to something. However, my SDD is ATA, not SATA (so no NCQ anyway). I will research if the SDD can do TCQ, and if yes how to disable it. That would suggest a firmware/hardware bug.

Revision history for this message

Rick @ rickandpatty.com (rick-rickandpatty) wrote on 2009-12-10:

#88

@Raf #87

I've tested Karmic (ext2 filesystem) with the latest 2.6.28 kernel for Jaunty, with similar results as for the Karmic kernels. (Of course, the same kernel works just fine with Jaunty.)

Another issue with upstart being the cause of the problem - as #11 notes, this bug can be triggered in Karmic by running gparted at any time - even when booting up via the LiveCD instead of the SSD.

Revision history for this message

Andrew Simpson (andrew-simpson) wrote on 2009-12-10:

#89

O.K., Reviewing what we do know:

Scott has suggested providing system information, however this already seems to be listed in this bug report and in the upstream bug report.

He has also suggested trying different kernels to isolate when the problem started. This is what we have been doing, HOWEVER comments #75, #76 & #87 have now all shown that something in 'Karmic' is the problem - and not directly the kernel version. Have we been chasing the wrong problem?

We can rule out ext4 and NCQ from comments #79, #87 and #88.

While I would agree with Scott that 'userspace applications' shouldn't affect the kernel, it does appear a 'userspace application' is affecting the kernel.

I can confirm that Mandriva 2010.0 (after several weeks of use & lots of checking) does not have this problem.

My own experience and comment #80 confirms that Fedora 12 does have the problem. I have searched the Fedora Bugzilla and can't see a bug report there. It would be good to file a bug there. One of the Fedora kernel devs has been commenting on the upstream bug report.

What is different about Mandriva 2010.0 compared to Karmic and Fedora 12? Anything of note other than upstart?

Revision history for this message

theluketaylor (ekul-taylor) wrote on 2009-12-10:

#90

if a userspace application is able to trigger this issue I would say this is a kernel bug since that is exactly the sort of thing the kernel is supposed to be managing. No userspace application regardless of how poorly written should be able to trigger a drop from UDMA to PIO.

I have tried jaunty with a 2.6.32 kernel and it works fine. I was going to try installing karmic and upgrading it to 2.6.32 but I wasn't even able to complete the installer. It failed to mount the ext4 / partition due to numerous HSM faults

Revision history for this message

Andrew Squire (andrewsquire) wrote on 2009-12-10:

#91

Just a consideration it might be worth making when we're testing the different distros / kernals. Once I had experienced the problem, I could not reliably get rid of it without doing a dd -if /dev/zero to both disks. In #89 (Andrew Simpson) and #80 (Stephen O), when the problem was exhibited in Fedora 12, had the disks been zeroed in between?

Revision history for this message

theluketaylor (ekul-taylor) wrote on 2009-12-10:

#92

I tried bootstrapping karmic from a jaunty livecd on a zeroed ssd. I installed the 2.6.32 kernel from http://kernel.ubuntu.com/~kernel-ppa/mainline/ that I have run from jaunty without any trouble.
I wanted to see if I could trigger the bug when karmic wasn't involved in the creation or population of the filesystem. Since once the bug happens the disk has to be zeroed I started small, just booting into single user mode. Even this triggered the bug. The only processes that had been spawned were init, upstart, dhcp and the shell I was using

Revision history for this message

Johan Van den Neste (jvdneste) wrote on 2009-12-10:

#93

I can't try this right now, but would it be possible to run gparted with lots of debugging output to see exactly what it is doing when the bug is triggered? If this is not possible, maybe we could ask someone who knows the gparted codebase to help us out?

Revision history for this message

Юрий Аполлов (apollovy) wrote on 2009-12-11: Re: [Bug 445852] Re: SSD stall during boot

#94

Download full text (4.9 KiB)

Such an interesting thing: if only I launch parted from liveusb - I
immediately got our errors. Got same using cfdisk. And - it may be
interesting and important - it happens when I just run parted, when I exit
parted - and when I run/exit cfdisk too.
BUT!
Once I've removed libparted and all packets that depend upon it - and the
noise has gone away! No more blinking of SSD-indicator. WOW!I still got DMA
though. Interesting, I guess. Can anyone approve my find?

2009/12/10 Johan Van den Neste <email address hidden>

> I can't try this right now, but would it be possible to run gparted with
> lots of debugging output to see exactly what it is doing when the bug is
> triggered? If this is not possible, maybe we could ask someone who knows
> the gparted codebase to help us out?
>
> --
> SSD stall during boot
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading. When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load. If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded. I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: luke 1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
> Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
> Mixer name : 'Realtek ALC268'
> Components : 'HDA:10ec0268,1025015b,00100101'
> Controls : 9
> Simple ctrls : 6
> CheckboxSubmission: 12ef539f3788bfbc46bc56b5c28128a6
> CheckboxSystem: c69722ecac764861be52925fa50b4dcc
> Date: Wed Oct 7 17:54:56 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=UUI...

Such an interesting thing: if only I launch parted from liveusb - I
immediately got our errors. Got same using cfdisk. And - it may be
interesting and important - it happens when I just run parted, when I exit
parted - and when I run/exit cfdisk too.
BUT!
Once I've removed libparted and all packets that depend upon it - and the
noise has gone away! No more blinking of SSD-indicator. WOW!I still got DMA
though. Interesting, I guess. Can anyone approve my find?

2009/12/10 Johan Van den Neste <jvdneste@gmail.com>

> I can't try this right now, but would it be possible to run gparted with
> lots of debugging output to see exactly what it is doing when the bug is
> triggered? If this is not possible, maybe we could ask someone who knows
> the gparted codebase to help us out?
>
> --
> SSD stall during boot
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process.  It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading.  When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load.  If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded.  I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
>  **** List of PLAYBACK Hardware Devices ****
>  card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>   Subdevices: 1/1
>   Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
>  **** List of CAPTURE Hardware Devices ****
>  card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>   Subdevices: 1/1
>   Subdevice #0: subdevice #0
> AudioDevicesInUse:
>  USER        PID ACCESS COMMAND
>  /dev/snd/controlC0:  luke       1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
>  Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
>   Mixer name   : 'Realtek ALC268'
>   Components   : 'HDA:10ec0268,1025015b,00100101'
>   Controls      : 9
>   Simple ctrls  : 6
> CheckboxSubmission: 12ef539f3788bfbc46bc56b5c28128a6
> CheckboxSystem: c69722ecac764861be52925fa50b4dcc
> Date: Wed Oct  7 17:54:56 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=UUID=8d44b89b-2edb-4c02-a4be-94bd25b65081
> MachineType: Acer AOA110
> Package: linux-image-2.6.31-12-generic 2.6.31-12.40
> ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-12-generic
> root=UUID=039a096e-3486-4898-9eeb-44a705f8b7fd ro quiet splash elevator=noop
> usbcore.autosuspend=1
> ProcEnviron:
>  LANG=en_CA.UTF-8
>  SHELL=/bin/bash
> ProcVersionSignature: Ubuntu 2.6.31-12.40-generic
> RelatedPackageVersions: linux-firmware 1.21
> RfKill:
>  0: phy0: Wireless LAN
>        Soft blocked: no
>        Hard blocked: no
> SourcePackage: linux
> Tags:  ubuntu-unr
> Uname: Linux 2.6.31-12-generic i686
> XsessionErrors:
>  (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:
> assertion `src != NULL' failed
>  (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:
> assertion `src != NULL' failed
>  (nautilus:2092): Eel-CRITICAL **: eel_preferences_get_boolean: assertion
> `preferences_is_initialized ()' failed
>  (polkit-gnome-authentication-agent-1:2118): GLib-CRITICAL **:
> g_once_init_leave: assertion `initialization_value != 0' failed
>  (gnome-panel:2048): Gdk-WARNING **:
> /build/buildd/gtk+2.0-2.18.2/gdk/x11/gdkdrawable-x11.c:952 drawable is not a
> pixmap or window
> dmi.bios.date: 10/06/2008
> dmi.bios.vendor: Acer
> dmi.bios.version: v0.3309
> dmi.board.asset.tag: Base Board Asset Tag
> dmi.board.vendor: Acer
> dmi.board.version: Base Board Version
> dmi.chassis.type: 1
> dmi.chassis.vendor: Chassis Manufacturer
> dmi.chassis.version: Chassis Version
> dmi.modalias:
> dmi:bvnAcer:bvrv0.3309:bd10/06/2008:svnAcer:pnAOA110:pvr1:rvnAcer:rn:rvrBaseBoardVersion:cvnChassisManufacturer:ct1:cvrChassisVersion:
> dmi.product.name: AOA110
> dmi.product.version: 1
> dmi.sys.vendor: Acer
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/linux/+bug/445852/+subscribe
>

Revision history for this message

rogmorri (frontporsche) wrote on 2009-12-11: Re: SSD stall during boot

#95

It seems that if I do this as root....

  int k = open("/dev/sda", O_WRONLY|O_LARGEFILE);
  ioctl(k,BLKFLSBUF,0);
  close(k);

...then 15~30 seconds later I see this on the console ...

[33458.988232] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[33458.994551] ata2.00: BMDMA stat 0x4
[33459.001646] ata2.00: cmd ca/00:08:ef:00:40/00:00:00:00:00/e0 tag 0 dma 4096 out
[33459.001657] res 58/00:08:ef:00:40/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
[33459.014157] ata2.00: status: { DRDY DRQ }

but dmesg shows a bit more....

[33458.988145] ata2: lost interrupt (Status 0x58)
[33458.988232] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[33458.994551] ata2.00: BMDMA stat 0x4
[33459.001646] ata2.00: cmd ca/00:08:ef:00:40/00:00:00:00:00/e0 tag 0 dma 4096 out
[33459.001657] res 58/00:08:ef:00:40/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
[33459.014157] ata2.00: status: { DRDY DRQ }
[33459.021056] ata2: soft resetting link
[33459.228490] ata2.00: configured for UDMA/66
[33459.228528] ata2: EH complete

I gleaned those 3 lines of code from doing "strace parted"

Revision history for this message

rogmorri (frontporsche) wrote on 2009-12-11:

#96

I should have mentioned that this could be related to O_LARGEFILE, which I had to define manually...

#define O_LARGEFILE 0100000

Revision history for this message

rogmorri (frontporsche) wrote on 2009-12-11:

#97

Oops, ignore that.
It seem that this is all I need to cause the error is just this...

int k = open("/dev/sda", O_WRONLY);
close(k);

BTW, my root file system is mounted on sda1 while I'm doing this.

(If I open with O_RDONLY instead, then there is no error)

Revision history for this message

Raf (4283534-noduck) wrote on 2009-12-11:

#98

#97 also repeatably triggers the HSM violation for me. But only under Karmic (not Jaunty).

Revision history for this message

Raf (4283534-noduck) wrote on 2009-12-11:

#99

But be warned, this has lead to disk corruption for me! I never had disk corruptions before...

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-12-11: Re: [Bug 445852] Re: SSD stall during boot

#100

On Fri, 2009-12-11 at 01:07 +0000, rogmorri wrote:

> Oops, ignore that.
> It seem that this is all I need to cause the error is just this...
>
> int k = open("/dev/sda", O_WRONLY);
> close(k);
>
> BTW, my root file system is mounted on sda1 while I'm doing this.
>
That actually has quite a few side-effects that you might not
realise ;-)

Try this command (as root)

echo change > /sys/block/sda/uevent

Does that cause the same errors?

Scott
--
Scott James Remnant
<email address hidden>

Revision history for this message

Stephen O (soglesby1) wrote on 2009-12-11: Re: SSD stall during boot

#101

I had not zeroed my drive prior to installing Fedora 12 so I attempted to follow Andrew's suggestion (#91). Unfortunately every time I try dd if=/dev/zero of=/dev/sda (which is definitely my 32GB SSD) I get the following error:
dd: writing to '/dev/sda': Input/output error
676065+0 records in
676064+0 records out
346144768 byte (346 MB) copied, 82.0451 s, 4.2 MB/s

It's always 346MB, never more nor less. Has my drive fallen victim to this bug and been damaged in some way?

Revision history for this message

Юрий Аполлов (apollovy) wrote on 2009-12-11: Re: [Bug 445852] Re: SSD stall during boot

#102

Download full text (4.8 KiB)

Stephen, try badblocks -wvs /dev/sda to make check your drive in read-write
mode

2009/12/11 Stephen O <email address hidden>

> I had not zeroed my drive prior to installing Fedora 12 so I attempted to
> follow Andrew's suggestion (#91). Unfortunately every time I try dd
> if=/dev/zero of=/dev/sda (which is definitely my 32GB SSD) I get the
> following error:
> dd: writing to '/dev/sda': Input/output error
> 676065+0 records in
> 676064+0 records out
> 346144768 byte (346 MB) copied, 82.0451 s, 4.2 MB/s
>
> It's always 346MB, never more nor less. Has my drive fallen victim to
> this bug and been damaged in some way?
>
> --
> SSD stall during boot
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading. When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load. If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded. I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: luke 1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
> Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
> Mixer name : 'Realtek ALC268'
> Components : 'HDA:10ec0268,1025015b,00100101'
> Controls : 9
> Simple ctrls : 6
> CheckboxSubmission: 12ef539f3788bfbc46bc56b5c28128a6
> CheckboxSystem: c69722ecac764861be52925fa50b4dcc
> Date: Wed Oct 7 17:54:56 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=UUID=8d44b89b-2edb-4c02-a4be-94bd25b65081
> MachineType: Acer AOA110
> Package: linux-image-2.6.31-12-generic 2.6.31-12.40
> ProcCmdLine: BOOT_IMAGE=/boot/vmlinu...

Stephen, try badblocks -wvs /dev/sda to make check your drive in read-write
mode

2009/12/11 Stephen O <soglesby1@yahoo.com>

> I had not zeroed my drive prior to installing Fedora 12 so I attempted to
> follow Andrew's suggestion (#91). Unfortunately every time I try dd
> if=/dev/zero of=/dev/sda (which is definitely my 32GB SSD) I get the
> following error:
> dd: writing to '/dev/sda': Input/output error
> 676065+0 records in
> 676064+0 records out
> 346144768 byte (346 MB) copied, 82.0451 s, 4.2 MB/s
>
> It's always 346MB, never more nor less. Has my drive fallen victim to
> this bug and been damaged in some way?
>
> --
> SSD stall during boot
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process.  It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading.  When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load.  If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded.  I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
>  **** List of PLAYBACK Hardware Devices ****
>  card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>   Subdevices: 1/1
>   Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
>  **** List of CAPTURE Hardware Devices ****
>  card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>   Subdevices: 1/1
>   Subdevice #0: subdevice #0
> AudioDevicesInUse:
>  USER        PID ACCESS COMMAND
>  /dev/snd/controlC0:  luke       1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
>  Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
>   Mixer name   : 'Realtek ALC268'
>   Components   : 'HDA:10ec0268,1025015b,00100101'
>   Controls      : 9
>   Simple ctrls  : 6
> CheckboxSubmission: 12ef539f3788bfbc46bc56b5c28128a6
> CheckboxSystem: c69722ecac764861be52925fa50b4dcc
> Date: Wed Oct  7 17:54:56 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=UUID=8d44b89b-2edb-4c02-a4be-94bd25b65081
> MachineType: Acer AOA110
> Package: linux-image-2.6.31-12-generic 2.6.31-12.40
> ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-12-generic
> root=UUID=039a096e-3486-4898-9eeb-44a705f8b7fd ro quiet splash elevator=noop
> usbcore.autosuspend=1
> ProcEnviron:
>  LANG=en_CA.UTF-8
>  SHELL=/bin/bash
> ProcVersionSignature: Ubuntu 2.6.31-12.40-generic
> RelatedPackageVersions: linux-firmware 1.21
> RfKill:
>  0: phy0: Wireless LAN
>        Soft blocked: no
>        Hard blocked: no
> SourcePackage: linux
> Tags:  ubuntu-unr
> Uname: Linux 2.6.31-12-generic i686
> XsessionErrors:
>  (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:
> assertion `src != NULL' failed
>  (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:
> assertion `src != NULL' failed
>  (nautilus:2092): Eel-CRITICAL **: eel_preferences_get_boolean: assertion
> `preferences_is_initialized ()' failed
>  (polkit-gnome-authentication-agent-1:2118): GLib-CRITICAL **:
> g_once_init_leave: assertion `initialization_value != 0' failed
>  (gnome-panel:2048): Gdk-WARNING **:
> /build/buildd/gtk+2.0-2.18.2/gdk/x11/gdkdrawable-x11.c:952 drawable is not a
> pixmap or window
> dmi.bios.date: 10/06/2008
> dmi.bios.vendor: Acer
> dmi.bios.version: v0.3309
> dmi.board.asset.tag: Base Board Asset Tag
> dmi.board.vendor: Acer
> dmi.board.version: Base Board Version
> dmi.chassis.type: 1
> dmi.chassis.vendor: Chassis Manufacturer
> dmi.chassis.version: Chassis Version
> dmi.modalias:
> dmi:bvnAcer:bvrv0.3309:bd10/06/2008:svnAcer:pnAOA110:pvr1:rvnAcer:rn:rvrBaseBoardVersion:cvnChassisManufacturer:ct1:cvrChassisVersion:
> dmi.product.name: AOA110
> dmi.product.version: 1
> dmi.sys.vendor: Acer
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/linux/+bug/445852/+subscribe
>

Revision history for this message

Andrew Squire (andrewsquire) wrote on 2009-12-11: Re: SSD stall during boot

#103

Stephen O - I also had this error unless I set the block size manually. Try bs=1M.

Revision history for this message

rogmorri (frontporsche) wrote on 2009-12-11:

#104

@Scott,

>> int k = open("/dev/sda", O_WRONLY);
>> close(k);
>That actually has quite a few side-effects that you might not
>realise ;-)

It might be good then to warn people not to run "parted --list" to view partition tables. That seems to open all your devices with O_WRONLY.

Revision history for this message

Raf (4283534-noduck) wrote on 2009-12-11:

#105

Scott,

Yes, "echo change > /sys/block/sda/uevent" results in the same error. After entering that command, it takes about 20 to 30 seconds until the error shows up.

Any idea why does works fine under Jaunty and not Karmic?

Raf.

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-12-14: Re: [Bug 445852] Re: SSD stall during boot

#106

On Fri, 2009-12-11 at 17:10 +0000, Raf wrote:

> Yes, "echo change > /sys/block/sda/uevent" results in the same error.
> After entering that command, it takes about 20 to 30 seconds until the
> error shows up.
>
Ok, excellent.

So what's happening is that one of the commands being run to probe the
disk is causing the error. Let's figure out which one!

Run the following command:

sudo udevadm test /block/sda 2>&1 | grep "^util_run_program:.*started"

This will output a bunch of program names. First wait the 20-30s, to
see whether you get the error. It's possible that you will not with
this (which is interesting in of itself, so please let me know if that
happens).

If you do get the error, note down the commands and then we'll want to
run each one in turn to see which one gives the error. (You'll need to
run them all with sudo or as root).

There's probably about 6-8 of them.

Let me know which one(s) cause the error (if any).

Scott
--
Scott James Remnant
<email address hidden>

Revision history for this message

Raf (4283534-noduck) wrote on 2009-12-14: Re: SSD stall during boot

#107

Scott,

# udevadm test /block/sda 2>&1 | grep "^util_run_program:.*started"
util_run_program: 'ata_id --export /dev/sda' started
util_run_program: 'scsi_id --whitelisted --replace-whitespace -p0x80 -d/dev/sda' started
util_run_program: 'path_id /devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sda' started
util_run_program: '/sbin/blkid -o udev -p /dev/sda' started
util_run_program: 'edd_id --export /dev/sda' started
util_run_program: 'devkit-disks-part-id /dev/sda' started
util_run_program: 'devkit-disks-probe-ata-smart /dev/sda' started

This did trigger the HSM violation.

Testing more, I think that devkit-disks-probe-ata-smart is the one triggering the violation (devkit is new in Karmic?). However, it doesn't always show.

I think it might be related to the delay between the different commands. If the delay is too big it doesn't always show . But if the delay is too small, we cannot be sure which one triggered the command.

The delay between running devkit-disks-probe-ata-smart and the actual violation being logged is not constant. Maybe devkit-disks-probe-ata-smart sets things up, but the actual violation only triggers after some other activity occurs (e.g. simple disk access).

I will try again to isolate it.

Raf.

Revision history for this message

Raf (4283534-noduck) wrote on 2009-12-14:

#108

I am doing this (on an otherwise idle UNR):

# sleep 120; logger devkit-disks-probe-ata-smart; /lib/udev/devkit-disks-probe-ata-smart /dev/sda; sleep 120; logger done

And I get this in syslog (repeatably):

Dec 14 11:12:01 unus logger: devkit-disks-probe-ata-smart
Dec 14 11:12:35 unus kernel: [ 7734.000130] ata2: lost interrupt (Status 0x58)
Dec 14 11:12:35 unus kernel: [ 7734.000217] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 14 11:12:35 unus kernel: [ 7734.000232] ata2.00: BMDMA stat 0x4
Dec 14 11:12:35 unus kernel: [ 7734.000264] ata2.00: cmd ca/00:08:08:8b:54/00:00:00:00:00/e3 tag 0 dma 4096 out
Dec 14 11:12:35 unus kernel: [ 7734.000270] res 58/00:08:08:8b:54/00:00:00:00:00/e3 Emask 0x2 (HSM violation)
Dec 14 11:12:35 unus kernel: [ 7734.000284] ata2.00: status: { DRDY DRQ }
Dec 14 11:12:35 unus kernel: [ 7734.000343] ata2: soft resetting link
Dec 14 11:12:35 unus kernel: [ 7734.208576] ata2.00: configured for UDMA/66
Dec 14 11:12:35 unus kernel: [ 7734.208618] ata2: EH complete
Dec 14 11:14:01 unus logger: done

Also note that I stay in UDMA (not PIO like some other posters). Except for the delays, the disk is quite usable.

Revision history for this message

Raf (4283534-noduck) wrote on 2009-12-14:

#109

I disabled /lib/udev/rules.d/95-devkit-disks.rules (using dpkg-divert). And now I can boot without HSM violations. But I believe this is only a workaround.

Note that the devkit-disks-probe-ata-smart tests did again result in filesystem corruption.

Raf.

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-12-14: Re: [Bug 445852] Re: SSD stall during boot

#110

On Mon, 2009-12-14 at 16:27 +0000, Raf wrote:

> I disabled /lib/udev/rules.d/95-devkit-disks.rules (using dpkg-divert).
> And now I can boot without HSM violations. But I believe this is only a
> workaround.
>
> Note that the devkit-disks-probe-ata-smart tests did again result in
> filesystem corruption.
>
From information received here, and information on the kernel bug, I
really think that the cause *is* the SMART commands.

Scott
--
Scott James Remnant
<email address hidden>

affects:	linux (Ubuntu) → libatasmart (Ubuntu)
summary:	- SSD stall during boot + devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential + hardware death
Changed in devicekit-disks (Ubuntu):
status:	New → Triaged
Changed in libatasmart (Ubuntu):
status:	Confirmed → Triaged
importance:	Undecided → High
importance:	High → Critical
Changed in devicekit-disks (Ubuntu):
importance:	Undecided → High
importance:	High → Critical

Revision history for this message

Raf (4283534-noduck) wrote on 2009-12-14:

#111

I don't know if it is helpful to anybody, but I have attached the strace for /lib/udev/devkit-disks-probe-ata-smart /dev/sda. It does two SG_IO ioctls against the device. smartctl -a /dev/sda does not trigger the HSM violation.

Revision history for this message

Raf (4283534-noduck) wrote on 2009-12-14:

#112

strace -fotrace /lib/udev/devkit-disks-probe-ata-smart /dev/sda Edit (8.4 KiB, text/plain)

Revision history for this message

Gav Mack (gavinmac) wrote on 2009-12-14:

#113

@Raf: Could you explain how to disable /lib/udev/rules.d/95-devkit-disks.rules (using dpkg-divert) so I can apply the workaround - the corruption is finally starting to hit my setup over the past couple of days.

One good thing has came out of a good Windows tech but linux n00b fumbling about not really knowing what he was doing - I got Scott involved :-)

Revision history for this message

Andrew Simpson (andrew-simpson) wrote on 2009-12-15:

#114

@Gav Mack
This is probably a fairly crude workaround, but it works for me. I just disabled the ata-smart disk probe in the udev rules:

In the file /lib/udev/rules.d/95-devkit-disks.rules look for these lines (Lines 73 & 74):

# ATA disks driven by libata
KERNEL=="sd*[!0-9]", ATTR{removable}=="0", ENV{ID_BUS}=="ata", ENV{DEVTYPE}=="disk", IMPORT{program}="devkit-disks-probe-ata-smart $tempnode"

Add a '#' in front to make the rule line a comment, like this:

# ATA disks driven by libata
#KERNEL=="sd*[!0-9]", ATTR{removable}=="0", ENV{ID_BUS}=="ata", ENV{DEVTYPE}=="disk", IMPORT{program}="devkit-disks-probe-ata-smart $tempnode"

Save the file.

To make sure it's reloaded do these commands:

sudo service udev stop
sudo service udev start

Test with gparted... and notice the difference.

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-12-15: Re: [Bug 445852] Re: devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential hardware death

#115

On Mon, 2009-12-14 at 20:26 +0000, Raf wrote:

> I don't know if it is helpful to anybody, but I have attached the strace
> for /lib/udev/devkit-disks-probe-ata-smart /dev/sda. It does two SG_IO
> ioctls against the device. smartctl -a /dev/sda does not trigger the HSM
> violation.
>
Could you provide the equivalent strace for smartctl as well, for
comparison?

Scott
--
Scott James Remnant
<email address hidden>

Revision history for this message

rogmorri (frontporsche) wrote on 2009-12-15:

#116

> #KERNEL=="sd*[!0-9]", ATTR{removable}=="0", ENV{ID_BUS}=="ata", ENV{DEVTYPE}=="disk", IMPORT{program}="devkit-disks-probe-ata-smart $tempnode"

Nice workaround. This brought my poweron-to-desktop time from 2:35 down to 0:52. :)

Revision history for this message

Tommy Trussell (tommy-trussell) wrote on 2009-12-15:

#117

Confirming this behavior on an ASUS EeePC 900 with an upgraded SSD: Patriot Lite SSD, 32GB model PL32GPEPCSSDR

I just ran a command like @Raf described in comment #108. When /dev/sda is umounted, no response except "DKD_ATA_SMART_IS_AVAILABLE=1" in the console, but next mount takes a long time to complete and any activity on /dev/sda generates the same syslog errors as described there. The drive seems to function normally until the next /lib/udev/devkit-disks-probe-ata-smart command when it hangs and generates syslog events again.

I have been commenting on my reported Bug 430333 but I think I will declare it to be a duplicate of this one, and send the other Patriot Lite SSD user(s) over here.

Revision history for this message

shadowblast101 (shadowblast101) wrote on 2009-12-15:

#118

I followed Tommy over, and can confirm his confirmation. I have pretty much the same setup, EEE900, PL32GPEPCSSDR, but with Arch instead of Ubuntu, and the behavior is still there. Mine had a few other quirks too, such as hiding my mouse until I switch to a tty and back.

After applying the workaround, everything seems to be good.

Revision history for this message

Raf (4283534-noduck) wrote on 2009-12-16:

#119

I did some more tests with devkit-disks-probe-ata-smart. If I boot from USB flash, devkit-disks-probe-ata-smart on the SSD does not trigger any log entry until I try to write to the disk. If I then try to write (to the swap partition, as not the corrupt any fs) to the disk with dd, dd hangs until the error is generated, which occurs 30 seconds later (probably a timeout in the kernel).

Revision history for this message

Raf (4283534-noduck) wrote on 2009-12-16:

#120

smartctl-output Edit (2.5 KiB, text/plain)

Output from smartctl -a.

Revision history for this message

Raf (4283534-noduck) wrote on 2009-12-16:

#121

smartctl-trace Edit (12.6 KiB, text/plain)

strace from smartctl -a.

Revision history for this message

In freedesktop.org Bugzilla #25673, Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-12-16:

#122

We have many reports of the libatasmart code causing stalls, HSM Violations and even death of SSDs. Particularly SuperTalent ones, but also those found in my netbooks.

# sleep 120; logger devkit-disks-probe-ata-smart; /lib/udev/devkit-disks-probe-ata-smart /dev/sda; sleep 120; logger done

And I get this in syslog (repeatably):

Dec 14 11:12:01 unus logger: devkit-disks-probe-ata-smart
Dec 14 11:12:35 unus kernel: [ 7734.000130] ata2: lost interrupt (Status 0x58)
Dec 14 11:12:35 unus kernel: [ 7734.000217] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 14 11:12:35 unus kernel: [ 7734.000232] ata2.00: BMDMA stat 0x4
Dec 14 11:12:35 unus kernel: [ 7734.000264] ata2.00: cmd ca/00:08:08:8b:54/00:00:00:00:00/e3 tag 0 dma 4096 out
Dec 14 11:12:35 unus kernel: [ 7734.000270] res 58/00:08:08:8b:54/00:00:00:00:00/e3 Emask 0x2 (HSM violation)
Dec 14 11:12:35 unus kernel: [ 7734.000284] ata2.00: status: { DRDY DRQ }
Dec 14 11:12:35 unus kernel: [ 7734.000343] ata2: soft resetting link
Dec 14 11:12:35 unus kernel: [ 7734.208576] ata2.00: configured for UDMA/66
Dec 14 11:12:35 unus kernel: [ 7734.208618] ata2: EH complete
Dec 14 11:14:01 unus logger: done

The problem has also been confirmed in Fedora 12.

Revision history for this message

In freedesktop.org Bugzilla #25673, Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-12-16:

#123

Kernel bugzilla bug for the same issue (URL above is the Launchpad bug)

http://bugzilla.kernel.org/show_bug.cgi?id=14583

Revision history for this message

In freedesktop.org Bugzilla #25673, Lennart-poettering (lennart-poettering) wrote on 2009-12-16:

#124

Hmm, lacking access to the hw in question I am not sure what I can do about this.

What surprises me a bit is that this only appeared so very recently. Is this triggered by some interplay with some specific kernel version?

Scott James Remnant (Canonical) (canonical-scott) on 2009-12-16

description:

updated

Revision history for this message

In freedesktop.org Bugzilla #25673, Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-12-16:

#125

Most distros only switched to using your code recently; previously we've all been using smartmontools and the like which don't cause this problem.

The LP bug has the differencing straces between the two if that's helpful?

Revision history for this message

Gav Mack (gavinmac) wrote on 2009-12-16:

#126

@Andrew Simpson: Many thanks for the instructions. The workaround has dropped my boot time from always over 2 minutes to 30 seconds, what I was expecting back in late September when I installed the Beta with the Super Talent SSD! Almost 3 months of woe now at an end thank goodness. Recreated my user account from scratch because after further investigation my other half thought it was a good idea to delete the timed out applets including window-picker so I couldn't put them back again!

Revision history for this message

Tommy Trussell (tommy-trussell) wrote on 2009-12-16:

#127

We have seen the corruption survive a basic filesystem initialization, so once your drive has been corrupted you may need to write zeroes to it to eliminate the bad blocks before you can create a clean filesystem again. You can verify the drive using badblocks when it is not mounted. This procedure works for the flash SSDs such as the Patriot Lite.

For example, to write zeroes to /dev/sda:

# dd if=/dev/zero of=/dev/sda bs=1M

To do a read-only test of /dev/sda:

# badblocks -s /dev/sda

Revision history for this message

lotus49 (lotus-49) wrote on 2009-12-16: Re: [Bug 445852] Re: devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential hardware death

#128

Download full text (5.8 KiB)

The workaround did the trick for me too. I hadn't suffered any
corruption but I was comsidering going back to Jaunty and I am pleased
not to have to.

Simon

Sent from My iPhone

On 16 Dec 2009, at 14:28, Gav Mack <email address hidden> wrote:

> @Andrew Simpson: Many thanks for the instructions. The workaround has
> dropped my boot time from always over 2 minutes to 30 seconds, what I
> was expecting back in late September when I installed the Beta with
> the
> Super Talent SSD! Almost 3 months of woe now at an end thank
> goodness.
> Recreated my user account from scratch because after further
> investigation my other half thought it was a good idea to delete the
> timed out applets including window-picker so I couldn't put them back
> again!
>
> --
> devkit-disks-probe-ata-smart causes HSM Violations on SSD, and
> potential hardware death
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “devicekit-disks” package in Ubuntu: Triaged
> Status in “libatasmart” package in Ubuntu: Triaged
>
> Bug description:
> TEMPORARY WORK AROUND FOR THIS PROBLEM:
>
> 1. sudo gedit /lib/udev/rules.d/95-devkit-disks.rules
>
> 2. locate the following lines (about 1/3 the way into the file;
> search for "smart")
>
> # ATA disks driven by libata
> KERNEL=="sd*[!0-9]", ATTR{removable}=="0", ENV{ID_BUS}=="ata", ENV
> {DEVTYPE}=="disk", IMPORT{program}="devkit-disks-probe-ata-smart
> $tempnode"
>
> 3. comment out the second line by adding a # in front, so you should
> have
>
> # ATA disks driven by libata
> #KERNEL=="sd*[!0-9]", ATTR{removable}=="0", ENV{ID_BUS}=="ata", ENV
> {DEVTYPE}=="disk", IMPORT{program}="devkit-disks-probe-ata-smart
> $tempnode"
>
> 4. save the file and reboot
>
>
> BUG DESCRIPTION FOLLOWS:
>
> In the Karmic beta I experience ssd stalls during the boot process.
> It happens almost everytime before xsplash loads and happens again
> frequently between logging into gdm and the desktop loading. When
> it happens during login I think it is making gnome time out on
> loading panel items as I get errors related to lots of panel items
> failing to load. If I log out and back in again when the ssd isn't
> stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in
> dmesg when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't
> think it has happened once the system is fully loaded. I am running
> karmic unr on an Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Ana...

The workaround did the trick for me too.  I hadn't suffered any  
corruption but I was comsidering going back to Jaunty and I am pleased  
not to have to.

Simon

Sent from My iPhone

On 16 Dec 2009, at 14:28, Gav Mack <gavinmac@hotmail.com> wrote:

> @Andrew Simpson:  Many thanks for the instructions. The workaround has
> dropped my boot time from always over 2 minutes to 30 seconds, what I
> was expecting back in late September when I installed the Beta with  
> the
> Super Talent SSD!  Almost 3 months of woe now at an end thank  
> goodness.
> Recreated my user account from scratch because after further
> investigation my other half thought it was a good idea to delete the
> timed out applets including window-picker so I couldn't put them back
> again!
>
> -- 
> devkit-disks-probe-ata-smart causes HSM Violations on SSD, and  
> potential hardware death
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “devicekit-disks” package in Ubuntu: Triaged
> Status in “libatasmart” package in Ubuntu: Triaged
>
> Bug description:
> TEMPORARY WORK AROUND FOR THIS PROBLEM:
>
> 1. sudo gedit /lib/udev/rules.d/95-devkit-disks.rules
>
> 2. locate the following lines (about 1/3 the way into the file;  
> search for "smart")
>
> # ATA disks driven by libata
> KERNEL=="sd*[!0-9]", ATTR{removable}=="0", ENV{ID_BUS}=="ata", ENV 
> {DEVTYPE}=="disk", IMPORT{program}="devkit-disks-probe-ata-smart  
> $tempnode"
>
> 3. comment out the second line by adding a # in front, so you should  
> have
>
> # ATA disks driven by libata
> #KERNEL=="sd*[!0-9]", ATTR{removable}=="0", ENV{ID_BUS}=="ata", ENV 
> {DEVTYPE}=="disk", IMPORT{program}="devkit-disks-probe-ata-smart  
> $tempnode"
>
> 4. save the file and reboot
>
>
> BUG DESCRIPTION FOLLOWS:
>
> In the Karmic beta I experience ssd stalls during the boot process.   
> It happens almost everytime before xsplash loads and happens again  
> frequently between logging into gdm and the desktop loading.  When  
> it happens during login I think it is making gnome time out on  
> loading panel items as I get errors related to lots of panel items  
> failing to load.  If I log out and back in again when the ssd isn't  
> stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in  
> dmesg when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't  
> think it has happened once the system is fully loaded.  I am running  
> karmic unr on an Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
>  **** List of PLAYBACK Hardware Devices ****
>  card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>    Subdevices: 1/1
>    Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
>  **** List of CAPTURE Hardware Devices ****
>  card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>    Subdevices: 1/1
>    Subdevice #0: subdevice #0
> AudioDevicesInUse:
>  USER        PID ACCESS COMMAND
>  /dev/snd/controlC0:  luke       1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
>  Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
>    Mixer name    : 'Realtek ALC268'
>    Components    : 'HDA:10ec0268,1025015b,00100101'
>    Controls      : 9
>    Simple ctrls  : 6
> CheckboxSubmission: 12ef539f3788bfbc46bc56b5c28128a6
> CheckboxSystem: c69722ecac764861be52925fa50b4dcc
> Date: Wed Oct  7 17:54:56 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=UUID=8d44b89b-2edb-4c02-a4be-94bd25b65081
> MachineType: Acer AOA110
> Package: linux-image-2.6.31-12-generic 2.6.31-12.40
> ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-12-generic  
> root=UUID=039a096e-3486-4898-9eeb-44a705f8b7fd ro quiet splash  
> elevator=noop usbcore.autosuspend=1
> ProcEnviron:
>  LANG=en_CA.UTF-8
>  SHELL=/bin/bash
> ProcVersionSignature: Ubuntu 2.6.31-12.40-generic
> RelatedPackageVersions: linux-firmware 1.21
> RfKill:
>  0: phy0: Wireless LAN
>   Soft blocked: no
>   Hard blocked: no
> SourcePackage: linux
> Tags:  ubuntu-unr
> Uname: Linux 2.6.31-12-generic i686
> XsessionErrors:
>  (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:  
> assertion `src != NULL' failed
>  (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:  
> assertion `src != NULL' failed
>  (nautilus:2092): Eel-CRITICAL **: eel_preferences_get_boolean:  
> assertion `preferences_is_initialized ()' failed
>  (polkit-gnome-authentication-agent-1:2118): GLib-CRITICAL **:  
> g_once_init_leave: assertion `initialization_value != 0' failed
>  (gnome-panel:2048): Gdk-WARNING **: /build/buildd/gtk+2.0-2.18.2/ 
> gdk/x11/gdkdrawable-x11.c:952 drawable is not a pixmap or window
> dmi.bios.date: 10/06/2008
> dmi.bios.vendor: Acer
> dmi.bios.version: v0.3309
> dmi.board.asset.tag: Base Board Asset Tag
> dmi.board.vendor: Acer
> dmi.board.version: Base Board Version
> dmi.chassis.type: 1
> dmi.chassis.vendor: Chassis Manufacturer
> dmi.chassis.version: Chassis Version
> dmi.modalias:  
> dmi:bvnAcer:bvrv0.3309: 
> bd10/ 
> 06/ 
> 2008: 
> svnAcer:pnAOA110: 
> pvr1: 
> rvnAcer:rn:rvrBaseBoardVersion:cvnChassisManufacturer:ct1: 
> cvrChassisVersion:
> dmi.product.name: AOA110
> dmi.product.version: 1
> dmi.sys.vendor: Acer
>
>
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/linux/+bug/445852/+subscribe

Revision history for this message

In freedesktop.org Bugzilla #25673, Lennart-poettering (lennart-poettering) wrote on 2009-12-17:

#129

(In reply to comment #3)
> Most distros only switched to using your code recently; previously we've all
> been using smartmontools and the like which don't cause this problem.

Is it actually verified that this doesn't happen with smartmontools? I mean, smartmontools in contrast to libatasmart does not issue commands that early after initialization/hotplug? So, is it verified that the problem is with the way libatasmart issues its commands and not simply due to the context those commands are executed in?

> The LP bug has the differencing straces between the two if that's helpful?

I only see a lot of noise in that bug report, could you point me tto the two straces?

Revision history for this message

In freedesktop.org Bugzilla #25673, Lennart-poettering (lennart-poettering) wrote on 2009-12-17:

#130

(In reply to comment #3)
> Most distros only switched to using your code recently;

The simple fact is that rawhide (and the ubuntu betas) had this code for months already, and we got quite a few bug reports, but never something about this issue. This issue only appeared a couple of weeks back, and hence I am wondering if something else changed in that time, because libatasmart didn't.

Revision history for this message

Tommy Trussell (tommy-trussell) wrote on 2009-12-17:

#131

UNFORTUNATELY the workaround is not a good idea for a NEW installation on my netbook. I was able to edit /lib/udev/rules.d/95-devkit-disks.rules before the installer rebooted into the new system, but as soon as the system installed the first set of critical and recommended updates, the filesystem was thoroughly trashed before update-manager had even finished.

Is there a single package I can pin in apt, or can I just remove or somehow deactivate libatasmart itself instead of editing the udev rule?

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-12-17: Re: [Bug 445852] Re: devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential hardware death

#132

On Thu, 2009-12-17 at 15:44 +0000, Tommy Trussell wrote:

> UNFORTUNATELY the workaround is not a good idea for a NEW installation
> on my netbook. I was able to edit /lib/udev/rules.d/95-devkit-
> disks.rules before the installer rebooted into the new system, but as
> soon as the system installed the first set of critical and recommended
> updates, the filesystem was thoroughly trashed before update-manager had
> even finished.
>
Then re-apply the change before rebooting after those updates.

You can use dpkg-divert as described above in the bug comments to ensure
that updates do not affect this file - but then you won't get the proper
fix later and may indeed cause yourself future bugs down the line.

Scott
--
Scott James Remnant
<email address hidden>

Revision history for this message

Tommy Trussell (tommy-trussell) wrote on 2009-12-17:

#133

@scott Yes, I think dpkg-divert or some other technique is essential because as I said, I didn't even reboot -- the filesystem was already trashed BEFORE update-manager had FINISHED -- apparently it had already reverted the change and reloaded udev in one of its updates.

Revision history for this message

Cris (cristiano.p) wrote on 2009-12-17:

#134

While upgrading my eeepc 900 to Karmic (before I had to recover the ssd and fall-back to Jaunty),
I've noticed a very marked slow down of the disk operations: the upgrade process took quite 5 hours.

Which makes me believe that the disk corruption happens already during the upgrade/install process.

Revision history for this message

In freedesktop.org Bugzilla #25673, Andrewnz-simpson (andrewnz-simpson) wrote on 2009-12-17:

#135

(In reply to comment #5)
> The simple fact is that rawhide (and the ubuntu betas) had this code for months
> already, and we got quite a few bug reports, but never something about this
> issue. This issue only appeared a couple of weeks back, and hence I am
> wondering if something else changed in that time, because libatasmart didn't.
>

The Ubuntu bug report goes back to early October, however the link with libatasmart was only made very recently.

There is no interplay with any specific kernel version: The bug has been confirmed on 2.6.28, 2.6.29, 2.6.30, 2.6.31 and 2.6.32. Both Ubuntu patched versions and mainline kernels are affected.

The bug has been confirmed as libatasmart only; testing has shown that smartmontools does not give the same problem. Early initialisation is a possible issue, though the problem can be readily reproduced at any time.

Comment #129 of the Ubuntu Bug Report is well worth reading, because it seems to be isolating the bug.

I will attach the straces from the Ubuntu Bug Report to this report

Revision history for this message

In freedesktop.org Bugzilla #25673, Andrewnz-simpson (andrewnz-simpson) wrote on 2009-12-17:

#136

Created an attachment (id=32167)
Trace from libatasmart

Revision history for this message

In freedesktop.org Bugzilla #25673, Andrewnz-simpson (andrewnz-simpson) wrote on 2009-12-17:

#137

Created an attachment (id=32168)
Trace from smartctl

Revision history for this message

Jean-Louis (jean-louis) wrote on 2009-12-17:

#138

HI, sorry for my bad english.

I don't have sdd hard disk, but I've watched libatasmart code for other bug and I think that I can help a little for this

In the attachment smartctl-output I can see: "ATA Version is: 5"

In this pdf (2.7MB) http://www.t10.org/t13/project/d1321r3-ATA-ATAPI-5.pdf there are differences if hdd implements "PACKET Command feature set" or no.

In particular if it is implemented, all the smart commands used in libatasmart are prohibited.

The IDENTIFY DEVICE command, if is implemented "PACKET Command feature set", shall return command aborted, but in libatasmart the return value is lost and the "d->identify_valid = FALSE;" is never setted.

Try to add in function disk_identify_device(),
after (line 741) "if ((ret = disk_command(d, SK_ATA_COMMAND_IDENTIFY_DEVICE, SK_DIRECTION_IN, cmd, d->identify, &len)) < 0)"
and before (line 742)"return ret;"
"d->identify_valid = FALSE;"

like this

        if ((ret = disk_command(d, SK_ATA_COMMAND_IDENTIFY_DEVICE, SK_DIRECTION_IN, cmd, d->identify, &len)) < 0) {
                d->identify_valid = FALSE;
                return ret;
        }

Revision history for this message

In freedesktop.org Bugzilla #25673, Lennart-poettering (lennart-poettering) wrote on 2009-12-18:

#139

(In reply to comment #6)

>
> Comment #129 of the Ubuntu Bug Report is well worth reading, because it seems
> to be isolating the bug.

I don't think so. That proposed patch is bogus, identify_valid is FALSE unless set to TRUE anyway.

Also, supposedly SMART does work with smartmontools, just not with libatasmart, right? That comment suggests that SMART would not work at all with those SSDs.

Andrew, do you have one of the SSDs affected? Could you step through the code and figure out exactly which command triggers the problem?

Revision history for this message

In freedesktop.org Bugzilla #25673, Andrewnz-simpson (andrewnz-simpson) wrote on 2009-12-18:

#140

(In reply to comment #9)

>
> Andrew, do you have one of the SSDs affected? Could you step through the code
> and figure out exactly which command triggers the problem?
>

You would have to give me very detailed instructions as to how to do it. Programming in C is not one of my skills and I don't have a programming background.

More realistically, there are at least a couple of people on the Ubuntu Bug list that would know how to do this. Is it worth putting out a query?

Revision history for this message

In freedesktop.org Bugzilla #25673, Alan Pope 🍺🐧🐱 🦄 (popey) wrote on 2009-12-18:

#141

In terms of how long ago this bug has existed, I originally filed a bug on this issue back in June '09. So it's existed long before 9.10 (October) released.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/387272

Revision history for this message

In freedesktop.org Bugzilla #25673, Jelot-freedesktop (jelot-freedesktop) wrote on 2009-12-18:

#142

(In reply to comment #9)
> (In reply to comment #6)
>
> >
> > Comment #129 of the Ubuntu Bug Report is well worth reading, because it seems
> > to be isolating the bug.
>
> I don't think so. That proposed patch is bogus, identify_valid is FALSE unless
> set to TRUE anyway.

I'm the author of comment #129 in launchpad <https://bugs.launchpad.net/ubuntu/+source/libatasmart/+bug/445852/comments/129>

I'm quite a beginner with c, but I know that if a variable is not initialized its value is garbage.

I don't find how d->identify_valid is zeroed or setted FALSE. Obviously, I could be missed it.

>
> Also, supposedly SMART does work with smartmontools, just not with libatasmart,
> right? That comment suggests that SMART would not work at all with those SSDs.

I don't know internals of smartmoontools... I have read only some pages of this pdf (2.7MB) http://www.t10.org/t13/project/d1321r3-ATA-ATAPI-5.pdf

on pag 52 is reported:

[quote]
Devices that implement the PACKET Command feature set shall not implement the SMART feature set as described in this subclause.
Devices that implement the PACKET Command feature set and SMART shall implement SMART as defined by the command packet set implemented by the device.
[/quote]

and on page 196 and subsequent is reported:

[quote]
SMART feature set.
− Mandatory when the SMART feature set is implemented.
− Use prohibited when the PACKET Command feature set is implemented.
[/quote]

I don't know how is implemented SMART for command packet set and I don't know if this sdd implements command packet set, but this *could be* the problem (IMHO)

Revision history for this message

mint-one (d-zschokke) wrote on 2009-12-18:

#143

Hi folks

Applying the patch works... You should apply it first to your install-usb-stick and you will notice how fast gparted detects your drives. Then apply the patch after installing karmic. Never reboot the system without applying the patch! Otherwise you will write lots zeroes to your ssd again. And now to the best part: Every kernel or grub update (undecided) will reset this patch! Apply it after each major update or your ssd will be lost in space again.

I'm still up after having updated the system successfully. Let's see how long this is going to last. This bug is hell. Priority should be set to "hell".

good luck, dominic (on a eee pc 900)

Revision history for this message

Tommy Trussell (tommy-trussell) wrote on 2009-12-18:

#144

Would it be useful to create a "dummy" libatasmart4 package that responds to its calls with something innocuous but doesn't actually probe the SMART status? I see it's not easy to just yank it out because of other packages' dependencies upon it. I would prefer to disable the package in a way that survives ordinary software updates.

I'm not sure how @mint-one was able to avoid filesystem breakage... I wasn't able to reapply the patch in time, though maybe I was just especially un-lucky or un-careful.

P.S.: The Patriot Lite 32GB SSD upgrade on my ASUS 900 seems most susceptible to damage when the root partition or the root + swap partitions completely fill the drive. I don't know what that might mean, except that it's a "bigger target" for breakage. The beta 9.10 NBR installers (prior to October) could not even finish the job without completely trashing the filesystem before grub was installed.

Revision history for this message

Tommy Trussell (tommy-trussell) wrote on 2009-12-18:

#145

@jean-louis: do you have your patched libatasmart4 code in a PPA? I would be pleased to test it.

Revision history for this message

mint-one (d-zschokke) wrote on 2009-12-18:

#146

For Tommy Trussel and to whom it may concern:

1. Create a bootable usb medium.
2. Apply the patch on the usb medium! (Uncomment this line, see top of this bug)
3. Install (boot into live installation, don't install directly)
4. Install karmic
5. Don't reboot
6. Apply the patch on the the boot partition (navigate there with nautilus and copy the path (quite a strange one), paste it into terminal, sudo gedit $path, save)
7. Reboot.
8. Update your system (language support, new kernel, grub, misc updates)
9. Don't reboot.
10. sudo gedit /lib/udev/rules.d/95-devkit-disks.rules
10.1 and uncomment again... it was reset to the faulty default!
11. Reboot

here you go. This worked for me. Still working after several reboots.. and its fast on this old and small eee pc.

Good Luck!

Revision history for this message

Jean-Louis (jean-louis) wrote on 2009-12-18:

#147

> @jean-louis: do you have your patched libatasmart4 code in a PPA?
> I would be pleased to test it.

No, I don't have a ppa.
I'm new on launchpad and I have yet to understand its features.

My comment is reported to upstream by Andrew Simpson, but Lennart Poettering says that a proposed patch is bogus (https://bugs.freedesktop.org/show_bug.cgi?id=25673#c9).

I'm quite a beginner with c, but I don't find how d->identify_valid is zeroed or setted FALSE. Obviously, I could be missed it.

Now I would create account on freedesktop for ask

Revision history for this message

Tommy Trussell (tommy-trussell) wrote on 2009-12-18:

#148

@mint-one -- when I did that, my system was corrupted BEFORE step #8 was finished.

Revision history for this message

Andrew Simpson (andrew-simpson) wrote on 2009-12-18:

#149

O.K., I think I have a better workaround for this bug.

The problem is that udev reads the udev rule files into memory and then uses inotify to watch for changes in the file. As soon as the rule file changes, udev is informed and re-reads the file. That means that when apt-get updates the rule file, damage can be done before you get a chance to patch it again.

What I have done is put a dummy file in for devkit-disks-probe-ata-smart and used dpkg-divert to make the system accept the dummy file.

Run the following command:

$ sudo dpkg-divert --divert --add --rename --divert /lib/udev/devkit-disks-probe-ata-smart.bak /lib/udev/devkit-disks-probe-ata-smart

This renames the existing file to devkit-disks-probe-ata-smart.bak and tells dpkg / apt-get to install any new updates to the _changed_ file name.

To see your divert (and others in the system):

$ sudo dpkg-divert --list

Now we create a dummy file:

$ sudo /lib/udev/nano devkit-disks-probe-ata-smart (or some other editor of your choice)

#!/bin/bash
#
exit 0

Save the file.

This dummy file does precisely nothing, but it allows udev to run it...

Make the new dummy file executable:

$ sudo chmod 755 /lib/udev/devkit-disks-probe-ata-smart

That's it.

When the bug gets really fixed, we need to remove the dummy file and divert:

$ sudo rm /lib/udev/devkit-disks-probe-ata-smart
$ dpkg-divert --rename --remove /lib/udev/devkit-disks-probe-ata-smart

Revision history for this message

Andrew Simpson (andrew-simpson) wrote on 2009-12-18:

#150

Carrying on from above:

Here's how to patch and install a system safely from the LiveCD (or live USB) of Ubuntu 9.10.

I booted up the LiveCD and patched the live system as above. That made the live system safe to use. I then installed from the LiveCD (no errors - good).

However instead of immediately rebooting, I patched the SSD from the live system:

$ sudo mkdir /target

(In my case it already existed from the install)

$ sudo mount /dev/sda1 /target

$ sudo chroot /target

You are now in the new (SSD) system as root, but safely running on the patched LiveCD. Follow the steps above, but leave out 'sudo', because you are root. When finished you can leave chroot by:

# exit

------------------

Edit on previous comment:

$ sudo /lib/udev/nano devkit-disks-probe-ata-smart (or some other editor of your choice)

-- should read:

$ sudo nano /lib/udev/devkit-disks-probe-ata-smart (or some other editor of your choice)

Revision history for this message

Tommy Trussell (tommy-trussell) wrote on 2009-12-18:

#151

@Andrew Simpson: Thank you! but be sure to substitute the path to an editor that works ;-) I'm testing this now.

sudo gedit /lib/udev/devkit-disks-probe-ata-smart
or
sudo nano /lib/udev/devkit-disks-probe-ata-smart

Revision history for this message

Tommy Trussell (tommy-trussell) wrote on 2009-12-18:

#152

Finished testing, and the new workaround procedure works fine on my ASUS. (I did have some trouble on the first reboot after update-manager finishes, but it looks like a grub issue, probably not related to this bug.)

Revision history for this message

In freedesktop.org Bugzilla #25673, Lennart-poettering (lennart-poettering) wrote on 2009-12-19:

#153

(In reply to comment #12)
> (In reply to comment #9)
> > (In reply to comment #6)
> >
> > >
> > > Comment #129 of the Ubuntu Bug Report is well worth reading, because it seems
> > > to be isolating the bug.
> >
> > I don't think so. That proposed patch is bogus, identify_valid is FALSE unless
> > set to TRUE anyway.
>
> I'm the author of comment #129 in launchpad
> <https://bugs.launchpad.net/ubuntu/+source/libatasmart/+bug/445852/comments/129>
>
> I'm quite a beginner with c, but I know that if a variable is not initialized
> its value is garbage.
>
> I don't find how d->identify_valid is zeroed or setted FALSE. Obviously, I
> could be missed it.

The initial calloc() call for allocating the SkDisk structure does the zero initialization.

Revision history for this message

In freedesktop.org Bugzilla #25673, Jelot-freedesktop (jelot-freedesktop) wrote on 2009-12-19:

#154

(In reply to comment #13)
> (In reply to comment #12)
> > I don't find how d->identify_valid is zeroed or setted FALSE. Obviously, I
> > could be missed it.
>
> The initial calloc() call for allocating the SkDisk structure does the zero
> initialization.
>

Uh... thanks for clarification and sorry for wasting your time.

Revision history for this message

In freedesktop.org Bugzilla #25673, 4280829-noduck (4280829-noduck) wrote on 2009-12-22:

#155

(In reply to comment #9)
> (In reply to comment #6)
>
> >
> > Comment #129 of the Ubuntu Bug Report is well worth reading, because it seems
> > to be isolating the bug.
>
> I don't think so. That proposed patch is bogus, identify_valid is FALSE unless
> set to TRUE anyway.
>
> Also, supposedly SMART does work with smartmontools, just not with libatasmart,
> right? That comment suggests that SMART would not work at all with those SSDs.
>
> Andrew, do you have one of the SSDs affected? Could you step through the code
> and figure out exactly which command triggers the problem?
>

I ran gdb on devkit-disks-probe-ata-smart/libatasmart. I found that it is the ioctl call from disk_smart_read_thresholds call that triggers the HSM violation in the kernel log (disk_smart_read_thresdholds calls disk_command with the argument SK_SMART_COMMAND_READ_THRESHOLDS), which calls disk_passthrough_16_command, which calls sg_io, which does an ioctl.

There is one earlier call to disk_command (with SK_ATA_COMMAND_IDENTIFY_DEVICE), but that does not the trigger the HSM violation in the kernel log. So I believe it is the way that the SSD reacts to the READ_THRESHOLDS command that throws off the kernel.

Raf.

Revision history for this message

MFV (mfv) wrote on 2009-12-29:

#156

I can confirm this occurs on a stock Asus EEE 900 (original celeron linux model with 4gb+16gb SSD's - it occurs on both).

Bug Watch Updater (bug-watch-updater) on 2009-12-31

Changed in libatasmart:
status:	Unknown → Confirmed

Revision history for this message

Vishal Rao (vishalrao) wrote on 2010-01-02:

#157

I've filed https://bugs.launchpad.net/bugs/502219 not sure whether its the same or just related and whether the same workaround applies?

Revision history for this message

jslater (jslater) wrote on 2010-01-06:

#158

Does this bug affect _everyone_ with an eeePC [900]? A "Tier 1" supported netbook platform.

It certainly destroyed the contents of my SSD.

The workaround works, but there is no mention on https://wiki.ubuntu.com/HardwareSupport/Machines/Netbooks or elsewhere of this very serious bug. At the very least warning potential users of this bug might save someone from losing all of their data.

This bug has existed for months and has seriously dented my impression of Ubuntu.

Revision history for this message

Tommy Trussell (tommy-trussell) wrote on 2010-01-06:

#159

@jslater: from earlier tests it seemed it did not affect everyone, or at least not equally. For example, I have an ASUS EeePC 900 (4GB SSD, no built-in webcam, purchased from Target) and I see the problem very clearly on my Patriot Lite upgraded SSD but not on the stock ASUS 4GB SSD. I haven't swapped the 4GB SSD back in since we have discovered the trigger -- it's possible the stock SSD was somewhat affected but didn't trash data as thoroughly or something. When I get back to my office later I might get out the screwdriver set and try it.

I agree that this bug should be noted somewhere on that netbooks wiki page. Please feel free to add it! (Though it would be hard to know exactly which models might be affected. Maybe ALL of them, depending on which SSD is installed.)

I believe some SSDs have been reported that are not installed in netbooks.

If you see a good place to include the warning, please add it.

Revision history for this message

rogmorri (frontporsche) wrote on 2010-01-06:

#160

The factory-installed SSD on my Aspire Acer One 110-1722, a tier-2 netbook, died a few months ago. In retrospect, I think the issue was this very bug.

(I've since replaced the SSD with a 16G after-market drive, which also suffered from this problem.)

Wouldn't a problem like this, where there's no easy workaround for avoiding the problem at install time, call for releasing a new 9.10.1 Ubuntu iso?

Revision history for this message

Alan Pope 🍺🐧🐱 🦄 (popey) wrote on 2010-01-07:

#161

Just for reference I have an EEE 900 which has been running Jaunty 9.04 for a while just fine. I upgraded to 9.10, and before rebooting implemented the change to /lib/udev/rules.d/95-devkit-disks.rules as indicated in the description. This seemed to work well.

Revision history for this message

In freedesktop.org Bugzilla #25673, Jelot-freedesktop (jelot-freedesktop) wrote on 2010-01-07:

#162

In Karmic there is a new stable kernel 2.6.31-17.54 <https://bugs.launchpad.net/ubuntu/+source/linux/+bug/480144> that refer to upstream kernel 2.6.31.6; in the changelog <http://kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.31.6> is mentioned this commit 9982364654c186acd48c3070dcf6a76c69e540cc with this description:

[quote]
commit 9982364654c186acd48c3070dcf6a76c69e540cc
Author: Tejun Heo <email address hidden>
Date: Fri Oct 16 13:00:51 2009 +0900

libata: fix internal command failure handling

commit f4b31db92d163df8a639f5a8c8633bdeb6e8432d upstream.

When an internal command fails, it should be failed directly without invoking EH. In the original implemetation, this was accomplished by letting internal command bypass failure handling in ata_qc_complete(). However, later changes added post-successful-completion handling to that code path and the success path is no longer adequate as internal command failure path. One of the visible problems is that internal command failure due to timeout or other freeze conditions would spuriously trigger WARN_ON_ONCE() in the success path.

This patch updates failure path such that internal command failure handling is contained there.
[/quote]

Could be related to this bug?

Revision history for this message

Jean-Louis (jean-louis) wrote on 2010-01-07:

#163

In Karmic there is a new stable kernel 2.6.31-17.54
<https://bugs.launchpad.net/ubuntu/+source/linux/+bug/480144> that refer to
upstream kernel 2.6.31.6; in the changelog
<http://kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.31.6> is mentioned this
commit 9982364654c186acd48c3070dcf6a76c69e540cc with this description:

[quote]
commit 9982364654c186acd48c3070dcf6a76c69e540cc
Author: Tejun Heo <email address hidden>
Date: Fri Oct 16 13:00:51 2009 +0900

libata: fix internal command failure handling

commit f4b31db92d163df8a639f5a8c8633bdeb6e8432d upstream.

When an internal command fails, it should be failed directly without invoking
EH. In the original implemetation, this was accomplished by letting internal
command bypass failure handling in ata_qc_complete(). However, later changes
added post-successful-completion handling to that code path and the success
path is no longer adequate as internal command failure path. One of the
visible problems is that internal command failure due to timeout or other
freeze conditions would spuriously trigger WARN_ON_ONCE() in the success path.

This patch updates failure path such that internal command failure handling is
contained there.
[/quote]

Could be related to this bug?

Revision history for this message

Tommy Trussell (tommy-trussell) wrote on 2010-01-15:

#164

Here is a tested procedure for getting an uncorrupted Karmic system installed onto a netbook (tested on my ASUS Eee PC 900).

Installing Ubuntu Karmic 9.10 UNR on a system with an affected SSD

1) Boot from a live Ubuntu "Karmic" 9.10 USB stick or SD card.

2) If the SSD has been trashed by previous encounters with the bug, it may need to be wiped to eliminate bad blocks. Open a terminal and issue the following command (this assumes the SSD mounts to "/dev/sda" -- you must be certain of the device name on your system because everything on it will be erased):

$ sudo dd if=/dev/zero of=/dev/sda bs=1M

3) After step 2 finishes (it can take awhile), launch the "Install Ubuntu-Netbook-Remix 9.10" application (ubiquity) and install Ubuntu to your SSD. (If you have sufficient RAM, choose a custom partition and install a single "/" (root) partition without any swap space. I recommend using ext3 or the default ext4. Some recommend usig ext2, however, in my experience it does not recover from crash problems gracefully.)

4) When the installer finishes, a dialog will come up suggesting you can restart now. Don't restart yet! While that dialog is open, the install partition should still be mounted at /target ... HOWEVER if you already closed the dialog, open a Terminal and mount the partition:

$ sudo mount /dev/sda1 /target

5) Now chroot into the target system

$ sudo chroot /target

6) The terminal is chrooted into the target system as root (no need for sudo). You can now divert the problematic file on the target system:

# dpkg-divert --divert --add --rename --divert /lib/udev/devkit-disks-probe-ata-smart.bak /lib/udev/devkit-disks-probe-ata-smart

7) Now create a file. You will type three lines directly into the file, finishing with a control-D. (If you make a mistake that you can't fix using backspace, close the file with control-D and use nano or vim to edit the file.)

# cat > /lib/udev/devkit-disks-probe-ata-smart
#!/bin/bash
#
exit 0
[type control-D here]

8) Make the new file executable:

# chmod 755 /lib/udev/devkit-disks-probe-ata-smart

9) exit the chroot and terminal

# exit
$ exit

10) Shutdown, remove the USB stick or SD card, and boot into the new system. Install all software updates as needed.

-------------------------

After you know this bug has been fixed AND after the correct updated devicekit-disks package has been installed on your system, you can re-enable it using these commands:

$ sudo rm /lib/udev/devkit-disks-probe-ata-smart
$ sudo dpkg-divert --rename --remove /lib/udev/devkit-disks-probe-ata-smart

--------------------------

Here is a tested procedure for getting an uncorrupted Karmic system installed onto a netbook (tested on my ASUS Eee PC 900).

Installing Ubuntu Karmic 9.10 UNR on a system with an affected SSD

1) Boot from a live Ubuntu "Karmic" 9.10 USB stick or SD card.

2) If the SSD has been trashed by previous encounters with the bug, it may need to be wiped to eliminate bad blocks. Open a terminal and issue the following command (this assumes the SSD mounts to "/dev/sda" -- you must be certain of the device name on your system because everything on it will be erased):

$ sudo dd if=/dev/zero of=/dev/sda bs=1M

3) After step 2 finishes (it can take awhile), launch the "Install Ubuntu-Netbook-Remix 9.10" application (ubiquity) and install Ubuntu to your SSD. (If you have sufficient RAM, choose a custom partition and install a single "/" (root) partition without any swap space. I recommend using ext3 or the default ext4. Some recommend usig ext2, however, in my experience it does not recover from crash problems gracefully.)

4) When the installer finishes, a dialog will come up suggesting you can restart now. Don't restart yet! While that dialog is open, the install partition should still be mounted at /target ... HOWEVER if you already closed the dialog, open a Terminal and mount the partition:

$ sudo mount /dev/sda1 /target

5) Now chroot into the target system

$ sudo chroot /target

6) The terminal is chrooted into the target system as root (no need for sudo). You can now divert the problematic file on the target system:

# dpkg-divert --divert --add --rename --divert /lib/udev/devkit-disks-probe-ata-smart.bak /lib/udev/devkit-disks-probe-ata-smart

7) Now create a file. You will type three lines directly into the file, finishing with a control-D. (If you make a mistake that you can't fix using backspace, close the file with control-D and use nano or vim to edit the file.)

# cat > /lib/udev/devkit-disks-probe-ata-smart 
#!/bin/bash
#
exit 0
[type control-D here]

8) Make the new file executable:

# chmod 755 /lib/udev/devkit-disks-probe-ata-smart

9) exit the chroot and terminal

# exit
$ exit

10) Shutdown, remove the USB stick or SD card, and boot into the new system. Install all software updates as needed.

-------------------------

After you know this bug has been fixed AND after the correct updated devicekit-disks package has been installed on your system, you can re-enable it using these commands:

$ sudo rm /lib/udev/devkit-disks-probe-ata-smart
$ sudo dpkg-divert --rename --remove /lib/udev/devkit-disks-probe-ata-smart

--------------------------

Revision history for this message

MFV (mfv) wrote on 2010-01-16:

#165

Whats the update on this? The workaround seems good, and this issues exists in every current linux distro? Is a fix actually on its way?

Revision history for this message

Jarige (jarikvh) wrote on 2010-01-19:

#166

I've noticed that I had similar symptoms of this bug when adding "elevator=noop" to /etc/default/grub on to this line: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash elevator=noop".
After removing it again from command line it worked 'normally' again. When I say normally I mean that all programs seem to stop responding when the SSD is in use (either write or read). Installing a program will make every other program to 'crash' and the GUI (even the mouse) stops responding. This is not happening all the time though. As I'm typing on my AAO (8GB SSD) I see the SSD LED blinking every now and then but it affects other progams pretty often.

Revision history for this message

Vishal Rao (vishalrao) wrote on 2010-01-19:

#167

FYI, my (solved) problem is/was NCQ not SMART as you can see in this comment in another bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/502219/comments/7

If you see "failed command READ FPDMA QUEUED" kind of logs in dmesg then that might be for you...

Basically you need to pass " libata.force=noncq " to the linux kernel boot param which I am doing but not sure what to do if people have multiple drives some properly supporting NCQ...

Revision history for this message

Юрий Аполлов (apollovy) wrote on 2010-01-29:

#168

It's told here in launchpad, that this bug has a patch. But I cannot see it in usual way - there are instructions, but no patch. Can anyone write a patch-script to work this problem around??

Bug Watch Updater (bug-watch-updater) on 2010-02-03

Changed in linux:
status:	Confirmed → Invalid

Revision history for this message

ectropionized (ectropionized-deactivatedaccount) wrote on 2010-02-10:

#169

After upgrading the kernel to 2.6.31-19, devicekit-disks (007-2ubuntu4), and enabling DMA again, I am receiving no errors after testing extensively. I was one of those plagued with this bug, causing havoc on my netbook SSD. Can anyone else confirm a resolution on their end? It's entirely possible the time period for testing became anomalous, although I would figure unlikely given the consistency of errors previous. Since I am not yet receiving errors I thought it was worth opening continued discussion.

Revision history for this message

ectropionized (ectropionized-deactivatedaccount) wrote on 2010-02-10:

#170

Scratch that on the resolution. Although I'm no longer receiving data corruption with DMA enabled, I just received this:

[ 2962.988208] ata2: lost interrupt (Status 0x58)
[ 2962.988297] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 2962.988306] ata2.00: BMDMA stat 0x4
[ 2962.988323] ata2.00: cmd ca/00:08:40:e9:7d/00:00:00:00:00/e0 tag 0 dma 4096 out
[ 2962.988326] res 58/00:08:40:e9:7d/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
[ 2962.988333] ata2.00: status: { DRDY DRQ }
[ 2962.988376] ata2: soft resetting link
[ 2963.196543] ata2.00: configured for UDMA/66
[ 2963.196581] ata2: EH complete

Revision history for this message

Will (will-berriss) wrote on 2010-02-12:

#171

I have just put a Super Talent 32GB SSD into my AA1 netbook and have installed Ubuntu 9.10 and I have this bug.

I don't want to damage my SSD as it was expensive.

What are my options to avoid Ubuntu damaging my SSD? Do I need to stop using 9.10 and wait for 10.04 or will the workaround above treat the SSD nicely?

Revision history for this message

Gav Mack (gavinmac) wrote on 2010-02-13:

#172

@Will - Follow the instructions on post 147

Revision history for this message

Will (will-berriss) wrote on 2010-02-13:

#173

@Gav Mack - Thanks! That's looks like quite a change, but I like the idea of not having swap so I may give it a go and reinstall.

Currently all I have done is the stuff in post 1, i.e. this:

# ATA disks driven by libata
#KERNEL=="sd*[!0-9]", ATTR{removable}=="0", ENV{ID_BUS}=="ata", ENV{DEVTYPE}=="disk", IMPORT{program}="devkit-disks-probe-ata-smart $tempnode"

Is this enough to avoid damage to the SSD or is it only a small step towards reducing SSD wear and tear?

Thanks.

Revision history for this message

Martin Pitt (pitti) wrote on 2010-02-15:

#174

but is in libatasmart, closing devicekit-disks task.

Changed in devicekit-disks (Ubuntu):
status:	Triaged → Invalid
importance:	Critical → Undecided

Revision history for this message

Tommy Trussell (tommy-trussell) wrote on 2010-02-15:

#175

@will -- in my experience with my Patriot Lite SSD, the patch in post 1 works fine but your first software update will undo it. And it's not good enough to apply the patch after the software update is finished, because the buggy code starts running immediately and it trashes the drive before the update even finishes.

The more elaborate workaround in comment 147 (which Andrew Simpson developed and I reiterated) tells the package manager to move the buggy code to a different location and apply all future updates to it in the new location (step 6) and creates a dummy executable file that runs but does nothing (step 7 & 8) in the location of the old software.

Revision history for this message

Will (will-berriss) wrote on 2010-02-16:

#176

@Tommy Trussell - Thank you very much!

I reinstalled with no swap space and will apply #147 next. I just did #1 for the time being and i noticed an update overwrote it, so I reapplied it. Luckily my SSD survived that at least.

Next time I boot it up, I'll apply #147 right away. What a nightmare!

Thanks again! :)

Revision history for this message

Will (will-berriss) wrote on 2010-02-16:

#177

I had to read #147 a couple of times, as the wording of step 7) confused me. Anyway, in short I did this to my working system:

6) You can now divert the problematic file on the target system:

# dpkg-divert --divert --add --rename --divert /lib/udev/devkit-disks-probe-ata-smart.bak /lib/udev/devkit-disks-probe-ata-smart

7) Now create a file.

vi /lib/udev/devkit-disks-probe-ata-smart

and put the following 3 lines in it:

#!/bin/bash
#
exit 0

8) Make the new file executable:

# chmod 755 /lib/udev/devkit-disks-probe-ata-smart

Revision history for this message

In freedesktop.org Bugzilla #25673, 4280829-noduck (4280829-noduck) wrote on 2010-02-18:

#178

(In reply to comment #15)
> I ran gdb on devkit-disks-probe-ata-smart/libatasmart.

Was this information helpful? If not, can you let me know how I can assist in fixing this bug?

Revision history for this message

Raf (4283534-noduck) wrote on 2010-02-18:

#179

This bug also affects Lucid via the call to udisks-probe-ata-smart in /lib/udev/rules.d/80-udisks.rules (from the package udisks). Uncommenting the line

KERNEL=="sd*[!0-9]", ATTR{removable}=="0", ENV{ID_BUS}=="ata", ENV{DEVTYPE}=="disk", IMPORT{program}="udisks-probe-ata-smart $tempnode"

in /lib/udev/rules.d/80-udisks.rules works around the problem on my Acer.

Is there any plan to include a real fix for this problem in Lucid? The upstream kernel bug was closed, the upstream libatasmart bug hasn't received much attention. How can I help to make sure that this bug is fixed in Lucid?

Revision history for this message

Andrew Simpson (andrew-simpson) wrote on 2010-02-19:

#180

@Raf
I closed the kernel bug report (it was my bug report) since it's not relevant to the kernel.

I've also nominated this bug for Lucid release - whatever that does.

More importantly the upstream maintainer seems to have lost interest in fixing this bug. How does one go about nominating packages for removal from Ubuntu due to lack of response from upstream maintainer?

Revision history for this message

Guy Taylor (thebiggerguy) wrote on 2010-02-19:

#181

@Andrew Simpson
I have had good response from upstream and think this is a good package to keep within Ubuntu.

@all
Has the particular hardware (SSD or Controller) being identified yet? libatasmart has a "quirk" table to black list incompatible hardware.

Revision history for this message

Andrew Simpson (andrew-simpson) wrote on 2010-02-19:

#182

@Guy Taylor
The hardware has been generically identified - most Super Talent and Patriot devices and less commonly a few others. The problem seems to be at the SSD rather than the bridge. There are enough people following this bug to enable compilation of a reasonably complete list if asked.

What device information is required for a quirks table? Output from lspci -vv? Or something else?

Revision history for this message

Skylord (me-skylord) wrote on 2010-02-20:

#183

BTW, this problem refers not only to specific hardware. For example I encountered it after updating my standard EeePC 901 SSD firmware to newer version - with better speed performance (in exchange of reducing disk space). The same is for Acer AspireOne....

Revision history for this message

adamski (adam-hasselbalch) wrote on 2010-02-23:

#184

So.

I have /dev/zeroed both the 4G and the 16G SSDs in my Eee PC after running into this bug. When I discovered what was going on, I had used Karmic for about one hour.

The 4G drive seems to be OK.'badblocks -s' finish with no reports.

The 16G drive is dead. Stuffed to the brim with bad sectors, and dmesg shows I/O errors galore. This is both with 9.04 and 8.04 (which was my last known good installation) kernels. 8.04 allowed me to actually make a partition table after 9.04 failed with I/O errors all over the place. 8.04 also allowed me to create a file system. badblocks(1), however, show that there are still tons of errors on the drive.

I am going to try another /dev/zeroing of the 16G drive with a 8.04 kernel, for good measure, but I am not optimistic, since the HSM violations are gone, and what I see is, as mentioned, what looks like hardware I/O errors.

Mind you, both these drives worked fine prior to installing 9.10, but now, one disk is dead.

I am NOT really happy with Ubuntu right now. Spare Asus SSDs (which of course don't use regular SATA connections) are not a common commodity here, so for all intents and purposes, Karmic has bricked my Eee.

Revision history for this message

Юрий Аполлов (apollovy) wrote on 2010-02-23: Re: [Bug 445852] Re: devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential hardware death

#185

Download full text (6.7 KiB)

adamski, if possible - try to renew (actually change) firmware. For a couple
of times. I got such troubles with Karmik on my Acer Aspire One 8Gb SSD.
Then many times zeroed it, ten (or more) times one-by-one firmwared - and
now it's running fine. Jaunty.

2010/2/24 adamski <email address hidden>

> So.
>
> I have /dev/zeroed both the 4G and the 16G SSDs in my Eee PC after
> running into this bug. When I discovered what was going on, I had used
> Karmic for about one hour.
>
> The 4G drive seems to be OK.'badblocks -s' finish with no reports.
>
> The 16G drive is dead. Stuffed to the brim with bad sectors, and dmesg
> shows I/O errors galore. This is both with 9.04 and 8.04 (which was my
> last known good installation) kernels. 8.04 allowed me to actually make
> a partition table after 9.04 failed with I/O errors all over the place.
> 8.04 also allowed me to create a file system. badblocks(1), however,
> show that there are still tons of errors on the drive.
>
> I am going to try another /dev/zeroing of the 16G drive with a 8.04
> kernel, for good measure, but I am not optimistic, since the HSM
> violations are gone, and what I see is, as mentioned, what looks like
> hardware I/O errors.
>
> Mind you, both these drives worked fine prior to installing 9.10, but
> now, one disk is dead.
>
> I am NOT really happy with Ubuntu right now. Spare Asus SSDs (which of
> course don't use regular SATA connections) are not a common commodity
> here, so for all intents and purposes, Karmic has bricked my Eee.
>
> --
> devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential
> hardware death
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in ATA S.M.A.R.T. Disk Health Monitoring Library: Confirmed
> Status in The Linux Kernel: Invalid
> Status in “devicekit-disks” package in Ubuntu: Invalid
> Status in “libatasmart” package in Ubuntu: Triaged
> Status in “devicekit-disks” source package in Lucid: Invalid
> Status in “libatasmart” source package in Lucid: Triaged
> Status in “devicekit-disks” source package in Karmic: New
> Status in “libatasmart” source package in Karmic: New
>
> Bug description:
> TEMPORARY WORK AROUND FOR THIS PROBLEM:
>
> 1. sudo gedit /lib/udev/rules.d/95-devkit-disks.rules
>
> 2. locate the following lines (about 1/3 the way into the file; search for
> "smart")
>
> # ATA disks driven by libata
> KERNEL=="sd*[!0-9]", ATTR{removable}=="0", ENV{ID_BUS}=="ata",
> ENV{DEVTYPE}=="disk", IMPORT{program}="devkit-disks-probe-ata-smart
> $tempnode"
>
> 3. comment out the second line by adding a # in front, so you should have
>
> # ATA disks driven by libata
> #KERNEL=="sd*[!0-9]", ATTR{removable}=="0", ENV{ID_BUS}=="ata",
> ENV{DEVTYPE}=="disk", IMPORT{program}="devkit-disks-probe-ata-smart
> $tempnode"
>
> 4. save the file and reboot
>
>
> BUG DESCRIPTION FOLLOWS:
>
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading. When it happens during
> login I think it is making gnome time out on load...

adamski, if possible - try to renew (actually change) firmware. For a couple
of times. I got such troubles with Karmik on my Acer Aspire One 8Gb SSD.
Then many times zeroed it, ten (or more) times one-by-one firmwared - and
now it's running fine. Jaunty.

2010/2/24 adamski <adam@hasselbalch.com>

> So.
>
> I have /dev/zeroed both the 4G and the 16G SSDs in my Eee PC after
> running into this bug. When I discovered what was going on, I had used
> Karmic for about one hour.
>
> The 4G drive seems to be OK.'badblocks -s' finish with no reports.
>
> The 16G drive is dead. Stuffed to the brim with bad sectors, and dmesg
> shows I/O errors galore. This is both with 9.04 and 8.04 (which was my
> last known good installation) kernels. 8.04 allowed me to actually make
> a partition table after 9.04 failed with I/O errors all over the place.
> 8.04 also allowed me to create a file system.  badblocks(1), however,
> show that there are still tons of errors on the drive.
>
> I am going to try another /dev/zeroing of the 16G drive with a 8.04
> kernel, for good measure, but I am not optimistic, since the HSM
> violations are gone, and what I see is, as mentioned, what looks like
> hardware I/O errors.
>
> Mind you, both these drives worked fine prior to installing 9.10, but
> now, one disk is dead.
>
> I am NOT really happy with Ubuntu right now. Spare Asus SSDs (which of
> course don't use regular SATA connections) are not a common commodity
> here, so for all intents and purposes, Karmic has bricked my Eee.
>
> --
> devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential
> hardware death
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in ATA S.M.A.R.T. Disk Health Monitoring Library: Confirmed
> Status in The Linux Kernel: Invalid
> Status in “devicekit-disks” package in Ubuntu: Invalid
> Status in “libatasmart” package in Ubuntu: Triaged
> Status in “devicekit-disks” source package in Lucid: Invalid
> Status in “libatasmart” source package in Lucid: Triaged
> Status in “devicekit-disks” source package in Karmic: New
> Status in “libatasmart” source package in Karmic: New
>
> Bug description:
> TEMPORARY WORK AROUND FOR THIS PROBLEM:
>
> 1. sudo gedit /lib/udev/rules.d/95-devkit-disks.rules
>
> 2. locate the following lines (about 1/3 the way into the file; search for
> "smart")
>
> # ATA disks driven by libata
> KERNEL=="sd*[!0-9]", ATTR{removable}=="0", ENV{ID_BUS}=="ata",
> ENV{DEVTYPE}=="disk", IMPORT{program}="devkit-disks-probe-ata-smart
> $tempnode"
>
> 3. comment out the second line by adding a # in front, so you should have
>
> # ATA disks driven by libata
> #KERNEL=="sd*[!0-9]", ATTR{removable}=="0", ENV{ID_BUS}=="ata",
> ENV{DEVTYPE}=="disk", IMPORT{program}="devkit-disks-probe-ata-smart
> $tempnode"
>
> 4. save the file and reboot
>
>
> BUG DESCRIPTION FOLLOWS:
>
> In the Karmic beta I experience ssd stalls during the boot process.  It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading.  When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load.  If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:cb:60:32/00:00:00:00:00/e0 tag 0 dma 32768 in
> res 58/00:40:cb:60:32/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded.  I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
>  **** List of PLAYBACK Hardware Devices ****
>  card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>    Subdevices: 1/1
>    Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
>  **** List of CAPTURE Hardware Devices ****
>  card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
>    Subdevices: 1/1
>    Subdevice #0: subdevice #0
> AudioDevicesInUse:
>  USER        PID ACCESS COMMAND
>  /dev/snd/controlC0:  luke       1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
>  Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
>    Mixer name   : 'Realtek ALC268'
>    Components   : 'HDA:10ec0268,1025015b,00100101'
>    Controls      : 9
>    Simple ctrls  : 6
> CheckboxSubmission: 12ef539f3788bfbc46bc56b5c28128a6
> CheckboxSystem: c69722ecac764861be52925fa50b4dcc
> Date: Wed Oct  7 17:54:56 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=UUID=8d44b89b-2edb-4c02-a4be-94bd25b65081
> MachineType: Acer AOA110
> Package: linux-image-2.6.31-12-generic 2.6.31-12.40
> ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-12-generic
> root=UUID=039a096e-3486-4898-9eeb-44a705f8b7fd ro quiet splash elevator=noop
> usbcore.autosuspend=1
> ProcEnviron:
>  LANG=en_CA.UTF-8
>  SHELL=/bin/bash
> ProcVersionSignature: Ubuntu 2.6.31-12.40-generic
> RelatedPackageVersions: linux-firmware 1.21
> RfKill:
>  0: phy0: Wireless LAN
>   Soft blocked: no
>   Hard blocked: no
> SourcePackage: linux
> Tags:  ubuntu-unr
> Uname: Linux 2.6.31-12-generic i686
> XsessionErrors:
>  (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:
> assertion `src != NULL' failed
>  (gnome-settings-daemon:2006): GLib-CRITICAL **: g_propagate_error:
> assertion `src != NULL' failed
>  (nautilus:2092): Eel-CRITICAL **: eel_preferences_get_boolean: assertion
> `preferences_is_initialized ()' failed
>  (polkit-gnome-authentication-agent-1:2118): GLib-CRITICAL **:
> g_once_init_leave: assertion `initialization_value != 0' failed
>  (gnome-panel:2048): Gdk-WARNING **:
> /build/buildd/gtk+2.0-2.18.2/gdk/x11/gdkdrawable-x11.c:952 drawable is not a
> pixmap or window
> dmi.bios.date: 10/06/2008
> dmi.bios.vendor: Acer
> dmi.bios.version: v0.3309
> dmi.board.asset.tag: Base Board Asset Tag
> dmi.board.vendor: Acer
> dmi.board.version: Base Board Version
> dmi.chassis.type: 1
> dmi.chassis.vendor: Chassis Manufacturer
> dmi.chassis.version: Chassis Version
> dmi.modalias:
> dmi:bvnAcer:bvrv0.3309:bd10/06/2008:svnAcer:pnAOA110:pvr1:rvnAcer:rn:rvrBaseBoardVersion:cvnChassisManufacturer:ct1:cvrChassisVersion:
> dmi.product.name: AOA110
> dmi.product.version: 1
> dmi.sys.vendor: Acer
>
>
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/libatasmart/+bug/445852/+subscribe
>

Revision history for this message

shadowblast101 (shadowblast101) wrote on 2010-02-23:

#186

Adamski, just thought I'd clarify that this applies to a lot of Linux distributions, not just Ubuntu. I have this bug on my Arch install, and I think someone with Debian has confirmed the bug as well. Pretty much anything that calls lib-ata-smart will suffer from this, so there's not really any reason to blame Ubuntu explicitly.

Revision history for this message

In freedesktop.org Bugzilla #25673, 4280829-noduck (4280829-noduck) wrote on 2010-02-24:

#187

(In reply to comment #17)
> Was this information helpful? If not, can you let me know how I can assist in
> fixing this bug?

For those still following: disabling smart support on the device prevents the second (dangerous) ioctl, and as a result no more HSM violations.

smartctl --smart off /dev/sda

Revision history for this message

MFV (mfv) wrote on 2010-02-24:

#188

I'd disagree, theres a perfectly good reason to blame Ubuntu, their NBR doesn't work on Netbooks. In my book thats an epic FAIL and probably spells the end of Linux's chance on netbooks given it is the highest profile distro and they just don't appear interested in fixing it.

To those having problems recovering devices, I recommend repartioning the drive or writing to the raw devices. Milax does this well. Format, select device, analyze and purge.

Revision history for this message

GenericAnimeBoy (souletech) wrote on 2010-02-24:

#189

I'd rather not join in with hysterical finger-pointing, but it's been a month and a half, people. The workaround in #147 (modified in #160 for working systems) could easily have been implemented as a script, packaged up, and pushed out via the software updater by now, and the fix could have been rolled into the Karmic .iso's. You have to realize just how critical this is: the only reason I even became aware of this problem was that when the SSD hung up during boot (as a result of this bug) my gnome-applets failed to load. For every affected user who is on launchpad following this bug, I would guess there have been at least 3 others who have just ignored the occasional glitches, thinking that it's nothing major.

How many drives have been destroyed in the month and a half since the workaround was published? Replacing an SSD in a netbook is an expensive, time consuming, and (if you do it yourself) warranty voiding operation.

Revision history for this message

GenericAnimeBoy (souletech) wrote on 2010-02-24:

#190

Hotfix script for implementing fix #160 on working systems Edit (629 bytes, text/plain)

Just a rough implementation of #160 as a script. Probably needs to be prettied up for public consumption.

Revision history for this message

Raf (4283534-noduck) wrote on 2010-02-24:

#191

I found a possibly easier work around: after I disabled SMART support on the device, I can safely run devkit-disks-probe-ata-smart (or udisks-probe-ata-smart in lucid):

sudo smartctl --smart off /dev/sda

You can check if smart is disabled, with 'sudo smartctl -i /dev/sda', the output should include (note the last line):

SMART support is: Available - device has SMART capability.
SMART support is: Disabled

Note the following comment in the smartctl manual: "In principle the SMART feature settings are preserved over power-cycling, but it doesn´t hurt to be sure." I have not yet rebooted.

Looking at the strace of devkit-disks/udisks-probe-ata-smart, I see that the second (dangerous) ioctl is not executed when smart is disabled.

Revision history for this message

Raf (4283534-noduck) wrote on 2010-02-24:

#192

I just rebooted, and SMART was enabled again. So this doesn't work through a reboot. Sorry.

Revision history for this message

Rick @ rickandpatty.com (rick-rickandpatty) wrote on 2010-02-24:

#193

@Raf

As you say, that won't survive a reboot - but your workaround in #172 might be a good thing to add between steps 2 (zeroing the drive) and 3 (starting the installer) of the Karmic installation procedure in #147. Simply starting up the partitioner during installation will trigger the bug and trash some SSDs - like the original factory SSD in my Asus Eee 701 8G

Revision history for this message

Guy Taylor (thebiggerguy) wrote on 2010-02-24:

#194

Hi all
My research has found this only affects:

Company Model Name Model Number Firmware Version
--------------------------------------------------------------------------------------------------------------------------------------------------
Intel Z-P230 SSDPAMM0008G1 Unknown affected firmware. Inclusive of "Ver2.J0H" and "Ver2.I0K"
Seagate STEC PATA 8GB Unknown affected firmware. Inclusive of "D5221-10"
Unknown Flash Module Unknown affected firmware. Inclusive of "Ver3.P0B"

Could people with the problem please run "sudo hdparm -i /dev/sda" (replacing the sda with the problematic drive) to confirm this or flag any other drives and or identify 'fixed' firmware.

Thank you

Revision history for this message

Guy Taylor (thebiggerguy) wrote on 2010-02-24:

#195

Current List of known drives Edit (617 bytes, text/plain)

Sorry all for the formatting. I have attached a text file instead so you can actually read it.

Revision history for this message

GenericAnimeBoy (souletech) wrote on 2010-02-24:

#196

hdparm.txt Edit (611 bytes, text/plain)

My hdparm is attached. The SSD in question is the aftermarket 16GB Supertalent SSD everyone's talking about, and it looks like the firmware version already appears on your list.

FWIW, I've only had minor issues with this one: it would occasionally hang up during boot and cause several gnome applets to fail to start. I had an Intel SSDPAMM0008G, which was the original SSD in this netbook [Acer Aspire One ZG5] which cratered the first time booted 9.10 from it. I guess I know now why that happened.

Revision history for this message

shadowblast101 (shadowblast101) wrote on 2010-02-24:

#197

PatriotHDParam.txt Edit (614 bytes, text/plain)

Here's the patriot 32gb SSD that a couple of us have. I have had this one completely corrupt on both Ubuntu and Arch before applying the patch.

Revision history for this message

MFV (mfv) wrote on 2010-02-24:

#198

hdparm.eee900orig.txt Edit (1.0 KiB, text/plain)

Asus EEE 900 (orig Celeron model)

Phison devices.

Revision history for this message

LarryGrover (lgrover) wrote on 2010-02-25:

#199

supertalent32GB.txt Edit (616 bytes, text/plain)

Acer Aspire One with replacement Super Talent 32 GB SSD. I experienced stalls during boot up and HSM error messages in logs, but no data corruption. Drive info attached.

Revision history for this message

Raf (4283534-noduck) wrote on 2010-02-25:

#200

supertalent32GB.txt Edit (2.3 KiB, text/plain)

I have the same Super Talent 32 GB SSD. But it identifies it self simply as 'Flash' (see attached output). I think the easiest way to implement a blacklist will be in the udev rules, so I included the output of 'udevadm info --query=all --path /block/sda'.

@LarryGrover: I have the same device. While debugging this problem (repeated runs of devkit-disks-probe-ata-smart) I did get (non-permanent) disk corruption. fsck placed several files in /lost+found!

Revision history for this message

Samizdata (samizdata) wrote on 2010-02-27:

#201

MyAcerAspireOnehdparm.txt Edit (1.6 KiB, text/plain)

I have the SSDPAMM0008G1, FwRev=Ver2.I0K SSD in my Acer Aspire One, but I am including the data for the sake of completeness. I have not had it brick, but I did have problems with timing out and the "lost+found" data corruption. I am currently running with the "quick", non-redirect workaround successfully. I have also attached both sets of data mentioned.

Revision history for this message

ipig (infopiggy) wrote on 2010-03-04:

#202

Stock Asus EEE 900 (Celeron) / White 12G?

SSDs: Phison / 4GB (Primary) - 8GB (2ndary)

- ~Same hardware as post #179 -

On remix 9.10/32 & regular 9.10/32 - HSM Violation

(Used) Post #147 (upon 3rd install!) fixed issues.

- Had disk utility crash on 3rd/fix install during format. Hoping it's not bad blocks?

Revision history for this message

MFV (mfv) wrote on 2010-03-06:

#203

Same fix in lucid requires editting 80-udisks.rules. Search for 'smart' and look for the entry.

Revision history for this message

basily (basily) wrote on 2010-03-12:

#204

A me too... Acer Aspire One with replacement Super Talent 32 GB SSD, UNR 9.10. I had very slow boot times, but not data corruption. I have successfully implemented the work around in posting #147. Now boot times are around 27 seconds to full desktop and wifi connection.

Revision history for this message

Alain SAURAT (maisondouf) wrote on 2010-03-13:

#205

dmesg without and with PCIe SSD Edit (5.6 KiB, text/plain)

I have a similar problem with an EeePC 701 ugraded with two SSD.

Originaly equiped with 4Gb SSD onto the mobo, Jaunty, Karmic and Lucid initialize it in UDMA66 mode without any problems and the read speed is around 30Mb/s.

I upgrade this notebook by adding a 32Gb PCIe PATA SSD in the extension connector.
BIOS well reconize them as Secondary Master for internal SSD and as Secondary Slave for PCIe SSD.

Now the kernel spend 3 timouts of 30 seconds to downgrade ata protocol from UDMA66 to PIO4 for the internal 4Gb SSD.
The PCIe SSD is directly use in UDMA66 mode.

After booting the read speed is 3Mb/s on internal SSD and 40Mb/s on PCIe SSD

I try, as I read here, to add "libata.dma=0" to the grub.cfg file, in this case timouts disapear but read speeds are very low on all disks

To avoid timouts and have a good read speed on /dev/sdb, can I deactivate UDMA mode only on /dev/sda and how ?

ps: I try an USB WinXP, there is no timout during boot and speeds are the sames ( Internal 4Mb/s, PCIe 40Mb/s )

Revision history for this message

Alain SAURAT (maisondouf) wrote on 2010-03-13:

#206

Whouauuuhhh !

with "libata.force=2.00:pio4" option, kernel initialize internal SSD directly in PIO mode, so no timouts.

Option found in [url]http://www.kernel.org/doc/Documentation/kernel-parameters.txt[/url]
Syntax found in [url]http://docs.blackfin.uclinux.org/kernel/generated/libata/[/url]

Read speed stay very low on internal SSD but it doesn't matter for me.

Revision history for this message

Richard Ayotte (rich-ayotte) wrote on 2010-03-14:

#207

Setting libata.force=noncq fixed it for me. Disabling smart as describe in the workaround or forcing pio4 had no effect.

Hardware: Acer Aspire One

Here's what I did.

sudo gedit /etc/default/grub

Change the line that says:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
to
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash libata.force=noncq"

sudo update-grub

reboot.

Revision history for this message

Vishal Rao (vishalrao) wrote on 2010-03-15: Re: [Bug 445852] Re: devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential hardware death

#208

aha! so i found another user needing the same workaround!

i had mentioned this in this bug and another and also sent a patch to
LKML which wasn't
safe enough to go in and also blogged about it: http://lahsiv.net/blog/?p=47

On 15 March 2010 02:10, Richard Ayotte <email address hidden> wrote:
> Setting libata.force=noncq fixed it for me. Disabling smart as describe
> in the workaround or forcing pio4 had no effect.
>
> Hardware: Acer Aspire One
>
> Here's what I did.
>
> sudo gedit /etc/default/grub
>
> Change the line that says:
> GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
> to
> GRUB_CMDLINE_LINUX_DEFAULT="quiet splash libata.force=noncq"
>
> sudo update-grub
>
> reboot.
>
> --
> devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential hardware death
> https://bugs.launchpad.net/bugs/445852
> You received this bug notification because you are a direct subscriber
> of the bug.
>

--
"Thou shalt not follow the null pointer for at its end madness and chaos lie."

Revision history for this message

Vishal Rao (vishalrao) wrote on 2010-03-15:

#209

See comment # 141 here https://bugs.launchpad.net/libatasmart/+bug/445852/comments/141 for the other bug https://bugs.launchpad.net/bugs/502219

Revision history for this message

Steve (sjc-carpanet) wrote on 2010-03-18:

#210

I do not have an SD disk, but I do have the exact same errors:

[ 3252.000066] ata1: lost interrupt (Status 0x58)
[ 3252.004027] ata1: drained 32768 bytes to clear DRQ.
[ 3252.091644] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 3252.091669] ata1.01: cmd a0/00:00:00:00:00/00:00:00:00:00/b0 tag 0
[ 3252.091672] cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 3252.091675] res 58/00:01:00:12:00/00:00:00:00:00/b0 Emask 0x2 (HSM v
[ 3252.091684] ata1.01: status: { DRDY DRQ }
[ 3252.091726] ata1: soft resetting link
[ 3252.332346] ata1.00: configured for UDMA/100
[ 3252.348516] ata1.01: configured for MWDMA2
[ 3252.348839] ata1: EH complete

It looks like the problem might be with the cdrom?

Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA Model: ST96812A Rev: 3.05
  Type: Direct-Access ANSI SCSI revision: 05
Host: scsi0 Channel: 00 Id: 01 Lun: 00
  Vendor: MATSHITA Model: UJDA775 DVD/CDRW Rev: 1.00
  Type: CD-ROM ANSI SCSI revision: 05

In any case, I applied the workaround in question, and it definitely happens less often, but still happens to me pretty frequently.

Revision history for this message

Paede (patrick-steiner-gmx) wrote on 2010-03-19:

#211

@Steve

I also have the same problem on a normal IDE disk. What type of Notebook do you have? And also the same type of cdrom:

Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA Model: WDC WD2500BEVE-0 Rev: 01.0
  Type: Direct-Access ANSI SCSI revision: 05
Host: scsi0 Channel: 00 Id: 01 Lun: 00
  Vendor: MATSHITA Model: UJ-822Da Rev: 1.02
  Type: CD-ROM ANSI SCSI revision: 05

this error happend to me when i move my laptop. i don't know if it cause by the HP Mobile Data Protection System

Revision history for this message

Raf (4263004-noduck) wrote on 2010-03-20:

#212

The first beta of Lucid was released and we have not yet found a solution for the HSM violations and corruption. I would really hope that we can find a way to fix this before the final release.

A proper solution would be a patch for libatasmart, but I have not seen any progress.

A first workaround would be to disable the use of udisks-probe-ata-smart in 80-udisks.rules. I haven't found anything using the result of the SMART test (ID_ATA_FEATURE_SET_SMART, ID_ATA_FEATURE_SET_SMART_ENABLED, UDISKS_ATA_SMART_IS_AVAILABLE). And I don't think they are documented.

An alternative workaround would create a blacklist in 80-udisks.rules, so that SMART test is not run on the devices identified above (and possible others).

I would like to know if the developers are willing to accept either of these workarounds.

Revision history for this message

Raf (4263004-noduck) wrote on 2010-03-20:

#213

I should have written:

I would like to know if the *maintainers* are willing to accept either of these workarounds.

Revision history for this message

ipig (infopiggy) wrote on 2010-03-22:

#214

(Ref Post #183)

This bug gave me hell this weekend.

I had 9.10 running fine all month with remix.

I decided to nuke my netbook & later put 9.10 back on.

Upon installing 9.10 I forgot the timing of post #147. I finished the installation & was confronted with HSM Violations (again)

After the install i attempted the fix in post #188. That fix did not work. Attempting to undo 188 i was confronted with the inability of writing to the disk (save) - i could not undo the changes.

I decided to do another install this time not forgetting to apply 147 prior to re-boot. I was then unable to finish any installation normally. (aka never got to that point)

Installations would hang @ 38% (copying files iirc) & i noticed the HD/LED light
would begin flashing in a timed (1/2 sec per) fashion. I'd tried 3 installs partitioning (even slightly diff sizes) & formatting - Nothing made a difference.

I'd also tried the dd command (post 147) in between - but i don't think i did it correctly as it only took about 7-10 minutes (on 8GB). - I believe i'd ran it on a partition instead of disk.

Later on i noticed hitting the power button (@ the 38% hang) dropped me to a screen that displayed **The machine was in an infinite loop of HSM violation errors** (over & over & over)- In sync with the flashing HD light.

It seemed regardless of post #147 - the bug effects the machine earlier than that. That or the bug had still been dragging along all this time.

I'd tried using a separate gparted livecd to format & partition & it made no difference on installations failing.

Literally a day later i ran the dd command again (post 147) this time correctly & it took about 1.5 hours. (i ran it from 8.04 live/cd)

I had a feeling things would then go differently & they did. I managed to get 9.10 installed - HOWEVER i did still receive a disk utility crash (i believe during formatting - it's difficult to tell when it occurs because all it does is put a small red icon in the task bar) - I believe i put in #147 correctly (hell i'd done it before)

So i am thinking 'finally' this has been dealt with.

Wrong.

After installation i realized for some reason i'm unable to install any packages or updates. Seemingly anything. It seemed i could write to the disk OK but reading/installing packages/updates resulted in input/output errors displayed in the terminal/details.

I battled with this for a while but then i just gave up. F'it.

I'm willing to put 4-6 hours in but once it starts pushing beyond that people just can't be expected to deal with this. I was quite angry by the time i gave up & i am still a bit disgusted with this. No doubt that i've spent 8 or more hours in some way dealing with this problem.

I am not pleased upon hearing there's no fix in Lucid.

If the difference between getting a patch worked on & not is me packing up my netbook and shipping it off then i might be willing. I am a fairly loyal eee pc fan/user - when i think of netbook i think 'eee pc'.

What's bothersome is the apparent netbook remix edition. What part of netbook didn't include EEE PCs?