devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential hardware death
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
EasyPeasy Overview |
Fix Released
|
High
|
Unassigned | ||
Linux |
Won't Fix
|
Medium
|
|||
libatasmart |
Confirmed
|
Low
|
|||
devicekit-disks (Fedora) |
New
|
Undecided
|
Unassigned | ||
devicekit-disks (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
Karmic |
Fix Released
|
Critical
|
Martin Pitt | ||
Lucid |
Invalid
|
Undecided
|
Unassigned | ||
libatasmart (Ubuntu) |
Fix Released
|
High
|
Martin Pitt | ||
Karmic |
Won't Fix
|
Medium
|
Unassigned | ||
Lucid |
Fix Released
|
High
|
Martin Pitt |
Bug Description
TEMPORARY WORK AROUND FOR THIS PROBLEM IN KARMIC: (This is now also in karmic-proposed and needs testing feedback):
1. sudo gedit /lib/udev/
2. locate the following lines (about 1/3 the way into the file; search for "smart")
# ATA disks driven by libata
KERNEL=
3. comment out the second line by adding a # in front, so you should have
# ATA disks driven by libata
#KERNEL=
4. save the file and reboot
TECHNICAL ANALYSIS: https:/
LUCID STATUS: https:/
KARMIC SOLUTION: https:/
BUG DESCRIPTION FOLLOWS:
In the Karmic beta I experience ssd stalls during the boot process. It happens almost everytime before xsplash loads and happens again frequently between logging into gdm and the desktop loading. When it happens during login I think it is making gnome time out on loading panel items as I get errors related to lots of panel items failing to load. If I log out and back in again when the ssd isn't stalled the panel items load fine.
When it happens the following messages appear before xplash (or in dmesg when it happens after gdm):
ata2: lost interrupt (Status 0x58)
ata2: drained 16384 bytes to clear DRQ.
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata2.00: BMDMA stat 0x4
ata2.00: cmd c8/00:40:
res 58/00:40:
ata2.00: status: { DRDY DRQ }
ata2: soft resetting link
ata2.00: configured for UDMA/66
ata2: EH complete
I did not have this issue in jaunty with this hardware and I don't think it has happened once the system is fully loaded. I am running karmic unr on an Acer Aspire One netbook.
ProblemType: Bug
AplayDevices:
**** List of PLAYBACK Hardware Devices ****
card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
Architecture: i386
ArecordDevices:
**** List of CAPTURE Hardware Devices ****
card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
Mixer name : 'Realtek ALC268'
Components : 'HDA:10ec0268,
Controls : 9
Simple ctrls : 6
CheckboxSubmission: 12ef539f3788bfb
CheckboxSystem: c69722ecac76486
Date: Wed Oct 7 17:54:56 2009
DistroRelease: Ubuntu 9.10
HibernationDevice: RESUME=
MachineType: Acer AOA110
Package: linux-image-
ProcCmdLine: BOOT_IMAGE=
ProcEnviron:
LANG=en_CA.UTF-8
SHELL=/bin/bash
ProcVersionSign
RelatedPackageV
RfKill:
0: phy0: Wireless LAN
Soft blocked: no
Hard blocked: no
SourcePackage: linux
Tags: ubuntu-unr
Uname: Linux 2.6.31-12-generic i686
XsessionErrors:
(gnome-
(gnome-
(nautilus:2092): Eel-CRITICAL **: eel_preferences
(polkit-
(gnome-
dmi.bios.date: 10/06/2008
dmi.bios.vendor: Acer
dmi.bios.version: v0.3309
dmi.board.
dmi.board.vendor: Acer
dmi.board.version: Base Board Version
dmi.chassis.type: 1
dmi.chassis.vendor: Chassis Manufacturer
dmi.chassis.
dmi.modalias: dmi:bvnAcer:
dmi.product.name: AOA110
dmi.product.
dmi.sys.vendor: Acer
Related branches
theluketaylor (ekul-taylor) wrote : | #1 |
- dmesg.log Edit (51.2 KiB, text/plain)
- AlsaDevices.txt Edit (403 bytes, text/plain; charset="utf-8")
- BootDmesg.txt Edit (48.2 KiB, text/plain; charset="utf-8")
- Card0.Amixer.values.txt Edit (1.2 KiB, text/plain; charset="utf-8")
- Card0.Codecs.codec.0.txt Edit (7.4 KiB, text/plain; charset="utf-8")
- CurrentDmesg.txt Edit (3.1 KiB, text/plain; charset="utf-8")
- Dependencies.txt Edit (1.4 KiB, text/plain; charset="utf-8")
- IwConfig.txt Edit (571 bytes, text/plain; charset="utf-8")
- Lspci.txt Edit (13.7 KiB, text/plain; charset="utf-8")
- Lsusb.txt Edit (382 bytes, text/plain; charset="utf-8")
- PciMultimedia.txt Edit (591 bytes, text/plain; charset="utf-8")
- ProcCpuinfo.txt Edit (1.3 KiB, text/plain; charset="utf-8")
- ProcInterrupts.txt Edit (1.2 KiB, text/plain; charset="utf-8")
- ProcModules.txt Edit (2.4 KiB, text/plain; charset="utf-8")
- UdevDb.txt Edit (88.6 KiB, text/plain; charset="utf-8")
- UdevLog.txt Edit (187.4 KiB, text/plain; charset="utf-8")
- WifiSyslog.txt Edit (458.8 KiB, text/plain; charset="utf-8")
theluketaylor (ekul-taylor) wrote : | #2 |
theluketaylor (ekul-taylor) wrote : | #3 |
av8r (av8r) wrote : | #4 |
It doesn't change with 2.6.31-13-generic #44 - on a EeePC 900A with upgrade RAM/SSD disk.
For me, It's usualy freeze between fsck and setting up the resolver. And once again while launching the first session - doesn't matter if it's UNR or not (I've both setup).
I had to the grub boot line: "clocksource=hpet notsc", It remove me the warning message abount tsc clock unstable but didn't change anything with stall SSD. I also remove(edit) from the grub boot line: "quiet splash"
$ dmesg | grep ata2
[ 1.253633] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xffa8 irq 15
[ 1.427461] ata2.00: CFA: Patriot Memory 64GB PATA Storage Drive, Ver2.M0G, max UDMA/66
[ 1.427461] ata2.00: 126090720 sectors, multi 1: LBA
[ 1.440008] ata2.00: configured for UDMA/66
[ 40.809047] ata2: lost interrupt (Status 0x58)
[ 40.809047] ata2: drained 2048 bytes to clear DRQ.
[ 40.811862] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 40.815862] ata2.00: BMDMA stat 0x24
[ 40.818548] ata2.00: cmd c8/00:08:
[ 40.822958] ata2.00: status: { DRDY DRQ }
[ 40.826958] ata2: soft resetting link
[ 41.000008] ata2.00: configured for UDMA/66
[ 41.000008] ata2: EH complete
[ 232.820015] ata2: lost interrupt (Status 0x58)
[ 232.820081] ata2: drained 8192 bytes to clear DRQ.
[ 232.848018] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 232.848018] ata2.00: BMDMA stat 0x24
[ 232.848018] ata2.00: cmd c8/00:20:
[ 232.848018] ata2.00: status: { DRDY DRQ }
[ 232.848018] ata2: soft resetting link
[ 233.020008] ata2.00: configured for UDMA/66
[ 233.020008] ata2: EH complete
[ 273.820016] ata2: lost interrupt (Status 0x58)
[ 273.820089] ata2: drained 6144 bytes to clear DRQ.
[ 273.837834] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 273.843269] ata2.00: BMDMA stat 0x24
[ 273.849571] ata2.00: cmd c8/00:18:
[ 273.860739] ata2.00: status: { DRDY DRQ }
[ 273.866515] ata2: soft resetting link
[ 274.041007] ata2.00: configured for UDMA/66
[ 274.041007] ata2: EH complete
[ 407.436313] ata2.00: ACPI cmd ef/03:44:
[ 407.436313] ata2.00: ACPI cmd ef/03:0c:
[ 407.436313] ata2.00: ACPI cmd c6/00:01:
[ 407.452005] ata2.00: configured for UDMA/66
[ 407.468005] ata2.00: configured for UDMA/66
[ 407.468005] ata2: EH complete
Changed in linux (Ubuntu): | |
status: | New → Confirmed |
redDEADresolve (reddeadresolve) wrote : | #5 |
- dsmeg output Edit (45.2 KiB, text/plain)
I am also getting the same error on My Dell Mini 9.
[38.825065] ata1.00 exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[38.825227] BMDMA Stat 0x24
[38.825318] ata1.00:cmd c8/00:18:
[38.825321] res 58/00:18:
[38.825598] ata1.00: status {DRDY DRQ}
Occasionally I get sent to the root prompt to manually run an fsck.
redDEADresolve (reddeadresolve) wrote : | #6 |
Gav Mack (gavinmac) wrote : | #7 |
I have identical issues with my Aspire One A110 with a SuperTalent SSD 32Gb upgrade as the OP of this bug - it makes Karmic take almost 3 minutes to start the first time and at least 3 restarts later (with ever decreasing time) I get a relatively stable desktop.
Gav Mack (gavinmac) wrote : | #8 |
Gav Mack (gavinmac) wrote : | #9 |
Johan Van den Neste (jvdneste) wrote : | #10 |
I have the same configuration as Gav Mack (Aspire One A110 with a SuperTalent SSD 32Gb upgrade). Same problem here.
Johan Van den Neste (jvdneste) wrote : | #11 |
It is easy to reproduce simply by starting gparted. The error is then produced twice just as before while 'searching /dev/sda partitions'. As a result, the 'searching /dev/sda partitions' activity in gparted takes a long time.
professordes (d-a-johnston-hw) wrote : | #12 |
A "me too" on an eeePC 901 with an upgraded crucial SSD and an upgrade install of karmic RC
The relevant bit in dmesg is:
[ 35.816124] ata2: lost interrupt (Status 0x58)
[ 35.820096] ata2: drained 2048 bytes to clear DRQ.
[ 35.823180] ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 35.826268] ata2.01: BMDMA stat 0x64
[ 35.829292] ata2.01: cmd c8/00:08:
n
[ 35.829295] res 58/00:08:
iolation)
[ 35.835971] ata2.01: status: { DRDY DRQ }
The machine is also (mostly) failing to pick up an SDHC card in the reader, which wasn't the case in 9.04
Adam Gianola (adam-gianola) wrote : | #13 |
Same story here. Dell Mini 9 upgraded with a Super Talent FEM16GFDL 16 GB SSD. I experience this both after the upgrade from 9.04 to 9.10 as well as on a clean install of 9.10.
I can also confirm starting gparted reproduces the dmesg output normally seen after (long) boot up.
Andrew Simpson (andrew-simpson) wrote : | #14 |
Another 'me too'.
Just upgraded an Acer Aspire One A110 (ZG5) from existing (factory installed) 8 GB SSD to Super Talent 16 GB (FEM16GF13M).
Running the LiveCD (on USB stick) with 9.10 RC, then opening gParted shows the essentially the same messages in dmesg as other reports (and it takes a long time).
Everything else seems fine.
danq989 (danq989) wrote : | #15 |
Me too!
I have the same configuration as Gav Mack (including upgraded 32GB SSD) with the same results.
Verified on both a 9.04 to 9.10 upgrade and a fresh 9.10 install on a freshly erased and partitioned drive. Same problem on boot and in gparted. Seems to only occur during mounting of the drive (possibly during initial mount and then remount as -rw)
---danq989
Andrew Simpson (andrew-simpson) wrote : | #16 |
I have linked this bug report to (what looks to be) the same problem at the kernel bug tracker. Not sure I've done the linking correctly ;-)
Changed in linux: | |
status: | Unknown → Confirmed |
Kory (postmako) wrote : | #17 |
Me too! AAO ZG5 running stock 8GB drive. Jaunty was booting in about 20 seconds and Karmic is taking about 90 seconds. I am attaching parts from dmesg and the most recent copy of bootchart.
...
[ 7.242403] input: SynPS/2 Synaptics TouchPad as /devices/
[ 36.820096] ata2: lost interrupt (Status 0x58)
[ 36.824029] ata2: drained 2048 bytes to clear DRQ.
[ 36.827217] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 36.827227] ata2.00: BMDMA stat 0x4
[ 36.827248] ata2.00: cmd c8/00:08:
[ 36.827251] res 58/00:08:
[ 36.827258] ata2.00: status: { DRDY DRQ }
[ 36.827302] ata2: soft resetting link
[ 36.996463] ata2.00: configured for UDMA/66
[ 36.996497] ata2: EH complete
[ 37.001030] Clocksource tsc unstable (delta = -133907975 ns)
[ 37.042527] ath5k 0000:03:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[ 37.042586] ath5k 0000:03:00.0: setting latency timer to 64
[ 37.042686] ath5k 0000:03:00.0: registered as 'phy0'
...
[ 56.778794] groups: 1 0
[ 335.989086] ata2: lost interrupt (Status 0x58)
[ 335.993064] ata2: drained 2048 bytes to clear DRQ.
[ 335.996576] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 335.996588] ata2.00: BMDMA stat 0x4
[ 335.996609] ata2.00: cmd c8/00:08:
[ 335.996613] res 58/00:08:
[ 335.996623] ata2.00: status: { DRDY DRQ }
[ 335.996675] ata2: soft resetting link
[ 336.168420] ata2.00: configured for UDMA/66
[ 336.168453] ata2: EH complete
[ 372.004111] ata2: lost interrupt (Status 0x58)
[ 372.008033] ata2: drained 2048 bytes to clear DRQ.
[ 372.011577] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 372.011588] ata2.00: BMDMA stat 0x4
[ 372.011609] ata2.00: cmd c8/00:08:
[ 372.011613] res 58/00:08:
[ 372.011623] ata2.00: status: { DRDY DRQ }
[ 372.011674] ata2: soft resetting link
[ 372.184426] ata2.00: configured for UDMA/66
[ 372.184458] ata2: EH complete
[ 372.586276] gdu-notificatio
[ 377.073516] wlan0: authenticate with AP 00:0f:66:b9:59:0f
[ 377.079461] wlan0: authenticated
[ 377.079473] wlan0: associate with AP 00:0f:66:b9:59:0f
[ 377.085704] wlan0: RX AssocResp from 00:0f:66:b9:59:0f (capab=0x11 status=0 aid=5)
[ 377.085716] wlan0: associated
...
And it happens from time to time after login...
Johan Van den Neste (jvdneste) wrote : | #18 |
I'd like to point out that even though the delays in the boot process are annoying, what is worse is the series of applets crashing when logging in to gnome. Usually I cannot log out again because that applet has crashed. So I switch to tty1 and do a 'sudo service gdm restart'. The next and subsequent logins are fine until the next reboot.
Kory (postmako) wrote : Re: [Bug 445852] Re: SSD stall during boot | #19 |
Yeah I started to notice that kind of stuff as well. That is why I'm
leaving Jaunty on my wife's machine.
On Sat, Oct 31, 2009 at 8:51 AM, Johan Van den Neste <email address hidden>wrote:
> I'd like to point out that even though the delays in the boot process
> are annoying, what is worse is the series of applets crashing when
> logging in to gnome. Usually I cannot log out again because that applet
> has crashed. So I switch to tty1 and do a 'sudo service gdm restart'.
> The next and subsequent logins are fine until the next reboot.
>
> --
> SSD stall durin g boot
> https:/
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading. When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load. If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:
> res 58/00:40:
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded. I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: luke 1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
> Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
> Mixer name : 'Realtek ALC268'
> Components : 'HDA:10ec0268,
> Controls : 9
> Simple ctrls : 6
> CheckboxSubmission: 12ef539f3788bfb
> CheckboxSystem: c69722ecac76486
> Date: Wed Oct 7 17:54:56 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=
> MachineType: Acer AOA110
> Package: linux-image-
> ProcCmdLine: BOOT_IMAGE=
> root=UUID=
tags: | added: ubuntu |
ownyourown (ownyourown) wrote : Re: SSD stall during boot | #20 |
My Asus eee pc 900 also affected by this bug. (Fresh install of final Ubuntu 9.10)
Saif Ahmed (saif) wrote : | #21 |
A me too here
eeepc 900 fresh install of final 9.1.
Moreover if I have any kind of usb flash drives attached, machine doesn't complete boot at all.
andrey i. mavlyanov (andrey-mavlyanov) wrote : | #22 |
Moreover. I got this error on non-SSD drive. Check https:/
Andrew Simpson (andrew-simpson) wrote : | #23 |
@andrey i. mavlyanov
Andrey,
I don't think that this is the same bug.
On this line:
Nov 4 08:18:45 aim-laptop kernel: [35132.010175] res 40/00:00:
You are getting a 'timeout', whereas this bug is causing 'HSM Violations'.
Adam Gianola (adam-gianola) wrote : | #24 |
- hdparm -I /dev/sda Edit (1.8 KiB, text/plain)
Interestingly, if I use the Dell Mini 9 Factory SSD with 9.10 this problem goes away.
Andrew Simpson (andrew-simpson) wrote : | #25 |
Playing with LiveCD (on a USB stick) with an Aspire One with Super Talent 16Gb SSD:
- Normal LiveCD boot shows the problem in dmesg.
- Booting with 'libata.dma=0' in kernel line fixes the problem (by disabling DMA) in dmesg.
- Booting with 'libata.
Since the problem looked to be DMA related, I tried slowing down the transfer with 'libata.
The same machine is working fine with Jaunty. Another Aspire One with the standard (factory) 8 Gb SSD is running Karmic without any problem.
mdyn (tamerlaha-gmail) wrote : | #26 |
acer aoa-110 some problem....
danq989 (danq989) wrote : | #27 |
I just verified that this bug still present for me in the just-released kernel 2.6.31-15.
Alan Pope 🍺🐧🐱 🦄 (popey) wrote : | #28 |
Linux kernel bug 14515 has nothing to do with this
Changed in linux: | |
importance: | Unknown → Undecided |
status: | Confirmed → New |
Dave V (mindkeep) wrote : | #29 |
Affects my asus eeepc 900. Please raise to critical before I have to learn to hassle with Gentoo again.
Changed in linux: | |
importance: | Undecided → Unknown |
status: | New → Unknown |
Changed in linux: | |
status: | Unknown → Confirmed |
Horácio (horacioh) wrote : | #30 |
I had exactly the same problem (boot stall) on an Asus eee 900. After 2 weeks of use, i got a grub -error: "error: biosdisk read error" and the system become completely useless, pending a disk wipe and full reinstall.
Similar situations are reported on: https:/
considered a duplicate of this bug.
But I do not see the grub-error problem reported here. May this be a different bug?
Alan Pope 🍺🐧🐱 🦄 (popey) wrote : | #31 |
Ok, so after running 9.10 and discovering this issue. I now have booted off a 9.04 USB stick and dd'ed /dev/zero over both sda and sdb. I then installed 9.04 and have no issues.
So the hardware is not faulty.
Andrew Simpson (andrew-simpson) wrote : | #32 |
A possible work around from the upstream bug report is to boot with 'irqpoll' in the kernel boot parameters. It's not a good fix, the logs are still full of error messages, but at least the 'stall' is reduced.
Regrettably, it's probably best to avoid using Karmic on SSD equipped netbooks. Use Jaunty instead, since this bug probably won't be fixed in the near future.
Rick @ rickandpatty.com (rick-rickandpatty) wrote : | #33 |
I have the same issue as reported above with an eeepc 701 (with 16GB SuperTalent SSD, and also with the original 8GB SSD).
An alternate method of fixing a Karmic-corrupted SSD - at least on the 701 - is to boot with ASUS's rescue DVD and allow it to reinstall the default Xandros installation.
With Karmic installed, I can confirm that kernel option "irqpoll" stops the stall during a Karmic boot, but does anyone know if that stops Karmic from messing up the SSD?
Johan Van den Neste (jvdneste) wrote : | #34 |
I'm a bit disappointed that the previously mentioned kernel bug is discarded so quickly. Could it not still be related? There are indeed no *optical* drives to be polled, however, on the acer one there are 2 card readers (= pollable removeable media drives). Since the kernel bug report claims that the problem is caused by a removeable media drive choking on the polling commands, could one of the card readers not be the cause?
So I tried 'hal-disable-
One reader marked as 'storage extension' is /dev/mmcblk0, and is apparently not seen as a removeable device (message by hal-disable-
Anyone else care to investigate on his/her laptop? (I'd really hate to switch back to 9.04)
Johan Van den Neste (jvdneste) wrote : | #35 |
I should add that boot-time, gnome login and gparted startup are typically moments where I'd expect polling for media to take place.
Johan Van den Neste (jvdneste) wrote : | #36 |
I notice changed behaviour when there are sd cards inserted: The stall is longer with 2 cards inserted than with 1 card, and 1 card is worse than no cards, though it does not disappear. How would I disable the card readers entirely? (I see sdhci-pci mentioned, and when there are cards inserted, the drives are detected as mmc0 and mmc1)
Rick @ rickandpatty.com (rick-rickandpatty) wrote : | #37 |
The eeepc 701 has one SD reader that can be disabled in the BIOS. Disabling it doesn't seem to affect the SSD issues at all on my system.
Andrew Simpson (andrew-simpson) wrote : | #38 |
@Johan
Interesting comment. I have private doubts that this bug is totally due to hardware 'quality' problems (see the current kernel bug report).
If the hardware was at fault then: firstly, the bug would not be spread over such a range of differing hardware, and secondly, Ubuntu 9.04 should also be failing in a similar manner?
Kory (postmako) wrote : Re: [Bug 445852] Re: SSD stall during boot | #39 |
Well after having this issue for a couple of weeks now my netbook will no
longer boot. I ran a live USB stick and gparted can't even read the
partition. As soon as I have time, I'm going back to 9.04 and I suggest
everyone else do the same before their drive craps out like mine did. It
seems to write out bad sectors or destroy your data as well because my
wireless stopped working and I rebooted hoping it would fix the problem. So
all of my netbooks will wait until this issue is resolved before upgrading.
Enjoy!
danq989 (danq989) wrote : Re: SSD stall during boot | #40 |
@Andrew
Couldn't agree more that this is a code regression and not simply a hardware quality issue.
I haven't checked the various changelogs, but I wouldn't be surprised if something in the IDE IRQ handler or the hardware initialization was subjected to optimization (in libdma?). Possibly the SuperTalent drives do react in a non-standard way that was never exposed before.
Hopefully Mr. Heo will take the time to look through the code and see. I'm sure it would help if someone could supply the developers hardware that reliably produces the error. I need my netbook too much to send it away, but maybe someone has an extra SuperTalent drive?
---danq989
Johan Van den Neste (jvdneste) wrote : | #41 |
@Andrew
Don't get me wrong, i'm also convinced it's a software issue. I know little about the linux kernel, but I know a lot about concurrent programming, and I know certain bugs only manifest themselves under very specific conditions usually related to very subtle differences in timing (which may also only happen on different processors, different number of cores and so on).
Not that I'm saying it's a concurrency bug.
All I'm saying is that different devices influence the timing of all sorts of events and that specific combinations of hardware may trigger specific bugs, and I was guessing that maybe - just maybe - the combination of the ssd with the card readers triggered this one. (since the timing characteristics of ssd's are after all quite different from regular hard drives)
So I would still like to try and disable the card readers entirely (which I cannot do in the bios). But yes, it's a long shot.
Gav Mack (gavinmac) wrote : | #42 |
I have noticed if I power up the AAO without mains power more often than not it only freezes on the first part of the boot process and not during GDM making all the applets fail, which gives me the most stable I've got Karmic working so far. Fsck seems to want to run every boot now just before the freeze. It's enough for me to stick with Karmic unless the SSD gets trashed like Kory but I'm running EXT4 with journaling still enabled, perhaps that's why I've not suffered the corruption problem,
I've posted this issue on the Supertalent forum to see if they can possibly help with maybe a firmware update but I'm not holding my breath! http://
I use my card readers a lot, the left hand in particular for data because I'm dual booting the SSD with Windows 7 so that rules out that option in my case.
David Staples (dcstaples) wrote : | #43 |
I've been having this problem too on my Acer aoa110 (using UNR 9.10)
After a few days of using karmic and getting frustrated with the ~1:45 boot times (from grub to desktop) along with the panel configuration problems, I tried to use a PPA kernel (comment #20 in http://
I gave up and did a wipe of the drive (not using dd. wipe. took about six hours).
Read-error went away, partitioning worked fine, karmic finally installed, but no change to boot times.
Gah!
I'm going back to jaunty until 10.04 comes out. Which is a shame, because I like the new UI in karmic, but I've decided not to risk my ssd for eye candy.
David Staples (dcstaples) wrote : | #44 |
Oh yeah, here's the output of one of the logs.
Btw, sorry about the long meandering story above ;)
ata2: lost interrupt (Status 0x58)
ata2: drained 2048 bytes to clear DRQ.
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata2.00: BMDMA stat 0x4
ata2.00: cmd c8/00:08:
res 58/00:08:
ata2.00: status: { DRDY DRQ }
ata2: soft resetting link
ata2.00: configured for UDMA/66
ata2: EH complete
Jim (wilsja) wrote : | #45 |
Reproduced on a stock eeepc 900 linux version, installing karmic to the 4G ssd. (I did a dist-upgrade from jaunty, which works fine) Also, I have gotten many cases where I needed to overwrite the disk with zeros to make it usable again. (this happens after one or two boots with karmic, I then reinstall to try to fix it)
kernel is 2.6.31-14-generic
dmesg |grep ata2 gives me
[ 1.103371] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xffa8 irq 15
[ 1.344348] ata2.00: ATA-0: ASUS-PHISON OB SSD, TST2.04L, max UDMA/66
[ 1.344356] ata2.00: 7880544 sectors, multi 0: LBA
[ 1.344436] ata2.01: ATA-0: ASUS-PHISON SSD, TST2.04L, max UDMA/66
[ 1.344442] ata2.01: 31522176 sectors, multi 0: LBA
[ 1.356270] ata2.00: configured for UDMA/66
[ 1.368273] ata2.01: configured for UDMA/66
[ 1.401892] ata2.00: configured for UDMA/66
[ 1.408271] ata2.01: configured for UDMA/66
[ 1.408279] ata2: EH complete
[ 36.816048] ata2: lost interrupt (Status 0x58)
[ 36.820016] ata2: drained 8192 bytes to clear DRQ.
[ 36.834372] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 36.834380] ata2.00: BMDMA stat 0x64
[ 36.834397] ata2.00: cmd c8/00:20:
[ 36.834408] ata2.00: status: { DRDY DRQ }
[ 36.834443] ata2: soft resetting link
[ 37.028315] ata2.00: configured for UDMA/66
[ 37.036314] ata2.01: configured for UDMA/66
[ 37.060315] ata2.00: configured for UDMA/66
[ 37.068312] ata2.01: configured for UDMA/66
[ 37.068331] ata2: EH complete
[ 67.816056] ata2: lost interrupt (Status 0x58)
[ 67.820016] ata2: drained 6144 bytes to clear DRQ.
[ 67.829818] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 67.829945] ata2.00: BMDMA stat 0x64
[ 67.830019] ata2.00: cmd c8/00:18:
[ 67.830275] ata2.00: status: { DRDY DRQ }
[ 67.830379] ata2: soft resetting link
[ 68.024317] ata2.00: configured for UDMA/66
[ 68.032313] ata2.01: configured for UDMA/66
[ 68.056315] ata2.00: configured for UDMA/66
[ 68.064310] ata2.01: configured for UDMA/66
[ 68.064329] ata2: EH complete
Andrew Simpson (andrew-simpson) wrote : | #46 |
@Ubuntu Bugs
Can you please have a look at the status of this bug (currently 'undecided')? This bug could do with some input from the Ubuntu devs. Here's why:
1. The bug is occurring on a wide range of net books with SSD units. This is a growing target audience for Ubuntu.
2. In the simplest case the bug makes the machine unresponsive and impractical to use.
3. If the user continues with the above state, the machine often gets 'bricked'. Total data loss occurs, and the SSD can only be recovered with low level formatting (Normal rescue tools don't work).
Total data loss with bricked machines confirmed several times over and no apparent workaround, has to be a more than 'undecided' bug?
I have opened a bug report on the kernel bug list which is getting some high level attention. It would be good if Ubuntu was able to give some support on this.
Alan Pope 🍺🐧🐱 🦄 (popey) wrote : | #47 |
I contacted Leann on the kernel team and asked for some advice, and this is her response. I'm not at home, but at UDS so can't try this right now.. be good if someone else could:-
"Thanks for the heads up Alan. I'll get this bug on our list for review.
It seems this has been forwarded upstream as well:
http://
Care to give the latest upstream mainline kernel builds a test:
https:/
2.6.31.6 is the latest upstream stable kernel (which we're going to
release as an SRU for karmic):
http://
The Karmic kernel for SRU with the 2.6.31.6 patches is currently baking
in Stefan's PPA (2.6.31-
https:/
2.6.32-rc6 is the latest 2.6.32 release candidate
http://
Might not hurt to give both the 2.6.31.6 and 2.6.32-rc6 kernels a test
and confim the issue remains. Then relay the info to both the upstream
bug and lp bug."
Rick @ rickandpatty.com (rick-rickandpatty) wrote : | #48 |
Here's one test of 2.6.32-rc6 on an ASUS eeepc 701 with 16GB SuperTalent SSD: The original Karmic kernel also shows thus bug with the stock ASUS 8GB SSD.
$ uname -a
Linux eeepc 2.6.32-
Looks like the same thing is happening in 2.6.32-rc6.
[ 8.420522] ata2: drained 2048 bytes to clear DRQ.
[ 8.421826] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 8.421839] ata2.00: BMDMA stat 0x24
[ 8.421850] ata2.00: failed command: READ DMA
[ 8.421872] ata2.00: cmd c8/00:08:
[ 8.421876] res 58/00:08:
[ 8.421887] ata2.00: status: { DRDY DRQ }
[ 8.421948] ata2: soft resetting link
Trey (trey333) wrote : | #49 |
I want people, and especially the dev team, to know how serious this problem is. it killed my SSDs, as I reported in an duplicate bug of "unknown" importance. I mean this is hardware issue. Nothing could get either of my chips working get - not zeroing out, not a gparted live USB, not Windows / Partition magic on a USB. Not even taking out the 16gb chip and putting it in a different computer. This is physical. This is real and expensive to replace. Get Karmic off your SSD drives and dev team, please change this damn status! Having to find a new solid state drive because of your release is not of "unknown" importance.
danq989 (danq989) wrote : | #50 |
@Trey
Just a point of info for you. After I performed an in-place install of Karmic over my existing Jaunty, I found my AAO-110's SuperTalent-upgrade SSD bricked.
I managed to recover it using HDDERASE version 3.3.
This DOS program uses the "secure erase" function of the drive to completely erase the drive and set all blocks to unused. Unlike a dd of zeros, it only takes a minute or so to complete. Apparently this is a trick that folks have been using on Intel gen1 SSDs to recover performance after block fragmentation has caused a performance drop. Note that the more widely available HDDERASE versions 3.1 and 4.0 did NOT work for me (both threw different exceptions and bombed), so be sure to find and use v3.3. Guide and link here: http://
Now this won't fix things if there's enough real damage to the flash due to too many write/erase cycles, but it's worth a shot.
Good luck Trey!
---danq989
Rick @ rickandpatty.com (rick-rickandpatty) wrote : | #51 |
One more piece of information that may or may not be useful to the develpoers:
On my other laptop, I have Jaunty installed with kernel 2.6.30.9 (from http://
If I install that kernel on my eeepc 701 running Karmic, I get the same stall / HSM violation errors that I do under 2.6.31 or 2.6.32-rc6. (There are other problems with that kernel under karmic, but I mainly wanted to see if it would stop the SSD problems, and it doesn't...)
Jim (wilsja) wrote : | #52 |
Trey I'm not sure about your specific situation, but for me a regular dd didn't work, though a dd with of=/dev/sda bs=1M did work.
Just to back up what Rick is saying, I installed 2.6.28-
I'm not sure what the effects of using an old kernel on a new distribution are (for instance, the touchpad driver doesn't work), so I don't know how to isolate the problem
Skylord (me-skylord) wrote : | #53 |
Just want to confirm the whole bug on my EeePC 901 4G+16G. Installation of fresh Karmic not working at all with mentioned errors in logs and freezing of partitioning setup page. And surely all is good on Jaunty.
Raf (4283534-noduck) wrote : | #54 |
I also see these log entries on my Acer Aspire One with SUPER TALENT FEM32GF13M. However, I have not yet seen any corruption as a result of this. In fact, sometimes this does not result in a stall during boot. If it does stall the boot, it hangs for about 12 seconds.
Jim (wilsja) wrote : | #55 |
- dmesg.jaunty Edit (39.5 KiB, text/plain)
It actually seems that it gives a slightly different error message with the old kernel, but has a similar effect. This error with the old kernel only shows up after a dist-upgrade to karmic -- essentially a karmic install with the old 2.6.28 kernel
I am attaching three dmesg outputs.
dmesg.jaunty is before upgrading to karmic
dmesg.oldkern is directly after upgrading to karmic, but using the same kernel as before
dmesg.karmic is using the karmic kernel
Jim (wilsja) wrote : | #56 |
Jim (wilsja) wrote : | #57 |
lotus49 (lotus-49) wrote : | #58 |
I have the same computer and SSD (Acer Aspire One and SUPER TALENT FEM32GF13M) as well as the same symptoms as Raf above.
sles (slesru) wrote : | #59 |
this bug is not SSD specific.
here is output from my colleague's dmesg notebook dell 120L, this is hdd:
[ 6702.000206] ata1: lost interrupt (Status 0x58)
[ 6702.004018] ata1: drained 32768 bytes to clear DRQ.
[ 6702.093725] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 6702.093739] ata1.01: cmd a0/00:00:
[ 6702.093740] cdb 1e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 6702.093742] res 58/00:01:
[ 6702.093746] ata1.01: status: { DRDY DRQ }
[ 6702.093859] ata1: soft resetting link
[ 6702.340796] ata1.00: configured for UDMA/100
[ 6702.372409] ata1.01: configured for UDMA/33
[ 6702.391202] ata1: EH complete
professordes (d-a-johnston-hw) wrote : | #60 |
This bug is still present with the 31-15 kernel and all other updates applied on my eeePC 901 (1 Dec). There doesn't seem to be a lot going on at linux-kernel-bugs 14583?
Sal Mazzola (salmaz) wrote : | #61 |
I am having a similar problem with an Acer Aspire 1410. Ubuntu LiveCD booted perfectly off of the USB drive, but booting off the hard drive give me either HSM Violations or timeouts.
Setting libata.force=noncq allowed the machine to finally boot up without any errors.
Rick @ rickandpatty.com (rick-rickandpatty) wrote : | #62 |
The kernel parameter mentioned in #61 has no effect on the stall/HSM violation errors on my ASUS EeePC 701 (booted with the 9.10 Live CD and added the parameter to the kernel options, as I have had to reinstall Jaunty to use the machine daily).
Back to the drawing board, I guess.
Alan Pope 🍺🐧🐱 🦄 (popey) wrote : | #63 |
- dmesg2632.txt Edit (42.5 KiB, text/plain)
Ok, there's no way this is a hardware issue. I have on my desk two idenical Eee 900's. Both have exhibited this issue under Jaunty and both exhibit the issue under Karmic. If I roll them back to karmic and wipe the SSD along the way, the error goes away.
I have now installed 9.10 UNR on one, and have added the 2.6.32 kernel from the links in comment #47.
[ 113.816054] ata2: lost interrupt (Status 0x58)
[ 113.820008] ata2: drained 8192 bytes to clear DRQ.
[ 113.835302] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 113.835310] ata2.00: BMDMA stat 0x64
[ 113.835317] ata2.00: failed command: READ DMA
[ 113.835332] ata2.00: cmd c8/00:20:
[ 113.835335] res 58/00:20:
[ 113.835343] ata2.00: status: { DRDY DRQ }
[ 113.835393] ata2: soft resetting link
Andrew Simpson (andrew-simpson) wrote : | #64 |
The upstream bug report has asked whether anyone has tested kernels 2.6.29 or 2.6.30 as this would help narrow down when the bug was (re)introduced to the kernel.
For my own interest, I have noted that the bug generally occurs with newer/faster SSD units. For instance the Intel SSD originally fitted to the early Acer Aspire One is, well, rather slow, but doesn't show the bug (I have one). However another same machine (they were brought as a pair) upgraded with the Super Talent unit does have the problem. Anyone notice any similarity?
Alan Cox has suggested that the SSD is responding to a command so fast that the kernel misses seeing the interrupt.
lotus49 (lotus-49) wrote : Re: [Bug 445852] Re: SSD stall during boot | #65 |
I can confirm that upgrading from the stock SSD with Karmic already
installed to a Super Talent SSD produced this error which had not
previously been present.
Simon
Sent from My iPhone
On 7 Dec 2009, at 06:31, Andrew Simpson
<email address hidden> wrote:
> The upstream bug report has asked whether anyone has tested kernels
> 2.6.29 or 2.6.30 as this would help narrow down when the bug was
> (re)introduced to the kernel.
>
> For my own interest, I have noted that the bug generally occurs with
> newer/faster SSD units. For instance the Intel SSD originally
> fitted to
> the early Acer Aspire One is, well, rather slow, but doesn't show the
> bug (I have one). However another same machine (they were brought
> as a
> pair) upgraded with the Super Talent unit does have the problem.
> Anyone
> notice any similarity?
>
> Alan Cox has suggested that the SSD is responding to a command so fast
> that the kernel misses seeing the interrupt.
>
> --
> SSD stall during boot
> https:/
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process.
> It happens almost everytime before xsplash loads and happens again
> frequently between logging into gdm and the desktop loading. When
> it happens during login I think it is making gnome time out on
> loading panel items as I get errors related to lots of panel items
> failing to load. If I log out and back in again when the ssd isn't
> stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in
> dmesg when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:
> res 58/00:40:
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't
> think it has happened once the system is fully loaded. I am running
> karmic unr on an Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: luke 1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
> Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
> Mixer name : 'Realtek ALC268'
> Components : 'HDA:10ec0268,
> Controls : 9
> Simple ctrl...
theluketaylor (ekul-taylor) wrote : Re: SSD stall during boot | #66 |
I can confirm kernel 2.6.30 works fine with my netbook (Acer Aspire One with 8 GB SSD) using a Jaunty userland. I am running 2.6.30-
I am somewhat reluctant to try a karmic kernel since once I started getting the errors after installing karmic to get rid of them in any distro/kernel combination I had to write all zeros to the SSD. If it's truly necessary I will try.
Юрий Аполлов (apollovy) wrote : Re: [Bug 445852] Re: SSD stall during boot | #67 |
Telling kernel libata.force=noncq during boot-time from liveUSB had no
effect for me - still got hangups on the same place. It is interesting, that
this happens only after X and gdm is starting. Don't know why really.
Now I got this: booting into single-user mode gave an interesting
result:after getting onto root shell, in about a minute I got a message
about exception Emask, DRQ etc. But after it - another thing, that got my
interest - a message, telling that "Starting init crypto disks... [OK]"
So, my guess that it's really there - in crypto disks.
2009/12/7 theluketaylor <email address hidden>
> I can confirm kernel 2.6.30 works fine with my netbook (Acer Aspire One
> with 8 GB SSD) using a Jaunty userland. I am running
> 2.6.30-
> I am somewhat reluctant to try a karmic kernel since once I started getting
> the errors after installing karmic to get rid of them in any distro/kernel
> combination I had to write all zeros to the SSD. If it's truly necessary I
> will try.
>
> --
> SSD stall during boot
> https:/
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading. When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load. If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:
> res 58/00:40:
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded. I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: luke 1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
> Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
> Mixer name : 'Realtek ALC268'
> Components : 'HDA:10ec0268,
> Controls ...
Andrew Squire (andrewsquire) wrote : Re: SSD stall during boot | #68 |
I had this issue on my EeePC 901. I took the following actions and have not *yet* seen the issue reappear:
1. Boot from 9.10 NBR Live CD
2. dd if=/dev/zero of=/dev/sda bs=1M
3. dd if=/dev/zero of=/dev/sdb bs=1M
4. install 9.10 NBR
5. Add kernal command "libata.dma=0" in GRUB2 config and rebuild GRUB2 menu
NOTE: When I first got the issue I first tried steps 4. & 5. without doing 1. 2. 3. and it did not fix the issue.
Юрий Аполлов (apollovy) wrote : Re: [Bug 445852] Re: SSD stall during boot | #69 |
And yes- it's in libata!!
SSD is eating brains only when libata.dma>=4
http://
> libata.dma= [LIBATA] DMA control
> libata.dma=0 Disable all PATA and SATA DMA
> libata.dma=1 PATA and SATA Disk DMA only
> libata.dma=2 ATAPI (CDROM) DMA only
> libata.dma=4 Compact Flash DMA only
> Combinations also work, so libata.dma=3 enables DMA
> for disks and CDROMs, but not CFs.
>
> So, SSD is going to CFs. No surprise, of course. But if no dma is on -
haha, the speed of devise suffers well!
During bootup kernel says (in case of libata.dma={<=3}), that my SSD
(AAO-110L, 8Gb SSD-PAMM, Samsung) is in (!) PIO4 transfer data mode. It is
REALLY slow.
hdparm -tT /dev/sda
Cached reads: 370 MB/sec
Buffered reads: 2.3 MB/sec
And it is reading - do I have to talk 'bout writing?
Where to dig now, when the reason of problem seems 2 be localized?
2009/12/8 Andrew Squire <email address hidden>
> I had this issue on my EeePC 901. I took the following actions and have
> not *yet* seen the issue reappear:
>
> 1. Boot from 9.10 NBR Live CD
> 2. dd if=/dev/zero of=/dev/sda bs=1M
> 3. dd if=/dev/zero of=/dev/sdb bs=1M
> 4. install 9.10 NBR
> 5. Add kernal command "libata.dma=0" in GRUB2 config and rebuild GRUB2 menu
>
> NOTE: When I first got the issue I first tried steps 4. & 5. without
> doing 1. 2. 3. and it did not fix the issue.
>
> --
> SSD stall during boot
> https:/
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading. When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load. If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:
> res 58/00:40:
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded. I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268...
Gav Mack (gavinmac) wrote : Re: SSD stall during boot | #70 |
Setting libata.dma=3 to grub seems to have stopped the hang on my AAO so far, the entries in dmesg are gone. Still taking over a minute to boot though but that's something I'll live with for now.
It's early days but premature thanks go out to Andrew Squire in advance!
Юрий Аполлов (apollovy) wrote : Re: [Bug 445852] Re: SSD stall during boot | #71 |
Btw libata.dma=3 do not make SSD work in DMA mode though!
There's something to be told to developers, cause I stay on Jaunty so far
this stall goes away!
2009/12/8 Gav Mack <email address hidden>
> Setting libata.dma=3 to grub seems to have stopped the hang on my AAO so
> far, the entries in dmesg are gone. Still taking over a minute to boot
> though but that's something I'll live with for now.
>
> It's early days but premature thanks go out to Andrew Squire in advance!
>
> --
> SSD stall during boot
> https:/
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading. When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load. If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:
> res 58/00:40:
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded. I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: luke 1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
> Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
> Mixer name : 'Realtek ALC268'
> Components : 'HDA:10ec0268,
> Controls : 9
> Simple ctrls : 6
> CheckboxSubmission: 12ef539f3788bfb
> CheckboxSystem: c69722ecac76486
> Date: Wed Oct 7 17:54:56 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=
> MachineType: Acer AOA110
> Package: linux-image-
> ProcCmdLine: BOOT_IMAGE=
> root=UUID=
> usbcore.
> ProcEnviron:
> LANG...
Gav Mack (gavinmac) wrote : Re: SSD stall during boot | #72 |
You're correct - it was early days and premature! Though I could put up with the slow boot time with anything other than basic web browsing I couldn't handle the drop the disk read performance from 67mb/sec to 2Mb/sec never mind write. It was worse than my old stock AAO SSD was with Jaunty playing back video.
Back to the situation in post 67 methinks - devs stop dragging your heels and sort this issue out!
Andrew Squire (andrewsquire) wrote : | #73 |
Agree completely, Gav Mack. Seems likely to me to be a regression in the kernal that should be fixed. Unfortunately, it looks like the associated kernal bug (#14583) is being put down to dodgy hardware.
The libata.dma=<4 is just a workaround to keep us up and running in the interim... if you can put up with the negative impact on performance :)
Gav Mack (gavinmac) wrote : | #74 |
Couldn't agree more Andrew. It's the easy way out to blame this on hardware, they may have had a valid point but clearly none of us ever had a problem with this until Karmic and the newer kernels. To paraphrase that old saying "If it looks like a Kernel Bug, acts like a Kerrnel Bug then it's a Kernel bug" and I can't help but think "waffle" and "b*llsh*t" by laying the blame on the storage devices.
Unless there's more of us or if we get lucky with and a netbook OEM runs into major trouble with the same problem or some IT hacks get involved to force them to do something about it I'll be booting into Windows 7 far more than I want to. A shame :(
Now where's my logins for theregister.co.uk and theinquirer.net?
Юрий Аполлов (apollovy) wrote : Re: [Bug 445852] Re: SSD stall during boot | #75 |
A small thing to keep in mind:
1. I load Karmic - and got this bug. Kernel 2.6.31 as to my mind
2. I load Karmic and 2.6.32 - still got this bug
3. I load Jaunty - and got not this bug. Kernel 2.6.28
4. I load Jaunty and 2.6.32 - and got NOT this bug
Though it's NOT a kernel bug I guess. Otherwise anyone could reproduce it on
Jaunty easy.
So, I repeat, it's NOT a direct kernel bug - there's something that we miss.
Something makes an SSD hangup when in DMA. What could it be?? If it's not
kernel - than what? A userspace process, that stalls SSD? Specifically what?
Really - maybe if that one is killed - there would be no hangups? Or anyway
we could write a bug report in correct place - neither on a Launchpad, nor
in kernel bugs section.
That was my thoughts. IMHO.
2009/12/9 Gav Mack <email address hidden>
> Couldn't agree more Andrew. It's the easy way out to blame this on
> hardware, they may have had a valid point but clearly none of us ever
> had a problem with this until Karmic and the newer kernels. To
> paraphrase that old saying "If it looks like a Kernel Bug, acts like a
> Kerrnel Bug then it's a Kernel bug" and I can't help but think "waffle"
> and "b*llsh*t" by laying the blame on the storage devices.
>
> Unless there's more of us or if we get lucky with and a netbook OEM runs
> into major trouble with the same problem or some IT hacks get involved
> to force them to do something about it I'll be booting into Windows 7
> far more than I want to. A shame :(
>
> Now where's my logins for theregister.co.uk and theinquirer.net?
>
> --
> SSD stall during boot
> https:/
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading. When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load. If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:
> res 58/00:40:
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded. I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice...
Raf (4283534-noduck) wrote : Re: SSD stall during boot | #76 |
#75 is correct. My karmic install always gives the error (even with 2.6.29-rc3*). But my jaunty never gives the error, even with 2.6.31-16. However, there is one more difference between my karmic and jaunty installs: karmic uses ext4, while jaunty uses ext2 (and for the record all of those are inside LVM).
Could it be something related to ext4? I notice that the bug triggers during or right after filesystem check/mount (mountall). But I have not been able to reproduce this bug by doing filesystem checks/mounting of the karmic-ext4 partition under jaunty.
I have not been able to debug it further. Debugging upstart to see exactly when the bug triggers generates way too much output.
I tried disabling ureadahead, but that didn't make any difference.
*Note that the error message changed between 2.6.29-6 and 2.6.30-rc1, but it still triggers an error message and the same delay.
Gav Mack (gavinmac) wrote : | #77 |
@Raf:
Don't think it's an ext4 issue - I'm very sure that when I first ran Karmic Beta on the USB boot stick it hung on the detection of the SSD before I even got to the partition menu. I tried ext2 first and got corruption pretty quickly and then setup in ext4 before I spotted this bug report on launchpad.
Don't forget also that Andrew Simpson said in post 15 of the Kernel bug list that Mandriva 2010.0 with the same kernel doesn't show this error but Fedora 12 Beta does.
theluketaylor (ekul-taylor) wrote : | #78 |
I can confirm this isn't an ext4 issue; I am using ext4 for my / partition in Jaunty with a 2.6.30 kernel without any trouble
Rick @ rickandpatty.com (rick-rickandpatty) wrote : | #79 |
I've got to agree with #77 - It isn't an ext4 issue. I installed Karmic on my Eee with ext4 and again with ext2 and got similar errors both times.
I can also confirm that even using kernels that cause no problems under Jaunty cause problems under Karmic. So what's different about the boot process of karmic that's trashing our SSDs? Maybe it's something that can be fixed outside the kernel?
Stephen O (soglesby1) wrote : | #80 |
It's not just Karmic, as Gav pointed out. I installed Fedora 12 two days ago in part because I was curious if this problem would follow me from Karmic. Sure enough, a panel failed to load and my dmesg output had the same ata errors as it did in karmic. Even though my partitioning scheme was different since Fedora defaults to LVM. I can also join the chorus in stating that I used ext4 with this drive on Jaunty and had no issues.
Raf (4283534-noduck) wrote : | #81 |
What about the increased parallelism in the new upstart? Could this be the cause of the problems: increased simultaneous disk access.
Doesn't Fedora also use upstart?
Unfortunately it looks like upstart cannot be serialized. (I am talking about the jobs in /etc/init, not /etc/init.d)
Юрий Аполлов (apollovy) wrote : Re: [Bug 445852] Re: SSD stall during boot | #82 |
I'm about to please somebody to look distinctly at upstart - can it be
removed (I have no info now on this - I'm still on 9.04). Can somebody try
removing?
2009/12/9 Raf <email address hidden>
> What about the increased parallelism in the new upstart? Could this be
> the cause of the problems: increased simultaneous disk access.
>
> Doesn't Fedora also use upstart?
>
> Unfortunately it looks like upstart cannot be serialized. (I am talking
> about the jobs in /etc/init, not /etc/init.d)
>
> --
> SSD stall during boot
> https:/
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading. When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load. If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:
> res 58/00:40:
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded. I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: luke 1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
> Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
> Mixer name : 'Realtek ALC268'
> Components : 'HDA:10ec0268,
> Controls : 9
> Simple ctrls : 6
> CheckboxSubmission: 12ef539f3788bfb
> CheckboxSystem: c69722ecac76486
> Date: Wed Oct 7 17:54:56 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=
> MachineType: Acer AOA110
> Package: linux-image-
> ProcCmdLine: BOOT_IMAGE=
> root=UUID=
> usbcore.
> ProcEnv...
Changed in linux (Ubuntu): | |
assignee: | nobody → Gav Mack (gavinmac) |
assignee: | Gav Mack (gavinmac) → nobody |
assignee: | nobody → Upstart Developers (upstart-devel) |
Gav Mack (gavinmac) wrote : Re: SSD stall during boot | #83 |
Notified the upstart devs - do any other distros other than Fedora 12 and Karmic use upstart?
Changed in linux (Ubuntu): | |
assignee: | Upstart Developers (upstart-devel) → nobody |
Scott James Remnant (Canonical) (canonical-scott) wrote : | #84 |
Please do not subscribe that team, the team's own description explicitly asks you not to.
This bug clearly is nothing to do with userspace, it should not be possible for userspace to cause problems in this way (that's what the kernel is there for).
It smells like a kernel driver bug to me, especially given the reports of fiddling with DMA. The high number of similar SSDs mean it could be a hardware company being creative with the spec, but that's still a kernel driver bug for failing to quirk them properly.
I'm not a kernel developer, but I would recommend the following debugging technique:
- those affected should supply detailed information about not only their SSD, but the I/O controller in their laptop (dmesg, lcpci -vvnn, etc.)
- if one release of Ubuntu is affected more than the other, that suggests a regression
- first try a mainline kernel build from http://
- if that does not fix the problem, start working backwards through the kernel releases until you find one that does fix the problem
- if the first kernel is still affected, try kernels from previous Ubuntu releases
(one assumes that the kernel from the release where things work fine, installed on karmic, will work)
- Given a loose idea, narrow it down using the kernel packages you can download from https:/
Basically what would be ideal would be to find one kernel that works, and then the *immediate next kernel* that doesn't work. This would give a limited number of changes that broke it, and start to reveal what the bug might be
Alan Pope 🍺🐧🐱 🦄 (popey) wrote : | #85 |
Thanks for the debugging advice Scott.
The only issue for me is that when I install Karmic and it goes _very_ bad I end up with a bricked machine (IO errors on SSD causing me to drop to initramfs promt). The only way to test other kernels is to dd zeros over the SSD and reinstall the OS again, then add in whatever kernel to test. It's a maddeningly time consuming task.
Юрий Аполлов (apollovy) wrote : Re: [Bug 445852] Re: SSD stall during boot | #86 |
2 Scott:
How could it be: same program (here I mean kernel version), same hardware -
and different behaviour on different systems of Ubuntu. Don't you think that
it would be rather strange, if 2 things work here and not there - than it's
not because that things gone bad, and the reason is somewhere in
environment?
Nevertheless I think your idea is interesting - can anyone here that is
sitting on 9.10 with SSD test this workaround?
2009/12/10 Alan Pope <email address hidden>
> Thanks for the debugging advice Scott.
>
> The only issue for me is that when I install Karmic and it goes _very_
> bad I end up with a bricked machine (IO errors on SSD causing me to drop
> to initramfs promt). The only way to test other kernels is to dd zeros
> over the SSD and reinstall the OS again, then add in whatever kernel to
> test. It's a maddeningly time consuming task.
>
> --
> SSD stall during boot
> https:/
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading. When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load. If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:
> res 58/00:40:
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded. I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: luke 1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
> Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
> Mixer name : 'Realtek ALC268'
> Components : 'HDA:10ec0268,
> Controls : 9
> Simple ctrls : 6
> CheckboxSubmission: 12ef539f3788bfb
> CheckboxSystem: c69722ecac76486
> Date: Wed Oc...
Raf (4283534-noduck) wrote : Re: SSD stall during boot | #87 |
I don't think that there is a bug in upstart. More likely the increased parallelism of upstart triggers a bug in the kernel, firmware, or hardware.
If it is a kernel bug, it has been in the kernel for several releases. I have tested with 2.6.29-rc3 (and most versions in between), that is the oldest kernel on http://
I am wondering if #61 is not on to something. However, my SDD is ATA, not SATA (so no NCQ anyway). I will research if the SDD can do TCQ, and if yes how to disable it. That would suggest a firmware/hardware bug.
Rick @ rickandpatty.com (rick-rickandpatty) wrote : | #88 |
@Raf #87
I've tested Karmic (ext2 filesystem) with the latest 2.6.28 kernel for Jaunty, with similar results as for the Karmic kernels. (Of course, the same kernel works just fine with Jaunty.)
Another issue with upstart being the cause of the problem - as #11 notes, this bug can be triggered in Karmic by running gparted at any time - even when booting up via the LiveCD instead of the SSD.
Andrew Simpson (andrew-simpson) wrote : | #89 |
O.K., Reviewing what we do know:
Scott has suggested providing system information, however this already seems to be listed in this bug report and in the upstream bug report.
He has also suggested trying different kernels to isolate when the problem started. This is what we have been doing, HOWEVER comments #75, #76 & #87 have now all shown that something in 'Karmic' is the problem - and not directly the kernel version. Have we been chasing the wrong problem?
We can rule out ext4 and NCQ from comments #79, #87 and #88.
While I would agree with Scott that 'userspace applications' shouldn't affect the kernel, it does appear a 'userspace application' is affecting the kernel.
I can confirm that Mandriva 2010.0 (after several weeks of use & lots of checking) does not have this problem.
My own experience and comment #80 confirms that Fedora 12 does have the problem. I have searched the Fedora Bugzilla and can't see a bug report there. It would be good to file a bug there. One of the Fedora kernel devs has been commenting on the upstream bug report.
What is different about Mandriva 2010.0 compared to Karmic and Fedora 12? Anything of note other than upstart?
theluketaylor (ekul-taylor) wrote : | #90 |
if a userspace application is able to trigger this issue I would say this is a kernel bug since that is exactly the sort of thing the kernel is supposed to be managing. No userspace application regardless of how poorly written should be able to trigger a drop from UDMA to PIO.
I have tried jaunty with a 2.6.32 kernel and it works fine. I was going to try installing karmic and upgrading it to 2.6.32 but I wasn't even able to complete the installer. It failed to mount the ext4 / partition due to numerous HSM faults
Andrew Squire (andrewsquire) wrote : | #91 |
Just a consideration it might be worth making when we're testing the different distros / kernals. Once I had experienced the problem, I could not reliably get rid of it without doing a dd -if /dev/zero to both disks. In #89 (Andrew Simpson) and #80 (Stephen O), when the problem was exhibited in Fedora 12, had the disks been zeroed in between?
theluketaylor (ekul-taylor) wrote : | #92 |
I tried bootstrapping karmic from a jaunty livecd on a zeroed ssd. I installed the 2.6.32 kernel from http://
I wanted to see if I could trigger the bug when karmic wasn't involved in the creation or population of the filesystem. Since once the bug happens the disk has to be zeroed I started small, just booting into single user mode. Even this triggered the bug. The only processes that had been spawned were init, upstart, dhcp and the shell I was using
Johan Van den Neste (jvdneste) wrote : | #93 |
I can't try this right now, but would it be possible to run gparted with lots of debugging output to see exactly what it is doing when the bug is triggered? If this is not possible, maybe we could ask someone who knows the gparted codebase to help us out?
Юрий Аполлов (apollovy) wrote : Re: [Bug 445852] Re: SSD stall during boot | #94 |
Such an interesting thing: if only I launch parted from liveusb - I
immediately got our errors. Got same using cfdisk. And - it may be
interesting and important - it happens when I just run parted, when I exit
parted - and when I run/exit cfdisk too.
BUT!
Once I've removed libparted and all packets that depend upon it - and the
noise has gone away! No more blinking of SSD-indicator. WOW!I still got DMA
though. Interesting, I guess. Can anyone approve my find?
2009/12/10 Johan Van den Neste <email address hidden>
> I can't try this right now, but would it be possible to run gparted with
> lots of debugging output to see exactly what it is doing when the bug is
> triggered? If this is not possible, maybe we could ask someone who knows
> the gparted codebase to help us out?
>
> --
> SSD stall during boot
> https:/
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading. When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load. If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:
> res 58/00:40:
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded. I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: luke 1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
> Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
> Mixer name : 'Realtek ALC268'
> Components : 'HDA:10ec0268,
> Controls : 9
> Simple ctrls : 6
> CheckboxSubmission: 12ef539f3788bfb
> CheckboxSystem: c69722ecac76486
> Date: Wed Oct 7 17:54:56 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=UUI...
rogmorri (frontporsche) wrote : Re: SSD stall during boot | #95 |
It seems that if I do this as root....
int k = open("/dev/sda", O_WRONLY|
ioctl(
close(k);
...then 15~30 seconds later I see this on the console ...
[33458.988232] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[33458.994551] ata2.00: BMDMA stat 0x4
[33459.001646] ata2.00: cmd ca/00:08:
[33459.001657] res 58/00:08:
[33459.014157] ata2.00: status: { DRDY DRQ }
but dmesg shows a bit more....
[33458.988145] ata2: lost interrupt (Status 0x58)
[33458.988232] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[33458.994551] ata2.00: BMDMA stat 0x4
[33459.001646] ata2.00: cmd ca/00:08:
[33459.001657] res 58/00:08:
[33459.014157] ata2.00: status: { DRDY DRQ }
[33459.021056] ata2: soft resetting link
[33459.228490] ata2.00: configured for UDMA/66
[33459.228528] ata2: EH complete
I gleaned those 3 lines of code from doing "strace parted"
rogmorri (frontporsche) wrote : | #96 |
I should have mentioned that this could be related to O_LARGEFILE, which I had to define manually...
#define O_LARGEFILE 0100000
rogmorri (frontporsche) wrote : | #97 |
Oops, ignore that.
It seem that this is all I need to cause the error is just this...
int k = open("/dev/sda", O_WRONLY);
close(k);
BTW, my root file system is mounted on sda1 while I'm doing this.
(If I open with O_RDONLY instead, then there is no error)
Raf (4283534-noduck) wrote : | #98 |
#97 also repeatably triggers the HSM violation for me. But only under Karmic (not Jaunty).
Raf (4283534-noduck) wrote : | #99 |
But be warned, this has lead to disk corruption for me! I never had disk corruptions before...
Scott James Remnant (Canonical) (canonical-scott) wrote : Re: [Bug 445852] Re: SSD stall during boot | #100 |
On Fri, 2009-12-11 at 01:07 +0000, rogmorri wrote:
> Oops, ignore that.
> It seem that this is all I need to cause the error is just this...
>
> int k = open("/dev/sda", O_WRONLY);
> close(k);
>
> BTW, my root file system is mounted on sda1 while I'm doing this.
>
That actually has quite a few side-effects that you might not
realise ;-)
Try this command (as root)
echo change > /sys/block/
Does that cause the same errors?
Scott
--
Scott James Remnant
<email address hidden>
Stephen O (soglesby1) wrote : Re: SSD stall during boot | #101 |
I had not zeroed my drive prior to installing Fedora 12 so I attempted to follow Andrew's suggestion (#91). Unfortunately every time I try dd if=/dev/zero of=/dev/sda (which is definitely my 32GB SSD) I get the following error:
dd: writing to '/dev/sda': Input/output error
676065+0 records in
676064+0 records out
346144768 byte (346 MB) copied, 82.0451 s, 4.2 MB/s
It's always 346MB, never more nor less. Has my drive fallen victim to this bug and been damaged in some way?
Юрий Аполлов (apollovy) wrote : Re: [Bug 445852] Re: SSD stall during boot | #102 |
Stephen, try badblocks -wvs /dev/sda to make check your drive in read-write
mode
2009/12/11 Stephen O <email address hidden>
> I had not zeroed my drive prior to installing Fedora 12 so I attempted to
> follow Andrew's suggestion (#91). Unfortunately every time I try dd
> if=/dev/zero of=/dev/sda (which is definitely my 32GB SSD) I get the
> following error:
> dd: writing to '/dev/sda': Input/output error
> 676065+0 records in
> 676064+0 records out
> 346144768 byte (346 MB) copied, 82.0451 s, 4.2 MB/s
>
> It's always 346MB, never more nor less. Has my drive fallen victim to
> this bug and been damaged in some way?
>
> --
> SSD stall during boot
> https:/
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “linux” package in Ubuntu: Confirmed
>
> Bug description:
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading. When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load. If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:
> res 58/00:40:
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't think it
> has happened once the system is fully loaded. I am running karmic unr on an
> Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> Architecture: i386
> ArecordDevices:
> **** List of CAPTURE Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
> Subdevices: 1/1
> Subdevice #0: subdevice #0
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: luke 1990 F.... pulseaudio
> CRDA: Error: [Errno 2] No such file or directory
> Card0.Amixer.info:
> Card hw:0 'Intel'/'HDA Intel at 0x58540000 irq 16'
> Mixer name : 'Realtek ALC268'
> Components : 'HDA:10ec0268,
> Controls : 9
> Simple ctrls : 6
> CheckboxSubmission: 12ef539f3788bfb
> CheckboxSystem: c69722ecac76486
> Date: Wed Oct 7 17:54:56 2009
> DistroRelease: Ubuntu 9.10
> HibernationDevice: RESUME=
> MachineType: Acer AOA110
> Package: linux-image-
> ProcCmdLine: BOOT_IMAGE=
Andrew Squire (andrewsquire) wrote : Re: SSD stall during boot | #103 |
Stephen O - I also had this error unless I set the block size manually. Try bs=1M.
rogmorri (frontporsche) wrote : | #104 |
@Scott,
>> int k = open("/dev/sda", O_WRONLY);
>> close(k);
>That actually has quite a few side-effects that you might not
>realise ;-)
It might be good then to warn people not to run "parted --list" to view partition tables. That seems to open all your devices with O_WRONLY.
Raf (4283534-noduck) wrote : | #105 |
Scott,
Yes, "echo change > /sys/block/
Any idea why does works fine under Jaunty and not Karmic?
Raf.
Scott James Remnant (Canonical) (canonical-scott) wrote : Re: [Bug 445852] Re: SSD stall during boot | #106 |
On Fri, 2009-12-11 at 17:10 +0000, Raf wrote:
> Yes, "echo change > /sys/block/
> After entering that command, it takes about 20 to 30 seconds until the
> error shows up.
>
Ok, excellent.
So what's happening is that one of the commands being run to probe the
disk is causing the error. Let's figure out which one!
Run the following command:
sudo udevadm test /block/sda 2>&1 | grep "^util_
This will output a bunch of program names. First wait the 20-30s, to
see whether you get the error. It's possible that you will not with
this (which is interesting in of itself, so please let me know if that
happens).
If you do get the error, note down the commands and then we'll want to
run each one in turn to see which one gives the error. (You'll need to
run them all with sudo or as root).
There's probably about 6-8 of them.
Let me know which one(s) cause the error (if any).
Scott
--
Scott James Remnant
<email address hidden>
Raf (4283534-noduck) wrote : Re: SSD stall during boot | #107 |
Scott,
# udevadm test /block/sda 2>&1 | grep "^util_
util_run_program: 'ata_id --export /dev/sda' started
util_run_program: 'scsi_id --whitelisted --replace-
util_run_program: 'path_id /devices/
util_run_program: '/sbin/blkid -o udev -p /dev/sda' started
util_run_program: 'edd_id --export /dev/sda' started
util_run_program: 'devkit-
util_run_program: 'devkit-
This did trigger the HSM violation.
Testing more, I think that devkit-
I think it might be related to the delay between the different commands. If the delay is too big it doesn't always show . But if the delay is too small, we cannot be sure which one triggered the command.
The delay between running devkit-
I will try again to isolate it.
Raf.
Raf (4283534-noduck) wrote : | #108 |
I am doing this (on an otherwise idle UNR):
# sleep 120; logger devkit-
And I get this in syslog (repeatably):
Dec 14 11:12:01 unus logger: devkit-
Dec 14 11:12:35 unus kernel: [ 7734.000130] ata2: lost interrupt (Status 0x58)
Dec 14 11:12:35 unus kernel: [ 7734.000217] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 14 11:12:35 unus kernel: [ 7734.000232] ata2.00: BMDMA stat 0x4
Dec 14 11:12:35 unus kernel: [ 7734.000264] ata2.00: cmd ca/00:08:
Dec 14 11:12:35 unus kernel: [ 7734.000270] res 58/00:08:
Dec 14 11:12:35 unus kernel: [ 7734.000284] ata2.00: status: { DRDY DRQ }
Dec 14 11:12:35 unus kernel: [ 7734.000343] ata2: soft resetting link
Dec 14 11:12:35 unus kernel: [ 7734.208576] ata2.00: configured for UDMA/66
Dec 14 11:12:35 unus kernel: [ 7734.208618] ata2: EH complete
Dec 14 11:14:01 unus logger: done
Also note that I stay in UDMA (not PIO like some other posters). Except for the delays, the disk is quite usable.
Raf (4283534-noduck) wrote : | #109 |
I disabled /lib/udev/
Note that the devkit-
Raf.
Scott James Remnant (Canonical) (canonical-scott) wrote : Re: [Bug 445852] Re: SSD stall during boot | #110 |
On Mon, 2009-12-14 at 16:27 +0000, Raf wrote:
> I disabled /lib/udev/
> And now I can boot without HSM violations. But I believe this is only a
> workaround.
>
> Note that the devkit-
> filesystem corruption.
>
From information received here, and information on the kernel bug, I
really think that the cause *is* the SMART commands.
Scott
--
Scott James Remnant
<email address hidden>
affects: | linux (Ubuntu) → libatasmart (Ubuntu) |
summary: |
- SSD stall during boot + devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential + hardware death |
Changed in devicekit-disks (Ubuntu): | |
status: | New → Triaged |
Changed in libatasmart (Ubuntu): | |
status: | Confirmed → Triaged |
importance: | Undecided → High |
importance: | High → Critical |
Changed in devicekit-disks (Ubuntu): | |
importance: | Undecided → High |
importance: | High → Critical |
Raf (4283534-noduck) wrote : | #111 |
I don't know if it is helpful to anybody, but I have attached the strace for /lib/udev/
Raf (4283534-noduck) wrote : | #112 |
Gav Mack (gavinmac) wrote : | #113 |
@Raf: Could you explain how to disable /lib/udev/
One good thing has came out of a good Windows tech but linux n00b fumbling about not really knowing what he was doing - I got Scott involved :-)
Andrew Simpson (andrew-simpson) wrote : | #114 |
@Gav Mack
This is probably a fairly crude workaround, but it works for me. I just disabled the ata-smart disk probe in the udev rules:
In the file /lib/udev/
# ATA disks driven by libata
KERNEL=
Add a '#' in front to make the rule line a comment, like this:
# ATA disks driven by libata
#KERNEL=
Save the file.
To make sure it's reloaded do these commands:
sudo service udev stop
sudo service udev start
Test with gparted... and notice the difference.
Scott James Remnant (Canonical) (canonical-scott) wrote : Re: [Bug 445852] Re: devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential hardware death | #115 |
On Mon, 2009-12-14 at 20:26 +0000, Raf wrote:
> I don't know if it is helpful to anybody, but I have attached the strace
> for /lib/udev/
> ioctls against the device. smartctl -a /dev/sda does not trigger the HSM
> violation.
>
Could you provide the equivalent strace for smartctl as well, for
comparison?
Scott
--
Scott James Remnant
<email address hidden>
rogmorri (frontporsche) wrote : | #116 |
> #KERNEL=
Nice workaround. This brought my poweron-to-desktop time from 2:35 down to 0:52. :)
Tommy Trussell (tommy-trussell) wrote : | #117 |
Confirming this behavior on an ASUS EeePC 900 with an upgraded SSD: Patriot Lite SSD, 32GB model PL32GPEPCSSDR
I just ran a command like @Raf described in comment #108. When /dev/sda is umounted, no response except "DKD_ATA_
I have been commenting on my reported Bug 430333 but I think I will declare it to be a duplicate of this one, and send the other Patriot Lite SSD user(s) over here.
shadowblast101 (shadowblast101) wrote : | #118 |
I followed Tommy over, and can confirm his confirmation. I have pretty much the same setup, EEE900, PL32GPEPCSSDR, but with Arch instead of Ubuntu, and the behavior is still there. Mine had a few other quirks too, such as hiding my mouse until I switch to a tty and back.
After applying the workaround, everything seems to be good.
Raf (4283534-noduck) wrote : | #119 |
I did some more tests with devkit-
Raf (4283534-noduck) wrote : | #120 |
Raf (4283534-noduck) wrote : | #121 |
In freedesktop.org Bugzilla #25673, Scott James Remnant (Canonical) (canonical-scott) wrote : | #122 |
We have many reports of the libatasmart code causing stalls, HSM Violations and even death of SSDs. Particularly SuperTalent ones, but also those found in my netbooks.
# sleep 120; logger devkit-
And I get this in syslog (repeatably):
Dec 14 11:12:01 unus logger: devkit-
Dec 14 11:12:35 unus kernel: [ 7734.000130] ata2: lost interrupt (Status 0x58)
Dec 14 11:12:35 unus kernel: [ 7734.000217] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 14 11:12:35 unus kernel: [ 7734.000232] ata2.00: BMDMA stat 0x4
Dec 14 11:12:35 unus kernel: [ 7734.000264] ata2.00: cmd ca/00:08:
Dec 14 11:12:35 unus kernel: [ 7734.000270] res 58/00:08:
Dec 14 11:12:35 unus kernel: [ 7734.000284] ata2.00: status: { DRDY DRQ }
Dec 14 11:12:35 unus kernel: [ 7734.000343] ata2: soft resetting link
Dec 14 11:12:35 unus kernel: [ 7734.208576] ata2.00: configured for UDMA/66
Dec 14 11:12:35 unus kernel: [ 7734.208618] ata2: EH complete
Dec 14 11:14:01 unus logger: done
The problem has also been confirmed in Fedora 12.
In freedesktop.org Bugzilla #25673, Scott James Remnant (Canonical) (canonical-scott) wrote : | #123 |
Kernel bugzilla bug for the same issue (URL above is the Launchpad bug)
In freedesktop.org Bugzilla #25673, Lennart-poettering (lennart-poettering) wrote : | #124 |
Hmm, lacking access to the hw in question I am not sure what I can do about this.
What surprises me a bit is that this only appeared so very recently. Is this triggered by some interplay with some specific kernel version?
description: | updated |
In freedesktop.org Bugzilla #25673, Scott James Remnant (Canonical) (canonical-scott) wrote : | #125 |
Most distros only switched to using your code recently; previously we've all been using smartmontools and the like which don't cause this problem.
The LP bug has the differencing straces between the two if that's helpful?
Gav Mack (gavinmac) wrote : | #126 |
@Andrew Simpson: Many thanks for the instructions. The workaround has dropped my boot time from always over 2 minutes to 30 seconds, what I was expecting back in late September when I installed the Beta with the Super Talent SSD! Almost 3 months of woe now at an end thank goodness. Recreated my user account from scratch because after further investigation my other half thought it was a good idea to delete the timed out applets including window-picker so I couldn't put them back again!
Tommy Trussell (tommy-trussell) wrote : | #127 |
We have seen the corruption survive a basic filesystem initialization, so once your drive has been corrupted you may need to write zeroes to it to eliminate the bad blocks before you can create a clean filesystem again. You can verify the drive using badblocks when it is not mounted. This procedure works for the flash SSDs such as the Patriot Lite.
For example, to write zeroes to /dev/sda:
# dd if=/dev/zero of=/dev/sda bs=1M
To do a read-only test of /dev/sda:
# badblocks -s /dev/sda
lotus49 (lotus-49) wrote : Re: [Bug 445852] Re: devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential hardware death | #128 |
The workaround did the trick for me too. I hadn't suffered any
corruption but I was comsidering going back to Jaunty and I am pleased
not to have to.
Simon
Sent from My iPhone
On 16 Dec 2009, at 14:28, Gav Mack <email address hidden> wrote:
> @Andrew Simpson: Many thanks for the instructions. The workaround has
> dropped my boot time from always over 2 minutes to 30 seconds, what I
> was expecting back in late September when I installed the Beta with
> the
> Super Talent SSD! Almost 3 months of woe now at an end thank
> goodness.
> Recreated my user account from scratch because after further
> investigation my other half thought it was a good idea to delete the
> timed out applets including window-picker so I couldn't put them back
> again!
>
> --
> devkit-
> potential hardware death
> https:/
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in The Linux Kernel: Confirmed
> Status in “devicekit-disks” package in Ubuntu: Triaged
> Status in “libatasmart” package in Ubuntu: Triaged
>
> Bug description:
> TEMPORARY WORK AROUND FOR THIS PROBLEM:
>
> 1. sudo gedit /lib/udev/
>
> 2. locate the following lines (about 1/3 the way into the file;
> search for "smart")
>
> # ATA disks driven by libata
> KERNEL=
> {DEVTYPE}=="disk", IMPORT{
> $tempnode"
>
> 3. comment out the second line by adding a # in front, so you should
> have
>
> # ATA disks driven by libata
> #KERNEL=
> {DEVTYPE}=="disk", IMPORT{
> $tempnode"
>
> 4. save the file and reboot
>
>
> BUG DESCRIPTION FOLLOWS:
>
> In the Karmic beta I experience ssd stalls during the boot process.
> It happens almost everytime before xsplash loads and happens again
> frequently between logging into gdm and the desktop loading. When
> it happens during login I think it is making gnome time out on
> loading panel items as I get errors related to lots of panel items
> failing to load. If I log out and back in again when the ssd isn't
> stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in
> dmesg when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:
> res 58/00:40:
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not have this issue in jaunty with this hardware and I don't
> think it has happened once the system is fully loaded. I am running
> karmic unr on an Acer Aspire One netbook.
>
> ProblemType: Bug
> AplayDevices:
> **** List of PLAYBACK Hardware Devices ****
> card 0: Intel [HDA Intel], device 0: ALC268 Ana...
In freedesktop.org Bugzilla #25673, Lennart-poettering (lennart-poettering) wrote : | #129 |
(In reply to comment #3)
> Most distros only switched to using your code recently; previously we've all
> been using smartmontools and the like which don't cause this problem.
Is it actually verified that this doesn't happen with smartmontools? I mean, smartmontools in contrast to libatasmart does not issue commands that early after initialization/
> The LP bug has the differencing straces between the two if that's helpful?
I only see a lot of noise in that bug report, could you point me tto the two straces?
In freedesktop.org Bugzilla #25673, Lennart-poettering (lennart-poettering) wrote : | #130 |
(In reply to comment #3)
> Most distros only switched to using your code recently;
The simple fact is that rawhide (and the ubuntu betas) had this code for months already, and we got quite a few bug reports, but never something about this issue. This issue only appeared a couple of weeks back, and hence I am wondering if something else changed in that time, because libatasmart didn't.
Tommy Trussell (tommy-trussell) wrote : | #131 |
UNFORTUNATELY the workaround is not a good idea for a NEW installation on my netbook. I was able to edit /lib/udev/
Is there a single package I can pin in apt, or can I just remove or somehow deactivate libatasmart itself instead of editing the udev rule?
Scott James Remnant (Canonical) (canonical-scott) wrote : Re: [Bug 445852] Re: devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential hardware death | #132 |
On Thu, 2009-12-17 at 15:44 +0000, Tommy Trussell wrote:
> UNFORTUNATELY the workaround is not a good idea for a NEW installation
> on my netbook. I was able to edit /lib/udev/
> disks.rules before the installer rebooted into the new system, but as
> soon as the system installed the first set of critical and recommended
> updates, the filesystem was thoroughly trashed before update-manager had
> even finished.
>
Then re-apply the change before rebooting after those updates.
You can use dpkg-divert as described above in the bug comments to ensure
that updates do not affect this file - but then you won't get the proper
fix later and may indeed cause yourself future bugs down the line.
Scott
--
Scott James Remnant
<email address hidden>
Tommy Trussell (tommy-trussell) wrote : | #133 |
@scott Yes, I think dpkg-divert or some other technique is essential because as I said, I didn't even reboot -- the filesystem was already trashed BEFORE update-manager had FINISHED -- apparently it had already reverted the change and reloaded udev in one of its updates.
Cris (cristiano.p) wrote : | #134 |
While upgrading my eeepc 900 to Karmic (before I had to recover the ssd and fall-back to Jaunty),
I've noticed a very marked slow down of the disk operations: the upgrade process took quite 5 hours.
Which makes me believe that the disk corruption happens already during the upgrade/install process.
In freedesktop.org Bugzilla #25673, Andrewnz-simpson (andrewnz-simpson) wrote : | #135 |
(In reply to comment #5)
> The simple fact is that rawhide (and the ubuntu betas) had this code for months
> already, and we got quite a few bug reports, but never something about this
> issue. This issue only appeared a couple of weeks back, and hence I am
> wondering if something else changed in that time, because libatasmart didn't.
>
The Ubuntu bug report goes back to early October, however the link with libatasmart was only made very recently.
There is no interplay with any specific kernel version: The bug has been confirmed on 2.6.28, 2.6.29, 2.6.30, 2.6.31 and 2.6.32. Both Ubuntu patched versions and mainline kernels are affected.
The bug has been confirmed as libatasmart only; testing has shown that smartmontools does not give the same problem. Early initialisation is a possible issue, though the problem can be readily reproduced at any time.
Comment #129 of the Ubuntu Bug Report is well worth reading, because it seems to be isolating the bug.
I will attach the straces from the Ubuntu Bug Report to this report
In freedesktop.org Bugzilla #25673, Andrewnz-simpson (andrewnz-simpson) wrote : | #136 |
Created an attachment (id=32167)
Trace from libatasmart
In freedesktop.org Bugzilla #25673, Andrewnz-simpson (andrewnz-simpson) wrote : | #137 |
Created an attachment (id=32168)
Trace from smartctl
Jean-Louis (jean-louis) wrote : | #138 |
HI, sorry for my bad english.
I don't have sdd hard disk, but I've watched libatasmart code for other bug and I think that I can help a little for this
In the attachment smartctl-output I can see: "ATA Version is: 5"
In this pdf (2.7MB) http://
In particular if it is implemented, all the smart commands used in libatasmart are prohibited.
The IDENTIFY DEVICE command, if is implemented "PACKET Command feature set", shall return command aborted, but in libatasmart the return value is lost and the "d->identify_valid = FALSE;" is never setted.
Try to add in function disk_identify_
after (line 741) "if ((ret = disk_command(d, SK_ATA_
and before (line 742)"return ret;"
"d->identify_valid = FALSE;"
like this
if ((ret = disk_command(d, SK_ATA_
}
In freedesktop.org Bugzilla #25673, Lennart-poettering (lennart-poettering) wrote : | #139 |
(In reply to comment #6)
>
> Comment #129 of the Ubuntu Bug Report is well worth reading, because it seems
> to be isolating the bug.
I don't think so. That proposed patch is bogus, identify_valid is FALSE unless set to TRUE anyway.
Also, supposedly SMART does work with smartmontools, just not with libatasmart, right? That comment suggests that SMART would not work at all with those SSDs.
Andrew, do you have one of the SSDs affected? Could you step through the code and figure out exactly which command triggers the problem?
In freedesktop.org Bugzilla #25673, Andrewnz-simpson (andrewnz-simpson) wrote : | #140 |
(In reply to comment #9)
>
> Andrew, do you have one of the SSDs affected? Could you step through the code
> and figure out exactly which command triggers the problem?
>
You would have to give me very detailed instructions as to how to do it. Programming in C is not one of my skills and I don't have a programming background.
More realistically, there are at least a couple of people on the Ubuntu Bug list that would know how to do this. Is it worth putting out a query?
In freedesktop.org Bugzilla #25673, Alan Pope 🍺🐧🐱 🦄 (popey) wrote : | #141 |
In terms of how long ago this bug has existed, I originally filed a bug on this issue back in June '09. So it's existed long before 9.10 (October) released.
In freedesktop.org Bugzilla #25673, Jelot-freedesktop (jelot-freedesktop) wrote : | #142 |
(In reply to comment #9)
> (In reply to comment #6)
>
> >
> > Comment #129 of the Ubuntu Bug Report is well worth reading, because it seems
> > to be isolating the bug.
>
> I don't think so. That proposed patch is bogus, identify_valid is FALSE unless
> set to TRUE anyway.
I'm the author of comment #129 in launchpad <https:/
I'm quite a beginner with c, but I know that if a variable is not initialized its value is garbage.
I don't find how d->identify_valid is zeroed or setted FALSE. Obviously, I could be missed it.
>
> Also, supposedly SMART does work with smartmontools, just not with libatasmart,
> right? That comment suggests that SMART would not work at all with those SSDs.
I don't know internals of smartmoontools... I have read only some pages of this pdf (2.7MB) http://
on pag 52 is reported:
[quote]
Devices that implement the PACKET Command feature set shall not implement the SMART feature set as described in this subclause.
Devices that implement the PACKET Command feature set and SMART shall implement SMART as defined by the command packet set implemented by the device.
[/quote]
and on page 196 and subsequent is reported:
[quote]
SMART feature set.
− Mandatory when the SMART feature set is implemented.
− Use prohibited when the PACKET Command feature set is implemented.
[/quote]
I don't know how is implemented SMART for command packet set and I don't know if this sdd implements command packet set, but this *could be* the problem (IMHO)
mint-one (d-zschokke) wrote : | #143 |
Hi folks
Applying the patch works... You should apply it first to your install-usb-stick and you will notice how fast gparted detects your drives. Then apply the patch after installing karmic. Never reboot the system without applying the patch! Otherwise you will write lots zeroes to your ssd again. And now to the best part: Every kernel or grub update (undecided) will reset this patch! Apply it after each major update or your ssd will be lost in space again.
I'm still up after having updated the system successfully. Let's see how long this is going to last. This bug is hell. Priority should be set to "hell".
good luck, dominic (on a eee pc 900)
Tommy Trussell (tommy-trussell) wrote : | #144 |
Would it be useful to create a "dummy" libatasmart4 package that responds to its calls with something innocuous but doesn't actually probe the SMART status? I see it's not easy to just yank it out because of other packages' dependencies upon it. I would prefer to disable the package in a way that survives ordinary software updates.
I'm not sure how @mint-one was able to avoid filesystem breakage... I wasn't able to reapply the patch in time, though maybe I was just especially un-lucky or un-careful.
P.S.: The Patriot Lite 32GB SSD upgrade on my ASUS 900 seems most susceptible to damage when the root partition or the root + swap partitions completely fill the drive. I don't know what that might mean, except that it's a "bigger target" for breakage. The beta 9.10 NBR installers (prior to October) could not even finish the job without completely trashing the filesystem before grub was installed.
Tommy Trussell (tommy-trussell) wrote : | #145 |
@jean-louis: do you have your patched libatasmart4 code in a PPA? I would be pleased to test it.
mint-one (d-zschokke) wrote : | #146 |
For Tommy Trussel and to whom it may concern:
1. Create a bootable usb medium.
2. Apply the patch on the usb medium! (Uncomment this line, see top of this bug)
3. Install (boot into live installation, don't install directly)
4. Install karmic
5. Don't reboot
6. Apply the patch on the the boot partition (navigate there with nautilus and copy the path (quite a strange one), paste it into terminal, sudo gedit $path, save)
7. Reboot.
8. Update your system (language support, new kernel, grub, misc updates)
9. Don't reboot.
10. sudo gedit /lib/udev/
10.1 and uncomment again... it was reset to the faulty default!
11. Reboot
here you go. This worked for me. Still working after several reboots.. and its fast on this old and small eee pc.
Good Luck!
Jean-Louis (jean-louis) wrote : | #147 |
> @jean-louis: do you have your patched libatasmart4 code in a PPA?
> I would be pleased to test it.
No, I don't have a ppa.
I'm new on launchpad and I have yet to understand its features.
My comment is reported to upstream by Andrew Simpson, but Lennart Poettering says that a proposed patch is bogus (https:/
I'm quite a beginner with c, but I don't find how d->identify_valid is zeroed or setted FALSE. Obviously, I could be missed it.
Now I would create account on freedesktop for ask
Tommy Trussell (tommy-trussell) wrote : | #148 |
@mint-one -- when I did that, my system was corrupted BEFORE step #8 was finished.
Andrew Simpson (andrew-simpson) wrote : | #149 |
O.K., I think I have a better workaround for this bug.
The problem is that udev reads the udev rule files into memory and then uses inotify to watch for changes in the file. As soon as the rule file changes, udev is informed and re-reads the file. That means that when apt-get updates the rule file, damage can be done before you get a chance to patch it again.
What I have done is put a dummy file in for devkit-
Run the following command:
$ sudo dpkg-divert --divert --add --rename --divert /lib/udev/
This renames the existing file to devkit-
To see your divert (and others in the system):
$ sudo dpkg-divert --list
Now we create a dummy file:
$ sudo /lib/udev/nano devkit-
#!/bin/bash
#
exit 0
Save the file.
This dummy file does precisely nothing, but it allows udev to run it...
Make the new dummy file executable:
$ sudo chmod 755 /lib/udev/
That's it.
When the bug gets really fixed, we need to remove the dummy file and divert:
$ sudo rm /lib/udev/
$ dpkg-divert --rename --remove /lib/udev/
Andrew Simpson (andrew-simpson) wrote : | #150 |
Carrying on from above:
Here's how to patch and install a system safely from the LiveCD (or live USB) of Ubuntu 9.10.
I booted up the LiveCD and patched the live system as above. That made the live system safe to use. I then installed from the LiveCD (no errors - good).
However instead of immediately rebooting, I patched the SSD from the live system:
$ sudo mkdir /target
(In my case it already existed from the install)
$ sudo mount /dev/sda1 /target
$ sudo chroot /target
You are now in the new (SSD) system as root, but safely running on the patched LiveCD. Follow the steps above, but leave out 'sudo', because you are root. When finished you can leave chroot by:
# exit
------------------
Edit on previous comment:
$ sudo /lib/udev/nano devkit-
-- should read:
$ sudo nano /lib/udev/
Tommy Trussell (tommy-trussell) wrote : | #151 |
@Andrew Simpson: Thank you! but be sure to substitute the path to an editor that works ;-) I'm testing this now.
sudo gedit /lib/udev/
or
sudo nano /lib/udev/
Tommy Trussell (tommy-trussell) wrote : | #152 |
Finished testing, and the new workaround procedure works fine on my ASUS. (I did have some trouble on the first reboot after update-manager finishes, but it looks like a grub issue, probably not related to this bug.)
In freedesktop.org Bugzilla #25673, Lennart-poettering (lennart-poettering) wrote : | #153 |
(In reply to comment #12)
> (In reply to comment #9)
> > (In reply to comment #6)
> >
> > >
> > > Comment #129 of the Ubuntu Bug Report is well worth reading, because it seems
> > > to be isolating the bug.
> >
> > I don't think so. That proposed patch is bogus, identify_valid is FALSE unless
> > set to TRUE anyway.
>
> I'm the author of comment #129 in launchpad
> <https:/
>
> I'm quite a beginner with c, but I know that if a variable is not initialized
> its value is garbage.
>
> I don't find how d->identify_valid is zeroed or setted FALSE. Obviously, I
> could be missed it.
The initial calloc() call for allocating the SkDisk structure does the zero initialization.
In freedesktop.org Bugzilla #25673, Jelot-freedesktop (jelot-freedesktop) wrote : | #154 |
(In reply to comment #13)
> (In reply to comment #12)
> > I don't find how d->identify_valid is zeroed or setted FALSE. Obviously, I
> > could be missed it.
>
> The initial calloc() call for allocating the SkDisk structure does the zero
> initialization.
>
Uh... thanks for clarification and sorry for wasting your time.
In freedesktop.org Bugzilla #25673, 4280829-noduck (4280829-noduck) wrote : | #155 |
(In reply to comment #9)
> (In reply to comment #6)
>
> >
> > Comment #129 of the Ubuntu Bug Report is well worth reading, because it seems
> > to be isolating the bug.
>
> I don't think so. That proposed patch is bogus, identify_valid is FALSE unless
> set to TRUE anyway.
>
> Also, supposedly SMART does work with smartmontools, just not with libatasmart,
> right? That comment suggests that SMART would not work at all with those SSDs.
>
> Andrew, do you have one of the SSDs affected? Could you step through the code
> and figure out exactly which command triggers the problem?
>
I ran gdb on devkit-
There is one earlier call to disk_command (with SK_ATA_
Raf.
I can confirm this occurs on a stock Asus EEE 900 (original celeron linux model with 4gb+16gb SSD's - it occurs on both).
Changed in libatasmart: | |
status: | Unknown → Confirmed |
Vishal Rao (vishalrao) wrote : | #157 |
I've filed https:/
jslater (jslater) wrote : | #158 |
Does this bug affect _everyone_ with an eeePC [900]? A "Tier 1" supported netbook platform.
It certainly destroyed the contents of my SSD.
The workaround works, but there is no mention on https:/
This bug has existed for months and has seriously dented my impression of Ubuntu.
Tommy Trussell (tommy-trussell) wrote : | #159 |
@jslater: from earlier tests it seemed it did not affect everyone, or at least not equally. For example, I have an ASUS EeePC 900 (4GB SSD, no built-in webcam, purchased from Target) and I see the problem very clearly on my Patriot Lite upgraded SSD but not on the stock ASUS 4GB SSD. I haven't swapped the 4GB SSD back in since we have discovered the trigger -- it's possible the stock SSD was somewhat affected but didn't trash data as thoroughly or something. When I get back to my office later I might get out the screwdriver set and try it.
I agree that this bug should be noted somewhere on that netbooks wiki page. Please feel free to add it! (Though it would be hard to know exactly which models might be affected. Maybe ALL of them, depending on which SSD is installed.)
I believe some SSDs have been reported that are not installed in netbooks.
If you see a good place to include the warning, please add it.
rogmorri (frontporsche) wrote : | #160 |
The factory-installed SSD on my Aspire Acer One 110-1722, a tier-2 netbook, died a few months ago. In retrospect, I think the issue was this very bug.
(I've since replaced the SSD with a 16G after-market drive, which also suffered from this problem.)
Wouldn't a problem like this, where there's no easy workaround for avoiding the problem at install time, call for releasing a new 9.10.1 Ubuntu iso?
Alan Pope 🍺🐧🐱 🦄 (popey) wrote : | #161 |
Just for reference I have an EEE 900 which has been running Jaunty 9.04 for a while just fine. I upgraded to 9.10, and before rebooting implemented the change to /lib/udev/
In freedesktop.org Bugzilla #25673, Jelot-freedesktop (jelot-freedesktop) wrote : | #162 |
In Karmic there is a new stable kernel 2.6.31-17.54 <https:/
[quote]
commit 9982364654c186a
Author: Tejun Heo <email address hidden>
Date: Fri Oct 16 13:00:51 2009 +0900
libata: fix internal command failure handling
commit f4b31db92d163df
When an internal command fails, it should be failed directly without invoking EH. In the original implemetation, this was accomplished by letting internal command bypass failure handling in ata_qc_complete(). However, later changes added post-successful
This patch updates failure path such that internal command failure handling is contained there.
[/quote]
Could be related to this bug?
Jean-Louis (jean-louis) wrote : | #163 |
In Karmic there is a new stable kernel 2.6.31-17.54
<https:/
upstream kernel 2.6.31.6; in the changelog
<http://
commit 9982364654c186a
[quote]
commit 9982364654c186a
Author: Tejun Heo <email address hidden>
Date: Fri Oct 16 13:00:51 2009 +0900
libata: fix internal command failure handling
commit f4b31db92d163df
When an internal command fails, it should be failed directly without invoking
EH. In the original implemetation, this was accomplished by letting internal
command bypass failure handling in ata_qc_complete(). However, later changes
added post-successful
path is no longer adequate as internal command failure path. One of the
visible problems is that internal command failure due to timeout or other
freeze conditions would spuriously trigger WARN_ON_ONCE() in the success path.
This patch updates failure path such that internal command failure handling is
contained there.
[/quote]
Could be related to this bug?
Tommy Trussell (tommy-trussell) wrote : | #164 |
Here is a tested procedure for getting an uncorrupted Karmic system installed onto a netbook (tested on my ASUS Eee PC 900).
Installing Ubuntu Karmic 9.10 UNR on a system with an affected SSD
1) Boot from a live Ubuntu "Karmic" 9.10 USB stick or SD card.
2) If the SSD has been trashed by previous encounters with the bug, it may need to be wiped to eliminate bad blocks. Open a terminal and issue the following command (this assumes the SSD mounts to "/dev/sda" -- you must be certain of the device name on your system because everything on it will be erased):
$ sudo dd if=/dev/zero of=/dev/sda bs=1M
3) After step 2 finishes (it can take awhile), launch the "Install Ubuntu-
4) When the installer finishes, a dialog will come up suggesting you can restart now. Don't restart yet! While that dialog is open, the install partition should still be mounted at /target ... HOWEVER if you already closed the dialog, open a Terminal and mount the partition:
$ sudo mount /dev/sda1 /target
5) Now chroot into the target system
$ sudo chroot /target
6) The terminal is chrooted into the target system as root (no need for sudo). You can now divert the problematic file on the target system:
# dpkg-divert --divert --add --rename --divert /lib/udev/
7) Now create a file. You will type three lines directly into the file, finishing with a control-D. (If you make a mistake that you can't fix using backspace, close the file with control-D and use nano or vim to edit the file.)
# cat > /lib/udev/
#!/bin/bash
#
exit 0
[type control-D here]
8) Make the new file executable:
# chmod 755 /lib/udev/
9) exit the chroot and terminal
# exit
$ exit
10) Shutdown, remove the USB stick or SD card, and boot into the new system. Install all software updates as needed.
-------
After you know this bug has been fixed AND after the correct updated devicekit-disks package has been installed on your system, you can re-enable it using these commands:
$ sudo rm /lib/udev/
$ sudo dpkg-divert --rename --remove /lib/udev/
-------
Whats the update on this? The workaround seems good, and this issues exists in every current linux distro? Is a fix actually on its way?
Jarige (jarikvh) wrote : | #166 |
I've noticed that I had similar symptoms of this bug when adding "elevator=noop" to /etc/default/grub on to this line: GRUB_CMDLINE_
After removing it again from command line it worked 'normally' again. When I say normally I mean that all programs seem to stop responding when the SSD is in use (either write or read). Installing a program will make every other program to 'crash' and the GUI (even the mouse) stops responding. This is not happening all the time though. As I'm typing on my AAO (8GB SSD) I see the SSD LED blinking every now and then but it affects other progams pretty often.
Vishal Rao (vishalrao) wrote : | #167 |
FYI, my (solved) problem is/was NCQ not SMART as you can see in this comment in another bug: https:/
If you see "failed command READ FPDMA QUEUED" kind of logs in dmesg then that might be for you...
Basically you need to pass " libata.force=noncq " to the linux kernel boot param which I am doing but not sure what to do if people have multiple drives some properly supporting NCQ...
Юрий Аполлов (apollovy) wrote : | #168 |
It's told here in launchpad, that this bug has a patch. But I cannot see it in usual way - there are instructions, but no patch. Can anyone write a patch-script to work this problem around??
Changed in linux: | |
status: | Confirmed → Invalid |
ectropionized (ectropionized-deactivatedaccount) wrote : | #169 |
After upgrading the kernel to 2.6.31-19, devicekit-disks (007-2ubuntu4), and enabling DMA again, I am receiving no errors after testing extensively. I was one of those plagued with this bug, causing havoc on my netbook SSD. Can anyone else confirm a resolution on their end? It's entirely possible the time period for testing became anomalous, although I would figure unlikely given the consistency of errors previous. Since I am not yet receiving errors I thought it was worth opening continued discussion.
ectropionized (ectropionized-deactivatedaccount) wrote : | #170 |
Scratch that on the resolution. Although I'm no longer receiving data corruption with DMA enabled, I just received this:
[ 2962.988208] ata2: lost interrupt (Status 0x58)
[ 2962.988297] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 2962.988306] ata2.00: BMDMA stat 0x4
[ 2962.988323] ata2.00: cmd ca/00:08:
[ 2962.988326] res 58/00:08:
[ 2962.988333] ata2.00: status: { DRDY DRQ }
[ 2962.988376] ata2: soft resetting link
[ 2963.196543] ata2.00: configured for UDMA/66
[ 2963.196581] ata2: EH complete
Will (will-berriss) wrote : | #171 |
I have just put a Super Talent 32GB SSD into my AA1 netbook and have installed Ubuntu 9.10 and I have this bug.
I don't want to damage my SSD as it was expensive.
What are my options to avoid Ubuntu damaging my SSD? Do I need to stop using 9.10 and wait for 10.04 or will the workaround above treat the SSD nicely?
Gav Mack (gavinmac) wrote : | #172 |
@Will - Follow the instructions on post 147
Will (will-berriss) wrote : | #173 |
@Gav Mack - Thanks! That's looks like quite a change, but I like the idea of not having swap so I may give it a go and reinstall.
Currently all I have done is the stuff in post 1, i.e. this:
# ATA disks driven by libata
#KERNEL=
Is this enough to avoid damage to the SSD or is it only a small step towards reducing SSD wear and tear?
Thanks.
Martin Pitt (pitti) wrote : | #174 |
but is in libatasmart, closing devicekit-disks task.
Changed in devicekit-disks (Ubuntu): | |
status: | Triaged → Invalid |
importance: | Critical → Undecided |
Tommy Trussell (tommy-trussell) wrote : | #175 |
@will -- in my experience with my Patriot Lite SSD, the patch in post 1 works fine but your first software update will undo it. And it's not good enough to apply the patch after the software update is finished, because the buggy code starts running immediately and it trashes the drive before the update even finishes.
The more elaborate workaround in comment 147 (which Andrew Simpson developed and I reiterated) tells the package manager to move the buggy code to a different location and apply all future updates to it in the new location (step 6) and creates a dummy executable file that runs but does nothing (step 7 & 8) in the location of the old software.
Will (will-berriss) wrote : | #176 |
@Tommy Trussell - Thank you very much!
I reinstalled with no swap space and will apply #147 next. I just did #1 for the time being and i noticed an update overwrote it, so I reapplied it. Luckily my SSD survived that at least.
Next time I boot it up, I'll apply #147 right away. What a nightmare!
Thanks again! :)
Will (will-berriss) wrote : | #177 |
I had to read #147 a couple of times, as the wording of step 7) confused me. Anyway, in short I did this to my working system:
6) You can now divert the problematic file on the target system:
# dpkg-divert --divert --add --rename --divert /lib/udev/
7) Now create a file.
vi /lib/udev/
and put the following 3 lines in it:
#!/bin/bash
#
exit 0
8) Make the new file executable:
# chmod 755 /lib/udev/
In freedesktop.org Bugzilla #25673, 4280829-noduck (4280829-noduck) wrote : | #178 |
(In reply to comment #15)
> I ran gdb on devkit-
Was this information helpful? If not, can you let me know how I can assist in fixing this bug?
Raf (4283534-noduck) wrote : | #179 |
This bug also affects Lucid via the call to udisks-
KERNEL=
in /lib/udev/
Is there any plan to include a real fix for this problem in Lucid? The upstream kernel bug was closed, the upstream libatasmart bug hasn't received much attention. How can I help to make sure that this bug is fixed in Lucid?
Andrew Simpson (andrew-simpson) wrote : | #180 |
@Raf
I closed the kernel bug report (it was my bug report) since it's not relevant to the kernel.
I've also nominated this bug for Lucid release - whatever that does.
More importantly the upstream maintainer seems to have lost interest in fixing this bug. How does one go about nominating packages for removal from Ubuntu due to lack of response from upstream maintainer?
Guy Taylor (thebiggerguy) wrote : | #181 |
@Andrew Simpson
I have had good response from upstream and think this is a good package to keep within Ubuntu.
@all
Has the particular hardware (SSD or Controller) being identified yet? libatasmart has a "quirk" table to black list incompatible hardware.
Andrew Simpson (andrew-simpson) wrote : | #182 |
@Guy Taylor
The hardware has been generically identified - most Super Talent and Patriot devices and less commonly a few others. The problem seems to be at the SSD rather than the bridge. There are enough people following this bug to enable compilation of a reasonably complete list if asked.
What device information is required for a quirks table? Output from lspci -vv? Or something else?
Skylord (me-skylord) wrote : | #183 |
BTW, this problem refers not only to specific hardware. For example I encountered it after updating my standard EeePC 901 SSD firmware to newer version - with better speed performance (in exchange of reducing disk space). The same is for Acer AspireOne....
adamski (adam-hasselbalch) wrote : | #184 |
So.
I have /dev/zeroed both the 4G and the 16G SSDs in my Eee PC after running into this bug. When I discovered what was going on, I had used Karmic for about one hour.
The 4G drive seems to be OK.'badblocks -s' finish with no reports.
The 16G drive is dead. Stuffed to the brim with bad sectors, and dmesg shows I/O errors galore. This is both with 9.04 and 8.04 (which was my last known good installation) kernels. 8.04 allowed me to actually make a partition table after 9.04 failed with I/O errors all over the place. 8.04 also allowed me to create a file system. badblocks(1), however, show that there are still tons of errors on the drive.
I am going to try another /dev/zeroing of the 16G drive with a 8.04 kernel, for good measure, but I am not optimistic, since the HSM violations are gone, and what I see is, as mentioned, what looks like hardware I/O errors.
Mind you, both these drives worked fine prior to installing 9.10, but now, one disk is dead.
I am NOT really happy with Ubuntu right now. Spare Asus SSDs (which of course don't use regular SATA connections) are not a common commodity here, so for all intents and purposes, Karmic has bricked my Eee.
Юрий Аполлов (apollovy) wrote : Re: [Bug 445852] Re: devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential hardware death | #185 |
adamski, if possible - try to renew (actually change) firmware. For a couple
of times. I got such troubles with Karmik on my Acer Aspire One 8Gb SSD.
Then many times zeroed it, ten (or more) times one-by-one firmwared - and
now it's running fine. Jaunty.
2010/2/24 adamski <email address hidden>
> So.
>
> I have /dev/zeroed both the 4G and the 16G SSDs in my Eee PC after
> running into this bug. When I discovered what was going on, I had used
> Karmic for about one hour.
>
> The 4G drive seems to be OK.'badblocks -s' finish with no reports.
>
> The 16G drive is dead. Stuffed to the brim with bad sectors, and dmesg
> shows I/O errors galore. This is both with 9.04 and 8.04 (which was my
> last known good installation) kernels. 8.04 allowed me to actually make
> a partition table after 9.04 failed with I/O errors all over the place.
> 8.04 also allowed me to create a file system. badblocks(1), however,
> show that there are still tons of errors on the drive.
>
> I am going to try another /dev/zeroing of the 16G drive with a 8.04
> kernel, for good measure, but I am not optimistic, since the HSM
> violations are gone, and what I see is, as mentioned, what looks like
> hardware I/O errors.
>
> Mind you, both these drives worked fine prior to installing 9.10, but
> now, one disk is dead.
>
> I am NOT really happy with Ubuntu right now. Spare Asus SSDs (which of
> course don't use regular SATA connections) are not a common commodity
> here, so for all intents and purposes, Karmic has bricked my Eee.
>
> --
> devkit-
> hardware death
> https:/
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in ATA S.M.A.R.T. Disk Health Monitoring Library: Confirmed
> Status in The Linux Kernel: Invalid
> Status in “devicekit-disks” package in Ubuntu: Invalid
> Status in “libatasmart” package in Ubuntu: Triaged
> Status in “devicekit-disks” source package in Lucid: Invalid
> Status in “libatasmart” source package in Lucid: Triaged
> Status in “devicekit-disks” source package in Karmic: New
> Status in “libatasmart” source package in Karmic: New
>
> Bug description:
> TEMPORARY WORK AROUND FOR THIS PROBLEM:
>
> 1. sudo gedit /lib/udev/
>
> 2. locate the following lines (about 1/3 the way into the file; search for
> "smart")
>
> # ATA disks driven by libata
> KERNEL=
> ENV{DEVTYPE}
> $tempnode"
>
> 3. comment out the second line by adding a # in front, so you should have
>
> # ATA disks driven by libata
> #KERNEL=
> ENV{DEVTYPE}
> $tempnode"
>
> 4. save the file and reboot
>
>
> BUG DESCRIPTION FOLLOWS:
>
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading. When it happens during
> login I think it is making gnome time out on load...
shadowblast101 (shadowblast101) wrote : | #186 |
Adamski, just thought I'd clarify that this applies to a lot of Linux distributions, not just Ubuntu. I have this bug on my Arch install, and I think someone with Debian has confirmed the bug as well. Pretty much anything that calls lib-ata-smart will suffer from this, so there's not really any reason to blame Ubuntu explicitly.
In freedesktop.org Bugzilla #25673, 4280829-noduck (4280829-noduck) wrote : | #187 |
(In reply to comment #17)
> Was this information helpful? If not, can you let me know how I can assist in
> fixing this bug?
For those still following: disabling smart support on the device prevents the second (dangerous) ioctl, and as a result no more HSM violations.
smartctl --smart off /dev/sda
I'd disagree, theres a perfectly good reason to blame Ubuntu, their NBR doesn't work on Netbooks. In my book thats an epic FAIL and probably spells the end of Linux's chance on netbooks given it is the highest profile distro and they just don't appear interested in fixing it.
To those having problems recovering devices, I recommend repartioning the drive or writing to the raw devices. Milax does this well. Format, select device, analyze and purge.
GenericAnimeBoy (souletech) wrote : | #189 |
I'd rather not join in with hysterical finger-pointing, but it's been a month and a half, people. The workaround in #147 (modified in #160 for working systems) could easily have been implemented as a script, packaged up, and pushed out via the software updater by now, and the fix could have been rolled into the Karmic .iso's. You have to realize just how critical this is: the only reason I even became aware of this problem was that when the SSD hung up during boot (as a result of this bug) my gnome-applets failed to load. For every affected user who is on launchpad following this bug, I would guess there have been at least 3 others who have just ignored the occasional glitches, thinking that it's nothing major.
How many drives have been destroyed in the month and a half since the workaround was published? Replacing an SSD in a netbook is an expensive, time consuming, and (if you do it yourself) warranty voiding operation.
GenericAnimeBoy (souletech) wrote : | #190 |
- Hotfix script for implementing fix #160 on working systems Edit (629 bytes, text/plain)
Just a rough implementation of #160 as a script. Probably needs to be prettied up for public consumption.
Raf (4283534-noduck) wrote : | #191 |
I found a possibly easier work around: after I disabled SMART support on the device, I can safely run devkit-
sudo smartctl --smart off /dev/sda
You can check if smart is disabled, with 'sudo smartctl -i /dev/sda', the output should include (note the last line):
SMART support is: Available - device has SMART capability.
SMART support is: Disabled
Note the following comment in the smartctl manual: "In principle the SMART feature settings are preserved over power-cycling, but it doesn´t hurt to be sure." I have not yet rebooted.
Looking at the strace of devkit-
Raf (4283534-noduck) wrote : | #192 |
I just rebooted, and SMART was enabled again. So this doesn't work through a reboot. Sorry.
Rick @ rickandpatty.com (rick-rickandpatty) wrote : | #193 |
@Raf
As you say, that won't survive a reboot - but your workaround in #172 might be a good thing to add between steps 2 (zeroing the drive) and 3 (starting the installer) of the Karmic installation procedure in #147. Simply starting up the partitioner during installation will trigger the bug and trash some SSDs - like the original factory SSD in my Asus Eee 701 8G
Guy Taylor (thebiggerguy) wrote : | #194 |
Hi all
My research has found this only affects:
Company Model Name Model Number Firmware Version
-------
Intel Z-P230 SSDPAMM0008G1 Unknown affected firmware. Inclusive of "Ver2.J0H" and "Ver2.I0K"
Seagate STEC PATA 8GB Unknown affected firmware. Inclusive of "D5221-10"
Unknown Flash Module Unknown affected firmware. Inclusive of "Ver3.P0B"
Could people with the problem please run "sudo hdparm -i /dev/sda" (replacing the sda with the problematic drive) to confirm this or flag any other drives and or identify 'fixed' firmware.
Thank you
Guy Taylor (thebiggerguy) wrote : | #195 |
- Current List of known drives Edit (617 bytes, text/plain)
Sorry all for the formatting. I have attached a text file instead so you can actually read it.
GenericAnimeBoy (souletech) wrote : | #196 |
- hdparm.txt Edit (611 bytes, text/plain)
My hdparm is attached. The SSD in question is the aftermarket 16GB Supertalent SSD everyone's talking about, and it looks like the firmware version already appears on your list.
FWIW, I've only had minor issues with this one: it would occasionally hang up during boot and cause several gnome applets to fail to start. I had an Intel SSDPAMM0008G, which was the original SSD in this netbook [Acer Aspire One ZG5] which cratered the first time booted 9.10 from it. I guess I know now why that happened.
shadowblast101 (shadowblast101) wrote : | #197 |
- PatriotHDParam.txt Edit (614 bytes, text/plain)
Here's the patriot 32gb SSD that a couple of us have. I have had this one completely corrupt on both Ubuntu and Arch before applying the patch.
LarryGrover (lgrover) wrote : | #199 |
- supertalent32GB.txt Edit (616 bytes, text/plain)
Acer Aspire One with replacement Super Talent 32 GB SSD. I experienced stalls during boot up and HSM error messages in logs, but no data corruption. Drive info attached.
Raf (4283534-noduck) wrote : | #200 |
- supertalent32GB.txt Edit (2.3 KiB, text/plain)
I have the same Super Talent 32 GB SSD. But it identifies it self simply as 'Flash' (see attached output). I think the easiest way to implement a blacklist will be in the udev rules, so I included the output of 'udevadm info --query=all --path /block/sda'.
@LarryGrover: I have the same device. While debugging this problem (repeated runs of devkit-
Samizdata (samizdata) wrote : | #201 |
- MyAcerAspireOnehdparm.txt Edit (1.6 KiB, text/plain)
I have the SSDPAMM0008G1, FwRev=Ver2.I0K SSD in my Acer Aspire One, but I am including the data for the sake of completeness. I have not had it brick, but I did have problems with timing out and the "lost+found" data corruption. I am currently running with the "quick", non-redirect workaround successfully. I have also attached both sets of data mentioned.
ipig (infopiggy) wrote : | #202 |
Stock Asus EEE 900 (Celeron) / White 12G?
SSDs: Phison / 4GB (Primary) - 8GB (2ndary)
- ~Same hardware as post #179 -
On remix 9.10/32 & regular 9.10/32 - HSM Violation
(Used) Post #147 (upon 3rd install!) fixed issues.
- Had disk utility crash on 3rd/fix install during format. Hoping it's not bad blocks?
Same fix in lucid requires editting 80-udisks.rules. Search for 'smart' and look for the entry.
basily (basily) wrote : | #204 |
A me too... Acer Aspire One with replacement Super Talent 32 GB SSD, UNR 9.10. I had very slow boot times, but not data corruption. I have successfully implemented the work around in posting #147. Now boot times are around 27 seconds to full desktop and wifi connection.
Alain SAURAT (maisondouf) wrote : | #205 |
- dmesg without and with PCIe SSD Edit (5.6 KiB, text/plain)
I have a similar problem with an EeePC 701 ugraded with two SSD.
Originaly equiped with 4Gb SSD onto the mobo, Jaunty, Karmic and Lucid initialize it in UDMA66 mode without any problems and the read speed is around 30Mb/s.
I upgrade this notebook by adding a 32Gb PCIe PATA SSD in the extension connector.
BIOS well reconize them as Secondary Master for internal SSD and as Secondary Slave for PCIe SSD.
Now the kernel spend 3 timouts of 30 seconds to downgrade ata protocol from UDMA66 to PIO4 for the internal 4Gb SSD.
The PCIe SSD is directly use in UDMA66 mode.
After booting the read speed is 3Mb/s on internal SSD and 40Mb/s on PCIe SSD
I try, as I read here, to add "libata.dma=0" to the grub.cfg file, in this case timouts disapear but read speeds are very low on all disks
To avoid timouts and have a good read speed on /dev/sdb, can I deactivate UDMA mode only on /dev/sda and how ?
ps: I try an USB WinXP, there is no timout during boot and speeds are the sames ( Internal 4Mb/s, PCIe 40Mb/s )
Alain SAURAT (maisondouf) wrote : | #206 |
Whouauuuhhh !
with "libata.
Option found in [url]http://
Syntax found in [url]http://
Read speed stay very low on internal SSD but it doesn't matter for me.
Richard Ayotte (rich-ayotte) wrote : | #207 |
Setting libata.force=noncq fixed it for me. Disabling smart as describe in the workaround or forcing pio4 had no effect.
Hardware: Acer Aspire One
Here's what I did.
sudo gedit /etc/default/grub
Change the line that says:
GRUB_CMDLINE_
to
GRUB_CMDLINE_
sudo update-grub
reboot.
Vishal Rao (vishalrao) wrote : Re: [Bug 445852] Re: devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential hardware death | #208 |
aha! so i found another user needing the same workaround!
i had mentioned this in this bug and another and also sent a patch to
LKML which wasn't
safe enough to go in and also blogged about it: http://
On 15 March 2010 02:10, Richard Ayotte <email address hidden> wrote:
> Setting libata.force=noncq fixed it for me. Disabling smart as describe
> in the workaround or forcing pio4 had no effect.
>
> Hardware: Acer Aspire One
>
> Here's what I did.
>
> sudo gedit /etc/default/grub
>
> Change the line that says:
> GRUB_CMDLINE_
> to
> GRUB_CMDLINE_
>
> sudo update-grub
>
> reboot.
>
> --
> devkit-
> https:/
> You received this bug notification because you are a direct subscriber
> of the bug.
>
--
"Thou shalt not follow the null pointer for at its end madness and chaos lie."
Vishal Rao (vishalrao) wrote : | #209 |
See comment # 141 here https:/
Steve (sjc-carpanet) wrote : | #210 |
I do not have an SD disk, but I do have the exact same errors:
[ 3252.000066] ata1: lost interrupt (Status 0x58)
[ 3252.004027] ata1: drained 32768 bytes to clear DRQ.
[ 3252.091644] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 3252.091669] ata1.01: cmd a0/00:00:
[ 3252.091672] cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 3252.091675] res 58/00:01:
[ 3252.091684] ata1.01: status: { DRDY DRQ }
[ 3252.091726] ata1: soft resetting link
[ 3252.332346] ata1.00: configured for UDMA/100
[ 3252.348516] ata1.01: configured for MWDMA2
[ 3252.348839] ata1: EH complete
It looks like the problem might be with the cdrom?
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: ST96812A Rev: 3.05
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: MATSHITA Model: UJDA775 DVD/CDRW Rev: 1.00
Type: CD-ROM ANSI SCSI revision: 05
In any case, I applied the workaround in question, and it definitely happens less often, but still happens to me pretty frequently.
Paede (patrick-steiner-gmx) wrote : | #211 |
@Steve
I also have the same problem on a normal IDE disk. What type of Notebook do you have? And also the same type of cdrom:
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: WDC WD2500BEVE-0 Rev: 01.0
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: MATSHITA Model: UJ-822Da Rev: 1.02
Type: CD-ROM ANSI SCSI revision: 05
this error happend to me when i move my laptop. i don't know if it cause by the HP Mobile Data Protection System
Raf (4263004-noduck) wrote : | #212 |
The first beta of Lucid was released and we have not yet found a solution for the HSM violations and corruption. I would really hope that we can find a way to fix this before the final release.
A proper solution would be a patch for libatasmart, but I have not seen any progress.
A first workaround would be to disable the use of udisks-
An alternative workaround would create a blacklist in 80-udisks.rules, so that SMART test is not run on the devices identified above (and possible others).
I would like to know if the developers are willing to accept either of these workarounds.
Raf (4263004-noduck) wrote : | #213 |
I should have written:
I would like to know if the *maintainers* are willing to accept either of these workarounds.
ipig (infopiggy) wrote : | #214 |
(Ref Post #183)
This bug gave me hell this weekend.
I had 9.10 running fine all month with remix.
I decided to nuke my netbook & later put 9.10 back on.
Upon installing 9.10 I forgot the timing of post #147. I finished the installation & was confronted with HSM Violations (again)
After the install i attempted the fix in post #188. That fix did not work. Attempting to undo 188 i was confronted with the inability of writing to the disk (save) - i could not undo the changes.
I decided to do another install this time not forgetting to apply 147 prior to re-boot. I was then unable to finish any installation normally. (aka never got to that point)
Installations would hang @ 38% (copying files iirc) & i noticed the HD/LED light
would begin flashing in a timed (1/2 sec per) fashion. I'd tried 3 installs partitioning (even slightly diff sizes) & formatting - Nothing made a difference.
I'd also tried the dd command (post 147) in between - but i don't think i did it correctly as it only took about 7-10 minutes (on 8GB). - I believe i'd ran it on a partition instead of disk.
Later on i noticed hitting the power button (@ the 38% hang) dropped me to a screen that displayed **The machine was in an infinite loop of HSM violation errors** (over & over & over)- In sync with the flashing HD light.
It seemed regardless of post #147 - the bug effects the machine earlier than that. That or the bug had still been dragging along all this time.
I'd tried using a separate gparted livecd to format & partition & it made no difference on installations failing.
Literally a day later i ran the dd command again (post 147) this time correctly & it took about 1.5 hours. (i ran it from 8.04 live/cd)
I had a feeling things would then go differently & they did. I managed to get 9.10 installed - HOWEVER i did still receive a disk utility crash (i believe during formatting - it's difficult to tell when it occurs because all it does is put a small red icon in the task bar) - I believe i put in #147 correctly (hell i'd done it before)
So i am thinking 'finally' this has been dealt with.
Wrong.
After installation i realized for some reason i'm unable to install any packages or updates. Seemingly anything. It seemed i could write to the disk OK but reading/installing packages/updates resulted in input/output errors displayed in the terminal/details.
I battled with this for a while but then i just gave up. F'it.
I'm willing to put 4-6 hours in but once it starts pushing beyond that people just can't be expected to deal with this. I was quite angry by the time i gave up & i am still a bit disgusted with this. No doubt that i've spent 8 or more hours in some way dealing with this problem.
I am not pleased upon hearing there's no fix in Lucid.
If the difference between getting a patch worked on & not is me packing up my netbook and shipping it off then i might be willing. I am a fairly loyal eee pc fan/user - when i think of netbook i think 'eee pc'.
What's bothersome is the apparent netbook remix edition. What part of netbook didn't include EEE PCs?
ipig (infopiggy) wrote : | #215 |
Here's a surprise. I installed beta 10.04 LTS yesterday & it seems this problem doesn't exist. Things have generally been pretty good. Hopefully the full release keeps it up!
ipig (infopiggy) wrote : | #216 |
Spoke too soon. On a re-boot of the machine it got caught in a bunch of HSM violations. Ugh i'd just started enjoying 10.04. Same issue :( - Not sure how well it's going to start up at the moment.
ipig (infopiggy) wrote : | #217 |
Here's what i decided to do w/beta 10.04
- Deleted partitions via gparted/10.04 livecd / applied & did not re-create, quit
- Ran: sudo dd if=/dev/zero of=/dev/sda bs=1M (post #147)
- Started & finished 10.04 install within live cd / left re-boot prompt open /
- Applied steps #5 -> * (in post #147)
- Did NOT re-boot / Kept terminal open
- While still in /target -> deleted smart section out of 80-udisks.rules (re: #161) *
- Finished / will check in later
* While in 80-udisks.rules i don't think the line specified by 161 was commented out.
ideathproof (glenn-immortal) wrote : | #218 |
I have just installed 10.4 and applied ipig's suggestion #198, working fine so far.
Changed in libatasmart (Ubuntu Lucid): | |
assignee: | nobody → Martin Pitt (pitti) |
status: | Triaged → In Progress |
Changed in devicekit-disks (Ubuntu Karmic): | |
status: | New → Invalid |
Martin Pitt (pitti) wrote : | #219 |
I'll disable the probing in karmic for now; it's not really critical for the system to work, it will just disable the warnings that you'll get for potential disk failures from SMART. But that's much better than the current situation.
I will check the smartmontools code what they do differently. The problem does not happen on my two computers (one HDD, one SSD), so I'd really appreciate if someone affected could give me ssh access to such a system? (second key on https:/
With that we at least have some more time to figure out a proper fix in libatasmart. I'll start with pursuing the IDENTIFY PACKET DEVICE path as suggested by Jean-Louis, thanks for that!
Changed in devicekit-disks (Ubuntu Karmic): | |
assignee: | nobody → Martin Pitt (pitti) |
importance: | Undecided → Critical |
status: | Invalid → In Progress |
Changed in libatasmart (Ubuntu Karmic): | |
status: | New → Triaged |
Changed in devicekit-disks (Ubuntu Karmic): | |
status: | In Progress → Fix Committed |
tags: | added: verification-needed |
Martin Pitt (pitti) wrote : | #220 |
Accepted devicekit-disks into karmic-proposed, the package will build now and be available in a few hours.
Please test and give feedback here. See https:/
In freedesktop.org Bugzilla #25673, Martin Pitt (pitti) wrote : | #221 |
I got ssh access to an affected machine and finally tracked this down. I also compared it to the ioctls that smartmontools do.
My raw notes with all the ioctl stracing, bisecting, etc. is in https:/
Summary: It seems READ_THRESHOLDS without READ_DATA (or a different ioctl like RETURN_STATUS) causes this problem, the drive "wants" to send more data which is never flushed. Possible explanation: https:/
http://
Quite obviously from the commit, sk_disk_open() called sk_disk_
udisks-
So while a223a4 fixes this for the "common" use cases, there might still be situations where thresholds are read, but not the values. Let's look where init_smart() (the only place reading thresholds) is called:
* sk_disk_
* sk_disk_
* sk_disk_
So right now, all code paths work.
However, a potential robust solution might be to make init_smart() call sk_disk_
Martin Pitt (pitti) wrote : | #222 |
Alan Pope kindly provided ssh access to his affected machine, and I analyzed this in detail.
I put my raw notes here for having a permanent record. I'll follow up with a more human-readable status in the next comment, so unless you are interested in the technical details, you can safely ignore this long post.
Jean-Louis' theory: check PACKET Command feature
-------
* Both of my computers can do SMART just fine, but both also succeed with IDENTIFY_
* An affected machine responds to SMART commands just fine with current Ubuntu 10.04 beta-1 (and deliver sensible results), so they can do SMART
libatasmart 0.17+git2010021
-------
WORKS: # strace -e ioctl /lib/udev/
ioctl(3, BLKGETSIZE64, 0x9525014) = 0
ioctl(3, SG_IO, {'S', SG_DXFER_FROM_DEV, cmd[16]=[85, 08, 2e, 00, 00, 00, 01, 00, 00, 00, 00, 00, 00, 00, ec, 00], mx_sb_len=32, iovec_count=0, dxfer_len=512, timeout=2000, flags=0, data[512]
UDISKS_
WORKS: # skdump /dev/sda
WORKS: # udisks --ata-smart-wakeup --ata-smart-refresh /dev/sda
libatasmart 0.17+git2010021
-------
WORKS: strace -e ioctl ./devkit-
ioctl(3, BLKGETSIZE64, 0x9686014) = 0
ioctl(3, SG_IO, {'S', SG_DXFER_FROM_DEV, cmd[16]=[85, 08, 2e, 00, 00, 00, 01, 00, 00, 00, 00, 00, 00, 00, ec, 00], mx_sb_len=32, iovec_count=0, dxfer_len=512, timeout=2000, flags=0, data[512]
DKD_ATA_
libatasmart 0.16, udisks 1.0.0
-------
FAILS: LD_LIBRARY_
ioctl(3, BLKGETSIZE64, 0x9fef014) = 0
ioctl(3, SG_IO, {'S', SG_DXFER_FROM_DEV, cmd[16]=[85, 08, 2e, 00, 00, 00, 01, 00, 00, 00, 00, 00, 00, 00, ec, 00], mx_sb_len=32, iovec_count=0, dxfer_len=512, timeout=2000, flags=0, data[512]
ioctl(3, SG_IO, {'S', SG_DXFER_FROM_DEV, cmd[16]=[85, 08, 2e, 00, d1, 00, 01, 00, 00, 00, 4f, 00, c2, 00, b0, 00], mx_sb_len=32, iovec_count=0, dxfer_len=512, timeout=2000, flags=0, data[512]
Martin Pitt (pitti) wrote : | #223 |
So in summary, the problem is fixed in the lucid version of libatasmart. While the code could be a little more robust for future extensions (which I'll discuss in the upstream bug), there are currently no code paths which can lead to the situation that triggers HSM violations.
Changed in libatasmart (Ubuntu Lucid): | |
status: | In Progress → Fix Released |
Martin Pitt (pitti) wrote : | #224 |
For Karmic we can backport http://
It just moves some initialization code into a new function and calls this lazily. It does not change any API/ABI. It has been tested a long time in lucid and should be fairly safe.
However, I'd like to keep the current workaround in devicekit-disks in karmic-proposed for now (please test that this properly disables SMART probing). I'd like to hear some more confirmations from affected people here that things indeed work fine with Lucid beta-1 on a variety of hardware platforms before re-enabling smart probing and this patch in karmic again.
Thank you, and sorry for the trouble that this caused!
Changed in libatasmart (Ubuntu Karmic): | |
assignee: | nobody → Martin Pitt (pitti) |
importance: | Undecided → Medium |
description: | updated |
Trey (trey333) wrote : Re: [Bug 445852] Re: devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential hardware death | #225 |
I have two bricked EEE 900 chips with this problem. I've been booting
her off an SD card. Can the onboard SSD's rise from the dead now?
On 3/26/10, Martin Pitt <email address hidden> wrote:
> For Karmic we can backport
> http://
> to fix this properly. If we do this, we should also apply
> http://
> on top, to avoid exporting this as a new symbol.
>
> It just moves some initialization code into a new function and calls
> this lazily. It does not change any API/ABI. It has been tested a long
> time in lucid and should be fairly safe.
>
> However, I'd like to keep the current workaround in devicekit-disks in
> karmic-proposed for now (please test that this properly disables SMART
> probing). I'd like to hear some more confirmations from affected people
> here that things indeed work fine with Lucid beta-1 on a variety of
> hardware platforms before re-enabling smart probing and this patch in
> karmic again.
>
> Thank you, and sorry for the trouble that this caused!
>
> ** Changed in: libatasmart (Ubuntu Karmic)
> Importance: Undecided => Medium
>
> ** Changed in: libatasmart (Ubuntu Karmic)
> Assignee: (unassigned) => Martin Pitt (pitti)
>
> ** Description changed:
>
> - TEMPORARY WORK AROUND FOR THIS PROBLEM:
> + TEMPORARY WORK AROUND FOR THIS PROBLEM IN KARMIC: (This is now also in
> + karmic-proposed and needs testing feedback):
>
> 1. sudo gedit /lib/udev/
>
> 2. locate the following lines (about 1/3 the way into the file; search
> for "smart")
>
> # ATA disks driven by libata
> KERNEL=
> ENV{DEVTYPE}
> $tempnode"
>
> 3. comment out the second line by adding a # in front, so you should
> have
>
> # ATA disks driven by libata
> #KERNEL=
> ENV{DEVTYPE}
> $tempnode"
>
> 4. save the file and reboot
>
> + TECHNICAL ANALYSIS:
> https:/
> + LUCID STATUS:
> https:/
> + KARMIC SOLUTION:
> https:/
>
> BUG DESCRIPTION FOLLOWS:
>
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again
> frequently between logging into gdm and the desktop loading. When it
> happens during login I think it is making gnome time out on loading
> panel items as I get errors related to lots of panel items failing to
> load. If I log out and back in again when the ssd isn't stalled the
> panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to cle...
Alan Pope 🍺🐧🐱 🦄 (popey) wrote : | #226 |
@Trey, technically they're not 'bricked'. You can revive them fairly easily. I revived two (including the one Martin logged into) by using dd to copy zeroes over the entire SSD. Once done I did a Jaunty install (this was a few months ago) and upgraded to karmic, but before rebooting to the new upgraded karmic install I did what's recommended in the description (as per comment #145).
It's been running karmic fine for ages. According to Martin Lucid does not suffer from this problem so you could dd zeroes then install the Lucid beta and be safe.
adamski (adam-hasselbalch) wrote : | #227 |
Alan: My Eee 900 is bricked. I have dd'ed zeroes over the SSD several times, and while I am no longer getting HSM-violations (since I am using a 8.04 rescue image), I now get Buffer I/O errors galore on the device. A full dd takes roughly 12 hours (that's the 16 gig drive) due to all these errors, and have no effect on them. I have dd'ed it probably three or four times now, and there's no apparent improvement.
I have given up and purchased a 1008HA with a regular disk on it instead. That works like a charm, though.
In other news: Eee 900 with 2G RAM for sale. :)
Alan Pope 🍺🐧🐱 🦄 (popey) wrote : | #228 |
@adamski - What did you boot from to do the dd?
adamski (adam-hasselbalch) wrote : | #229 |
@Alan: both 8.04 (which was what was on the thing until I foolishly reinstalled it) and 9.04. Also tried a mini-recue-dist of some sort, although I don' remember which one.
Also tried a Solaris-thing, as someone mentioned above, but I didn't have the patience at that time (was late, and I'd been at it for a couple hours) to make the USB boot, which took non-trivial effort (i.e. it didn't "just work").
Martin Pitt (pitti) wrote : Re: [Bug 445852] Re: devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential hardware death | #230 |
Alan Pope [2010-03-26 10:17 -0000]:
> to copy zeroes over the entire SSD
For those less accustomed with the command line:
* Boot a Jaunty or Lucid Beta-1 desktop CD.
* Start gparted to find the right drive. It should usually be
/dev/sda, but it could also be /dev/sdb if you have more than one
hard disk
* Open a Terminal, and do
sudo dd if=/dev/zero of=/dev/sda
(Replace "sda" with the actual drive, if you have several).
Please note that this IRREVOCABLY ERASES ALL DATA. So please make
double and triple sure that you are not overwriting that other hard
disk, or the USB disk you just attached. To be on the safe side,
disconnect all USB storage before you do this.
Martin
--
Martin Pitt | http://
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)
Guy Taylor (thebiggerguy) wrote : | #231 |
@Trey
Have you tried using the ATA 'Secure Erase' command (see: https:/
hope this works.
Steve Beattie (sbeattie) wrote : | #232 |
Martin, I can confirm that the version of devicekit-disks in karmic-proposed, 007-2ubuntu6, has the commented out line in /lib/udev/
tags: |
added: verification-done removed: verification-needed |
Launchpad Janitor (janitor) wrote : | #233 |
This bug was fixed in the package devicekit-disks - 007-2ubuntu6
---------------
devicekit-disks (007-2ubuntu6) karmic-proposed; urgency=low
* Add 11-disable-
disks. It causes hardware damage to a lot of SSD disks. This is a
workaround, until a real fix in libatasmart is found. (LP: #445852)
-- Martin Pitt <email address hidden> Thu, 25 Mar 2010 18:47:35 +0100
Changed in devicekit-disks (Ubuntu Karmic): | |
status: | Fix Committed → Fix Released |
Jarige (jarikvh) wrote : | #234 |
I received a workaround (not a fix!) for this bug today through update-manager, although I wasn't experiencing this bug that badly. It definitely improved boottime :D
Andrew Simpson (andrew-simpson) wrote : | #235 |
Confirming the fix in Karmic.
New file arrived through update-manager today.
I removed my existing dpkg-divert, rebooted and tested. No sign of error messages in dmesg. Previously with this machine I would have had error messages. That's good :-)
Samizdata (samizdata) wrote : | #236 |
Confirming fix in Karmic UNR. Received via Update Manager. No errors seen and performance seems good. Manually confirmed presence of the workaround.
Samizdata (samizdata) wrote : | #237 |
Oh, Acer Aspire One with the SSDPAM device.
ideathproof (glenn-immortal) wrote : | #238 |
Comfirming UNR 10.04 (lucid) after running update manager work around is sill in place (no need to edit 80-udisks.rules) on Asus EEE 900 xp 12g version. was this work around released for lucid?
And boot time are the fastest i've seen cold boot from power led coming on to desktop 35 seconds, shutdown in 11 seconds. How can it shut down so fast?
Tommy Trussell (tommy-trussell) wrote : | #239 |
Also confirming the software update to 9.10 Karmic NBR that came through today. No trouble; no foolin! ;-)
Tommy Trussell (tommy-trussell) wrote : | #240 |
OH and to be clear -- I DID "undo" the workaround as described in the last two lines of comment 147 https:/
ectropionized (ectropionized-deactivatedaccount) wrote : | #241 |
Applied fix (via Update-Manager), confirmed - no errors. (Intel 4GB SSD, UNR 9.10). All is well in the land of milk and hardware.
Trey (trey333) wrote : Re: [Bug 445852] Re: devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential hardware death | #242 |
I've got my old bricked Asus EEE 900 at my house for the weekend. Both
SSD's are ostensibly dead. I've loaded Ubuntu 10.04 Beta1 on a USB and
can't install it without an Input/Output Error during the format. So
much for the new kernel fixing the problem. Zeroing out and
"unlocking" is the same. I've even taken the 16GB out and tried to
have it read in a Windows machine. I've tried everything. You guys
keep saying it's not really dead. I assure you that it behaves like it
is every time I poke it with a digital stick. It came from trying to
install the early Karmic release on both SSD's after the first one
failed. It's not physically damaged, it came directly from my
persistence trying to make this damn thing work installing and
reinstalling.
I've got the 900 for another day or two. I sold it to a friend with a
Ubuntu running off an SD card. I'd like to get it working for him
before I give it back to him tomorrow.
My original post:
http://
On Sat, Apr 3, 2010 at 12:35 PM, sun2ecliptic <email address hidden> wrote:
> Applied fix (via Update-Manager), confirmed - no errors. (Intel 4GB SSD,
> UNR 9.10). All is well in the land of milk and hardware.
>
> --
> devkit-
> https:/
> You received this bug notification because you are a direct subscriber
> of a duplicate bug.
>
> Status in ATA S.M.A.R.T. Disk Health Monitoring Library: Confirmed
> Status in The Linux Kernel: Invalid
> Status in “devicekit-disks” package in Ubuntu: Invalid
> Status in “libatasmart” package in Ubuntu: Fix Released
> Status in “devicekit-disks” source package in Lucid: Invalid
> Status in “libatasmart” source package in Lucid: Fix Released
> Status in “devicekit-disks” source package in Karmic: Fix Released
> Status in “libatasmart” source package in Karmic: Triaged
>
> Bug description:
> TEMPORARY WORK AROUND FOR THIS PROBLEM IN KARMIC: (This is now also in karmic-proposed and needs testing feedback):
>
> 1. sudo gedit /lib/udev/
>
> 2. locate the following lines (about 1/3 the way into the file; search for "smart")
>
> # ATA disks driven by libata
> KERNEL=
>
> 3. comment out the second line by adding a # in front, so you should have
>
> # ATA disks driven by libata
> #KERNEL=
>
> 4. save the file and reboot
>
> TECHNICAL ANALYSIS: https:/
> LUCID STATUS: https:/
> KARMIC SOLUTION: https:/
>
> BUG DESCRIPTION FOLLOWS:
>
> In the Karmic beta I experience ssd stalls during the boot process. It happens almost everytime before xsplash loads and happens again frequently between logging into g...
Dan Halbert (dhalbert) wrote : | #243 |
Lucid Lynx Beta1 does not include this fix, because it was assembled before the fix was released. I successfully installed a recent daily netbook build of Lucid (http://
The final test for those of you with problem SSD's, it seems to me, is to install from of the daily-live builds (or wait for Beta2). That will require no patching during the install, and should just work.
Before I installed, I booted from a USB stick and did a secure erase of the SSD (see #211 above), which only took a short time.
Trey (trey333) wrote : | #244 |
Couldn't get the Daily Build downloaded (5kb/s in China), but I've got the
latest stable .32 kernel running in Karmic. Still doesn't work. HDPARM gives
me input/output errors doing anything, like setting a password. Gparted at
least sees one of the drives and its partitions, but ultimately does the
same when I tried to format.
Any suggestions?
n Sun,
Apr 4, 2010 at 2:04 AM, Dan Halbert <email address hidden> wrote:
> Lucid Lynx Beta1 does not include this fix, because it was assembled
> before the fix was released. I successfully installed a recent daily
> netbook build of Lucid (http://
> live/ <http://
> Mini 9 with a stock 4GB STEC SSD. The build I used was
> created after the fix was released. Though I did not have the reported
> problem, I did have peculiar, similar symptoms with karmic and after.
>
> The final test for those of you with problem SSD's, it seems to me, is
> to install from of the daily-live builds (or wait for Beta2). That will
> require no patching during the install, and should just work.
>
> Before I installed, I booted from a USB stick and did a secure erase of
> the SSD (see #211 above), which only took a short time.
>
> --
> devkit-
> hardware death
> https:/
> You received this bug notification because you are a direct subscriber
> of a duplicate bug.
>
> Status in ATA S.M.A.R.T. Disk Health Monitoring Library: Confirmed
> Status in The Linux Kernel: Invalid
> Status in “devicekit-disks” package in Ubuntu: Invalid
> Status in “libatasmart” package in Ubuntu: Fix Released
> Status in “devicekit-disks” source package in Lucid: Invalid
> Status in “libatasmart” source package in Lucid: Fix Released
> Status in “devicekit-disks” source package in Karmic: Fix Released
> Status in “libatasmart” source package in Karmic: Triaged
>
> Bug description:
> TEMPORARY WORK AROUND FOR THIS PROBLEM IN KARMIC: (This is now also in
> karmic-proposed and needs testing feedback):
>
> 1. sudo gedit /lib/udev/
>
> 2. locate the following lines (about 1/3 the way into the file; search for
> "smart")
>
> # ATA disks driven by libata
> KERNEL=
> ENV{DEVTYPE}
> $tempnode"
>
> 3. comment out the second line by adding a # in front, so you should have
>
> # ATA disks driven by libata
> #KERNEL=
> ENV{DEVTYPE}
> $tempnode"
>
> 4. save the file and reboot
>
> TECHNICAL ANALYSIS:
> https:/
> LUCID STATUS:
> https:/
> KARMIC SOLUTION:
> https:/
>
> BUG DESCRIPTION FOLLOWS:
>
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime befor...
J-Pierre Rouits (jprouits) wrote : | #245 |
Surprisingly, the fix from a few days ago also fixed the random freezing at boot time that I experienced from the time I installed Karmic. I have no SSD but an ATA disk. However, the CDROM drive continues to be probed randomly giving an HSM violation. See the following kernel log:
===============
Apr 5 11:31:03 jpport kernel: [ 1940.181158] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
Apr 5 11:31:03 jpport kernel: [ 1940.181187] ata2.00: cmd a0/00:00:
Apr 5 11:31:03 jpport kernel: [ 1940.181191] cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Apr 5 11:31:03 jpport kernel: [ 1940.181194] res 00/01:01:
Apr 5 11:31:03 jpport kernel: [ 1940.181243] ata2: soft resetting link
Apr 5 11:31:03 jpport kernel: [ 1940.360647] ata2.00: configured for MWDMA2
Apr 5 11:31:03 jpport kernel: [ 1940.473234] ata2: EH complete
Apr 5 11:37:48 jpport kernel: [ 2345.180532] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
Apr 5 11:37:48 jpport kernel: [ 2345.180563] ata2.00: cmd a0/00:00:
Apr 5 11:37:48 jpport kernel: [ 2345.180567] cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Apr 5 11:37:48 jpport kernel: [ 2345.180571] res 00/01:01:
Apr 5 11:37:48 jpport kernel: [ 2345.180646] ata2: soft resetting link
Apr 5 11:37:48 jpport kernel: [ 2345.360649] ata2.00: configured for MWDMA2
Apr 5 11:37:48 jpport kernel: [ 2345.366253] ata2: EH complete
Apr 5 11:39:31 jpport kernel: [ 2447.968101] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Apr 5 11:39:31 jpport kernel: [ 2447.968112] ata2.00: ST_FIRST: !(DRQ|ERR|DF)
Apr 5 11:39:31 jpport kernel: [ 2447.968139] ata2.00: cmd a0/00:00:
Apr 5 11:39:31 jpport kernel: [ 2447.968142] cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Apr 5 11:39:31 jpport kernel: [ 2447.968146] res 00/01:01:
Apr 5 11:39:31 jpport kernel: [ 2447.968191] ata2: soft resetting link
Apr 5 11:39:31 jpport kernel: [ 2448.148629] ata2.00: configured for MWDMA2
Apr 5 11:39:31 jpport kernel: [ 2448.262396] ata2: EH complete
================
Configuration : HP Compaq nx6125, Ubuntu 9.10, last updated on April 3
J-Pierre Rouits (jprouits) wrote : | #246 |
Unfortunately, this morning, the random boot lock up reappeared. and this can only be recovered by hitting the power button which erases any log ! So my previous comment was too optimistic. May be this is not the proper place for discussion about the random boot lock up. But the random CDROM probing is still there with HSM violation.
Raf (4263004-noduck) wrote : | #247 |
I am using current Lucid (udisks 1.0.0+git201003
I am hesitant to test more since the corruption was quite bad: broken grub config (easily fixed) and "mount point /dev/shm does not exist" errors when booting (haven't found how to fix this yet).
ipig (infopiggy) wrote : | #248 |
I have not had any problems with 10.04 b1 since post #198.
Call me a wimp but if it aint broke i lack the energy to fix it again.
Not sure if i'll do b2 yet but i'll definitely upgrade to the full version when it comes out.
It does *appear* its fixed in the latest Lucid.
Can anyone confirm how effective any Karmic/Jaunty fixes were, as Lucid is a bag of i915 coredump at the moment?
Gav Mack (gavinmac) wrote : | #250 |
I installed the Lucid daily build a fortnight ago, it seemed to work fine for a week but last weekend I started getting the HSM violiations which have finally trashed my partition - fsck was running for a whole day stuck on 70%. Stuck the daily boot usb back in and am currently zeroing the ext4 partition. This time I think I'll be setting Andrew Simpsons dpdg-divert from the very start and leaving it in place until I'm sure it's not going to come back!
Raf (4263004-noduck) wrote : | #251 |
With the early versions of udisks/libatasmart4 on Lucid I would always get HSM violations. Now with the newer versions (udisks 1.0.0+git201003
Gav Mack (gavinmac) wrote : | #252 |
@Raf:
Despite months of HSM violations with Karmic I never had to zero the drive whatsoever, I installed Lucid just by reformatting the partition. This time I only had the occasional HSM, not during startup but was noticeable during apt get update in terminal but it was enough to trash the supertalent SSD this time only after 2 days tops. I would rather set the divert up and leave it set until I know this bug is gone forever!
My reinstall of Lucid after the erase and divert set as per post 147 is now running nicely - now I just have to load the apps/repositories back in to get back to where I was!
Raf (4263004-noduck) wrote : | #253 |
@Gav: how are we going to find a fix for this problem if none of us once in a while is willing to try the new version?
Jarige (jarikvh) wrote : | #254 |
Still having the bug with a totally updated Lucid today, but it didn't appear at boottime. It became less and less over time, but it is not fully gone. I got it twice now during this session. Both of them quite at the beginning (85 and 114 seconds in dmesg, if those are seconds)
udisk version: 1.0.1-1
libatasmart version: 0.17+git2010021
Spoke to soon, my system got hosed just now. see 561079 as i didn't have this reference to hand. Managed to get dumps etc off.
udev rules was edited to not run the libata stuff and it still errored.
I really need to use this netbook for casual work on the move and can't mess around any longer, so going to have to try installing something else, Ubuntu is not working out on the EEEpc. Bye for now.
shadowblast101 (shadowblast101) wrote : | #256 |
MFV, I just thought I should point out that this bug is not just within Ubuntu, but affects any Linux distribution that utilizes the Libatasmart package.
I know this re #148. There are other OS's out there ;)
Martin Pitt (pitti) wrote : | #258 |
Reopening for lucid then, since some machines still seem to be affected. For those who get it on Lucid beta-2, can you please confirm that applying the workaround in /lib/udev/
sudo strace -vvfs1024 -o /tmp/probe-
and attach /tmp/probe-
Changed in libatasmart (Ubuntu Lucid): | |
importance: | Critical → High |
status: | Fix Released → Confirmed |
Changed in libatasmart (Ubuntu Lucid): | |
status: | Confirmed → Incomplete |
ubuntu-crypto (davexthc) wrote : | #259 |
just to be sure it is safe to remove this patch if you *don't* use SDDs correct?
Raf (4263004-noduck) wrote : | #260 |
I have not been able to reproduce the HSM violations. I rebooted 20 times, cold booted, booted with battery, replaced battery and booted, tried 2.6.32-19 and 2.6.32-20, all of these seem to work without problem.
Previously (after the fix went in) I sometimes got the HSM violation, but only on boot, I was not able to trigger it by manually running udisks-
I have replaced /lib/udev/
My understanding of this bug is that it is only related to running of udisks-
The workaround did seem to work for me.
Martin Pitt (pitti) wrote : Re: [Bug 445852] Re: devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential hardware death | #261 |
ubuntu-crypto [2010-04-12 21:52 -0000]:
> just to be sure it is safe to remove this patch if you *don't* use
> SDDs correct?
The SMART probing is not inherently tied to SSDs. It just seems that
many of today's SSDs use a kind of controller which acts up when its
asked for SMART status.
So, nobody can guarantee that this problem does not affect normal HDDs
as well, but it seems we haven't heard about those yet.
If you re-enable the SMART probing and it works for you (easy to
notice if your startup speed suddenly increases by 30 seconds or so),
then it's safe, yes.
Martin
--
Martin Pitt | http://
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)
Gav Mack (gavinmac) wrote : | #262 |
Martin Pitt (pitti) wrote : | #263 |
Gav Mack [2010-04-13 23:18 -0000]:
> Even though I have the divert set I've just had a HSM Violation shortly
> after Lucid has fully started - portion of the log attached from boot
> with the error logged at the end.
>
> ** Attachment added: "dmesg"
> http://
This looks rather different, though (no HSM violation). If you have
the divert set, then it's not due to devkit-
Another potential culprit could be hdparm, please see bug 515023.
Raf (4263004-noduck) wrote : | #264 |
I have not had anymore HSM violations. And nobody else has reported any problems. It looks like this problem is fixed.
Trey (trey333) wrote : Re: [Bug 445852] Re: devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential hardware death | #265 |
I still have two bricked SSD's on an EEE 900
On Apr 17, 2010 10:01 AM, "Raf" <email address hidden> wrote:
I have not had anymore HSM violations. And nobody else has reported any
problems. It looks like this problem is fixed.
-- devkit-
hardware death https:/...
ectropionized (ectropionized-deactivatedaccount) wrote : | #266 |
So, the problem has reappeared with Lucid. Last night I did an upgrade via upgrade-manager to Lucid, and after updating all packages and using the system, I received this:
==
[ 89.816125] ata2: lost interrupt (Status 0x58)
[ 89.820090] ata2: drained 2048 bytes to clear DRQ.
[ 89.823689] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 89.823708] ata2.00: BMDMA stat 0x4
[ 89.823726] ata2.00: failed command: READ DMA
[ 89.823765] ata2.00: cmd c8/00:08:
[ 89.823774] res 58/00:08:
[ 89.823794] ata2.00: status: { DRDY DRQ }
[ 89.823870] ata2: soft resetting link
[ 89.992540] ata2.00: configured for UDMA/66
[ 89.992580] ata2: EH complete
[ 120.816124] ata2: lost interrupt (Status 0x58)
[ 120.820091] ata2: drained 32768 bytes to clear DRQ.
[ 120.934970] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 120.938005] ata2.00: BMDMA stat 0x4
[ 120.940833] ata2.00: failed command: READ DMA
[ 120.943786] ata2.00: cmd c8/00:f8:
[ 120.943795] res 58/00:f8:
[ 120.950725] ata2.00: status: { DRDY DRQ }
[ 120.954028] ata2: soft resetting link
[ 121.124583] ata2.00: configured for UDMA/66
[ 121.124625] ata2: EH complete
[ 209.787794] __ratelimit: 9 callbacks suppressed
[ 209.787814] apt-get[1574]: segfault at 0 ip 00327d10 sp bfa0e0ec error 4 in libc-2.
==
Kernel: 2.6.32-21-generic (i686)
Horácio (horacioh) wrote : | #267 |
I confirm the bug in lucid beta2. I had this problem in a asus eee 900 originally, was apparently solved after patch, but after upgrade from karmic to lucid beta2, I detected again HSM violations. The difference is that it does not appear during boot but randomly during normal use of computer (my case web browsing).
82.816046] ata2: lost interrupt (Status 0x58)
[ 82.816103] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 82.816111] ata2.00: BMDMA stat 0x64
[ 82.816119] ata2.00: failed command: WRITE DMA
[ 82.816133] ata2.00: cmd ca/00:10:
[ 82.816137] res 58/00:10:
[ 82.816144] ata2.00: status: { DRDY DRQ }
[ 82.816180] ata2: soft resetting link
[ 83.014771] ata2.00: configured for UDMA/66
[ 83.020331] ata2.01: configured for UDMA/66
[ 83.044327] ata2.00: configured for UDMA/66
[ 83.052277] ata2.01: configured for UDMA/66
[ 83.052295] ata2: EH complete
[ 113.816056] ata2: lost interrupt (Status 0x58)
[ 113.816113] ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 113.816121] ata2.01: BMDMA stat 0x64
[ 113.816129] ata2.01: failed command: WRITE DMA
[ 113.816144] ata2.01: cmd ca/00:88:
[ 113.816147] res 58/00:88:
[ 113.816154] ata2.01: status: { DRDY DRQ }
[ 113.816191] ata2: soft resetting link
[ 114.056305] ata2.00: configured for UDMA/66
[ 114.064298] ata2.01: configured for UDMA/66
[ 114.088283] ata2.00: configured for UDMA/66
[ 114.096279] ata2.01: configured for UDMA/66
[ 114.096296] ata2: EH complete
[ 147.816044] ata2: lost interrupt (Status 0x58)
[ 147.816102] ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 147.816109] ata2.01: BMDMA stat 0x64
[ 147.816117] ata2.01: failed command: WRITE DMA
[ 147.816131] ata2.01: cmd ca/00:08:
[ 147.816135] res 58/00:08:
[ 147.816142] ata2.01: status: { DRDY DRQ }
[ 147.816178] ata2: soft resetting link
[ 148.056297] ata2.00: configured for UDMA/66
[ 148.064307] ata2.01: configured for UDMA/66
[ 148.088278] ata2.00: configured for UDMA/66
[ 148.096288] ata2.01: configured for UDMA/66
[ 148.096303] ata2: EH complete
[ 180.816048] ata2: lost interrupt (Status 0x58)
[ 180.816105] ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 180.816233] ata2.01: BMDMA stat 0x64
[ 180.816297] ata2.01: failed command: WRITE DMA
[ 180.816380] ata2.01: cmd ca/00:08:
[ 180.816384] res 58/00:08:
[ 180.816624] ata2.01: status: { DRDY DRQ }
[ 180.816727] ata2: soft resetting link
[ 181.056279] ata2.00: configured for UDMA/66
[ 181.064276] ata2.01: configured for UDMA/66
[ 181.088275] ata2.00: configured for UDMA/66
[ 181.096274] ata2.01: configured for UDMA/66
[ 181.096289] ata2: EH complete
[ 244.738599] [drm:drm_
Jarige (jarikvh) wrote : | #268 |
- probe-smart.txt Edit (17.2 KiB, text/plain)
I can confirm that I still have this bug having all updates installed in Lucid.
I was told (by Martin Pitt) to execute the following command:
sudo strace -vvfs1024 -o /tmp/probe-
And add /tmp/probe-
Basically, I don't understand anything of this bug. I don't know how to apply the workaround on an already installed UNR, since the explanation only states booting from a LiveCD, and I don't know whether the workaround has any bad side effects.
How do I apply the workaround on my machine, on an already installed UNR?
Andrew Simpson (andrew-simpson) wrote : | #269 |
I'm not convinced this recent problem is related to devkit-
From what I see it's typified by the kernel giving a READ DMA or WRITE DMA command, to which the drive responds in an unexpected manner (HSM Violation). After a suitable timeout the drive is reset and things continue.
Also, the bug does not occur at boot, but randomly during use. And using the probing with 'Disk Utility' has no affect for me.
Martin Pitt (pitti) wrote : | #270 |
Given how much trouble this still causes on Lucid, I won't reenable the SMART prober for karmic very soon.
Changed in libatasmart (Ubuntu Karmic): | |
assignee: | Martin Pitt (pitti) → nobody |
Jarige (jarikvh) wrote : | #271 |
If there's anything I can do to produce more data for debugging, contact me.
I didn't apply any workaround, since I do not know how to do that.
Changed in easypeasy-project: | |
status: | New → Confirmed |
assignee: | nobody → Jon Ramvi (ramvi) |
importance: | Undecided → High |
ipig (infopiggy) wrote : | #272 |
My (Post #198) install had a lockup & then would only boot to a grub read error - not sure what happened.
I decided to give 10.04 Beta2/RC a whirl - repeating the steps in #198 - i couldn't dd the drive w/out it resulting in a loop of HSM violations.
I then tried dd'ing the drive in (live) 9.04, 8.10 & then 8.04.
In 8.04 was i able to dd the drive w/out a loop of HSM violations.
Side Note: I've noticed when dd'ing the drive under normal circumstances the HD light stays on solid, when it starts pulsing in a timed manor that = an hsm loop going on in the bg
Side Note: seems to effect 8.10/9.
The bug surely seems to be in effect earlier then i thought in re: versions & installation(s)
Anyways, i know the full release is tomorrow so hopefully everything is g2g (re: #250)
Martin Pitt (pitti) wrote : Re: [Bug 445852] Re: devkit-disks-probe-ata-smart causes HSM Violations on SSD, and potential hardware death | #273 |
ipig [2010-04-28 19:53 -0000]:
> Side Note: seems to effect 8.10/9.
> not 8.04)
This is definitively unrelated then, since libatasmart was only
introduced in 9.10. Perhaps your problem is more like bug 515023 or
bug 548513?
Changed in easypeasy-project: | |
status: | Confirmed → Triaged |
status: | Triaged → Fix Released |
ipig (infopiggy) wrote : | #274 |
When i boot up a live (nightly) build from Tuesday/27th i get an error saying a hard disk has health problems (ATA ASUS-PHISON SSD/TST2.0L4) (Port 2 of PATA Host Adapter) (8.1GB) (/dev/sdb)
SMART Status: Disk Failure is Imminent
ID: 235 / Good Block Rate (Number of available reserved blocks as a percentage of the total number of reserved blocks) Assessment: Failing / Normalized: 1 / Worst: 1 / Threshold: 3 / Value: N/A
I don't really know what the deal is. Maybe the disk really is failing. It's been failing for a while+ then.
Maybe libatasmart just brings out the worst of it. I'm a little split still.
theluketaylor (ekul-taylor) wrote : | #275 |
I am the original reporter of this issue. I installed Lucid today and I can confirm this issue has NOT been corrected. I had to comment out the SMART portions of /lib/udev/
Jim Connor (jim-canada) wrote : | #276 |
6 months, 22 days ago?
theluketaylor (ekul-taylor) wrote : | #277 |
@Jim Connor
Yes, I reported this 6 months ago. I'm more than a little frustrated it hasn't been fixed yet, especially since in comment 203 I read this:
"So in summary, the problem is fixed in the lucid version of libatasmart. While the code could be a little more robust for future extensions (which I'll discuss in the upstream bug), there are currently no code paths which can lead to the situation that triggers HSM violations."
I assumed based on that when I upgraded from 9.10 to 10.04 I would have no issues. I did a fresh install and it went fine until I rebooted into the new system. Then I got the same errors I reported oh so long ago. The work around from 9.10 worked, though the SMART udev rules are now located in a different file (80-udisks.rules)
Martin Pitt (pitti) wrote : | #278 |
theluketaylor [2010-04-29 21:24 -0000]:
> Yes, I reported this 6 months ago. I'm more than a little frustrated it
> hasn't been fixed yet, especially since in comment 203 I read this:
>
> "So in summary, the problem is fixed in the lucid version of
> libatasmart. While the code could be a little more robust for future
> extensions (which I'll discuss in the upstream bug), there are currently
> no code paths which can lead to the situation that triggers HSM
> violations."
So far I just got access to one machine where this happened. I could
reproduce the bug and found the cause (see upstream report). As of
today, nobody offered me SSH access to a machine which is still
affected, so I'm afraid there's nothing else I can do..
Martin
--
Martin Pitt | http://
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)
Jarige (jarikvh) wrote : | #279 |
@Martin Pitt
I'm willing to give you SSH access to my netbook, but you have got to tell me how to do so. I do not have any experience with that. I must tell you that I did apply the workaround yesterday. I can uncomment the lines if necessarily. I'm probably going to be online 5 hours from now, and maybe even longer. I'll receive an e-mail notification if you reply here.
And, of course, if you've got SSH access (which I guess is some kind of terminal access) don't screw it :P This is a production machine, not a test machine. I work on this netbook every day. My important files are backed up with Dropbox though, so no need to worry about that.
So just tell me what to do :)
Martin Pitt (pitti) wrote : | #280 |
These are the steps for allowing me SSH access:
* Install openssh-server
* Create a new user for me (e. g. "pitti"), with admin privileges
* Log in as that user, write the password in a file "password.txt" in the home directory (so that you do not need to pass it around by mail, but I can get access to it once I'm logged in and need sudo)
* mkdir ~/.ssh
* wget -O ~/.ssh/
* Configure your router to allow access to your machine's Port "22" (for ssh from outside)
* Tell me (via private mail, IRC, or bug followup) your IP address (visible in the router configuration web page usually).
Martin Pitt (pitti) wrote : | #281 |
Oh, for the record: I will track my changes and revert them, but I'll need to install a few additional packages (thus I need a few MB of download quota), build some test code, and run it as root. Thus I _will_ access your hard drive with those SMART probing commands to reproduce the problem a few times.
I will not need to see anything in other home directories, and the like.
Martin Pitt (pitti) wrote : | #282 |
I debugged this issue on Jarige's machine, and it has a pretty different root cause. Due to that, and because this bug has become way too long, and because it fixes the issue for most people here, we opened a new report in bug 574462. If you still have the problem, please subscribe to that one instead.
Thank you!
Changed in libatasmart (Ubuntu Lucid): | |
status: | Incomplete → Fix Released |
Changed in libatasmart (Ubuntu): | |
status: | Incomplete → Fix Released |
Changed in libatasmart (Ubuntu Karmic): | |
status: | Triaged → Won't Fix |
Changed in libatasmart: | |
importance: | Unknown → Critical |
loewe_78 (bergloewe) wrote : | #283 |
I've got a hp 510 notebook pc. Back when it was new, it was shipped with open-dos. So, it was running with Linux since Gutsy Gibbon and, up to now, had only some difficulties to be solved with the southbridge that were working out of the box in Hardy or Jackalope.
It has got an Intel Celeron M 360 1.4 Mhz Processor with 400 Mhz frontside-bus and Intel 910 GML Chipset with Intel-ICH-6m SB.
The HD is a IBM/Hitachi 40GB 4200RPM 2MB Cache Travelstar HTS421240H9AT00.
When I upgraded to Karmic about a year ago, I had the problem described above. So, I reinstalled Jackalope where the hardware worked without problems.
Now I want to pass on my laptop as I bought a new one. A test with the desktop-CD made no obvious problems (it's a HD-failure that might occur more often when the program is installed on the HD and not on CD, ha-ha). This and the fact that it's easier to do so is why I tried to install Lucid.
Despite I implemented the workaround for Lucid given above, the device produces the following output when running dmesg:
[ 0.271537] ata_piix 0000:00:1f.1: version 2.13
[ 0.271555] alloc irq_desc for 16 on node -1
[ 0.271559] alloc kstat_irqs on node -1
[ 0.271567] ata_piix 0000:00:1f.1: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 0.271628] ata_piix 0000:00:1f.1: setting latency timer to 64
[ 0.277201] isapnp: Scanning for PnP cards...
[ 0.282766] scsi0 : ata_piix
[ 0.282911] scsi1 : ata_piix
[ 0.283654] ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0x3580 irq 14
[ 0.283659] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0x3588 irq 15
[ 0.284177] Fixed MDIO Bus: probed
-
.
.
.
-
[ 0.489196] ata1.00: ATA-7: HTS421240H9AT00, HACOA70S, max UDMA/100
[ 0.489204] ata1.00: 78140160 sectors, multi 16: LBA48
[ 0.489256] ata1.01: ATAPI: TSSTcorpCDW/DVD TS-L462D, HS02, max MWDMA2
[ 0.548617] ata1.00: configured for UDMA/100
[ 0.556926] ACPI: Battery Slot [C15E] (battery present)
[ 0.580432] ata1.01: configured for MWDMA2
-
.
.
.
-
[ 158.816041] ata1: lost interrupt (Status 0x58)
[ 158.820017] ata1: drained 32768 bytes to clear DRQ.
[ 158.909721] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 158.909728] sr 0:0:1:0: CDB: Test Unit Ready: 00 00 00 00 00 00
[ 158.909744] ata1.01: cmd a0/00:00:
[ 158.909746] res 58/00:01:
[ 158.909750] ata1.01: status: { DRDY DRQ }
[ 158.909787] ata1: soft resetting link
[ 159.128657] ata1.00: configured for UDMA/100
[ 159.160312] ata1.01: configured for MWDMA2
[ 159.179141] ata1: EH complete
[ 1113.785054] atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
[ 1113.785060] atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.
[ 2920.000474] ata1: lost interrupt (Status 0x58)
[ 2920.004015] ata1: drained 32768 bytes to clear DRQ.
[ 2920.093417] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 2920.093423] ata1.01: ATAPI check failed (ireason=0x1 bytes=8)
[ 2920.093429] sr 0:0:1:0: CDB: Get event status notification: 4a 01 00 00 10 00 00 00 08 00
[ 2920.093448] ata1.01: cmd a0/00:00:00:08:...
loewe_78 (bergloewe) wrote : | #284 |
rogmorri (frontporsche) wrote : | #285 |
On the same acer aspire one laptop where I saw this issue last year, I am perhaps seeing it again with ubuntu-
Oct 4 00:59:43 ubuntu kernel: [ 602.639266] res 00/00:00:
Oct 4 00:59:44 ubuntu kernel: [ 604.149075] ata2: soft resetting link
Oct 4 00:59:45 ubuntu kernel: [ 604.321486] ata2.00: configured for UDMA/100
Oct 4 00:59:45 ubuntu kernel: [ 604.321529] ata2: EH complete
Oct 4 00:59:49 ubuntu kernel: [ 609.173903] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
Oct 4 00:59:49 ubuntu kernel: [ 609.173915] ata2.00: BMDMA stat 0x5
Oct 4 00:59:49 ubuntu kernel: [ 609.173926] ata2.00: failed command: WRITE DMA
Oct 4 00:59:49 ubuntu kernel: [ 609.173945] ata2.00: cmd ca/00:00:
Oct 4 00:59:49 ubuntu kernel: [ 609.173949] res 00/00:00:
Andrew Simpson (andrew-simpson) wrote : | #286 |
@rogmorri
I don't think this is the same bug. You are getting HSM Violations with WRITE DMA, whereas this bug occurred with READ DMA.
(This is being written on an AA1 with 10.10 also!)
rogmorri (frontporsche) wrote : | #287 |
Thank, Andrew. Maybe then I just have bad hardware.
Trey (trey333) wrote : | #288 |
just to throw it out there - this bug was never "fixed." I had to sell a
bricked EEE 900 for scrap because it killed both SSD's.
On Mon, Oct 4, 2010 at 2:08 PM, rogmorri <email address hidden> wrote:
> Thank, Andrew. Maybe then I just have bad hardware.
>
> --
> devkit-
> hardware death
> https:/
> You received this bug notification because you are a direct subscriber
> of a duplicate bug (387272).
>
> Status in EasyPeasy Overview: Fix Released
> Status in ATA S.M.A.R.T. Disk Health Monitoring Library: Confirmed
> Status in The Linux Kernel: Invalid
> Status in “devicekit-disks” package in Ubuntu: Invalid
> Status in “libatasmart” package in Ubuntu: Fix Released
> Status in “devicekit-disks” source package in Lucid: Invalid
> Status in “libatasmart” source package in Lucid: Fix Released
> Status in “devicekit-disks” source package in Karmic: Fix Released
> Status in “libatasmart” source package in Karmic: Won't Fix
> Status in “devicekit-disks” package in Fedora: New
>
> Bug description:
> TEMPORARY WORK AROUND FOR THIS PROBLEM IN KARMIC: (This is now also in
> karmic-proposed and needs testing feedback):
>
> 1. sudo gedit /lib/udev/
>
> 2. locate the following lines (about 1/3 the way into the file; search for
> "smart")
>
> # ATA disks driven by libata
> KERNEL=
> ENV{DEVTYPE}
> $tempnode"
>
> 3. comment out the second line by adding a # in front, so you should have
>
> # ATA disks driven by libata
> #KERNEL=
> ENV{DEVTYPE}
> $tempnode"
>
> 4. save the file and reboot
>
> TECHNICAL ANALYSIS:
> https:/
> LUCID STATUS:
> https:/
> KARMIC SOLUTION:
> https:/
>
> BUG DESCRIPTION FOLLOWS:
>
> In the Karmic beta I experience ssd stalls during the boot process. It
> happens almost everytime before xsplash loads and happens again frequently
> between logging into gdm and the desktop loading. When it happens during
> login I think it is making gnome time out on loading panel items as I get
> errors related to lots of panel items failing to load. If I log out and
> back in again when the ssd isn't stalled the panel items load fine.
>
> When it happens the following messages appear before xplash (or in dmesg
> when it happens after gdm):
>
> ata2: lost interrupt (Status 0x58)
> ata2: drained 16384 bytes to clear DRQ.
> ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata2.00: BMDMA stat 0x4
> ata2.00: cmd c8/00:40:
> res 58/00:40:
> ata2.00: status: { DRDY DRQ }
> ata2: soft resetting link
> ata2.00: configured for UDMA/66
> ata2: EH complete
>
> I did not hav...
lotus49 (lotus-49) wrote : | #289 |
Trey's right, it never was fixed.
Although I have fortunately not suffered from any permanent hardware
problems, the bug resurfaces every now and then. I have worked around it by
editing /lib/udev/
# ATA disks driven by libata
#KERNEL=
ENV{DEVTYPE}
This workaround has done the trick but unfortunately, this is overwritten
every now and then (as it warns it will be at the beginning of the file).
At least it is easy to spot when this has happened as my boot times go from
about 15 secs to a couple of minutes.
Simon
On 4 October 2010 07:41, Trey <email address hidden> wrote:
> just to throw it out there - this bug was never "fixed." I had to sell a
> bricked EEE 900 for scrap because it killed both SSD's.
>
> On Mon, Oct 4, 2010 at 2:08 PM, rogmorri <email address hidden>
> wrote:
>
> > Thank, Andrew. Maybe then I just have bad hardware.
> >
> > --
> > devkit-
> > hardware death
> > https:/
> > You received this bug notification because you are a direct subscriber
> > of a duplicate bug (387272).
> >
> > Status in EasyPeasy Overview: Fix Released
> > Status in ATA S.M.A.R.T. Disk Health Monitoring Library: Confirmed
> > Status in The Linux Kernel: Invalid
> > Status in “devicekit-disks” package in Ubuntu: Invalid
> > Status in “libatasmart” package in Ubuntu: Fix Released
> > Status in “devicekit-disks” source package in Lucid: Invalid
> > Status in “libatasmart” source package in Lucid: Fix Released
> > Status in “devicekit-disks” source package in Karmic: Fix Released
> > Status in “libatasmart” source package in Karmic: Won't Fix
> > Status in “devicekit-disks” package in Fedora: New
> >
> > Bug description:
> > TEMPORARY WORK AROUND FOR THIS PROBLEM IN KARMIC: (This is now also in
> > karmic-proposed and needs testing feedback):
> >
> > 1. sudo gedit /lib/udev/
> >
> > 2. locate the following lines (about 1/3 the way into the file; search
> for
> > "smart")
> >
> > # ATA disks driven by libata
> > KERNEL=
> > ENV{DEVTYPE}
> > $tempnode"
> >
> > 3. comment out the second line by adding a # in front, so you should have
> >
> > # ATA disks driven by libata
> > #KERNEL=
> > ENV{DEVTYPE}
> > $tempnode"
> >
> > 4. save the file and reboot
> >
> > TECHNICAL ANALYSIS:
> >
> https:/
> > LUCID STATUS:
> >
> https:/
> > KARMIC SOLUTION:
> >
> https:/
> >
> > BUG DESCRIPTION FOLLOWS:
> >
> > In the Karmic beta I experience ssd stalls during the boot process. It
> > happens almost everytime before xsplash loads a...
Changed in linux: | |
status: | Invalid → Won't Fix |
Changed in libatasmart: | |
importance: | Critical → Unknown |
Neil Hooey (nhooey) wrote : | #290 |
Is anyone going to fix this?
I've disabled s.m.a.r.t. on all of my drives in /lib/udev/
Does anyone even know what software package is actually responsible for the bug?
Thomas Wagnwer@thowabu.de (thomas-thowabu) wrote : | #291 |
I get the same Problem with a NVIDIA MCP61 Chipset, OCZ Vertex2 60GB and ext4 Filesystem.
It´s real Pain -- sometimes the whole system is freezing.
The SSD had dataLoss.
The Problem occurs less frequently with ext2...
And not even once with Windows XP, 7 ...
SATA hdd and optical works fine.
To jail hdparm and smart doesn´t work for me running the latest rc kernels.
I think it´s between the SSD firmware and the libata kernel stack.
(see differenc to Windows, but other drives work fine with libata)
Changed in linux: | |
importance: | Unknown → Medium |
Changed in libatasmart: | |
importance: | Unknown → Critical |
Neil Hooey (nhooey) wrote : | #292 |
I just installed Fedora Core 14 which uses kernel 2.6.35.
Here's the Fedora Bug:
https:/
More details at my StackExchange question:
http://
Changed in easypeasy-project: | |
assignee: | Jon Ramvi (ramvi) → nobody |
In freedesktop.org Bugzilla #25673, Lennart-poettering (lennart-poettering) wrote : | #293 |
MArtin, can you prep a patch for your requested changes?
In freedesktop.org Bugzilla #25673, Bugs-freedesktop-org-n (bugs-freedesktop-org-n) wrote : | #294 |
you there, martin?
In freedesktop.org Bugzilla #25673, Greg Unger (groggyboy) wrote : Re: [Bug 445852] | #295 |
I'm sorry, you have the wrong email. My name is Greg.
On Oct 31, 2013 9:28 AM, "Bugs-freedeskt
wrote:
> you there, martin?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> devkit-
> potential hardware death
>
> To manage notifications about this bug go to:
> https:/
>
In freedesktop.org Bugzilla #25673, mdyn (tamerlaha-gmail) wrote : | #296 |
Something goes wrong, guys :)
2013/10/31 Greg Unger <email address hidden>
> I'm sorry, you have the wrong email. My name is Greg.
> On Oct 31, 2013 9:28 AM, "Bugs-freedeskt
> <email address hidden>>
> wrote:
>
> > you there, martin?
> >
> > --
> > You received this bug notification because you are subscribed to the bug
> > report.
> > https:/
> >
> > Title:
> > devkit-
> > potential hardware death
> >
> > To manage notifications about this bug go to:
> > https:/
> >
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> devkit-
> potential hardware death
>
> Status in EasyPeasy Overview:
> Fix Released
> Status in ATA S.M.A.R.T. Disk Health Monitoring Library:
> Confirmed
> Status in The Linux Kernel:
> Won't Fix
> Status in “devicekit-disks” package in Ubuntu:
> Invalid
> Status in “libatasmart” package in Ubuntu:
> Fix Released
> Status in “devicekit-disks” source package in Lucid:
> Invalid
> Status in “libatasmart” source package in Lucid:
> Fix Released
> Status in “devicekit-disks” source package in Karmic:
> Fix Released
> Status in “libatasmart” source package in Karmic:
> Won't Fix
> Status in “devicekit-disks” package in Fedora:
> New
>
> Bug description:
> TEMPORARY WORK AROUND FOR THIS PROBLEM IN KARMIC: (This is now also in
> karmic-proposed and needs testing feedback):
>
> 1. sudo gedit /lib/udev/
>
> 2. locate the following lines (about 1/3 the way into the file; search
> for "smart")
>
> # ATA disks driven by libata
> KERNEL=
> ENV{DEVTYPE}
> $tempnode"
>
> 3. comment out the second line by adding a # in front, so you should
> have
>
> # ATA disks driven by libata
> #KERNEL=
> ENV{DEVTYPE}
> $tempnode"
>
> 4. save the file and reboot
>
> TECHNICAL ANALYSIS:
> https:/
> LUCID STATUS:
> https:/
> KARMIC SOLUTION:
> https:/
>
> BUG DESCRIPTION FOLLOWS:
>
> In the Karmic beta I experience ssd stalls during the boot process.
> It happens almost everytime before xsplash loads and happens again
> frequently between logging into gdm and the desktop loading. When it
> happens during login I think it is making gnome time out on loading
> panel items as I get errors related to lots of panel items failing to
> load. If I log out and back in again when the ssd isn't stalled the
> panel items load fine.
>
> When it happens the following messages appear before xplash (or in
> dmesg ...
In freedesktop.org Bugzilla #25673, Martin Pitt (pitti) wrote : | #297 |
I didn't actually see Lennart's comment 20 two years ago, sorry. Downgrading priority as the actual bug has been fixed two years ago. What's left is some robustification which I outlined in the last paragraph of comment 19.
Changed in libatasmart: | |
importance: | Critical → Low |
Stephan Müller (megandy) wrote : | #298 |
Hi,
since the update to 14.04.01 I encounter this error on my Thinkpad with a SSD. The systems freezes, and after aprox. 30 sec - 1 min the following messages can be found by using dmesg:
ata1.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x50000 action 0x6 frozen
ata1: SError: { PHYRdyChg CommWake }
ata1.00: failed command: WRITE FPDMA QUEUED
ata1.00: cmd 61/08:00:
res 40/00:00:
ata1.00: status: { DRDY }
So, the bug seems to have a revival. Any ideas?
Matt (matthewj-coke) wrote : | #299 |
I too have experienced this revival on Ubuntu 16.04.1 LTS
Josir Cardoso Gomes (josircg) wrote : | #300 |
I too have experienced this error. SDD is working fine but I'm worried with this odd message.
Dmesg:
ata5: drained 512 bytes to clear DRQ
[ 7.955932] ata5.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 7.955998] ata5.01: failed command: SMART
[ 7.956059] ata5.01: cmd b0/d5:01:
[ 7.956162] ata5.01: status: { DRDY DRQ }
[ 7.956240] ata5: soft resetting link
Kernel: 4.4.0-58-generic #79-Ubuntu SMP Tue Dec 20 12:12:35 UTC 2016 x86_64
$ dpkg -l | grep libatasmart
ii libatasmart4:amd64 0.19-3
If you need other info, I will be glad to help.
I updated to kernel 2.6.31-12 today and the problem seems to have gotten worse. Under the older karmic kernels it would almost always happen before xsplash and very rarely after gdm. with -12 it seems to be happening before xsplash and after gdm every boot.