Bug #438136 “palimpsest bad sectors false positive” : Bugs : libatasmart package : zUbuntu

Revision history for this message

Benjamin Drung (bdrung) wrote on 2009-09-28:

#1

smartctl.log Edit (5.4 KiB, text/plain)
Dependencies.txt Edit (3.0 KiB, text/plain; charset="utf-8")

Revision history for this message

Benjamin Drung (bdrung) wrote on 2009-09-28:

#2

palimpsest-screenshot.png Edit (137.4 KiB, image/png)

Revision history for this message

Matthew Murphy (chthonical) wrote on 2009-10-01:

#3

I can confirm. Suffering from the same issue in 9.10 where Palimpsest is saying my Hitachi HTS541680J9SA00 has many bad sectors. Reallocated Sector Count. Pops it up every time I restart the computer.

Revision history for this message

J. J. Ramsey (jjramsey) wrote on 2009-10-03:

#4

Screenshot of Palimpsest Edit (86.6 KiB, image/png)

Looks like I have a false positive as well. Now Palimpsest is reporting the correct raw value for the "reallocated sector count," which in my case is 3268608. However, I have two reasons for thinking that Palimpsest is reporting the wrong conclusions from this wrong value. First, it appears that a similarly high value for the reallocated sector count is reported by smartctl for a *new* drive in the MacBook Air. The discussion on the smartmontools mailing list can be found here:

http://thread.gmane.org/gmane.linux.utilities.smartmontools/5252

or here (same discussion):

http://marc.info/?l=smartmontools-support&m=120407420622544&w=2

While my SSD is in an X41 Thinkpad rather than a MacBook Air, it is the exact same model as the one described in the mailing list messages to which I linked above, namely a Samsung SSD with the model number MCCOE64GEMPP. For the owner of the Macbook Air, the reallocated sector count is 2617344, the same order of magnitude as mine. One of the participants in the discussion speculated, "maybe the author of the SMART code in this disk was (ab)using this attribute to track the number of times that blocks have been moved about by the wear levelling algorithm."

Second, as seen in the screenshot, the "Self Assessment" of the self-test is "Passed." Apparently, whoever was the Samsung firmware programmer who wrote the self-test wasn't bothered by the reported raw value of the reallocated sector count.

I get the same results from running smartctl from SystemRescueCD 1.3.0. The reallocated sector count is 3268608, but the self-test nonetheless reports the drive as healthy.

Revision history for this message

Przemek K. (azrael) wrote on 2009-10-07:

#5

These 2 bugs are probably related:
https://bugzilla.redhat.com/show_bug.cgi?id=506254
https://bugzilla.redhat.com/show_bug.cgi?id=498115

tags:

added: disk karmic palimpsest

Revision history for this message

Tim Lunn (darkxst) wrote on 2009-10-08:

#6

palimpsest seems to check the raw value against the threshold for reallocated sectors.

All other smart utilities seem to check the normalised value against the threshold. This seems to be more logical, however the developer of palimpsest seems to think the first behavior is correct as noted by him in this bug report
https://bugzilla.redhat.com/show_bug.cgi?id=500079

Revision history for this message

Georg (georg-lippold) wrote on 2009-10-12:

#7

Would be a good option to let the user override certain reported values until the value goes up the next time. Then one wouldn't be bugged by Palimpsest on every boot but only if the disk degrades further.

Revision history for this message

yareckon (yareckon) wrote on 2009-10-22:

#8

Hi guys, I have a 3 month old samsung ssd, which has 2179072 reallocated sectors (probably due to flash wear levelling on the drive as I have 0 "uncorrectable sector count" and 0 "Realocation count" ). I get yelled at every time I log into karmic that my drive is failing. The reallocated sector count has not changed in a month of heavy usage, so I don't think it's a drive in collapsing condition. I would also like an error dialog that says something like

Caution! Your drive has a high number of reallocated sectors, which may be a result of failing hardware. Currently the drive reports it is *passing* SMART checks, which are designed to detect a failing drive, so this warning may be incorrect. Certain types of storage such as solid state drives (ssds) have large numbers of reallocated sectors to extend their life. It is recommended that you back up your data in case your drive is about to fail.
What would you like to do now:
> Display the error messages and stats
> Inform me if the drive health further deteriorates

Naturally the scary warnings wouldn't be tempered if the SMART status was actually failing.

What do you say? I love Palimpest, but people will ignore the warnings if every netbook and SSD drive cries wolf.

Revision history for this message

Tapas Bose,India (tapas-23571113) wrote on 2009-10-24:

#9

Screenshot-1.png Edit (174.0 KiB, image/png)

Hallo everybody. I think this is a bug. Palimpest and GSmartControl give me report that I have bad sectors in my hard disk. Please see the attachments.

Revision history for this message

Tapas Bose,India (tapas-23571113) wrote on 2009-10-24:

#10

GSmartControl Report Edit (10.1 KiB, text/plain)

Revision history for this message

Gerald Jansen (jansengb) wrote on 2009-10-26:

#11

Confirm. I get a reallocated sector count of 65551 on my Hitachi HTS541680J9SA00, very similar to the value 65543 reported by Tapas Bose. Note 2**16 = 65536.

Revision history for this message

Lonnie Lee Best (launchpad-startport) wrote on 2009-10-27:

#12

Screenshot-Sysinfo.png Edit (33.1 KiB, image/png)

I have no clue if there are bad sectors or not; All I know is the machine is running fine, and previous versions of Ubuntu don't notify me of bad sectors, for what that is worth. See attached screen shot for hard drive model, etc.

Revision history for this message

Lonnie Lee Best (launchpad-startport) wrote on 2009-10-27:

#13

Screenshot-SMART Data.png Edit (56.4 KiB, image/png)

see additional attachment

Revision history for this message

Mārtiņš Barinskis (martins-barinskis) wrote on 2009-10-27:

#14

I can confirm the "false-positive" issue on my Dell XPS M1530 with Samsung HM250JI hard drive.
The Disk Utility warns me about 'bad sectors' all the time however testing it using HDD manufacturer's diagnostic tools (http://www.samsung.com/global/business/hdd/support/utilities/Support_HUTIL.html) gives no error. I did full disk surface scan.

Revision history for this message

Jose Mico (jose-mico) wrote on 2009-10-29:

#15

smartctl.txt Edit (4.6 KiB, text/plain)

I think that I'm also have a false positive with an Hitachi Travelstar disk on a HP530 notebook, fresh Ubuntu 9.10. Palimsest warns about inminent failure, even when current value of Reallocated_Sector_Ct (100) is way far form threshold (005):
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 327697

Revision history for this message

Jose Mico (jose-mico) wrote on 2009-10-29:

#16

Most people with six-digit reallocated sectors seems to be using Hiatchi drives. Could be possible that raw value is not really the "bad sector" count for these drives?

Revision history for this message

neclepsio (neclepsio) wrote on 2009-10-29:

#17

I have the same problem with a Hitachi disk, counting 458798 bad sectors. Every other parameter is ok.

Revision history for this message

greyor (greyor) wrote on 2009-10-30:

#18

I have this problem as well, which was quite alarming just after I'd installed 9.10 on my laptop. It's a Dell Inspiron 1525n that I've had a little over a year -- and Palimpsest counts 858 bad sectors. The reallocated sectors count is 68295.

The drive is a ~120Gb Samsung HM121HI.

I highly doubt that this drive has seen that much wear in the last year, and I don't really have the money to replace it at this time, so I'm wondering what's going on.

Revision history for this message

inigmatus (inigmatus) wrote on 2009-10-30:

#19

see http://ubuntuforums.org/showthread.php?p=8193949 for more examples of potential false positives.

Revision history for this message

peersi (ianpeers) wrote on 2009-10-30:

#20

I have the same problem - ATA Maxtor 6Y16OPO 164 Gig drive
ID 5 - Normalised 110, Worst 100, Threshold 63, Value 1447 sectors

All other ID's no errors .... it makes bootup considerably slower than under 9.04

Revision history for this message

Eros Zanchetta (eros) wrote on 2009-10-30:

#21

9VS0S1Q1.log Edit (766 bytes, text/plain)

Same here on my fresh Karmic installation (x86_64), using a Seagate Barracuda ST31500341AS. I also tried the 32 bit version of Karmic and got the the same results.

I ran SeaTools for Windows' Long Generic test' to confirm that the drive was OK and it passed the test (see attached log file) so I guess it is indeed a false positive.

Revision history for this message

BRY (brypie) wrote on 2009-10-30:

#22

I don't have enough hardware knowledge to say for sure that my "Many bad sectors" is true, but I have been using the disk for a long time now, without problem.

I saw this error message when using the Live CD also.

Revision history for this message

mercutio22 (macabro22) wrote on 2009-10-30:

#23

Or maybe some OEMs are issuing refurbished hard drives inside their brand new PCs. Wild.

Revision history for this message

amar (amarendra) wrote on 2009-10-30:

#24

same error though got nothing with other tools in windows like HDDLife and HDDHealth etc and I have been using it really good since Sep 2007.

Revision history for this message

Angel Aguilera (angel-aguilera) wrote on 2009-10-30:

#25

Same error in a DELL XPS M1330 with an ATA SAMSUNG HM400LI hard disk drive.

Revision history for this message

bhuvi (bhuvanesh) wrote on 2009-10-31:

#26

I too get this message in my new karmic installation.

Revision history for this message

bhuvi (bhuvanesh) wrote on 2009-10-31:

#27

Screenshot-SMART Data.png Edit (72.6 KiB, image/png)

Revision history for this message

Alan Burbol (aburbol) wrote on 2009-11-01:

#28

"SMART Data" window capture Edit (85.1 KiB, image/png)

I have this issue as well; fresh install, 9.10 release, amd64, Dell Inspiron 1521 dual booting with Windows 7. At first, I though this message had something to do with Windows 7 (perhaps Win7 does crazy things with disk partitions?). Glad to see it's actually -not- Microsoft's fault this time.

Revision history for this message

Ramiro Castro (castro-ramiro) wrote on 2009-11-01:

#29

Hi, I can confirm this false positive too. I have a TOSHIBA MK2035GSS. Hope the fix came soon!

Revision history for this message

Emile Ong (emilemail) wrote on 2009-11-02:

#30

Yup, same problem here. Seagate 1500.11.

CHKDSK /R reports no problems. Nor does SeaTools.

Palimpsest reports 55 bad sectors.

Revision history for this message

kon_nos (konsnos) wrote on 2009-11-02:

#31

Hi, I also confirm this false positive. I have a HP G7000.

Revision history for this message

s.ketrat (s-ketrat) wrote on 2009-11-02:

#32

same on my HP 6510b with Hiatchi drives

sk

Revision history for this message

Jose Mico (jose-mico) wrote on 2009-11-02:

#33

I really have several reallocated sectors in my disk, maybe due to a hit or something. But the point is that the disk is working fine, I've have no data loss, and the number of reallocated sectors is not increasing. The bug is just the warning about "inminent failure"... I don't think that the disk will fail in the next months (and the manufacturer neither). We'll see...

Revision history for this message

Anant (infyniti) wrote on 2009-11-02:

#34

Same problem on Dell XPS 1330 using Samsung HM250JI. Infact upgraded on two different mahcines which are way older without any errors. Strange that most of them who reported this problem are using Samgsung HD. Is this something to do with this brand of HD ??

Anant

Revision history for this message

Daniele Napolitano (dnax88) wrote on 2009-11-02:

#35

In my experience I have one hard disk with 7 reallocated sectors and work fine (obviously).

Another case is my friend's computer, Windows has stopped working and after a analysis with Palimpsest Disk Utility (on Ubuntu 9.10 live) I've read up to 250 reallocated damage sectors. So, no false positive for me (all hard disk are Maxtor).

A clarification: Reallocated sector means that hard disk has internally isolated the sectors, so badblocks don't report errors! This is a hardware data recovery.

Revision history for this message

P4man (duvel123) wrote on 2009-11-02:

#36

To everyone reporting this as a bug; while its clear there IS a bug (negative relocated sector count, or ppl seeing 65500 relocated sectors), its not clear to be if those people seeing credible values (between 1-300 or so) are blaming this bug incorrectly or not. Please run another smart monitoring tools or hardware diagnosis program of your harddrive vendor to verify.

Doing read/write tests or filesystems checks does NOT disprove palimpsest's warning. Relocated sectors are invisible to the filesystem or operating system, the harddrive manages them automatically and transparently until it runs out of spare sectors, and only exposes this information through S.M.A.R.T.

If anyone can confirm a seemingly credible relocated sector count is in fact incorrect , I would love to learn it. Until then I would be very reluctant to blame this as a bug if you get a warning from palimpsest for a reasonable looking number of bad sectors.

Revision history for this message

Daniele Napolitano (dnax88) wrote on 2009-11-02:

#37

@P4man: Finally! Thanks for clarification.

Revision history for this message

Eros Zanchetta (eros) wrote on 2009-11-02:

#38

Screenshot.png Edit (201.7 KiB, image/png)

@p4man: thanks for the clarification. I'd love to help, but I'm not sure how. As I said above I ran SeaTools for Windows' "Long Generic Test" and it didn't report any problems (you can find the log file in my previous post) while palimpsest reports 466 bad sectors (see attached screenshot). I don't know if this is a credible number of relocated sectors.

I'm willing to run more tests, just tell me what to do.

I could try Spirite, but I'd rather not because it'll probably take forever to run the test on a 1.5 TB disk.

Revision history for this message

Eros Zanchetta (eros) wrote on 2009-11-02:

#39

Screenshot.png Edit (201.7 KiB, image/png)

@p4man: thanks for the clarification. I'd love to help, but I'm not sure how. As I said above I ran SeaTools for Windows' "Long Generic Test" and it didn't report any problems (you can find the log file in my previous post) while palimpsest reports 466 bad sectors (see attached screenshot). I don't know if this is a credible number of relocated sectors.

I'm willing to run more tests, just tell me what to do.

I could try Spirite, but I'd rather not because it'll probably take forever to run the test on a 1.5 TB disk.

Revision history for this message

magoo (martingagnon5) wrote on 2009-11-03:

#40

Same for me, brand new installation on a system with a specific partition for home directory. Even i have amd64 i prefered to use the 386 version of 9.10. Never occured before and began right after the installation.

Sergey Sventitski (sergey-sventitski-deactivatedaccount-deactivatedaccount) on 2009-11-08

Changed in gnome-disk-utility (Ubuntu):
assignee:	nobody → Sergey Sventitski (sergey-sventitski)

Sergey Sventitski (sergey-sventitski-deactivatedaccount-deactivatedaccount) on 2009-11-08

Changed in gnome-disk-utility (Ubuntu):
assignee:	Sergey Sventitski (sergey-sventitski) → nobody

Kees Cook (kees) on 2009-11-09

Changed in gnome-disk-utility (Ubuntu Karmic):
status:	New → Confirmed
importance:	Undecided → Medium
Changed in gnome-disk-utility (Ubuntu Lucid):
importance:	Undecided → Medium
status:	New → Confirmed

Kevin Krafthefer (krafthefer) on 2009-11-10

Changed in gnome-disk-utility (Ubuntu Lucid):
assignee:	nobody → Canonical Ubuntu QA Team (canonical-qa)

Steve Beattie (sbeattie) on 2009-11-10

Changed in gnome-disk-utility (Ubuntu Lucid):
assignee:	Canonical Ubuntu QA Team (canonical-qa) → Canonical Desktop Team (canonical-desktop-team)

Martin Pitt (pitti) on 2009-11-10

affects:

gnome-disk-utility (Ubuntu Karmic) → libatasmart (Ubuntu Karmic)

Martin Pitt (pitti) on 2009-11-10

Changed in libatasmart (Ubuntu Lucid):
assignee:	Canonical Desktop Team (canonical-desktop-team) → Martin Pitt (pitti)
Changed in libatasmart (Ubuntu Karmic):
status:	Confirmed → Triaged
Changed in libatasmart (Ubuntu Lucid):
status:	Confirmed → Triaged

Bug Watch Updater (bug-watch-updater) on 2009-11-11

Changed in libatasmart (Fedora):
status:	Unknown → Confirmed

Jean-Louis (jean-louis) on 2009-12-14

Changed in libatasmart (Ubuntu Lucid):
assignee:	Martin Pitt (pitti) → nobody

Vitaliy Kulikov (slonua) on 2009-12-15

tags:

added: apport-collected

tereza.am (tereza-am) on 2010-01-07

description:

updated

Revision history for this message

gnuckx (gnuckx) wrote on 2010-03-03:

#138

I confirm the same problem "palimpsest bad sectors false positive " on Karmic and Lucid Alfa 3. Palimpsest ID errors number 5 and 197 reported on my HD Samsung 1 Terabyte model HD103UJ. Meanwhile, no error is reported on my second HD Hitachi 500 GB model HDT725050VLA360.

Kees Cook (kees) on 2010-03-03

tags:	added: regression-potential
Changed in libatasmart (Ubuntu Lucid):
milestone:	none → ubuntu-10.04-beta-1

Revision history for this message

Dhruva Sagar (dhruva-sagar) wrote on 2010-03-04:

#139

I am also facing the issue. I had an old SATA hard disk and once I upgraded to Karmic it started giving me hard disk failure errors. I tried to ignore them for a while, but slowly things deteriorated and soon half of my hard disk was rendered useless, even if I tried to format that partition Ubuntu would crash and drives would become read-only and I couldn't save / edit anything on the hard disk.
At times when rebooting it would say that no bootable media was found and that it was unable to mount the hard disk. I would restart go to GRUB and select another image to boot from and it would boot fine.

In anyways considering the kind of those errors I assumed that something is indeed wrong with the hard disk. I went ahead and purchased a brand new hard disk. (Toshiba), and installed karmic from scratch and installed all my softwares and utilities that I needed for my development setup all over again, took me almost a week. But good thing was that the errors disappeared!

One month into that, and the errors have now reappeared. Ubuntu shows me again that there are bad sectors and slowly but steadily they are increasing! Ubuntu would hang suddenly at times and I would have to reboot, go to the recovery console, it would be unable to mount my hard disk and I would have to do a FSCK to repair some inodes and when I then reboot and come back, I am able to boot normally but only to see that the number of bad sector count has increased.

I have a terrible feeling that Ubuntu is somehow corrupting my hard disk, I have no reason to believe that my 1 month old brand new hard disk could have any problems whatsoever.

This is really pathetic! I have been an Ubuntu fan for over 4 years now and have never looked back to windows. But this whole incident has left me haunted. I can't afford to buy new hard disks every couple of months. This is just not acceptable! Someone please do something. I beg you!

Revision history for this message

Oded Arbel (oded-geek) wrote on 2010-03-04:

#140

Dhruva: the problem you are reporting sounds like you do have a problem with the drive and libatasmart is reporting the issue correctly - so this is not the issue that is reported in this bug.

Regarding your actual problem, as you had escalating problems with an old drive and now have an escalating problem with a new drive, I would guess that your problem is not the drive but something else. I don't think the problem is that Ubuntu is corrupting the drive as it doesn't seem likely that it has that capability and there is no one else with a similar experience. I would think that you have another hardware problem that causes disks to fail - either insufficient cooling or vibration problems are the most likely issues. I suggest you contact someone with appropriate knowledge to help you resolve this problem.

Revision history for this message

Christoph Buchner (bilderbuchi) wrote on 2010-03-05:

#141

I also see the same behaviour on a dual-boot (ubuntu 9.10 and winXP) hp 8530w laptop with a Hitachi drive. It suddenly appeared after defragging a shared ntfs data partition. My reallocation sector count is 65538, and the rest seems unremarkable to me (only reallocation event count is 1).
I attached screenshots of 3 different SMART tools i ran under windows (HDD health, CrystalDiskInfo, and the windows version of smartctl) to check the palimpsest output.
All programs agree about the current/normalised value and threshold (indicating nothing is amiss), but the reading of the raw value of the different programs is interesting (indicated in the screenshot). Remarkably, CrystalDiskInfo also cautions me about the Reallocated Sector count...

so far i've come to the conclusion that it's a bug in the way the raw-values of my hitachi drive are translated, but am i right? or is my drive in fact dying, and should i replace it asap?

anything else i can post to help?

Is there any dev on this? 84 people affected, and copious user reports, but not even an assignee? or does the assignment to redhat-bugs mean it's going to be resolved there first?

Revision history for this message

Dhruva Sagar (dhruva-sagar) wrote on 2010-03-05:

#142

@Oded Arbel : hmmm now that I am a little less irritable, I seem to agree with your opinion. I have in fact been witnessing some cooling problems lately, although I thought they were pertaining to the CPU only, I didn't know or anticipate that they could be harming my hard disk too, but now I guess I do, I will follow your lead and have it checked out. It is just that I started to experience this only after I upgraded to karmic and while searching I was seeing a lot of such reports that made me feel it is something similar...Thanks.

Revision history for this message

Mikko Saarinen (mikk0) wrote on 2010-03-12:

#143

I got a computer from a friend who said it was not working well.

As soon as I booted it with Live CD, I got the error of a disk failing. I did a backup, but some of the files were unreadable, even though S.M.A.R.T says the Read Error Rate is 0 (Raw 0x000000000000)

Reallocated sectors = 335 and Pending = 122.
Obviously the disk is not O.K, because of the read wailures, but shoudn't the read error rate be higher then?

In my case, the palimpsest gives reasonable figures and is working wery fine =)

Martin Pitt (pitti) on 2010-03-16

Changed in libatasmart (Ubuntu Lucid):
assignee:	nobody → Martin Pitt (pitti)

Martin Pitt (pitti) on 2010-03-17

Changed in libatasmart (Ubuntu Lucid):
milestone:	ubuntu-10.04-beta-1 → ubuntu-10.04-beta-2

Revision history for this message

In freedesktop.org Bugzilla #25772, Martin Pitt (pitti) wrote on 2010-03-19:

#144

The bigger problem of this is (as you already mentioned) that the raw value is misparsed way too often. Random examples from bug reports:

http://launchpadlibrarian.net/34574037/smartctl.txt
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 327697

http://launchpadlibrarian.net/35971054/smartctl_tests.log
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 65542

http://launchpadlibrarian.net/36599746/smartctl_tests-deer.log
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 65552

https://bugzilla.redhat.com/attachment.cgi?id=382378
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 655424

https://bugzilla.redhat.com/show_bug.cgi?id=506254
reallocated-sector-count 100/100/ 5 FAIL 1900724 sectors Prefail
Online

It seems that "no officially accepted spec about SMART attribute decoding" also hits here in the sense of that way too many drives get the raw counts wrong. In all the 30 or so logs that I looked at in the various Launchpad/RedHat/fd.o bug reports related to this I didn't see an implausible value of the normalized values, though.

I appreciate the effort of doing vendor independent bad blocks checking, but a lot of people get tons of false alarms due to that, and thus won't believe it any more if there is really a disk failing some day.

My feeling is that a more cautious approach would be to use the normalized value vs. treshold for the time being, and use the raw values if/when that can be made more reliable (then we should use something in between logarithmic and linear, though, since due to sheer probabilities, large disks will have more bad sectors and also more reserve sectors than small ones).

Revision history for this message

In freedesktop.org Bugzilla #25772, Martin Pitt (pitti) wrote on 2010-03-19:

#145

Created an attachment (id=34234)
smart blob with slightly broken sectors

BTW, I use this smart blob for playing around and testing, which is a particularly interesting one: It has a few bad sectors (correctly parsed), but not enough yet to be below the vendor specified threshold.

5 reallocated-sector-count 77 1 63 1783 sectors 0xf70600000000 prefail online yes no
197 current-pending-sector 83 6 0 1727 sectors 0xbf0600000000 old-age offline n/a n/a

So this can be loaded into skdump or udisks for testing the desktop integration all the way through:

$ sudo udisks --ata-smart-refresh /dev/sda --ata-smart-simulate /tmp/smart.blob

Revision history for this message

In freedesktop.org Bugzilla #25772, Martin Pitt (pitti) wrote on 2010-03-19:

#146

Created an attachment (id=34242)
Drop our own "many bad sectors" heuristic

This patch just uses the standard "compare normalized value against treshold". I know that it's not necessarily how you really want it to work, but it's a pragmatic solution to avoid all those false positives, which don't help people either.

So of course feel free to entirely ignore it, but at least I want to post it here for full disclosure. (I'll apply it to Debian/Ubuntu, we have to get a release out).

This patch is against the one in bug 26834.

Revision history for this message

In freedesktop.org Bugzilla #25772, Martin Pitt (pitti) wrote on 2010-03-19:

#147

Oh, forgot: I compared

for i in blob-examples/*; do echo "-- $i"; ./skdump --load=$i; done > /tmp/atasmart-test.out

before and after, and get two differences like

-^[[1mOverall Status: BAD_SECTOR_MANY^[[0m
+^[[1mOverall Status: BAD_SECTOR^[[0m

The first one is against blob-examples/Maxtor_96147H8--BAC51KJ0:
5 reallocated-sector-count 226 226 63 69 sectors 0x450000000000 prefail online yes yes

and the second one against blob-examples/WDC_WD5000AAKS--00TMA0-12.01C01

5 reallocated-sector-count 192 192 140 63 sectors 0x3f0000000000 prefail online yes yes

so under the premise of changing the evaluation to use the normalized numbers those are correct and expected changes.

Martin Pitt (pitti) on 2010-03-19

Changed in libatasmart (Ubuntu Lucid):
status:	Triaged → In Progress

Bug Watch Updater (bug-watch-updater) on 2010-03-19

Changed in libatasmart:
status:	Unknown → Confirmed

Revision history for this message

Martin Pitt (pitti) wrote on 2010-03-19:

#148

I sent a patch to the upstream freedesktop.org bug, and uploaded a new libatasmart package to lucid. It's currently stuck in UNAPPROVED and will land after the beta-1 release.

Changed in libatasmart (Ubuntu Lucid):
status:	In Progress → Fix Committed

Revision history for this message

Jean-Louis (jean-louis) wrote on 2010-03-19:

#149

I'm very happy for this decision.

Before to investigate through this problem, I've bought a new hdd for security reason (backup all data), but for now, after 3 months, the numbers of reallocated sectors is stable and haven't increased.

This patch could will save unneeded e-waste

Revision history for this message

Launchpad Janitor (janitor) wrote on 2010-03-19:

#150

This bug was fixed in the package libatasmart - 0.17+git20100219-1git2

---------------
libatasmart (0.17+git20100219-1git2) lucid; urgency=low

Upload current Debian packaging git head.

  * debian/local/apport-hook.py: Update for udisks.
  * Add 0001-Speed-up-get_overall-and-get_bad.patch: Speed up get_overall()
    and get_bad(). (fd.o #26834)
  * Add 0002-Drop-our-own-many-bad-sectors-heuristic.patch: Drop our own "many
    bad sectors" heuristic.This currently causes a lot of false positives,
    because in many cases our treshold is either overly pessimistically low,
    or the raw value is implausibly high. Just use the normalized values vs.
    treshold for now. (LP: #438136, fd.o #25772)
-- Martin Pitt <email address hidden> Fri, 19 Mar 2010 22:21:47 +0100

Changed in libatasmart (Ubuntu Lucid):
status:	Fix Committed → Fix Released

Revision history for this message

Richard Gunn (ubuntu-deckard) wrote on 2010-03-19:

#151

I definitely have a false positive with my main drive (Hitachi), but something else just occurred to me - when I tried install Karmic from the live CD, it refused to recognize my Hitachi drive as a viable target for installation. In the end, I was forced to upgrade from Intrepid to Jaunty to Karmic using the dist upgrade option in synaptic.

I only discovered the false positive issue with palimpsest AFTER I had upgraded to Karmic through synaptic, so in retrospect, I wander if some sort of integrity check is done on the drive before the Karmic CD installer lists it prior to partitioning, and whether this false positive issue actually prevented me from installing Karmic from the live CD onto my Hitachi hard drive.

If that is true, perhaps an additional issue should be added to the list for the Karmic live CD?

Revision history for this message

Jerone Young (jerone) wrote on 2010-03-25:

#152

@Martin Pitt

Can this fix to the heuristics be backported to 9.10 Karmic via SRU?

Revision history for this message

Benjamin Drung (bdrung) wrote on 2010-03-27:

#153

I unsubscribed ubuntu-sponsors, because there is no debdiff to sponsor.

Revision history for this message

primefalcon (primefalcon) wrote on 2010-03-31:

#154

Just adding that I am getting this as well on my asus 900ha

Kevin Krafthefer (krafthefer) on 2010-04-05

Changed in libatasmart (Ubuntu Karmic):
assignee:	nobody → Canonical Platform QA Team (canonical-platform-qa)

Revision history for this message

Joe Claunch (catalina22) wrote on 2010-04-19:

#155

I encountered this problem under Karmic on my 6 month old Dell Mini-9 with a 4 GB SSD. I zeroed the SSD with the "dd" command and installed Lucid Beta 2. Within 30 seconds of the post install restart I was getting the same error message. I loaded and installed all available patches with software update utility but the problem persisited. I then repeated the zero, install, and patch operation again with the same results. At this point I went back to 9.04 and my Dell Mini-9 is again working perfectly. The specific error message I am encountering is as follows:

3.8 GB Hard Disk - ATA STEC ATA DISK vS020.1.0
DISK IS BEING USED OUTSIDE DESIGN PARAMETERS

Brian Murray (brian-murray) on 2010-04-22

Changed in libatasmart (Ubuntu Karmic):
milestone:	none → karmic-updates

Martin Pitt (pitti) on 2010-04-22

Changed in libatasmart (Ubuntu Karmic):
status:	Triaged → In Progress
assignee:	Canonical Platform QA Team (canonical-platform-qa) → Martin Pitt (pitti)

Martin Pitt (pitti) on 2010-04-23

description:

updated

Martin Pitt (pitti) on 2010-04-23

description:	updated
description:	updated

Revision history for this message

Martin Pitt (pitti) wrote on 2010-04-23:

#156

karmic debdiff Edit (5.5 KiB, text/plain)

Ugh, the karmic code is quite a bit different, so I basically needed to implement the same logic for a rather different code base. It's working now, though (see attached debdiff). The SRU test case (see description) is working now, and I also run the old and new version against all the blob examples which are in the source code:

for i in blob-examples/*; do echo "-- $i"; ./skdump --load=$i; done

The diff between the old and new version is

--- atasmart-test.old 2010-04-23 15:20:42.636609956 +0200
+++ atasmart-test.new 2010-04-23 16:06:49.966609923 +0200
@@ -214,7 +214,7 @@
Average Powered On Per Power Cycle: 1.1 h
Temperature: No such file or directory
Attribute Parsing Verification: Good
-Overall Status: BAD_SECTOR_MANY
+Overall Status: BAD_SECTOR
ID# Name Value Worst Thres Pretty Raw Type Updates Good Good/Past
   1 raw-read-error-rate 253 252 0 343062 0x163c05000000 old-age online n/a n/a
   3 spin-up-time 196 191 63 62 ms 0x3e000000fa37 prefail online yes yes
@@ -620,7 +620,7 @@
Average Powered On Per Power Cycle: 11.2 days
Temperature: 40.0 C
Attribute Parsing Verification: Good
-Overall Status: BAD_SECTOR_MANY
+Overall Status: BAD_SECTOR
ID# Name Value Worst Thres Pretty Raw Type Updates Good Good/Past
   1 raw-read-error-rate 200 200 51 18 0x120000000000 prefail online yes yes
   3 spin-up-time 208 164 21 4.6 s 0xd61100000000 prefail online yes yes

The first one is against blob-examples/Maxtor_96147H8--BAC51KJ0:
5 reallocated-sector-count 226 226 63 69 sectors 0x450000000000
prefail online yes yes

and the second one against blob-examples/WDC_WD5000AAKS--00TMA0-12.01C01

5 reallocated-sector-count 192 192 140 63 sectors 0x3f0000000000
prefail online yes yes

so under the premise of changing the evaluation to use the normalized numbers those are correct and expected changes. (I. e. in those two cases you would have gotten a "many bad blocks" warning before).

Ugh, the karmic code is quite a bit different, so I basically needed to implement the same logic for a rather different code base. It's working now, though (see attached debdiff). The SRU test case (see description) is working now, and I also run the old and new version against all the blob examples which are in the source code:

for i in blob-examples/*; do echo "-- $i"; ./skdump --load=$i; done

The diff between the old and new version is

--- atasmart-test.old	2010-04-23 15:20:42.636609956 +0200
+++ atasmart-test.new	2010-04-23 16:06:49.966609923 +0200
@@ -214,7 +214,7 @@
 Average Powered On Per Power Cycle: 1.1 h
 Temperature: No such file or directory
 Attribute Parsing Verification: Good
-Overall Status: BAD_SECTOR_MANY
+Overall Status: BAD_SECTOR
 ID# Name                        Value Worst Thres Pretty      Raw            Type    Updates Good Good/Past
   1 raw-read-error-rate         253   252     0   343062      0x163c05000000 old-age online  n/a  n/a 
   3 spin-up-time                196   191    63   62 ms       0x3e000000fa37 prefail online  yes  yes 
@@ -620,7 +620,7 @@
 Average Powered On Per Power Cycle: 11.2 days
 Temperature: 40.0 C
 Attribute Parsing Verification: Good
-Overall Status: BAD_SECTOR_MANY
+Overall Status: BAD_SECTOR
 ID# Name                        Value Worst Thres Pretty      Raw            Type    Updates Good Good/Past
   1 raw-read-error-rate         200   200    51   18          0x120000000000 prefail online  yes  yes 
   3 spin-up-time                208   164    21   4.6 s       0xd61100000000 prefail online  yes  yes

The first one is against blob-examples/Maxtor_96147H8--BAC51KJ0:
 5 reallocated-sector-count    226   226    63   69 sectors  0x450000000000
prefail online  yes  yes

and the second one against blob-examples/WDC_WD5000AAKS--00TMA0-12.01C01

5 reallocated-sector-count    192   192   140   63 sectors  0x3f0000000000
prefail online  yes  yes

so under the premise of changing the evaluation to use the normalized numbers those are correct and expected changes. (I. e. in those two cases you would have gotten a "many bad blocks" warning before).

Revision history for this message

Martin Pitt (pitti) wrote on 2010-04-23:

#157

Uploaded to karmic-proposed queue (needs another SRU team member to review now) and to my PPA at https://launchpad.net/~pitti/+archive/sru-test (sudo add-apt-repository ppa:pitti/sru-test).

Changed in libatasmart (Ubuntu Karmic):
status:	In Progress → Fix Committed

Jerone Young (jerone) on 2010-04-26

Changed in oem-priority:
status:	New → In Progress

Revision history for this message

Vitaliy Kulikov (slonua) wrote on 2010-05-07:

#158

confirm as fixed in Lucid =).

Revision history for this message

Colin Watson (cjwatson) wrote on 2010-05-14: Please test proposed package

#159

Accepted into karmic-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags:

added: verification-needed

Jerone Young (jerone) on 2010-05-18

Changed in oem-priority:
status:	In Progress → Fix Released

Revision history for this message

adMcb (amar4mx) wrote on 2010-06-08:

#160

Ok. We all know that the bug is affecting us, but what is the solution for this? I already repaired my disk with HDD Regenerator, and only 10 sectors showed me that same bad completely repaired, and Ubuntu 4.10 Netbook keeps telling me I have 29 000 for bad blocks and that the drive failure is imminent, if it is a bug we need patch, where do we get?. Or to do in this case? THANKS

Revision history for this message

Martin Pitt (pitti) wrote on 2010-06-09: Re: [Bug 438136] Re: palimpsest bad sectors false positive

#161

adMcb [2010-06-08 15:07 -0000]:
> Ok. We all know that the bug is affecting us, but what is the solution
> for this?

It got fixed in 10.04, and for 9.10 (karmic) the fix is in
karmic-proposed, waiting to be tested. Please see the updated
description for how to test it.

> I already repaired my disk with HDD Regenerator, and only 10
> sectors showed me that same bad completely repaired, and Ubuntu 4.10
> Netbook keeps telling me I have 29 000 for bad blocks

The patch only changes the threshold at which it starts notifying you
(which was very low and incorrect previously). 29.000 bad blocks
does sound like something you should start being concerned about,
though. Apparently your HDD still has enough spare blocks to cope, but
you should watch out whether this number increases over time. If it
rapidly does, consider getting a new HDD before you get serious data
loss.

Revision history for this message

Graham Inggs (ginggs) wrote on 2010-06-09:

#162

> 29.000 bad blocks does sound like something you should start being concerned about, though.

The problem is that 29000 is the RAW value of the re-allocated sector count, not the actual count of bad sectors.

I have a failing Seagate drive that I have been monitoring for several weeks and I have established that on this particular drive, the lower four bits of the RAW value are not part of the count. Palimpsest tells me this drive has 893 bad sectors, but I calculate that it only has 55. Seagate will only replace the drive when it has around 160 bad sectors.

Revision history for this message

Martin Pitt (pitti) wrote on 2010-06-09:

#163

Graham Inggs [2010-06-09 8:50 -0000]:
> The problem is that 29000 is the RAW value of the re-allocated sector
> count, not the actual count of bad sectors.

Right. But the notification about "Your disk is about to die" now
checks the normalized value/threshold, which is under control by the
driver manufacturer. Do you still get those notifications with the
current lucid or karmic-proposed packages?

Revision history for this message

Graham Inggs (ginggs) wrote on 2010-06-11:

#164

> Do you still get those notifications with the current lucid or karmic-proposed packages?

I no longer get the notifications, but the SMART data palimpsest still warns that I have "893 bad sectors", which is incorrect.

Revision history for this message

Martin Pitt (pitti) wrote on 2010-06-11:

#165

Graham Inggs [2010-06-11 9:11 -0000]:
> I no longer get the notifications, but the SMART data palimpsest still
> warns that I have "893 bad sectors", which is incorrect.

Right, the updated package wasn't supposed to actually reinterpret the
raw values. Thanks for testing!

tags:

added: verification-done
removed: verification-needed

Revision history for this message

Launchpad Janitor (janitor) wrote on 2010-06-14:

#166

This bug was fixed in the package libatasmart - 0.16-1ubuntu0.1

---------------
libatasmart (0.16-1ubuntu0.1) karmic-proposed; urgency=low

  * debian/rules: Enable simple-patchsys.
  * Add 01_use_manufacturer_bad_blocks.patch: Drop our own "many bad sectors"
    heuristic. This currently causes a lot of false positives, because in many
    cases our treshold is either overly pessimistically low, or the raw value
    is implausibly high. Just use the normalized values vs. treshold for now.
    (LP: #438136)
-- Martin Pitt <email address hidden> Fri, 23 Apr 2010 15:05:48 +0200

Changed in libatasmart (Ubuntu Karmic):
status:	Fix Committed → Fix Released

Revision history for this message

In freedesktop.org Bugzilla #25772, cowbutt (cowbutt6) wrote on 2010-07-04:

#167

(In reply to comment #1)

> The reason I picked log2() here is simply that we do want to allow more bad
> sectors on bigger drives than on small ones. But a linearly related threshold
> seemed to increase too quickly, so the next choice was logarithmic.
>
> Do you have any empiric example where the current thresholds do not work as
> they should?

According to http://www.seagate.com/ww/v/index.jsp?locale=en-US&name=SeaTools_Error_Codes_-_Seagate_Technology&vgnextoid=d173781e73d5d010VgnVCM100000dd04090aRCRD (which I first read about 18 months ago, when 1.5TB drives were brand new), "Current disk drives contain *thousands* [my emphasis] of spare sectors which are automatically reallocated if the drive senses difficulty reading or writing". Therefore, it is my belief that your heuristic is off by somewhere between one and two orders of magnitude as your heuristic only allows for 30 bad sectors on a 1TB drive (Seagate's article would imply it has at least 2000 spare sectors - and maybe more - of which 30 are only 1.5%).

As you say, though, this is highly manufacturer- and model-dependent; Seagate's drives might be designed with very many more spare sectors than other manufacturers' drives. The only sure-fire way to interpret the SMART attributes is to compare the cooked value with the vendor-set threshold for that attribute.

If you are insistent upon doing something with the raw reallocated sector count attribute, I believe it would be far more useful to alert when it changes, or changes by a large number of sectors in a short period of time.

Robert (robertkanabis) on 2010-09-07

Changed in oem-priority:
status:	Fix Released → Incomplete
assignee:	nobody → Robert (robertkanabis)
status:	Incomplete → Confirmed

Jean-Baptiste Lallement (jibel) on 2010-09-07

Changed in oem-priority:
status:	Confirmed → Fix Released

Bug Watch Updater (bug-watch-updater) on 2010-09-14

Changed in libatasmart:
importance:	Unknown → Medium

Revision history for this message

Sam_ (and-sam) wrote on 2010-12-27:

#168

drive-check Edit (14.2 KiB, text/plain)

Revision history for this message

Sam_ (and-sam) wrote on 2010-12-27:

#169

Affects new hardware and Maverick installation.
Thanks to comment #130 I did also advanced check with Hitachi DFT, result:
Operation completed successfully
Disposition Code: 0x00

Revision history for this message

Sam_ (and-sam) wrote on 2010-12-27:

#170

ATA-Hitachi-HDS5C1050CLA382– SMART.png Edit (136.7 KiB, image/png)

Revision history for this message

Sam_ (and-sam) wrote on 2010-12-28:

#171

ATA-Hitachi-HDS5C1050CLA382–SMART.png Edit (127.9 KiB, image/png)

After successful scan with Hitachi DFT palimpsest now shows 15 moved sectors instead of 1 before.

Revision history for this message

Sam_ (and-sam) wrote on 2011-01-05:

#172

Did another scan with CD from vendor, it also shows SMART status ok. Palimpsest says at the moment 25 reallocated sectors.

Bug Watch Updater (bug-watch-updater) on 2011-01-25

Changed in libatasmart:
importance:	Medium → Unknown

Bug Watch Updater (bug-watch-updater) on 2011-02-03

Changed in libatasmart:
importance:	Unknown → Medium

Revision history for this message

In freedesktop.org Bugzilla #25772, Lennart-poettering (lennart-poettering) wrote on 2011-10-11:

#174

So, I wanna give this one more try. I kept the log2() in there, but multiplied it now with 1024 which should be a safe margin.

If this brings bad results we can drop this entirely. In that case, please reopen.

Bug Watch Updater (bug-watch-updater) on 2011-10-12

Changed in libatasmart:
status:	Confirmed → Fix Released

Revision history for this message

Sam_ (and-sam) wrote on 2012-02-11:

#173

#165
> Right, the updated package wasn't supposed to actually reinterpret the
raw values.

Is it supposed to reinterpret on fresh installations?
After fresh Oneiric and Precise installations during the year palimpsest still counted up allocated sectors, since #172 increase to 53. Tresholds in UI didn't change.

Revision history for this message

In freedesktop.org Bugzilla #25772, Stuart Gathman (stuart-gathman) wrote on 2012-02-15:

#175

Just want to reiterate what a bad idea it is to:

a) make your own seat of the pants algorithm to determine how many bad sectors is "too many" based on no significant data.

b) do so when you can't even read the raw number correctly (due to varying format of raw values).

My wife's 120G laptop drive has 10 bad sectors, but palimpsest still reads this as 655424. (The 0x0a is the low order byte in intel byte order see https://bugzilla.redhat.com/show_bug.cgi?id=498115#c61 for details, still fails in Fedora 16, gnome-disk-utility-3.0.2.) The 1024 factor *still* sees the disk as failing - it does not address the underlying problem of not having a reliable raw value, and not knowing the design parameters or even the type of technology.

Please, please, just use the vendor numbers. The only thing you could add would be to keep a history, and warn of *changes* in the value (but don't say "OH MY GOD YOUR DISK IS ABOUT TO DIE!" unless the scaled value passes the vendor threshold).

Bug Watch Updater (bug-watch-updater) on 2012-02-15

Changed in libatasmart:
status:	Fix Released → Confirmed

Sam_ (and-sam) on 2012-02-15

tags:

added: oneiric precise

Revision history for this message

John Wilson (info-princeofpalms) wrote on 2012-03-02:

#176

Screenshot at 2012-03-02 17:51:33.png Edit (208.3 KiB, image/png)

Running dual-boot Windows 7 / Ubuntu 11.10 Oneiric on Dell M90. Windows CHKDSK reports no problems with my hard drive. Ubuntu S.M.A.R.T. reports a staggering 7 million+ bad sectors with green light status: "Disk has a few bad sectors". My system runs just fine, which is why I'm adding my 2 cents.

Revision history for this message

wilbur.harvey (wilbur-harvey) wrote on 2012-03-02: Re: [Bug 438136] Re: palimpsest bad sectors false positive

#177

Selection_033.jpeg Edit (45.6 KiB, image/jpeg; name="Selection_033.jpeg")

I am seeing similar issues with my SSD, lots of errors, but system seems to
run fine.
On my previous drive however, it started to run slowly, due to recovering
errors, and finally reported an error, so something funny is going on.
Regards
Wilbur Harvey

[image: Inline image 1]

On Fri, Mar 2, 2012 at 9:06 AM, John Wilson <email address hidden>wrote:

> Running dual-boot Windows 7 / Ubuntu 11.10 Oneiric on Dell M90. Windows
> CHKDSK reports no problems with my hard drive. Ubuntu S.M.A.R.T. reports
> a staggering 7 million+ bad sectors with green light status: "Disk has a
> few bad sectors". My system runs just fine, which is why I'm adding my 2
> cents.
>
> ** Attachment added: "Screenshot at 2012-03-02 17:51:33.png"
>
> https://bugs.launchpad.net/ubuntu/+source/libatasmart/+bug/438136/+attachment/2801866/+files/Screenshot%20at%202012-03-02%2017%3A51%3A33.png
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (413673).
> https://bugs.launchpad.net/bugs/438136
>
> Title:
> palimpsest bad sectors false positive
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/libatasmart/+bug/438136/+subscriptions
>

Bug Watch Updater (bug-watch-updater) on 2017-10-28

Changed in libatasmart (Fedora):
importance:	Unknown → High
status:	Confirmed → Won't Fix

zUbuntu
libatasmart package

palimpsest bad sectors false positive

Bug Description

Related branches

Duplicates of this bug

Other bug subscribers

Related questions

Patches

Bug attachments

Remote bug watches

	Status	Importance	Assigned to	Milestone
OEM Priority Project	Fix Released	Undecided	Robert
libatasmart	Confirmed	Medium	freedesktop-bugs #25772
libatasmart (Fedora)	Won't Fix	High	redhat-bugs #498115
libatasmart (Mandriva)	New	Undecided	Unassigned
libatasmart (Ubuntu)	Fix Released	Medium	Martin Pitt	Ubuntu ubuntu-10.04-beta-2
Karmic	Fix Released	Medium	Martin Pitt	Ubuntu karmic-updates
Lucid	Fix Released	Medium	Martin Pitt	Ubuntu ubuntu-10.04-beta-2
libatasmart (zUbuntu)	New	Undecided	Unassigned

zUbuntulibatasmart package

palimpsest bad sectors false positive

Bug Description

Related branches

Duplicates of this bug

Other bug subscribers

Related questions

Patches

Bug attachments

Remote bug watches

zUbuntu
libatasmart package