zSeries Ubuntu

palimpsest bad sectors false positive

Reported by Benjamin Drung on 2009-09-28
552
This bug affects 109 people
Affects Status Importance Assigned to Milestone
OEM Priority Project
Undecided
Robert
libatasmart
Confirmed
Medium
libatasmart (Fedora)
Confirmed
Unknown
libatasmart (Mandriva)
New
Undecided
Unassigned
libatasmart (Ubuntu)
Medium
Martin Pitt
Karmic
Medium
Martin Pitt
Lucid
Medium
Martin Pitt
libatasmart (zUbuntu)
New
Undecided
Unassigned

Bug Description

Binary package hint: gnome-disk-utility

palimpsest complains, that the disk has many bad sectors. palimpsest thinks, that SMART value 5 "Reallocated Sector Count" fails (screenshot attached). smartctl reports " 5 Reallocated_Sector_Ct 0x0033 097 097 010 Pre-fail Always - 117" (full log attached) which seems to be ok. This error appears on a different system, too.

SRU information:
 - Impact: Way too trigger happy about "broken disk" notifications, which both scares people and also makes them ignore situations where the disk is actually about to die
 - Fixed in lucid by reverting from our own bad sectors heuristics (using the raw numbers) to the manufactuer normalized numbers and manufacturer thresholds: http://bugs.freedesktop.org/attachment.cgi?id=34242
- No regression reports since then in lucid.

SRU TEST CASE:
- Download seb128's demo SMART data which have a few bad blocks, but not enough to be over the manufacturer threshold:

   wget -O /tmp/smart.blob http://bugs.freedesktop.org/attachment.cgi?id=34234

- Install libatasmart-bin

- Run

  skdump --load=/tmp/smart.blob --overall

With the karmic final version this says "BAD_SECTOR_MANY" which the GUI will react on with a scary notification.
The updated version should just say BAD_SECTOR.

If you leave out the --overall argument, you get a detailled list of the attributes. The broken ones will be printed in bold.

On a healthy system, "sudo ./skdump /dev/sda --overall" should still say "GOOD", and on a genuinely broken hard disk it should give the appropriate BAD_SECTOR/BAD_SECTOR_MANY answer.

ProblemType: Bug
Architecture: amd64
Date: Mon Sep 28 15:20:15 2009
DistroRelease: Ubuntu 9.10
Package: gnome-disk-utility 2.28.0-0ubuntu2
ProcEnviron:
 PATH=(custom, user)
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-11.36-generic
SourcePackage: gnome-disk-utility
Uname: Linux 2.6.31-11-generic x86_64

Benjamin Drung (bdrung) wrote :
Benjamin Drung (bdrung) wrote :
Matthew Murphy (chthonical) wrote :

I can confirm. Suffering from the same issue in 9.10 where Palimpsest is saying my Hitachi HTS541680J9SA00 has many bad sectors. Reallocated Sector Count. Pops it up every time I restart the computer.

J. J. Ramsey (jjramsey) wrote :

Looks like I have a false positive as well. Now Palimpsest is reporting the correct raw value for the "reallocated sector count," which in my case is 3268608. However, I have two reasons for thinking that Palimpsest is reporting the wrong conclusions from this wrong value. First, it appears that a similarly high value for the reallocated sector count is reported by smartctl for a *new* drive in the MacBook Air. The discussion on the smartmontools mailing list can be found here:

http://thread.gmane.org/gmane.linux.utilities.smartmontools/5252

or here (same discussion):

http://marc.info/?l=smartmontools-support&m=120407420622544&w=2

While my SSD is in an X41 Thinkpad rather than a MacBook Air, it is the exact same model as the one described in the mailing list messages to which I linked above, namely a Samsung SSD with the model number MCCOE64GEMPP. For the owner of the Macbook Air, the reallocated sector count is 2617344, the same order of magnitude as mine. One of the participants in the discussion speculated, "maybe the author of the SMART code in this disk was (ab)using this attribute to track the number of times that blocks have been moved about by the wear levelling algorithm."

Second, as seen in the screenshot, the "Self Assessment" of the self-test is "Passed." Apparently, whoever was the Samsung firmware programmer who wrote the self-test wasn't bothered by the reported raw value of the reallocated sector count.

I get the same results from running smartctl from SystemRescueCD 1.3.0. The reallocated sector count is 3268608, but the self-test nonetheless reports the drive as healthy.

Przemysław Kulczycki (azrael) wrote :
tags: added: disk karmic palimpsest
Tim (darkxst) wrote :

palimpsest seems to check the raw value against the threshold for reallocated sectors.

All other smart utilities seem to check the normalised value against the threshold. This seems to be more logical, however the developer of palimpsest seems to think the first behavior is correct as noted by him in this bug report
https://bugzilla.redhat.com/show_bug.cgi?id=500079

Georg (georg-lippold) wrote :

Would be a good option to let the user override certain reported values until the value goes up the next time. Then one wouldn't be bugged by Palimpsest on every boot but only if the disk degrades further.

yareckon (yareckon) wrote :

Hi guys, I have a 3 month old samsung ssd, which has 2179072 reallocated sectors (probably due to flash wear levelling on the drive as I have 0 "uncorrectable sector count" and 0 "Realocation count" ). I get yelled at every time I log into karmic that my drive is failing. The reallocated sector count has not changed in a month of heavy usage, so I don't think it's a drive in collapsing condition. I would also like an error dialog that says something like

Caution! Your drive has a high number of reallocated sectors, which may be a result of failing hardware. Currently the drive reports it is *passing* SMART checks, which are designed to detect a failing drive, so this warning may be incorrect. Certain types of storage such as solid state drives (ssds) have large numbers of reallocated sectors to extend their life. It is recommended that you back up your data in case your drive is about to fail.
What would you like to do now:
> Display the error messages and stats
> Inform me if the drive health further deteriorates

Naturally the scary warnings wouldn't be tempered if the SMART status was actually failing.

What do you say? I love Palimpest, but people will ignore the warnings if every netbook and SSD drive cries wolf.

Hallo everybody. I think this is a bug. Palimpest and GSmartControl give me report that I have bad sectors in my hard disk. Please see the attachments.

Confirm. I get a reallocated sector count of 65551 on my Hitachi HTS541680J9SA00, very similar to the value 65543 reported by Tapas Bose. Note 2**16 = 65536.

I have no clue if there are bad sectors or not; All I know is the machine is running fine, and previous versions of Ubuntu don't notify me of bad sectors, for what that is worth. See attached screen shot for hard drive model, etc.

see additional attachment

I can confirm the "false-positive" issue on my Dell XPS M1530 with Samsung HM250JI hard drive.
The Disk Utility warns me about 'bad sectors' all the time however testing it using HDD manufacturer's diagnostic tools (http://www.samsung.com/global/business/hdd/support/utilities/Support_HUTIL.html) gives no error. I did full disk surface scan.

Jose Mico (jose-mico) wrote :

I think that I'm also have a false positive with an Hitachi Travelstar disk on a HP530 notebook, fresh Ubuntu 9.10. Palimsest warns about inminent failure, even when current value of Reallocated_Sector_Ct (100) is way far form threshold (005):
  5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 327697

Jose Mico (jose-mico) wrote :

Most people with six-digit reallocated sectors seems to be using Hiatchi drives. Could be possible that raw value is not really the "bad sector" count for these drives?

neclepsio (neclepsio) wrote :

I have the same problem with a Hitachi disk, counting 458798 bad sectors. Every other parameter is ok.

greyor (greyor) wrote :

I have this problem as well, which was quite alarming just after I'd installed 9.10 on my laptop. It's a Dell Inspiron 1525n that I've had a little over a year -- and Palimpsest counts 858 bad sectors. The reallocated sectors count is 68295.

The drive is a ~120Gb Samsung HM121HI.

I highly doubt that this drive has seen that much wear in the last year, and I don't really have the money to replace it at this time, so I'm wondering what's going on.

inigmatus (inigmatus) wrote :

see http://ubuntuforums.org/showthread.php?p=8193949 for more examples of potential false positives.

peersi (ianpeers) wrote :

I have the same problem - ATA Maxtor 6Y16OPO 164 Gig drive
ID 5 - Normalised 110, Worst 100, Threshold 63, Value 1447 sectors

All other ID's no errors .... it makes bootup considerably slower than under 9.04

Eros Zanchetta (eros) wrote :

Same here on my fresh Karmic installation (x86_64), using a Seagate Barracuda ST31500341AS. I also tried the 32 bit version of Karmic and got the the same results.

I ran SeaTools for Windows' Long Generic test' to confirm that the drive was OK and it passed the test (see attached log file) so I guess it is indeed a false positive.

BRY (brypie) wrote :

I don't have enough hardware knowledge to say for sure that my "Many bad sectors" is true, but I have been using the disk for a long time now, without problem.

I saw this error message when using the Live CD also.

mercutio22 (macabro22) wrote :

Or maybe some OEMs are issuing refurbished hard drives inside their brand new PCs. Wild.

amar (amarendra) wrote :

same error though got nothing with other tools in windows like HDDLife and HDDHealth etc and I have been using it really good since Sep 2007.

Same error in a DELL XPS M1330 with an ATA SAMSUNG HM400LI hard disk drive.

bhuvi (bhuvanesh) wrote :

I too get this message in my new karmic installation.

bhuvi (bhuvanesh) wrote :
Alan Burbol (aburbol) wrote :

I have this issue as well; fresh install, 9.10 release, amd64, Dell Inspiron 1521 dual booting with Windows 7. At first, I though this message had something to do with Windows 7 (perhaps Win7 does crazy things with disk partitions?). Glad to see it's actually -not- Microsoft's fault this time.

Ramiro Castro (castro-ramiro) wrote :

Hi, I can confirm this false positive too. I have a TOSHIBA MK2035GSS. Hope the fix came soon!

Emile Ong (emilemail) wrote :

Yup, same problem here. Seagate 1500.11.

CHKDSK /R reports no problems. Nor does SeaTools.

Palimpsest reports 55 bad sectors.

kon_nos (konsnos) wrote :

Hi, I also confirm this false positive. I have a HP G7000.

s.ketrat (s-ketrat) wrote :

same on my HP 6510b with Hiatchi drives

sk

Jose Mico (jose-mico) wrote :

I really have several reallocated sectors in my disk, maybe due to a hit or something. But the point is that the disk is working fine, I've have no data loss, and the number of reallocated sectors is not increasing. The bug is just the warning about "inminent failure"... I don't think that the disk will fail in the next months (and the manufacturer neither). We'll see...

Anant (infyniti) wrote :

Same problem on Dell XPS 1330 using Samsung HM250JI. Infact upgraded on two different mahcines which are way older without any errors. Strange that most of them who reported this problem are using Samgsung HD. Is this something to do with this brand of HD ??

Anant

Daniele Napolitano (dnax88) wrote :

In my experience I have one hard disk with 7 reallocated sectors and work fine (obviously).

Another case is my friend's computer, Windows has stopped working and after a analysis with Palimpsest Disk Utility (on Ubuntu 9.10 live) I've read up to 250 reallocated damage sectors. So, no false positive for me (all hard disk are Maxtor).

A clarification: Reallocated sector means that hard disk has internally isolated the sectors, so badblocks don't report errors! This is a hardware data recovery.

P4man (duvel123) wrote :

To everyone reporting this as a bug; while its clear there IS a bug (negative relocated sector count, or ppl seeing 65500 relocated sectors), its not clear to be if those people seeing credible values (between 1-300 or so) are blaming this bug incorrectly or not. Please run another smart monitoring tools or hardware diagnosis program of your harddrive vendor to verify.

Doing read/write tests or filesystems checks does NOT disprove palimpsest's warning. Relocated sectors are invisible to the filesystem or operating system, the harddrive manages them automatically and transparently until it runs out of spare sectors, and only exposes this information through S.M.A.R.T.

If anyone can confirm a seemingly credible relocated sector count is in fact incorrect , I would love to learn it. Until then I would be very reluctant to blame this as a bug if you get a warning from palimpsest for a reasonable looking number of bad sectors.

Daniele Napolitano (dnax88) wrote :

@P4man: Finally! Thanks for clarification.

Eros Zanchetta (eros) wrote :

@p4man: thanks for the clarification. I'd love to help, but I'm not sure how. As I said above I ran SeaTools for Windows' "Long Generic Test" and it didn't report any problems (you can find the log file in my previous post) while palimpsest reports 466 bad sectors (see attached screenshot). I don't know if this is a credible number of relocated sectors.

I'm willing to run more tests, just tell me what to do.

I could try Spirite, but I'd rather not because it'll probably take forever to run the test on a 1.5 TB disk.

Eros Zanchetta (eros) wrote :

@p4man: thanks for the clarification. I'd love to help, but I'm not sure how. As I said above I ran SeaTools for Windows' "Long Generic Test" and it didn't report any problems (you can find the log file in my previous post) while palimpsest reports 466 bad sectors (see attached screenshot). I don't know if this is a credible number of relocated sectors.

I'm willing to run more tests, just tell me what to do.

I could try Spirite, but I'd rather not because it'll probably take forever to run the test on a 1.5 TB disk.

magoo (martingagnon5) wrote :

Same for me, brand new installation on a system with a specific partition for home directory. Even i have amd64 i prefered to use the 386 version of 9.10. Never occured before and began right after the installation.

Changed in gnome-disk-utility (Ubuntu):
assignee: nobody → Sergey Sventitski (sergey-sventitski)
Changed in gnome-disk-utility (Ubuntu):
assignee: Sergey Sventitski (sergey-sventitski) → nobody
Kees Cook (kees) on 2009-11-09
Changed in gnome-disk-utility (Ubuntu Karmic):
status: New → Confirmed
importance: Undecided → Medium
Changed in gnome-disk-utility (Ubuntu Lucid):
importance: Undecided → Medium
status: New → Confirmed
Changed in gnome-disk-utility (Ubuntu Lucid):
assignee: nobody → Canonical Ubuntu QA Team (canonical-qa)
Steve Beattie (sbeattie) on 2009-11-10
Changed in gnome-disk-utility (Ubuntu Lucid):
assignee: Canonical Ubuntu QA Team (canonical-qa) → Canonical Desktop Team (canonical-desktop-team)
Martin Pitt (pitti) on 2009-11-10
affects: gnome-disk-utility (Ubuntu Karmic) → libatasmart (Ubuntu Karmic)
Martin Pitt (pitti) on 2009-11-10
Changed in libatasmart (Ubuntu Lucid):
assignee: Canonical Desktop Team (canonical-desktop-team) → Martin Pitt (pitti)
Changed in libatasmart (Ubuntu Karmic):
status: Confirmed → Triaged
Changed in libatasmart (Ubuntu Lucid):
status: Confirmed → Triaged
Changed in libatasmart (Fedora):
status: Unknown → Confirmed
Jean-Louis (jean-louis) on 2009-12-14
Changed in libatasmart (Ubuntu Lucid):
assignee: Martin Pitt (pitti) → nobody
Vitaliy Kulikov (slonua) on 2009-12-15
tags: added: apport-collected
tereza.am (tereza-am) on 2010-01-07
description: updated
97 comments hidden view all 177 comments
gnuckx (gnuckx) wrote :

I confirm the same problem "palimpsest bad sectors false positive " on Karmic and Lucid Alfa 3. Palimpsest ID errors number 5 and 197 reported on my HD Samsung 1 Terabyte model HD103UJ. Meanwhile, no error is reported on my second HD Hitachi 500 GB model HDT725050VLA360.

Kees Cook (kees) on 2010-03-03
tags: added: regression-potential
Changed in libatasmart (Ubuntu Lucid):
milestone: none → ubuntu-10.04-beta-1
Dhruva Sagar (dhruva-sagar) wrote :

I am also facing the issue. I had an old SATA hard disk and once I upgraded to Karmic it started giving me hard disk failure errors. I tried to ignore them for a while, but slowly things deteriorated and soon half of my hard disk was rendered useless, even if I tried to format that partition Ubuntu would crash and drives would become read-only and I couldn't save / edit anything on the hard disk.
At times when rebooting it would say that no bootable media was found and that it was unable to mount the hard disk. I would restart go to GRUB and select another image to boot from and it would boot fine.

In anyways considering the kind of those errors I assumed that something is indeed wrong with the hard disk. I went ahead and purchased a brand new hard disk. (Toshiba), and installed karmic from scratch and installed all my softwares and utilities that I needed for my development setup all over again, took me almost a week. But good thing was that the errors disappeared!

One month into that, and the errors have now reappeared. Ubuntu shows me again that there are bad sectors and slowly but steadily they are increasing! Ubuntu would hang suddenly at times and I would have to reboot, go to the recovery console, it would be unable to mount my hard disk and I would have to do a FSCK to repair some inodes and when I then reboot and come back, I am able to boot normally but only to see that the number of bad sector count has increased.

I have a terrible feeling that Ubuntu is somehow corrupting my hard disk, I have no reason to believe that my 1 month old brand new hard disk could have any problems whatsoever.

This is really pathetic! I have been an Ubuntu fan for over 4 years now and have never looked back to windows. But this whole incident has left me haunted. I can't afford to buy new hard disks every couple of months. This is just not acceptable! Someone please do something. I beg you!

Oded Arbel (oded-geek) wrote :

Dhruva: the problem you are reporting sounds like you do have a problem with the drive and libatasmart is reporting the issue correctly - so this is not the issue that is reported in this bug.

Regarding your actual problem, as you had escalating problems with an old drive and now have an escalating problem with a new drive, I would guess that your problem is not the drive but something else. I don't think the problem is that Ubuntu is corrupting the drive as it doesn't seem likely that it has that capability and there is no one else with a similar experience. I would think that you have another hardware problem that causes disks to fail - either insufficient cooling or vibration problems are the most likely issues. I suggest you contact someone with appropriate knowledge to help you resolve this problem.

I also see the same behaviour on a dual-boot (ubuntu 9.10 and winXP) hp 8530w laptop with a Hitachi drive. It suddenly appeared after defragging a shared ntfs data partition. My reallocation sector count is 65538, and the rest seems unremarkable to me (only reallocation event count is 1).
I attached screenshots of 3 different SMART tools i ran under windows (HDD health, CrystalDiskInfo, and the windows version of smartctl) to check the palimpsest output.
All programs agree about the current/normalised value and threshold (indicating nothing is amiss), but the reading of the raw value of the different programs is interesting (indicated in the screenshot). Remarkably, CrystalDiskInfo also cautions me about the Reallocated Sector count...

so far i've come to the conclusion that it's a bug in the way the raw-values of my hitachi drive are translated, but am i right? or is my drive in fact dying, and should i replace it asap?

anything else i can post to help?

Is there any dev on this? 84 people affected, and copious user reports, but not even an assignee? or does the assignment to redhat-bugs mean it's going to be resolved there first?

Dhruva Sagar (dhruva-sagar) wrote :

@Oded Arbel : hmmm now that I am a little less irritable, I seem to agree with your opinion. I have in fact been witnessing some cooling problems lately, although I thought they were pertaining to the CPU only, I didn't know or anticipate that they could be harming my hard disk too, but now I guess I do, I will follow your lead and have it checked out. It is just that I started to experience this only after I upgraded to karmic and while searching I was seeing a lot of such reports that made me feel it is something similar...Thanks.

Mikko Saarinen (mikk0) wrote :

I got a computer from a friend who said it was not working well.

As soon as I booted it with Live CD, I got the error of a disk failing. I did a backup, but some of the files were unreadable, even though S.M.A.R.T says the Read Error Rate is 0 (Raw 0x000000000000)

Reallocated sectors = 335 and Pending = 122.
Obviously the disk is not O.K, because of the read wailures, but shoudn't the read error rate be higher then?

In my case, the palimpsest gives reasonable figures and is working wery fine =)

Martin Pitt (pitti) on 2010-03-16
Changed in libatasmart (Ubuntu Lucid):
assignee: nobody → Martin Pitt (pitti)
Martin Pitt (pitti) on 2010-03-17
Changed in libatasmart (Ubuntu Lucid):
milestone: ubuntu-10.04-beta-1 → ubuntu-10.04-beta-2

The bigger problem of this is (as you already mentioned) that the raw value is misparsed way too often. Random examples from bug reports:

  http://launchpadlibrarian.net/34574037/smartctl.txt
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 327697

  http://launchpadlibrarian.net/35971054/smartctl_tests.log
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 65542

  http://launchpadlibrarian.net/36599746/smartctl_tests-deer.log
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 65552

  https://bugzilla.redhat.com/attachment.cgi?id=382378
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 655424

  https://bugzilla.redhat.com/show_bug.cgi?id=506254
reallocated-sector-count 100/100/ 5 FAIL 1900724 sectors Prefail
Online

It seems that "no officially accepted spec about SMART attribute decoding" also hits here in the sense of that way too many drives get the raw counts wrong. In all the 30 or so logs that I looked at in the various Launchpad/RedHat/fd.o bug reports related to this I didn't see an implausible value of the normalized values, though.

I appreciate the effort of doing vendor independent bad blocks checking, but a lot of people get tons of false alarms due to that, and thus won't believe it any more if there is really a disk failing some day.

My feeling is that a more cautious approach would be to use the normalized value vs. treshold for the time being, and use the raw values if/when that can be made more reliable (then we should use something in between logarithmic and linear, though, since due to sheer probabilities, large disks will have more bad sectors and also more reserve sectors than small ones).

Created an attachment (id=34234)
smart blob with slightly broken sectors

BTW, I use this smart blob for playing around and testing, which is a particularly interesting one: It has a few bad sectors (correctly parsed), but not enough yet to be below the vendor specified threshold.

  5 reallocated-sector-count 77 1 63 1783 sectors 0xf70600000000 prefail online yes no
197 current-pending-sector 83 6 0 1727 sectors 0xbf0600000000 old-age offline n/a n/a

So this can be loaded into skdump or udisks for testing the desktop integration all the way through:

$ sudo udisks --ata-smart-refresh /dev/sda --ata-smart-simulate /tmp/smart.blob

Created an attachment (id=34242)
Drop our own "many bad sectors" heuristic

This patch just uses the standard "compare normalized value against treshold". I know that it's not necessarily how you really want it to work, but it's a pragmatic solution to avoid all those false positives, which don't help people either.

So of course feel free to entirely ignore it, but at least I want to post it here for full disclosure. (I'll apply it to Debian/Ubuntu, we have to get a release out).

This patch is against the one in bug 26834.

Oh, forgot: I compared

  for i in blob-examples/*; do echo "-- $i"; ./skdump --load=$i; done > /tmp/atasmart-test.out

before and after, and get two differences like

-^[[1mOverall Status: BAD_SECTOR_MANY^[[0m
+^[[1mOverall Status: BAD_SECTOR^[[0m

The first one is against blob-examples/Maxtor_96147H8--BAC51KJ0:
 5 reallocated-sector-count 226 226 63 69 sectors 0x450000000000 prefail online yes yes

and the second one against blob-examples/WDC_WD5000AAKS--00TMA0-12.01C01

  5 reallocated-sector-count 192 192 140 63 sectors 0x3f0000000000 prefail online yes yes

so under the premise of changing the evaluation to use the normalized numbers those are correct and expected changes.

Martin Pitt (pitti) on 2010-03-19
Changed in libatasmart (Ubuntu Lucid):
status: Triaged → In Progress
Changed in libatasmart:
status: Unknown → Confirmed
Martin Pitt (pitti) wrote :

I sent a patch to the upstream freedesktop.org bug, and uploaded a new libatasmart package to lucid. It's currently stuck in UNAPPROVED and will land after the beta-1 release.

Changed in libatasmart (Ubuntu Lucid):
status: In Progress → Fix Committed
Jean-Louis (jean-louis) wrote :

I'm very happy for this decision.

Before to investigate through this problem, I've bought a new hdd for security reason (backup all data), but for now, after 3 months, the numbers of reallocated sectors is stable and haven't increased.

This patch could will save unneeded e-waste

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libatasmart - 0.17+git20100219-1git2

---------------
libatasmart (0.17+git20100219-1git2) lucid; urgency=low

  Upload current Debian packaging git head.

  * debian/local/apport-hook.py: Update for udisks.
  * Add 0001-Speed-up-get_overall-and-get_bad.patch: Speed up get_overall()
    and get_bad(). (fd.o #26834)
  * Add 0002-Drop-our-own-many-bad-sectors-heuristic.patch: Drop our own "many
    bad sectors" heuristic.This currently causes a lot of false positives,
    because in many cases our treshold is either overly pessimistically low,
    or the raw value is implausibly high. Just use the normalized values vs.
    treshold for now. (LP: #438136, fd.o #25772)
 -- Martin Pitt <email address hidden> Fri, 19 Mar 2010 22:21:47 +0100

Changed in libatasmart (Ubuntu Lucid):
status: Fix Committed → Fix Released
Richard Gunn (ubuntu-deckard) wrote :

I definitely have a false positive with my main drive (Hitachi), but something else just occurred to me - when I tried install Karmic from the live CD, it refused to recognize my Hitachi drive as a viable target for installation. In the end, I was forced to upgrade from Intrepid to Jaunty to Karmic using the dist upgrade option in synaptic.

I only discovered the false positive issue with palimpsest AFTER I had upgraded to Karmic through synaptic, so in retrospect, I wander if some sort of integrity check is done on the drive before the Karmic CD installer lists it prior to partitioning, and whether this false positive issue actually prevented me from installing Karmic from the live CD onto my Hitachi hard drive.

If that is true, perhaps an additional issue should be added to the list for the Karmic live CD?

Jerone Young (jerone) wrote :

@Martin Pitt

       Can this fix to the heuristics be backported to 9.10 Karmic via SRU?

Benjamin Drung (bdrung) wrote :

I unsubscribed ubuntu-sponsors, because there is no debdiff to sponsor.

primefalcon (primefalcon) wrote :

Just adding that I am getting this as well on my asus 900ha

Changed in libatasmart (Ubuntu Karmic):
assignee: nobody → Canonical Platform QA Team (canonical-platform-qa)
Joe Claunch (catalina22) wrote :

I encountered this problem under Karmic on my 6 month old Dell Mini-9 with a 4 GB SSD. I zeroed the SSD with the "dd" command and installed Lucid Beta 2. Within 30 seconds of the post install restart I was getting the same error message. I loaded and installed all available patches with software update utility but the problem persisited. I then repeated the zero, install, and patch operation again with the same results. At this point I went back to 9.04 and my Dell Mini-9 is again working perfectly. The specific error message I am encountering is as follows:

3.8 GB Hard Disk - ATA STEC ATA DISK vS020.1.0
DISK IS BEING USED OUTSIDE DESIGN PARAMETERS

Changed in libatasmart (Ubuntu Karmic):
milestone: none → karmic-updates
Martin Pitt (pitti) on 2010-04-22
Changed in libatasmart (Ubuntu Karmic):
status: Triaged → In Progress
assignee: Canonical Platform QA Team (canonical-platform-qa) → Martin Pitt (pitti)
Martin Pitt (pitti) on 2010-04-23
description: updated
Martin Pitt (pitti) on 2010-04-23
description: updated
description: updated
Martin Pitt (pitti) wrote :

Ugh, the karmic code is quite a bit different, so I basically needed to implement the same logic for a rather different code base. It's working now, though (see attached debdiff). The SRU test case (see description) is working now, and I also run the old and new version against all the blob examples which are in the source code:

  for i in blob-examples/*; do echo "-- $i"; ./skdump --load=$i; done

The diff between the old and new version is

--- atasmart-test.old 2010-04-23 15:20:42.636609956 +0200
+++ atasmart-test.new 2010-04-23 16:06:49.966609923 +0200
@@ -214,7 +214,7 @@
 Average Powered On Per Power Cycle: 1.1 h
 Temperature: No such file or directory
 Attribute Parsing Verification: Good
-Overall Status: BAD_SECTOR_MANY
+Overall Status: BAD_SECTOR
 ID# Name Value Worst Thres Pretty Raw Type Updates Good Good/Past
   1 raw-read-error-rate 253 252 0 343062 0x163c05000000 old-age online n/a n/a
   3 spin-up-time 196 191 63 62 ms 0x3e000000fa37 prefail online yes yes
@@ -620,7 +620,7 @@
 Average Powered On Per Power Cycle: 11.2 days
 Temperature: 40.0 C
 Attribute Parsing Verification: Good
-Overall Status: BAD_SECTOR_MANY
+Overall Status: BAD_SECTOR
 ID# Name Value Worst Thres Pretty Raw Type Updates Good Good/Past
   1 raw-read-error-rate 200 200 51 18 0x120000000000 prefail online yes yes
   3 spin-up-time 208 164 21 4.6 s 0xd61100000000 prefail online yes yes

The first one is against blob-examples/Maxtor_96147H8--BAC51KJ0:
 5 reallocated-sector-count 226 226 63 69 sectors 0x450000000000
prefail online yes yes

and the second one against blob-examples/WDC_WD5000AAKS--00TMA0-12.01C01

  5 reallocated-sector-count 192 192 140 63 sectors 0x3f0000000000
prefail online yes yes

so under the premise of changing the evaluation to use the normalized numbers those are correct and expected changes. (I. e. in those two cases you would have gotten a "many bad blocks" warning before).

Martin Pitt (pitti) wrote :

Uploaded to karmic-proposed queue (needs another SRU team member to review now) and to my PPA at https://launchpad.net/~pitti/+archive/sru-test (sudo add-apt-repository ppa:pitti/sru-test).

Changed in libatasmart (Ubuntu Karmic):
status: In Progress → Fix Committed
Jerone Young (jerone) on 2010-04-26
Changed in oem-priority:
status: New → In Progress
Vitaliy Kulikov (slonua) wrote :

confirm as fixed in Lucid =).

Accepted into karmic-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: added: verification-needed
Jerone Young (jerone) on 2010-05-18
Changed in oem-priority:
status: In Progress → Fix Released
adMcb (amar4mx) wrote :

Ok. We all know that the bug is affecting us, but what is the solution for this? I already repaired my disk with HDD Regenerator, and only 10 sectors showed me that same bad completely repaired, and Ubuntu 4.10 Netbook keeps telling me I have 29 000 for bad blocks and that the drive failure is imminent, if it is a bug we need patch, where do we get?. Or to do in this case? THANKS

adMcb [2010-06-08 15:07 -0000]:
> Ok. We all know that the bug is affecting us, but what is the solution
> for this?

It got fixed in 10.04, and for 9.10 (karmic) the fix is in
karmic-proposed, waiting to be tested. Please see the updated
description for how to test it.

> I already repaired my disk with HDD Regenerator, and only 10
> sectors showed me that same bad completely repaired, and Ubuntu 4.10
> Netbook keeps telling me I have 29 000 for bad blocks

The patch only changes the threshold at which it starts notifying you
(which was very low and incorrect previously). 29.000 bad blocks
does sound like something you should start being concerned about,
though. Apparently your HDD still has enough spare blocks to cope, but
you should watch out whether this number increases over time. If it
rapidly does, consider getting a new HDD before you get serious data
loss.

Graham Inggs (ginggs) wrote :

> 29.000 bad blocks does sound like something you should start being concerned about, though.

The problem is that 29000 is the RAW value of the re-allocated sector count, not the actual count of bad sectors.

I have a failing Seagate drive that I have been monitoring for several weeks and I have established that on this particular drive, the lower four bits of the RAW value are not part of the count. Palimpsest tells me this drive has 893 bad sectors, but I calculate that it only has 55. Seagate will only replace the drive when it has around 160 bad sectors.

Martin Pitt (pitti) wrote :

Graham Inggs [2010-06-09 8:50 -0000]:
> The problem is that 29000 is the RAW value of the re-allocated sector
> count, not the actual count of bad sectors.

Right. But the notification about "Your disk is about to die" now
checks the normalized value/threshold, which is under control by the
driver manufacturer. Do you still get those notifications with the
current lucid or karmic-proposed packages?

Graham Inggs (ginggs) wrote :

> Do you still get those notifications with the current lucid or karmic-proposed packages?

I no longer get the notifications, but the SMART data palimpsest still warns that I have "893 bad sectors", which is incorrect.

Martin Pitt (pitti) wrote :

Graham Inggs [2010-06-11 9:11 -0000]:
> I no longer get the notifications, but the SMART data palimpsest still
> warns that I have "893 bad sectors", which is incorrect.

Right, the updated package wasn't supposed to actually reinterpret the
raw values. Thanks for testing!

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libatasmart - 0.16-1ubuntu0.1

---------------
libatasmart (0.16-1ubuntu0.1) karmic-proposed; urgency=low

  * debian/rules: Enable simple-patchsys.
  * Add 01_use_manufacturer_bad_blocks.patch: Drop our own "many bad sectors"
    heuristic. This currently causes a lot of false positives, because in many
    cases our treshold is either overly pessimistically low, or the raw value
    is implausibly high. Just use the normalized values vs. treshold for now.
    (LP: #438136)
 -- Martin Pitt <email address hidden> Fri, 23 Apr 2010 15:05:48 +0200

Changed in libatasmart (Ubuntu Karmic):
status: Fix Committed → Fix Released

(In reply to comment #1)

> The reason I picked log2() here is simply that we do want to allow more bad
> sectors on bigger drives than on small ones. But a linearly related threshold
> seemed to increase too quickly, so the next choice was logarithmic.
>
> Do you have any empiric example where the current thresholds do not work as
> they should?

According to http://www.seagate.com/ww/v/index.jsp?locale=en-US&name=SeaTools_Error_Codes_-_Seagate_Technology&vgnextoid=d173781e73d5d010VgnVCM100000dd04090aRCRD (which I first read about 18 months ago, when 1.5TB drives were brand new), "Current disk drives contain *thousands* [my emphasis] of spare sectors which are automatically reallocated if the drive senses difficulty reading or writing". Therefore, it is my belief that your heuristic is off by somewhere between one and two orders of magnitude as your heuristic only allows for 30 bad sectors on a 1TB drive (Seagate's article would imply it has at least 2000 spare sectors - and maybe more - of which 30 are only 1.5%).

As you say, though, this is highly manufacturer- and model-dependent; Seagate's drives might be designed with very many more spare sectors than other manufacturers' drives. The only sure-fire way to interpret the SMART attributes is to compare the cooked value with the vendor-set threshold for that attribute.

If you are insistent upon doing something with the raw reallocated sector count attribute, I believe it would be far more useful to alert when it changes, or changes by a large number of sectors in a short period of time.

Robert (robertkanabis) on 2010-09-07
Changed in oem-priority:
status: Fix Released → Incomplete
assignee: nobody → Robert (robertkanabis)
status: Incomplete → Confirmed
Changed in oem-priority:
status: Confirmed → Fix Released
Changed in libatasmart:
importance: Unknown → Medium
1 comments hidden view all 177 comments
Sam_ (and-sam) wrote :

Affects new hardware and Maverick installation.
Thanks to comment #130 I did also advanced check with Hitachi DFT, result:
Operation completed successfully
Disposition Code: 0x00

1 comments hidden view all 177 comments
Sam_ (and-sam) wrote :

After successful scan with Hitachi DFT palimpsest now shows 15 moved sectors instead of 1 before.

Sam_ (and-sam) wrote :

Did another scan with CD from vendor, it also shows SMART status ok. Palimpsest says at the moment 25 reallocated sectors.

Changed in libatasmart:
importance: Medium → Unknown
Changed in libatasmart:
importance: Unknown → Medium
1 comments hidden view all 177 comments

So, I wanna give this one more try. I kept the log2() in there, but multiplied it now with 1024 which should be a safe margin.

If this brings bad results we can drop this entirely. In that case, please reopen.

Changed in libatasmart:
status: Confirmed → Fix Released
Sam_ (and-sam) wrote :

#165
> Right, the updated package wasn't supposed to actually reinterpret the
raw values.

Is it supposed to reinterpret on fresh installations?
After fresh Oneiric and Precise installations during the year palimpsest still counted up allocated sectors, since #172 increase to 53. Tresholds in UI didn't change.

1 comments hidden view all 177 comments

Just want to reiterate what a bad idea it is to:

a) make your own seat of the pants algorithm to determine how many bad sectors is "too many" based on no significant data.

b) do so when you can't even read the raw number correctly (due to varying format of raw values).

My wife's 120G laptop drive has 10 bad sectors, but palimpsest still reads this as 655424. (The 0x0a is the low order byte in intel byte order see https://bugzilla.redhat.com/show_bug.cgi?id=498115#c61 for details, still fails in Fedora 16, gnome-disk-utility-3.0.2.) The 1024 factor *still* sees the disk as failing - it does not address the underlying problem of not having a reliable raw value, and not knowing the design parameters or even the type of technology.

Please, please, just use the vendor numbers. The only thing you could add would be to keep a history, and warn of *changes* in the value (but don't say "OH MY GOD YOUR DISK IS ABOUT TO DIE!" unless the scaled value passes the vendor threshold).

Changed in libatasmart:
status: Fix Released → Confirmed
Sam_ (and-sam) on 2012-02-15
tags: added: oneiric precise

Running dual-boot Windows 7 / Ubuntu 11.10 Oneiric on Dell M90. Windows CHKDSK reports no problems with my hard drive. Ubuntu S.M.A.R.T. reports a staggering 7 million+ bad sectors with green light status: "Disk has a few bad sectors". My system runs just fine, which is why I'm adding my 2 cents.

I am seeing similar issues with my SSD, lots of errors, but system seems to
run fine.
On my previous drive however, it started to run slowly, due to recovering
errors, and finally reported an error, so something funny is going on.
Regards
Wilbur Harvey

[image: Inline image 1]

On Fri, Mar 2, 2012 at 9:06 AM, John Wilson <email address hidden>wrote:

> Running dual-boot Windows 7 / Ubuntu 11.10 Oneiric on Dell M90. Windows
> CHKDSK reports no problems with my hard drive. Ubuntu S.M.A.R.T. reports
> a staggering 7 million+ bad sectors with green light status: "Disk has a
> few bad sectors". My system runs just fine, which is why I'm adding my 2
> cents.
>
> ** Attachment added: "Screenshot at 2012-03-02 17:51:33.png"
>
> https://bugs.launchpad.net/ubuntu/+source/libatasmart/+bug/438136/+attachment/2801866/+files/Screenshot%20at%202012-03-02%2017%3A51%3A33.png
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (413673).
> https://bugs.launchpad.net/bugs/438136
>
> Title:
> palimpsest bad sectors false positive
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/libatasmart/+bug/438136/+subscriptions
>

Displaying first 40 and last 40 comments. View all 177 comments or add a comment.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.