Ubuntu

gnome-disk-utility nags me too much that my disk is failing

Reported by Alan Pope ㋛ on 2009-08-11
252
This bug affects 47 people
Affects Status Importance Assigned to Milestone
gnome-desktop
Invalid
Medium
gnome-disk-utility (Fedora)
Invalid
Unknown
gnome-disk-utility (Ubuntu)
Medium
Unassigned
Nominated for Lucid by Papamatti

Bug Description

Binary package hint: gnome-disk-utility

Every time I boot gdu nags me that my disk is about to die. It has 5 reallocated sectors, and indeed I get a green light for "reallocated sector count" on the detail screen, but on the summary screen the status is yellow and tells me I have bad sectors. Surely it shouldn't nag me until the threshold (in this case 36) is reached?

See screenshot.

ProblemType: Bug
Architecture: amd64
Date: Tue Aug 11 21:18:29 2009
DistroRelease: Ubuntu 9.10
NonfreeKernelModules: nvidia
Package: gnome-disk-utility 0.4-0ubuntu1
ProcEnviron:
 PATH=(custom, user)
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-5.24-generic
SourcePackage: gnome-disk-utility
Uname: Linux 2.6.31-5-generic x86_64

Alan Pope ㋛ (popey) wrote :
CravingPine (cravingpine) wrote :

1 sector here

Changed in gnome-disk-utility (Ubuntu):
status: New → Confirmed
cariboo907 (cariboo907) wrote :

I have the same problem I've attached the results of running both smartctl and gsmartcontrol

cariboo907 (cariboo907) wrote :
Patrice Vetsel (vetsel-patrice) wrote :

Indeed. I confirm and set importance to medium.

My disk have bad sectors. Ok. But we need to provide a way to tell palimpsest applet to stop reporting the same error again and again, every time I boot/login ! It's very annoying.

Changed in gnome-disk-utility (Ubuntu):
importance: Undecided → Medium
Conn O Griofa (psyke83) wrote :

Patrice - while it's good to have the ability to dismiss chronic notification warnings (in the same way you can dismiss defective batteries warnings on laptops, for example), this is not the intention of the bug report.

Alan's problem is that the notification that's displayed indicates drive failure, but the SMART status is saying that the drive is healthy - therefore the notification is being trigger erroneously, or the SMART status is incorrect.

Just a guess, but perhaps the notification is displayed due to the drive having a reallocated sector count >0 - but this doesn't merit a warning (as the drive's firmware remapped the bad sector before any data loss occurred).

Noel J. Bergman (noeljb) wrote :

According to the author (see the upstream bug report), this is not a bug: "Palimpsest [is] more picky than smartctl is - e.g. just a single bad sector will trigger this warning. [Your] disk has bad sectors and that's exactly why we report it as failing. It's supposed to work this way. If this was my disk, I'd back up the data and move it to a more healthy disk."

Changed in gnome-disk-utility (Fedora):
status: Unknown → Invalid
Alan Pope ㋛ (popey) wrote :

"If it were my disk" is an excuse any developer could use for not implementing any feature they wish. Fact is it's _not_ Davids disk, it's mine and as such I'd like control over when and whether I get nagged about it.

In this situation it's software mirrored so if the disk fails later I frankly don't care because I have comprehensive backups and a raid mirror of this disk.

I would like to be able to dismiss this error unless it escalates - i.e. tell me when the number is higher than the last time you checked - meaning, the disk is getting _worse_. Mine's been sat at 5 bad blocks for a while, and it works fine.

Vish (vish) wrote :

My disk has just 3 errors.. while my threshold is 50 !
If the utility is not smart enough to know when it is really an imminent failure, why does it label everything "Pre-fail" !

Conn O Griofa (psyke83) wrote :

"Pre-fail" is the type of attribute in SMART, not the status of your particular drive. When a pre-fail attribute rises unusually or reaches a threshold, it's a sign that the drive will die in a very short time period (as small as 24 hours).

Contrast this to "old-age" attributes which can reach the threshold, but the drive may continue to function for an indeterminate time.

Sebastien Bacher (seb128) wrote :
Changed in gnome-disk-utility (Ubuntu):
status: Confirmed → Triaged
Alan Pope ㋛ (popey) wrote :

Screenshot requested on irc by Chris Coulson.

Jean-Louis Dupond (dupondje) wrote :

Here it does even weirder! Not a single bad sector, but it keeps warning me about 'bad sectors' ...

Conn O Griofa (psyke83) wrote :

Jean-Louis,

Your disk has a bad sector that is waiting to be re-allocated. See the "Current Pending Sector Count". Should you be warned? Probably not. But you do have a bad sector.

elxiliath (elxiliath) wrote :

My Ubuntu 9.10 install says that I have bad sectors, and that the S.M.A.R.T. system says my drive will fail soon. I used my HDD manufacturers tools to determine that there are no problems with my drive.

gpk (gpk-kochanski) wrote :

May I point out that if Palimpsest raises false positives, it will cost people real money?
About $50 per false warning!

It will cause some people to trash their computers. It will cause many people to spend hours trying to diagnose their disks. Many people will lose irreplaceable data in the course of responding to these warnings.

This is an extremely serious bug, if bug it be.

Me? I chucked out an old disk drive and spent about 5 hours reinstalling Ubuntu. On the *second*, newer computer, I got a bit suspicious. I found a paper by Pinheiro, Weber and Barroso
at Google: http://research.google.com/archive/disk_failures.pdf

The found that "Despite [correlations between failure rates and SMART data] we conclude that models based on SMART parameters alone are unlikely to be useful for predicting individual drive failures."

Specifically, from discussions on Redhat's list, it seems that Palimpsest puts up a warning when one sectors is reallocated, but Google says "85% of drives survive more than 8 months after their first reallocation." And further, they state "Out of all failed drives, over 56% have no signal on any of the four strong SMART signals... Actual useful models, which need to have small false-positive error rates are in fact likely to do much worse than these limits might suggest."

That says it all. Palimpsest is a misbegotten piece of software that means well, but will ultimately hurt people more than it helps. Sorry, developers. Ya gotta do more than write good code: you need to pass a real statistical cost-benefit analysis, and the inital results suggest that it's a bad idea.

gpk (gpk-kochanski) wrote :

Actually, there's another paper, G. F. Hughes, J. F. Murray, K. Kreutz-Delgardo and C. Elkan, IEEE Transactions on Reliability, September 2002, "Improved Disk Drive Failure Warnings".
http://dsp.ucsd.edu/~jfmurray/publications/Hughes2002.pdf

It looks at an improved version of the current SMART threshold scheme, and shows that even the improved scheme yields unimpressive results.

They were able to predict about 30% (i.e. 40% for one kind of drive and 10% for another) of failures, in exchange for a number of false alarms that is about 20% of the real failure rate. That's not a spectacular result. It means that about 1/3 of people are saved from a failure, but 1/5 of people unnecessarily throw out a disk.

Perhaps that's a good trade-off, but it's the result of a careful statistical analysis. If this software does a half-baked job of it, it could easily do far worse, enough worse to be actually harmful.

Xavier Guillot (valeryan-24) wrote :

Hello,

I have the same observation : Palimpsest warns me about one drive which could be in pre-failure, but G-smart and smartmontools do not have same conclusion (although it is true there are bad sectors).

I hope that level of warning could be more precisely defined.

I don't think that this is helpful I don't understand what it's trying to say, so it's basically useless to me, but that's not my problem, my problem is that I don't know how can I easily disable it.

This tool reported every drive to be defective, even the one that was only run for 30 days.

The strange thing is that I have yet to experience any data loss or other problems, so I believe that this tool is extremely inaccurate.

If this tool is so inaccurate, how come it's enabled by default anyway? If this only works correctly for some drives why does it checks drives that it can't scan correctly.

I recommend disabling this tool by default and putting an option for enabling it it the system settings, otherwise it will really annoy users and wast their CPU cycles of course.

Vish (vish) wrote :

DjDarkman ,
The notification can be disabled from System> Preferences> Startup Applications> Disk Notifications.

If you want to check the disk status you can check later from System> Administration> Disk Utility

But that is not a solution , since this would probably prevent the notification when the disk is really failing.
So, if you know your disk is not too damaged , you could just disable the startup item until this is fixed properly.

Now this was a good answer. Thank you DjDarkman!

On Tue, Sep 8, 2009 at 6:57 AM, mac_v <email address hidden> wrote:

> DjDarkman ,
> The notification can be disabled from System> Preferences> Startup
> Applications> Disk Notifications.
>
> If you want to check the disk status you can check later from System>
> Administration> Disk Utility
>
> But that is not a solution , since this would probably prevent the
> notification when the disk is really failing.
> So, if you know your disk is not too damaged , you could just disable the
> startup item until this is fixed properly.
>
> --
> gnome-disk-utility nags me too much that my disk is failing
> https://bugs.launchpad.net/bugs/412152
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Dan Halbert (dhalbert) wrote :

A more thorough discussion of this issue is in yet another Fedora bug: https://bugzilla.redhat.com/show_bug.cgi?id=498115.

When I installed karmic, I too was falsely alarmed by palimpsest's warning about a single reallocated bad sector in a disk I've been using for years. I ran the manufacturer's test, which said there was no problem.

I believe palimpsests's author, as quoted in #7, is not as well as informed as he could be about the non-zero number of failures one typically sees in disks these days.

The other answer is

sudo aptitude remove gnome-disk-tools

Jean Roberto Souza wrote:
> Now this was a good answer. Thank you DjDarkman!
>
> On Tue, Sep 8, 2009 at 6:57 AM, mac_v <email address hidden> wrote:
>
>> DjDarkman ,
>> The notification can be disabled from System> Preferences> Startup
>> Applications> Disk Notifications.
...
>>
>> But that is not a solution , since this would probably prevent the
>> notification when the disk is really failing.

This is really a statistical problem. Computer programmers shouldn't
argue. The thing is, it's very much like earthquake prediction.

If you predict an earthquake in San Francisco often enough, eventually
you'll be right. The San Andreas fault is there, and eventually it's
going to let go. The same with everyone's disk drives. However,
it's really not a good idea to warn San Francisco every month when
it might be 50 years until the next big earthquake.

People who predict earthquakes have understood the problem and don't
issue predictions because they know they cannot predict well enough
to be useful. People who predict disk drives should do the same.
If you read the actual research papers on the subject, they come
to that conclusion.

The trouble is that if you are predicting an event that doesn't happen
very often, you can be in a difficult situation where
(a) You know the event is much more likely than normal, and
(b) the event is still very improbable.

So, for instance with disk drives, the average probability of failure
is about 1 in a thousand per month: 0.1% per month. If you
look at the SMART information, you can sometimes tell that this
disk drive is, perhaps, 5 times more likely to fail than normal.
But that's _still_ _small_. It's still 0.5% per month.

So, even after certain indicators of failure happen, there
might still be less than a 1% chance that the disk drive will
fail soon. In that case, the best thing to do is to
remain silent.

Why? Because if someone listens to the warning, and acts on it,
the cost will be relatively large. Whereas the probability
of failure is still relatively small.

If you really have your heart tied to this software, if you really
love it like your first child, and you really don't want it disabled,
then the thing to do is to minimize the damage it causes.
In that case do the following:

Change the warning message from "Your disk is failing"
to "Back up your data: SMART predicts a larger-than-normal
chance of disk failure."

Are you the MIT Halbert from '81?
If so, hello!

Dan Halbert wrote:
> A more thorough discussion of this issue is in yet another Fedora bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=498115.
>
> When I installed karmic, I too was falsely alarmed by palimpsest's
> warning about a single reallocated bad sector in a disk I've been using
> for years. I ran the manufacturer's test, which said there was no
> problem.
>
> I believe palimpsests's author, as quoted in #7, is not as well as
> informed as he could be about the non-zero number of failures one
> typically sees in disks these days.
>
> ** Bug watch added: Red Hat Bugzilla #498115
> https://bugzilla.redhat.com/show_bug.cgi?id=498115
>

Noel J. Bergman (noeljb) wrote :

There is a patch to fix the bogus reports, and that we should already have that in Karmic.

The patch is http://git.0pointer.de/?p=libatasmart.git;a=commitdiff;h=b7f3834654cb05f5a8aae6b2381d548f49d72987.

I have verified that the fix IS in the current Karmic package. So is anyone STILL seeing this problem?

Alan Pope ㋛ (popey) wrote :

I'm using latest karmic updated as of a few minutes ago and the problem still occurs.

Indeed its got worse, as now it nags me about a disk that doesn't even have _any_ reallocated sectors. So I am now being nagged about two disks (of the four in the machine).

See screenshot.

Noel J. Bergman (noeljb) wrote :

Alan,

Can you see what in the list is claimed as failing? If not, that's a problem in and of itself -- I hate "idiot light" reports. Tell the user WHAT is wrong, not just that something is wrong.

As you note, It should not be the sectors, at least not from libatasmart4, given the presence of:

+ /* We use log2(n_sectors) as a threshold here. We had to pick
+ * something, and this makes a bit of sense, or doesn't it? */
+ sector_threshold = u64log2(d->size/512);
+
+ if (sectors >= sector_threshold) {
+ *overall = SK_SMART_OVERALL_BAD_SECTOR_MANY;
+ return 0;
+ }
+ }

and the fact that you don't have any marked as failing. Lennart Poettering seems to be entirely pragmatic and willing to adjust, so let's just see if we can find out exactly what is being complained about, and tweak that too, as necessary.

I don't get errors on either of my 500GB drives, but I'll check my older SATA drives to see if they have any. That would make it easier for me to track down.

Alan Pope ㋛ (popey) wrote :

One has 5 reallocated sectors the other has one pending reallocation.

Screenshots attached.

Alan Pope ㋛ (popey) wrote :
Noel J. Bergman (noeljb) wrote :

Alan,

I have a drive here that generates the warning, but it does have some reallocated sectors, although far below the threshold. Same as the one of yours with 5 reallocated sectors.

As I said to mac_v earlier today, I'll pull the code later this afternoon or evening and start digging at this thing.

Chris Coulson (chrisccoulson) wrote :

This was fixed by the upload today - you will only get notified now if your disk is _really_ failing, and you can opt out of the notifications on a per-disk basis too:

gnome-disk-utility (2.28.0-0ubuntu1) karmic; urgency=low

  * New upstream release:
    - Bug 592006 - Untranslatable string construction in gdu-drive.c
    - Merge branch 'master' into ata-smart-ui-rework
    - Land new ATA SMART user interface
    - Fix build
    - Fix POTFILES.in
    - Bug 593381 - fix error caused by zh_CN translation
    - Depend on the latest DeviceKit-disks API
  * Drop 02-fix-plural-declaration-in-zh_CN-help-translation.patch, applied
    upstream.
  * debian/control:
    - Bump devicekit-disks dependencies to >= 007.
    - Add libatasmart-dev build dependency.
  * debian/libgdu0.symbols, debian/libgdu-gtk0.symbols: Update for new
    version.

 -- Martin Pitt < <email address hidden>> Sat, 19 Sep 2009 16:59:56 +0200

Changed in gnome-disk-utility (Ubuntu):
status: Triaged → Fix Released
James Tait (jamestait) wrote :

Hurrah! I no longer get nagged at boot time, and selecting Disk Utility from the System menu the reallocated sector count now matches what the command line tools say. For me at least, this does indeed appear to be fixed.

mibo (info-mibotech) wrote :

After my eeepc comes back from standby, the tool says "drive is operating out of range" or anytung like this, my system ist in German ;-)
The eee is equipped with an 4GB SSD. On a second eeepc is the same error.
It occurs only when the system wakeup from standby

CPO_Mendez (cpo-mendez) on 2009-12-07
Changed in gnome-disk-utility (Ubuntu):
assignee: nobody → CPO_Mendez (cpo-mendez)
CPO_Mendez (cpo-mendez) on 2009-12-07
Changed in gnome-disk-utility (Ubuntu):
assignee: CPO_Mendez (cpo-mendez) → nobody
assignee: nobody → Alan Pope (popey)
Alan Pope ㋛ (popey) on 2009-12-08
Changed in gnome-disk-utility (Ubuntu):
assignee: Alan Pope (popey) → nobody
map-j (electrodread) on 2010-02-13
Changed in gnome-disk-utility (Ubuntu):
status: Fix Released → Fix Committed
status: Fix Committed → Fix Released
Changed in gnome-disk-utility (Ubuntu):
status: Fix Released → Fix Committed
status: Fix Committed → Fix Released
Papamatti (matti-lx) wrote :

@mibo: I have the same issue with my eeePC 701 4G on karmic 9.10 unr and the beta2 of lucid 10.04 and confirm that after a suspend and resume the gnome-disk-utility reports that the solid state disk will fail soon and should replaced. The icon in the panel shows "DISK IS BEING USED OUTSIDE DESIGN PARAMETERS"

I can reproduce it:
1. Turn on the eeePC and booting up unr 10.04 beta2
2. run Disk Utility and click "SMART Data" and after this click "Refresh" - all is ok - no errors reported.
3. close the Disk Utility
4. suspend the pc - power led is blinking
5. after a moment resume the eeepc (powerbutton) - after 2 or 3 second the eeepc is up and running
6. run Disk Utility and click "SMART Data" and after this click "Refresh" - error comes up and reporting disk will fail soon (screenshot)
7. after rebooting the system all is fine again! No errors

If Disk Utility reports the error there are red marked attributes (32, 50, 73 and 84), but they are not in the list before the error occures! Mybe this is the issue? There also additional attributes (16 and 52) which are marked as good. In all these attributes is "No description for attribute xy".

I think there is no error on the solid state disk, but the system reports some "attributes" which are not defined in the Disk Utility.

Hope it may help you...

Changed in gnome-desktop:
importance: Unknown → Medium
status: Unknown → New

The current release no longer tells me that my disk is failing.

m1fcj (hakan-koseoglu) wrote :

Natty still reports the Eee 701 4GB SSD failing after a suspend & restore. The disk is healthy, the error is shown for attributes 32 & 56.

Hi, m1fcj. This bug report has been reported as fixed. If you are having problems you would do well to report a new bug (both in Launchpad and upstream at bugzilla.redhat.com. Thanks!

Changed in gnome-desktop:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.