Comment 250 for bug 59695

Revision history for this message
Brian Ealdwine (eode) wrote : Re: [Bug 59695] Re: High frequency of load/unload cycles on some hard disks may shorten lifetime

-- Please pardon if this is a duplicate, I accidentally sent from the wrong account and I don't know how that's handled by launchpad.

> It is not the place for the operating system to save the user from
> themselves.

Whose opinion is that? I would argue that it is, indeed the operating
system's place to save the user from themselves. As a matter of fact,
if all these poor sots using email these days had to write their own
program to do it, they would likely fudge the data on the hardware in a
matter of moments. ..but the operating system makes sure that, at a
basic level, everything is operational, and things which can destroy the
system don't happen. You are, day in and day out, saved from your own
mistakes by others who have made them and fixed them. But in this case,
it's not even the user's mistake! If I install Ubuntu (default
install), turn my system on, and leave it -- doing nothing else -- the
system will break in less than two years. That is not normal. It is
not good. It is not user error. Even if it was user error, which it's
not, it can and should nevertheless be left to technically-oriented
users to mess up, but should work by default. A user cannot drag /usr
to the trashcan, for good reason.

> You are correct in that the user could write a program that
> was detrimental to their hardware, but that is their choice. Similary
> the user can write a program that writes constantly to one area of the
> disk - this will wear the disk out much faster than the expected life
> time of the disk, but there is nothing and should be nothing that any OS
> can do to stop that.

Perhaps. Definitely not prevent-altogether. However, an OS by default
should not have such behavior. In my opinion, an OS by default should
not even allow such behavior, but that is merely an opinion. But it is
not an opinion that an OS will prevent itself from performing its own
basic purpose if, by action or negligence, it fails to protect the
hardware from its own behavior.

> We can't stop them from hitting their laptop with
> a hammer, either. Incidentally, it would be more likely to survive this
> if the hard drive heads were parked, and disabling APM will disable
> that.

Yes, it would be more likely to survive being hit with a hammer if the
heads were parked. However, the heads only remain parked for a matter
of a few seconds, therefore, the protection is negligible anyway (in
this particular instance) due to the default activity of the OS.

> Furthermore, it has been shown that disabling APM can cause some drives
> to over-heat, so they will be definitely damaged if you do that, and by
> putting extra load on the battery you will be reducing its operational
> lifespan, too.

The power savings from parking the heads, in this case, are also
negligible -- again, because as soon as the heads are parked, they will
become unparked. If a disk is overheating through sustained operation,
that is *another* (and more drastic) design flaw, and one that is much
less common. In that case, the user will run into major problems anyway
-- for example, when watching a movie off of their hard disk, or
performing any activity which accesses the disk regularly. Also, I
don't think that the overheating issue has been sufficiently proven, but
it should be looked into. The basic statement is that hard disks will
overheat if they don't sleep, yes? This can be checked with smartctl's
value/worst/thresh settings, perhaps it would be a good idea for people
who *are* running with systems that have APM disabled to post their
value/worst/thresh for temperature? While I prefer to do some research
before moving into the unknown, I'll take a probably-safe unknown over a
definitely-unsafe known.

> Rolling out the workaround on every system, including those not
> currently affected, is a mistake. You will make the experience worse
> for some people (e.g. me. I have fixed all my idle-writers manually - my
> disk sleeps like a baby now), and you will make it possible for people
> to get lazy and ignore the problem, so it will never be fixed properly.

Congratulations for fixing all of your idle-writers manually. It still
stands that they system by default installs many idle-writers -- and
either those should be fixed by default, or the system should account
for its own default behavior in a way that prevents damage to the
hardware.

> A better short-term workaround would be to monitor the disks, and bring
> up a pop-up bubble offering to disable APM if the LCC is increasing too
> fast. I believe someone already suggested this.

Unless you're volunteering to write and maintain software that addresses
the issue, I don't see that as a good short-term solution. I see the
temporary solution of disabling APM as imperfect, but the best fit for
now.

I think a more ideal situation would be a smartctl daemon that checks
for problematic usages, and adjusts settings accordingly, with either
longevity or power saving in mind, depending on whether the system is on
AC or on battery, as well as providing a user warning on drive-fail
situations. But I don't personally have the time to do that, at least,
not right now, and not likely in the next few months -- so, ideal or no,
it's understandable to me that it doesn't exist.

Next in my order of preferences is that the temporary fix that has been
committed in Debian be allowed to go through.

Next is that a utility is created to warn the user and implement the
change if the problem is discovered on the system, and the user so
chooses to have the change implemented.

Regardless, something must be done so that those (nearly everyone with a
laptop) who have the problem are provided a solution. Given a choice
between marginally higher battery usage and significantly shorter disk
life, I'll choose the marginally higher battery usage any day, and I
think I speak for a majority of people in that.