updatedb should be weekly, not daily

Bug #271272 reported by Kevin Hunter
26
This bug affects 6 people
Affects Status Importance Assigned to Milestone
mlocate (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

This is an enhancement or suggestion to save a little bit of wear and tear on hard disks.

Basically, I think that a daily update of the locate database is too much. Most folks don't add *that* many files on a daily basis. So, why not reduce wear-and-tear on the HDDs by making the update weekly?

If they do need the locate db updated *right now*, there could be a simple GUI invocation, as through the System->Administration menu.

Pros:

- Disk usage is reduced a significant amount for the vast majority of end-users (those who do simple web, email, business writing, music listening, etc.)
- Ubuntu is that much more "green"

Cons:

- Locate DB is a little more latent in being up-to-date.

Modulo the GUI part, implementing this is as simple as moving /etc/cron.daily/mlocate to /etc/cron.weekly/ .

Revision history for this message
Mackenzie Morgan (maco.m) wrote :

Marking this confirmed because weekly would be easier on hard drives, and it is currently daily.

Changed in mlocate:
status: New → Confirmed
Revision history for this message
Kevin Hunter (hunteke) wrote :

A further optimization might be to separate updates into dev sections. Clearly, not everyone separates their data from their system, but for those who do, being able to scan just the /home dir would also reduce the wear-and-tear further.

Then, though it makes the mlocate script a little more complicated, you only need to do system drive index updating if apt has (un)installed anything.

Revision history for this message
Mackenzie Morgan (maco.m) wrote :

An apt-file tie-in could make it not have to even do index updating for apt. If there's some way to get data from apt-file to the index, that is.

Revision history for this message
Kevin Hunter (hunteke) wrote :

Point. On the other hand, I'm suggesting a small incremental improvement.

In order of difficulty:

1. Move mlocate to /etc/cron.weekly/. (Very easy.)
2. Only update system dirs (i.e. /) if dpkg has been run. (Easy.)
3. Split updates of / and /home. (Hard.)
4. Update system dirs when dpkg is run.
  4a. Parse /var/log/dpkg.log and see what's been done since previous update invocation. (Hard)
  4b. Asynchronously tie into dpkg invocation so as to keep database up-to-date with no latency. (Very hard.)

For something as minor as the locate database, I'm not suggesting we put in a lot of work; it'll be diminishing returns very shortly. However, I believe that 1, 2, and 3 are doable.

We can easily integrate 1.
For 2, I can give you the logic of *when* to update /, but actually doing it is reliant on 3. See below.
For 3, I can update /home, or / and /home, but not just /. I'm unable to update *just* the system dirs. Hopefully someone else will be able to figure out what I'm not seeing.

For the here-and-now, I note the existence of /var/log/dpkg.*. An enterprising someone could use the date of those files, or parsing the final line of dpkg.log as a measure of when to update. That would be a very simple addition to the mlocate script. For example, here's an example script to only update the system dirs if necessary:

#!/bin/sh

DPKG_FILE='/var/log/dpkg.log'
UPD_FILE='/var/log/last_updatedb'

if [ ! -f "$UPD_FILE" ]; then
 touch -t 197001010000.00 "$UPD_FILE" # Didn't exist, initialize to "zero"
 echo "Locate database not yet created."
fi

LAST_APT_USAGE=`date -r "$DPKG_FILE" +"%s"` # output in epoch seconds
LAST_UPDATE=`date -r "$UPD_FILE" +"%s"` # output in epoch seconds

if [ $LAST_UPDATE != $LAST_APT_USAGE ]; then
 # The timestamps of files suggest the database is not in sync. So ...

 # 1. Update the database
  # The part I don't know how to do is to only update entries relating
  # to a single directory or filesystem, so I leave that part to someone else.

 # 2. Locate database now in sync. Let next invocation know when:
 touch --reference "$DPKG_FILE" "$UPD_FILE"
fi

Revision history for this message
Mackenzie Morgan (maco.m) wrote :

Tracker's in charge of home directories anyway, right? So no real need for having both locate *and* tracker index them. Ignore for the moment that I disable tracker due to knowing where my files are.

Revision history for this message
Kevin Hunter (hunteke) wrote :

Heh. Right there with you. Tracker takes too many resources for me, even on this dual core 2.2 Ghz with 4GB ram, so I too have disabled it.

But, for the vast majority of folks who won't disable it -- or when tracker graduates to less of a resource hog and a more mature product overall -- that's a good point.

In that case, the mojo to get 1. in the code above:

updatedb -U / -e /home

# -e adds following paths to the PRUNEPATHS. In other words, it does not scan them (and removes them from the database (if they were there previously)

Revision history for this message
Martin von Wittich (martin.von.wittich) wrote :

I think this should remain a daily cron job. If you can't trust locate to be at least halfway up-to-date you could just use find instead, or you'd be always forced to run updatedb before using locate. This would mitigate locate's benefits.

OTOH I don't believe that reducing reads in any way would increase a hard disk's life expectancy nor significantly decrease it's power consumption. Hard disks age, and after a few years (about the time there's warranty on the disk) the failure rate will drastically increase - that's just the way it is. And many things that you may believe which should prolong it's life are actually statistically insignificant. There's a Google white paper on this that explains it.

Revision history for this message
Kevin Hunter (hunteke) wrote :

One issue for me is the fact that I use my laptop's battery a LOT. The cron job for this isn't currently smart enough to realize that I'm on a battery and not run that job. Reducing to once a week would help my particular situation, as it takes about 15-20 minutes of continuous spinning to update the database. When compared to a normal 15-20 minutes when I'm able to run entirely from RAM without touching my drive, a solid 15-20 minutes of HDD spinning currently equates to an extra ~10% battery charge gone. That's a significant chunk of power and work time lost.

Revision history for this message
Colin Watson (cjwatson) wrote : Re: [Bug 271272] Re: updatedb should be weekly, not daily

On Thu, Sep 10, 2009 at 04:19:15AM -0000, Kevin Hunter wrote:
> One issue for me is the fact that I use my laptop's battery a LOT. The
> cron job for this isn't currently smart enough to realize that I'm on a
> battery and not run that job.

I thought we fixed that a while back - can you attach your
/etc/cron.daily/mlocate? Mine has this fragment:

if which on_ac_power >/dev/null 2>&1; then
    ON_BATTERY=0
    on_ac_power >/dev/null 2>&1 || ON_BATTERY=$?
    if [ "$ON_BATTERY" -eq 1 ]; then
        echo >&2 "System on battery power, not running updatedb."
        exit 1
    fi
fi

Revision history for this message
Kevin Hunter (hunteke) wrote :

Huh. Sho' nuff. I have a similar fragment:

if which on_ac_power >/dev/null 2>&1; then
    AC_POWER=0
    on_ac_power >/dev/null 2>&1 || AC_POWER=$?
    if [ "$AC_POWER" -eq 1 ]; then
        echo >&2 "On battery power, not running today."
        exit 1
    fi
fi

At first glance and trial, the logic appears to work too. I wonder what's awry on my machine? I'll have to suss that out later. Good deal.

Revision history for this message
Cruncher (ubuntu-wkresse) wrote :

The optimal solution of course would be an incremental database update - only change stuff that actually *does* change, and don't rescan all filesystems of which 99% are unchanged between updates. But what's missing there is a decent idea on how to make sure all filesystem changes are recognized by some updatedb daemon.
There was some locate project a while ago which tried something like that:
http://rlocate.sourceforge.net/
The related Ubuntu and Debian wishlist entries, mostly unnoticed:
https://bugs.launchpad.net/ubuntu/+bug/123752
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=448104

Revision history for this message
trans (transfire) wrote :

Why hasn't this been done? This is the one issue that has me wanting to throw Ubuntu out the window. Every morning at 8am it eats my hard drive for breakfast for more than 10 minutes straight. And for what? I've tried using ubnutu's search feature and it never finds any thing. So what the f is it even doing?

Revision history for this message
Mackenzie Morgan (maco.m) wrote :

Are you referring to a graphical search feature (which should have its own
indexing process) or the locate command?

You can change the configuration on your own system until a decision is made
on this issue.

Maco
On Oct 9, 2011 8:01 AM, "trans" <email address hidden> wrote:

> Why hasn't this been done? This is the one issue that has me wanting to
> throw Ubuntu out the window. Every morning at 8am it eats my hard drive
> for breakfast for more than 10 minutes straight. And for what? I've
> tried using ubnutu's search feature and it never finds any thing. So
> what the f is it even doing?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/271272
>
> Title:
> updatedb should be weekly, not daily
>
> Status in “mlocate” package in Ubuntu:
> Confirmed
>
> Bug description:
> This is an enhancement or suggestion to save a little bit of wear and
> tear on hard disks.
>
> Basically, I think that a daily update of the locate database is too
> much. Most folks don't add *that* many files on a daily basis. So,
> why not reduce wear-and-tear on the HDDs by making the update weekly?
>
> If they do need the locate db updated *right now*, there could be a
> simple GUI invocation, as through the System->Administration menu.
>
> Pros:
>
> - Disk usage is reduced a significant amount for the vast majority of
> end-users (those who do simple web, email, business writing, music
> listening, etc.)
> - Ubuntu is that much more "green"
>
> Cons:
>
> - Locate DB is a little more latent in being up-to-date.
>
> Modulo the GUI part, implementing this is as simple as moving
> /etc/cron.daily/mlocate to /etc/cron.weekly/ .
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/mlocate/+bug/271272/+subscriptions
>

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

locate/updatedb/findutils is *destroying* my laptop on a daily basis. Recent Ubuntu releases (in particular after 12.04) seem to have gotten particularly suceptible to UI hangs under heavy IO load. This makes Unity (and all other desktop systems I've tried) unusable for ~20 minutes each day.

Revision history for this message
halfbeing (halfbeing) wrote :

I agree with Mikkel. I use 13.10 and I find every day that my machine becomes completely unusable for around 15 minutes or so because of updatedb. This is not behaviour I used to get when I was using Windows, and I have never heard a Mac user complain about anything like this.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.