Comment 7 for bug 905669

Revision history for this message
Ilkka Tuohela (hile) wrote : Re: [Bug 905669] Re: Take file mtime's into account when calculating freshness hashes during scanning.

My python tool 'musa-db' does it something like this, and it's quite fast:

db_files = 'SELECT path,mime from songs'
for each directory in library:
       dirfiles = listdir()
       for each file in dirfiles:
              if file not in db_files:
                    append(file)
              else if file.mtime != db_song.mtime:
                    db_reload_tags(file)
        db_dir_files = filter db_files where directory == directory
        for db_file in db_dir_files:
              if db_file not in dirfiles:
                    db_mark_file_removed(db_file)

I keep no hashes, haven't seen any use for those (I might add SHA for each file to DB though).

Here is timings for my normal run against 80000 songs m4a database with this process, when I haven't done it for a while:

> time musa-db --cleanup --update
real 1m21.403s
user 0m32.343s
sys 0m16.223s

And second run after inodes are in fs cache (no changes to files):

> time musa-db --cleanup --update
real 0m45.790s
user 0m29.514s
sys 0m12.966s

It still takes considerable time, but it is looking at about 7000 folders after all, so I'm OK with the numbers.

On 20 Mar 2012, at 14:36, Ben Clark wrote:

> The process is roughly:
>
> scan directories recursively:
> for each filename in directory:
> newHashStr += filename
> if hash(newHashStr) != hash_in_database(dir):
> for each filename in directory:
> if exists_in_db(filename):
> mark as verified
> else:
> add file to tracksToAdd list
>
> add tracksToAdd list to the database
>
>
> simply adding the mtime to the directory hash doesn't cause the files to
> be updated, since if the file exists in the database at all, it's
> considered to be the same and isn't updated. So the patch I created in
> #4 won't do anything in this case. This will likely require a bit of
> refactoring to change. Given that the tracks shouldn't actually update,
> I'm not sure what is causing the slowdown.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/905669
>
> Title:
> Take file mtime's into account when calculating freshness hashes
> during scanning.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/mixxx/+bug/905669/+subscriptions