Improve library rescan speed

Bug #966963 reported by Ben Clark
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mixxx
Fix Released
Medium
Ben Clark
1.10
Fix Released
Medium
Ben Clark

Bug Description

Library rescan seems very slow compared to other applications that perform a similar function, even when there have been no modifications at all.

I have attached a small fix that (on my machine) improves rescan times by about 35-40% by surrounding the calls:

m_libraryHashDao.updateDirectoryStatus(dirPath, false, true);
m_trackDao.markTracksInDirectoryAsVerified(dirPath);

in a transaction. These calls are made when nothing has changed in a directory and mark the directory as verified and mark the tracks inside that directory as verified.

This was originally found in Bug #905669 but this strayed quite far from the suggestion in that bug so I'm opening a separate issue to track this and any other potential improvements.

Revision history for this message
Ben Clark (bencoder) wrote :
Revision history for this message
Daniel Schürmann (daschuer) wrote :

Hi Ben,

I have tried you patch on the top of lp:mixxx with:
Ubuntu lucid, 2GB Ram, Intel Atom N270@1.6GHz.
My 3000 Tracks are partly on an 16 GB SD Card in 761 Subfolders.

There was no significant speed improvement on this system.
I have tested a complete new scan after deleting the mixxxdb.sqlite and a rescan with an unchanged library.
The only thing I have noticed is an improvement with the second scan. I think there must be a cache somewhere.

What are your test results?

Revision history for this message
Ben Clark (bencoder) wrote :

Hi Daniel,

thanks for trying out the patch - I have retested and noticed that the improvements weren't as large as I initially thought - probably due to caching as you mentioned, but there was still a fairly significant increase:

I tested on lp:mixxx (optimize=9) with 32bit Ubuntu Lucid, 2GB ram, AMD Athlon X2

4742 tracks on external drive across about 500 subdirectories.

                        Initial Scan Rescan 1 Rescan 2

Test 1 Without patch 220s 94s 96s
Test 2 Without patch 94s 88s 89s

Test 3 With patch 65s 59s 58s
Test 4 With patch 64s 70s* 55s

Test 5 Without patch 60s 78s 77s

* = I may have slowed this scan down inadvertently by performing other actions on the computer.

I have the log file of each of these 5 runs. I am quite surprised by the initial scan speed of test 5 (when I went back to without the patch, to ensure the improvement was not just in caching).

Revision history for this message
Sean M. Pappalardo (pegasus-renegadetech) wrote :

Have you taken into account disk caching performed by the kernel and the controller? You should actually power cycle (or at least restart) your system between each test to be sure it's cleared.

Revision history for this message
Ben Clark (bencoder) wrote :

All my testing has indicated that the slowness in rescanning is due to the writes to the database when verifying directories and tracks, and doing these inside a transaction improves this by reducing the number of writes required.

Taking this further, I can't actually determine any reason why we need to mark the directories and tracks as verified while performing the recursive scan. I'm attaching a patch that speeds my rescan time up to less than 4 seconds(!), by holding a QStringList of verified directories, and only verifying them (and the tracks inside) inside the transaction that occurs after the recursive scan, before we delete anything that is unverified. This makes our rescanning speed competitive with similar applications.

Would appreciate if someone could review this patch.

Ben Clark (bencoder)
Changed in mixxx:
assignee: nobody → Ben Clark (bencoder)
status: New → In Progress
RJ Skerry-Ryan (rryan)
Changed in mixxx:
importance: Undecided → Medium
Revision history for this message
Owen Williams (ywwg) wrote :

Uh, *wow*, that works as advertised. Rescan speed is about 3-4 seconds (used to be a few minutes).

The only thing I notice is that before, even if the scan was canceled we would mark the directories as verified (because that marking was occurring as the scan proceeds). Now, we only mark things as verified if the scan finishes cleanly. Is this an ok change?

RJ Skerry-Ryan (rryan)
Changed in mixxx:
milestone: none → 1.10.1
Revision history for this message
Ben Clark (bencoder) wrote :

Hi Owen,

I can't see that being an issue. I don't think we use the verified flag for anything except determining which files need to be deleted, which we don't do except on a fully completed scan anyway, so I don't think there's a difference between leaving them fully unverified vs. partially verified on an incomplete scan.

Revision history for this message
RJ Skerry-Ryan (rryan) wrote :

I agree I don't think any harm can come from not confirming the files/directories. The logic that actually marks them missing/moved/deleted only runs if the scan wasn't cancelled.

Thanks for the patch Ben! I'll review it and get it in this weekend. It should be in Mixxx 1.10.1 (if we end up doing a 1.10.x point release) and Mixxx 1.11.0.

Is it ok to credit you as "Ben Clark"?

Also, please sign our contributor agreement here: https://docs.google.com/a/mixxx.org/spreadsheet/viewform?formkey=dEpYN2NkVEFnWWQzbkFfM0ZYYUZ5X2c6MQ#gid=0

This gives us permission to distribute your change with MIxxx.

Revision history for this message
RJ Skerry-Ryan (rryan) wrote :

The patch looks good to me -- here are my comments:

1) It's fine to keep the emit() for the hash progress where it was previously. I think even though it's a little bit of a fib it's better to have the progress dialog constantly updating so you don't think it's slow / hung.

2) The two methods updateDirectoryStatus and markTracksInDirectoryAsVerified were both created specifically for this part of the scanner and aren't used anywhere else. It would be even faster if you updated these two methods so that they take the whole list of directories and just issue one big update query instead of doing 1 query for each directory.

Revision history for this message
Daniel Schürmann (daschuer) wrote :

Here are my test results on system from #2

Initial Scan
unpatched patched
142 s 144 s

Rescan
unpatched patched
24 s 8 s

Congratulation!
Initial Scan is slightly longer due to the patch, but its only a one time issue.

Changed in mixxx:
milestone: 1.10.1 → none
RJ Skerry-Ryan (rryan)
Changed in mixxx:
status: In Progress → Fix Committed
Revision history for this message
RJ Skerry-Ryan (rryan) wrote :

Thanks Ben -- committed to lp:mixxx/1.10

RJ Skerry-Ryan (rryan)
Changed in mixxx:
status: Fix Committed → Fix Released
Revision history for this message
Swiftb0y (swiftb0y) wrote :

Mixxx now uses GitHub for bug tracking. This bug has been migrated to:
https://github.com/mixxxdj/mixxx/issues/6334

lock status: Metadata changes locked and limited to project staff
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.