Initial library import slows to a crawl

Bug #1286944 reported by alex decker on 2014-03-03
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Noise
Fix Released
Medium
Sergey "Shnatsel" Davidoff

Bug Description

Running a current elementaryOS 64bit, recent install, up-to-date.
AMD Turion™ X2 Dual-Core Mobile Processor, 2 GHz
3 GB RAM

I've got a Music folder with about 20,000 mp3s organized in nested Artist > Album folders.

I don't see any errors, aside from the UI becoming unresponsive at times, but I've noticed that the initial import from the Music folder takes an increasing amount of time as the progress goes on. As a result I have yet to fully import all of my music into Noise.

The UI counts 1, then +200 files during the import, so: 1, 201, 401, 601, etc.

As the number of files imported increases, the time between UI updates increases. It takes about 4 hrs on my system to import about 8000 songs. It then takes another 4 hrs to import the next 2000 (total: 10000). At this rate, it'd take, what? A few days to import all 20000?

What I've tried is to force quit, restart Noise, and re-scan, but this doesn't seem to go anywhere at all. I've also tried to delete the library database and re-import, but I end up with the same, slow death result.

When I do restart Noise after force-quitting an import, all of the imported tracks are there and work well, but obviously I'm missing a large amount of my library. Is this maybe a database issue?

Related branches

Daniel Fore (danrabbit) wrote :

OP can you try again with the latest version? There's been some changes to this functionality

Changed in noise:
status: New → Incomplete
Changed in noise:
status: Incomplete → Confirmed
status: Confirmed → In Progress
assignee: nobody → Sergey "Shnatsel" Davidoff (shnatsel)

Thanks for reporting the bug!

Turns out this was an issue with deduplication - it was using a deduplication run atop linked list instead of a dedicated auto-deduplicating data structure such as hash set or tree set.

I've switched the data structure for files to import to TreeSet; it should fix the issue. I'm still testing whether it fixed it because importing 35,000 tracks in the old implementation is indeed slow!

Changed in noise:
importance: Undecided → Medium
Changed in noise:
status: In Progress → Fix Committed

I also did an import under sysprof and now most time is spent in mpegaudioparse2 and wcsxfrm called from it, so there's not much we can do to speed up the import further.

tags: added: performance
Daniel Fore (danrabbit) on 2014-10-05
Changed in noise:
milestone: none → freya-beta2
Daniel Fore (danrabbit) on 2015-02-09
Changed in noise:
status: Fix Committed → Fix Released
Felipe Escoto (philip.scott) wrote :

Did this bug fix really get merged? Last night when i was importing my library it took about an hour to import ~2500 songs.

Joren Verspeurt (musicide) wrote :

In 0.3.1 now when I change my music folder the interface freezes from the moment I click "Open" in the file dialog and while the import does seem to stop at some point (the cpu usage for noise goes down after 10~20 minutes) the interface remains frozen and unusable. When I force kill it and start it up again it's like the import didn't happen...
Also, gdb gives me this as a fatal error: [GLib-GObject] Read-only property 'read-only-view' on class 'GeeReadOnlyBidirSortedSet' has type 'GeeSortedSet' which is not equal to or more restrictive than the type 'GeeBidirSortedSet' of the property on the interface 'GeeBidirSortedSet'
Is that related? By the way I also get the "FileUtils.vala:229: Could not pre-scan music folder. Progress percentage may be off: Permission denied" warning that already has 2 duplicate bugs filed about it I think.

James Henstridge (jamesh) wrote :

I just tried getting noise 0.3.1+r1909+pkg95~daily~ubuntu0.4.1 to scan my 24 GB music collection, and it has also slowed to a crawl, currently a bit below the 8 GB mark. Two things I've noticed:

1. I can hear disk activity each time it says it has scanned another song. It would probably help to batch the inserts into transactions (e.g. only commit the changes every N seconds, or every N files during the scan).

2. When I checked the schema of the sqlite3 database, there were no indexes. I'm guessing there is a SELECT query being done before inserting each file that results in a full table scan. This would be consistent with the process slowing down as the database gets larger. If you can add an index that can be used by this query (or queries), it will likely make a huge performance improvement.

The second option is probably the easiest one to fix to get rid of the slow down. The first would probably only make a noticeable difference after fixing the second.

Tobias Paar (eldarion) wrote :

Am I the only one still experiencing this extremely slow import process, even with all updates installed?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers