duplicate titles when using Symlinks in Library

Bug #1451778 reported by -MK-
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Mixxx
Confirmed
Medium
Unassigned

Bug Description

I'm using build branch-1.12 r5411.

I have symlinked /home/user/Music/ to /mnt/data/Music.
I've set /home/user/Music/SomeFolder as my LibraryRoot.

Some titles that I play in Mixxx 1.12 will be inserted a second time to my library with the path /mnt/data/Music/...
I can't at the moment reproduce when excactly this happens, but I now have multiple files two times in my library, once with path /home/... and once with /mnt/...
the one from /mnt/... is the newer one and mostly has some additional metadata (as album-artist) set. Maybe this could hint in the right direction.

Sometimes I noticed that titles will be analyzed when loading to a deck even if they have bmp and key attached. This could be issued by that.

Cheers,
Markus

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

I see many invocations of QFileInfo::canonicalFilePath() in the code base which replaces symbolic links in the file path. But QFileInfo::absoluteFilePath() is also used which preserves symbolic links.

What is missing is a common function that normalizes file paths consistently instead of calling different Qt functions that are scattered all around.

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

Using QFileInfo::canonicalFilePath() is basically correct, but some post-processing is needed. The prefix of the canonical file path has to be matched against the canonical file path of the configured music directories. If a match is found it has to be replaced by QFileInfo::absoluteFilePath() of that specific music directory which might contain a symbolic link.

Note: Music directories must be referenced by their absolute file path! Their canonical file path may vary, i.e. if the target directory of a symbolic link is changed in the file system.

Precondition: The canonical file paths of all music directories must be disjunct, i.e. none is the prefix of any other. This restriction should be checked not only when adding new music directories, but also on every startup of Mixxx! The canonical file paths may vary even while Mixxx is running, but that is definitely out of scope.

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

Please note that my previous proposal does not account for symbolic inside/underneath a music directory. This would require even more post-processing, not only of the prefix (= reference of music directory = absolute path of music directory) but also of the suffix (= relative path inside the referenced music directory).

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

Simple rule: QFileInfo::canonicalFilePath() is needed for normalization purposes. But since canonical file paths may vary (-> symbolic links) they should never be stored persistently as a reference to any file or directory!

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

I will try to find a solution that is both robust and future proof. But definitely out of scope for 1.12.

Changed in mixxx:
status: New → Confirmed
assignee: nobody → Uwe Klotz (uklotzde)
Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

I was able to reproduce this behavior! It happens when you open a track from the library through the browse view. This track is then stored in the library without symlinks in the file path, i.e. the "canonical" file path.

A quick fix for 1.12 might be possible.

Changed in mixxx:
status: Confirmed → In Progress
Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

I've found a pragmatic solution:
- basically use the absolute path of files and directories
- normalize it by using QDir::cleanPath()
- use the canonical path only for security tokens

This strategy preserves symlinks, while redundant parts of the path are removed. The corresponding static functions will be located in MFile and MDir. All invocations of QDir::absolutePath(), QFileInfo::absolutePath(), and QFileInfo::absoluteFilePath() will be replace by MDir::normalizedPath(), MFile::normalizedPath(), and MFile::normalizedFilePath() respectively.

PR will follow shortly. I've prepared a branch for 1.12, but rebasing it on master would also be possible.

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :
RJ Skerry-Ryan (rryan)
Changed in mixxx:
milestone: none → 1.12.0
importance: Undecided → Medium
Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

A short note about the scope and limitations of this fix:

There is no efficient way to reliably detect if an arbitrary file is already contained in the library if the path may contain symbolic links. The same file can be referenced by any number of paths. It is not possible to determine all paths to a single file.

For the decision if two files are the same their canonical paths have to be compared. But the canonical path of a file may change over time and cannot be stored and indexed in the database as I already mentioned. A full scan of the whole library to determine the current canonical path of each file for comparison is definitely not an option.

If you browse and open a file through the same path that is referenced in the library, no duplicates should be created with this fix applied. Browsing and opening the file through a different path will still add duplicate entries to the library.

Revision history for this message
Daniel Schürmann (daschuer) wrote :

The only thing we can do is to compare files with equal or Artist and Title.

Like this:
* User picks a track from outside the library path
* Find tracks with same Artist / Title
* compare canonical path
* Use the library path for the folowing operations if the canonical path matches.

Revision history for this message
-MK- (mk42) wrote :

Sorry for not giving feedback till now, didn't get notifications from Launchpad.

Thanks for your fix. But I think there must be another way then using the "Browse-View" since I never used it with mixxx.

Is there any possibility to clean up my library now?

Revision history for this message
Daniel Schürmann (daschuer) wrote :

Did you use drag and drop from a file manager?

It looks like your original issue is still open.

To clean up your DB can Hide the duplicates and then Purge them from the hidden track view.
Unfortunately your will loose the history of these tracks.

If you wish to start with a fresh database, you may rename or delete
~/.mixxx/mixxxdb.sqlite

Revision history for this message
-MK- (mk42) wrote :

No, I don't do that normally - means I can't remember that i've done that.
I think its something else.

I will try to reproduce this.

Revision history for this message
-MK- (mk42) wrote :

I got it. I looked over the files and tryed to remember.

It happens when you drag and drop such a file from Library to a Deck (maybe in some other circumstances as well). Then a file with the path outside of Music-Directory (which is a symlinked path) will be added to the library

Revision history for this message
-MK- (mk42) wrote :

Happens also when dragging to preview-Deck

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

I experienced this issue one again when analyzing a track that has not yet been played. Unfortunately I was not able to reproduce this error.

File system access indeed needs to be reworked by avoiding to directly invoke filePath()/absoluteFilePath()/relativeFilePath() from the various Qt classes.

Revision history for this message
-MK- (mk42) wrote :

Maybe a dumb question, but isn't double-clicking and context-menu -> load to deck X doing it 'right'?
When I use those ways to load tracks i won't get duplicate tracks.
Wouldn't it be possible to use the way tracks (and paths) are loaded for now until it is reworked as you said?

Revision history for this message
-MK- (mk42) wrote :

I've just tested it.

For me replacing canonicalFilePath with absoluteFilePath in the onDrop functions works. Should I file a pull request for those changes?

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

I was unsure if I should change the onDrop functions. Of course, another pull request for those functions (-> 1.12) makes sense, if you've tested it.

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

Another fix for the duplicate tracks issue:
https://github.com/mixxxdj/mixxx/pull/814

Revision history for this message
Jean Claveau (jean-claveau-g) wrote :

There is question I think about for a while now :
I have a lot of duplicates in my library due to the fact that my music is stored on an external hdd and the default location of Ubuntu can change whether or not it's plugged during boot.

More, I often move my tracks from a folder to another as I store them by genre and split my folders in subgenres when they are too big.

I need their location to be well tidied to use them on the CDJs of my friends with a USB pen.

So my question is : why don't we use Chromaprint's fingerprints to find duplicates and ?

I don't know if the idea has already been discussed or not. Tell me if I should file a new bug.

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

If you store your music on external storage (like I do) use symbolic links for your music library. This ensures that your files always appear at the same location, independent of where the volume is actually mounted. Example: I use <home directory>/Music/Collection

Each track in Mixxx has a "location" that references the corresponding file. Track locations are normalized by QFileInfo::absoluteFilePath(). This value that is stored in the library persistently.

Track locations might contain symbolic links. Different locations can reference the same file, this is what you noticed and might lead to duplicates. At runtime we additionally use QFileInfo::canonicalFilePath() to determine the actual file path. But this path can change dynamically (depending on mount points) so we must not store it in the library.

See also my latest PR that encapsulates those concepts in TrackRef:
https://github.com/mixxxdj/mixxx/pull/901

The chroma fingerprint (as any other "hash" value) is not guaranteed to be unique. We should think about using it for relocating moved files and finding duplicates, nice idea. But it is not a replacement for the track's location that we need to store in the library.

Revision history for this message
RJ Skerry-Ryan (rryan) wrote :

Uwe, is this fixed?

ronso0 (ronso0)
Changed in mixxx:
assignee: Uwe Klotz (uklotzde) → nobody
milestone: 2.0.0 → none
status: In Progress → Confirmed
Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

Related: Since canonical file paths might not be unique within the library we need to account for this fact when caching tracks.

https://github.com/mixxxdj/mixxx/pull/3027

Revision history for this message
Swiftb0y (swiftb0y) wrote :

Mixxx now uses GitHub for bug tracking. This bug has been migrated to:
https://github.com/mixxxdj/mixxx/issues/8009

lock status: Metadata changes locked and limited to project staff
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.