possible memory corruption with "update metadata" active

Bug #728197 reported by Owen Williams on 2011-03-03
This bug report is a duplicate of:  Bug #1181869: Tagging Files. Edit Remove
34
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Mixxx
Critical
Uwe Klotz
1.10
Critical
RAFFI TEA
1.9
Critical
RAFFI TEA

Bug Description

soundsourcemp3 uses QFile.map for fast access to an mp3 file for playback. This might be causing a problem when paired with the UI option to update track metadata. The doc for mmap says:

"If the size of the mapped file changes after the call to mmap() as a result of some other operation on the mapped file, the effect of references to portions of the mapped region that correspond to added or removed portions of the file is unspecified."

I have had (hard-to-reproduce) experiences where a track's audio has suddenly gotten corrupted, with squawks and bleeps of bad data playing back (total party-killer). Scrubbing has revealed that the bad sound is in the track, not the playback. (I can scratch the awful noises). Reloading the track into the player fixes the problem. I suspect that this is a result of the audiotagger writing to the file while it's still being mapped.

This is a tough one to fix because trackdao doesn't have an easy way to know which tracks are loaded. For now I did this by adding a flag to the trackinfoobject that is set if it is loaded, but a better way would be some function that trackdao could call to ask the players if this track is loaded. This could be implemented in many various hacky ways, so a good way needs to be chosen.

RJ Ryan (rryan) on 2011-03-07
Changed in mixxx:
importance: Undecided → High
RJ Ryan (rryan) on 2011-09-30
Changed in mixxx:
status: New → Confirmed
RJ Ryan (rryan) wrote :

Anything capable of stopping hte party, however, rare, is Critical :)

Changed in mixxx:
importance: High → Critical
RJ Ryan (rryan) wrote :

I agree this is tricky. Adding the 'loaded' bit to a TIO is not good enough because there isn't a 1 to 1 track to loaded SoundSource mapping. You could have opened the same track from the iTunes library and the Mixxx library so there are two TIOs for the loaded soundsource. There isn't even a one SoundSource per file restriction. Two decks playing the same track have two separate mmaps of that file.

So we need a way to delay the metadata updates until the location is not present in any deck. We could keep a global list of all currently active SoundSources.

Owen Williams (ywwg) wrote :

If we're not going to fix this bug for 1.10.0, I would strongly suggest disabling the metadata update feature entirely. I got memory corruption fairly regularly on my machine, so I wouldn't call it "rare". This might be the type of thing that doesn't really pop up on some machines at all, but on mine it was an issue.

RJ Ryan (rryan) wrote :

My bad -- didn't realize it happened so often and it was late at night so I wasn't thinking straight :).

Maybe disabling it is a good idea .. .or maybe we can delay the updates to shutdown or something.

Owen Williams (ywwg) wrote :

Would it be a total hack to ask QT/the OS if the file is open for use, and only update the tag if it isn't? That would be a way around all of the corner cases you mentioned. Here's one option I found: http://doc.trolltech.com/solutions/3/qtlockedfile/qtlockedfile.html

RAFFI TEA (raffitea) wrote :

+1 to disable the feature temporarily.

Just make sure you hide the select box with QT designer. Moreover, you need make sure that the config has 'update metadata' disabled.

RAFFI TEA (raffitea) wrote :

I've deactivated the feature for safety reasons. As RJ stated in comment #3 fixing this bug is not easy. We need a kind of map that maps filenames to TrackInfoObjects to prevent creating more than one TIO for a particular loaded file.

Changed in mixxx:
status: Confirmed → Fix Committed
assignee: nobody → RAFFI TEA (raffitea)
RJ Ryan (rryan) on 2011-12-25
Changed in mixxx:
status: Fix Committed → Fix Released
Changed in mixxx:
assignee: RAFFI TEA (raffitea) → Daniel Schürmann (daschuer)
status: Fix Released → In Progress
importance: Critical → Undecided
Daniel Schürmann (daschuer) wrote :

I have reopen the bug because of discussions on
http://sourceforge.net/mailarchive/forum.php?thread_name=4F78A7E4.1010302%40gmx.de&forum_name=mixxx-devel

I am currently working on a proposal of reusing the external clementine-tagreader process.
I'm curious if this will be successfully.

Daniel Schürmann (daschuer) wrote :

The Link from #9 was somehow incomplete, here is the rest:
http://sourceforge.net/mailarchive/message.php?msg_id=29071506

investigations:
QFile::map():
* On Windows XP, the file is automatically locked against writes from other processes when it is mapped by one process.
* On Unix (man mmap), "It is unspecified whether changes made to the file after the mmap() call are visible in the mapped region"
** Its likely that we get a seek offset when mapping a new page after extending a tag stored in front of the audio stream.
* see tfile.cpp File::insert()

Warning in taglib tfile.h:
"On UNIX multiple processes are able to write to the same file at the same time. This can result in serious file corruption.
If you are developing a program that makes use of TagLib from multiple processes you must insure that you are only doing writes to a particular file from one of them."

On Mac/Linux the traditional thing to do would be:
1) Copy the file to a temporary file
2) Write tags to the copied file
3) Unlink the original file (Mixxx will still have that one mmap'd)
4) Move the temporary file to where the old file was

On Windows I'm not sure what we can do since the file would be locked
anyway. Make a temporary file and keep a queue of temporary files to move
to final locations once they are unloaded by Mixxx?

2012/4/24 Daniel Schürmann <email address hidden>

> The Link from #9 was somehow incomplete, here is the rest:
> http://sourceforge.net/mailarchive/message.php?msg_id=29071506
>
> investigations:
> QFile::map():
> * On Windows XP, the file is automatically locked against writes from
> other processes when it is mapped by one process.
> * On Unix (man mmap), "It is unspecified whether changes made to the file
> after the mmap() call are visible in the mapped region"
> ** Its likely that we get a seek offset when mapping a new page after
> extending a tag stored in front of the audio stream.
> * see tfile.cpp File::insert()
>
> Warning in taglib tfile.h:
> "On UNIX multiple processes are able to write to the same file at the same
> time. This can result in serious file corruption.
> If you are developing a program that makes use of TagLib from multiple
> processes you must insure that you are only doing writes to a particular
> file from one of them."
>
> --
> You received this bug notification because you are a member of Mixxx
> Development Team, which is subscribed to Mixxx.
> https://bugs.launchpad.net/bugs/728197
>
> Title:
> possible memory corruption with "update metadata" active
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/mixxx/+bug/728197/+subscriptions
>

Daniel Schürmann (daschuer) wrote :

Hi RJ,

I don't like the idea to copy the file around just. This did not entirely solve the problem because we have a write back anyway.
Such a solution will likely introduce other issues.

A suitable solution to me is:
* use an external process for taglib (based on clementine tagreader)
* use an explicit file lock for all file operations (based on http://libqxt.bitbucket.org/doc/tip/qxtfilelock.html)

Advantages:
* a own process reduces the risk of corrupting the track by a Mixxx error elsewhere.
* A crash due to a corrupt file does not effect Mixxx itself. This was reported by a Clementine developer, can be found by Google and I might remember that we had a mixxx bug like that.
* File locks will solve the seek offset problem on XP entirely and with all cooperative processes on Linux and Mac OS
* Writing Tags is as fast as possible especially if there is no need to grow the tag region of the file.

So there is minimum rest risk if a user users uses a external tag writer writing a track while it is currently mmaped in Mixxx (playing) .

Currently the clementine-tagreader works in my trunk experimental. If I find time I will work on unit tests on that.

Regards,

Daniel

Owen Williams (ywwg) wrote :

I'm a little confused, have we abandoned the idea of waiting until program exit to update metadata? We wouldn't need to make any other changes (external tag writers, locking mechanisms) with this solution.

Daniel Schürmann (daschuer) wrote :

Hi Owen,

as discussed on mixxx-devel list, writing at Mixxx shutdown is only the second-best solution.

My current favourite is the solution in #12. It might needs little more effort but is likely that what the user expect.
... and we could benefit from the clementine solution for mass tagging and cover arts
... and I have a already a working prototype

Are there additional obstacles?

Regards,

Daniel

Daniel Schürmann (daschuer) wrote :

Additional Investigations:
Rythmbox:
* has en external tagreader process called "rhythmbox-metadata"
* It communicates via DBus
* uses gstreamer for tag writing
* the new tags a written to a new file
* new file is checked if tags are readable
* file is moved back to original location after a simple check if the new tags are readable

Pickard (Musicbrains)
* Uses phyton mutagen as taging library
* In place editing
* If possible taging area is enlarged by mmap.move
* If that fails, file io is used with a file lock

Daniel Schürmann (daschuer) wrote :

I have checked the behavior of map function win WinXP and Linux. Here are my results:

Writing a mapped file from an external process:
* WinXP local -> file is locked
* WinXP net -> mapped memory is updated
* Linux -> mapped memory is updated

Reading from a deleted file previously still mapped:
* WinXP local + net -> not able to delete
* Linux -> File is deleted, but the handle is still valid, the disk space is reported occupied until the file is unmapped (no overwriting is possible)

Reading mapped from a physical removed file (Memory stick)
* WinXP -> Crash when reading from a still unmapped page (Segfault)
* Linux -> Crash when reading from a still unmapped page (Bus Error)

Conclusion:
* My concerns from #12 against "copy write" in Linux a re invalid. This seems to be the best solution to achieve a save write in Linux.
* In Win Xp a solution requires to wait until the file is unmapped (unlocked) from other processes.
* I will prepare a solution, still using the clementine tagreader but patched with the copy write addition templating from Rhythmbox.
* On Medias without space for a copy, tag writing is not possible, because its not safe.

Daniel Schürmann (daschuer) wrote :

The SaveFile solution is not as easy as it seams.
glib has included a complete solution see gio/glocalfileoutputstream.c in function "handle_overwrite_open".
Qt seems to have nothing similar, see:
http://lists.qt-project.org/pipermail/development/2012-January/thread.html#1262
KDE uses KSaveFile:
http://herschel.dsf.unica.it/cgi-bin/dwww/usr/share/doc/kde/HTML/en/kdelibs-apidocs/kdecore/html/ksavefile_8cpp_source.html
It seems that glib covers additional corner cases, but the KDE solution should work for us.

Does anyone have a port to pure Qt? Or additional hints?
If not I would just copy the required files to lib/kdecore
(.. this is only a temporary solution until we have complete GIO support see Bug #918222 :-) )

Daniel Schürmann (daschuer) wrote :

Status Update (slow progress) :

QSaveFile is now part of Qt 5.
I have a version with the Clementine Tagreader working in my trunk in my trunk but there are license issues because of GPL 2 and 3 incompatibility.
The Clementine Tagreader does not perform atomic file writes, so it is only a medium god solution anyway.

Todo:
- Port my changes to QSavefile to the Qt 5 version
- Write our own tag reader process
- Use it for tag read and write
( we had resonantly a report about a segfault in taglib Bug #1166267)

RJ Ryan (rryan) on 2013-05-10
Changed in mixxx:
importance: Undecided → Critical

Is there any progress on this? What is the status of this bug in 1.11 ?

Daniel Schürmann (daschuer) wrote :

I am sorry, there is currently but no progress. It is unlikely that this becomes part of 1.12.
Do you like to pick it up?

ewan colsell (ewanuno) wrote :

is it possible to re-enable tag writing at build time?
i really need to write tags to my mp3s

am i correct in thinging that if i don't load any files into decks i won't run into this bug?

Daniel Schürmann (daschuer) wrote :

Yes, it "might" still work.

Uwe Klotz (uklotzde) wrote :

My proposal for an in-process solution:
- Use QSaveFile (Qt 5) whenever writing to files
   - Not essential, but a nice safety feature
- Implement caching of TIOs by their absolute file path to ensure that there is only a single TIO for each file at any time
   - TIOs are only created by the cache
   - The file path of a TIO must only be modified through the cache
   - The cache is responsible for deleting the TIO after all references to it have been dropped
- SoundSourceProxy: Always construct with a TIO and store both (smart) pointers in a TrackAudioSource wrapper object
   - Delete constructor SoundSourceProxy(QString qFilename, ...)
   - SoundSourceProxy::openAudioSource() returns an AudioSourcePointer that actually points to a TrackAudioSource wrapper object
   - TrackAudioSource
      - Implements the AudioSource API by delegating to the actual AudioSource
      -1st member: TrackPointer
      - 2nd member: AudioSourcePointer
      -> AudioSource is closed/destroyed before TIO when TrackAudioSource is destroyed

Consequences: If a TIO is finally destroyed by the TIO cache there will be no more open AudioSources reading from the file. The file can now safely be modified before the TIO object is actually deleted.

Uwe Klotz (uklotzde) wrote :

If metadata writing is only enabled for tracks from the Mixxx library with a valid ID the TIO cache might not be needed in the first place.

Even the TIO cache will not be able to detect all duplicate tracks that refer to the same file, but have different file paths.

Daniel Schürmann (daschuer) wrote :

I will free this up.
Here my current work:
A Qt4 Savefile version
https://github.com/daschuer/mixxx/tree/daschuers_trunk/lib/kdelibs/libkdeqt5staging/src
Protobuf interface to celemtine tag reader process:
https://github.com/daschuer/mixxx/blob/daschuers_trunk/src/trackinfoobject.cpp

Changed in mixxx:
assignee: Daniel Schürmann (daschuer) → nobody
Daniel Schürmann (daschuer) wrote :

The jumping point here is: Do we need a quarantine process for tag writing?
Tag reading seams be become stable int the latest taglib version. I think we have no taglib crash report lately.
How will be the writing stability?

The single File access point solution sound reasonable.
The only issue can be that a file is not accessible if the cache starts the update, but a user requests to load the file at the same time.

> - The file path of a TIO must only be modified through the cache

What does this mean? Why we need to modify the path of a cached file?

> If metadata writing is only enabled for tracks from the Mixxx library with a valid ID the TIO cache might not be needed in the first place.
> Even the TIO cache will not be able to detect all duplicate tracks that refer to the same file, but have different file paths.

Do you have an example?

Uwe Klotz (uklotzde) wrote :

I've implemented a thread-safe TrackPointer cache that replaces the weak-reference cache in TrackDAO. All TIOs in Mixxx must be allocated through this cache that is instantiated as a singleton. When the last reference to a TIO is dropped modified metadata could safely be written back into the file.

The cache is implemented on top of the following PR:
https://github.com/mixxxdj/mixxx/pull/592

I'm successfully using this code for managing a huge music library. In my use case the metadata in the audio files is the master and the Mixxx library is only a big and fast cache with additional information. I even prefer to reload (possibly modified) metadata from the files each time a track is (re-)loaded into the cache. Both writing metadata back into files and automatically reloading metadata from files can easily be configured.

It's experimental work that I prefer to hold back until the prerequisite PRs are integrated.

Uwe Klotz (uklotzde) on 2015-08-04
Changed in mixxx:
assignee: nobody → Uwe Klotz (uklotzde)
Uwe Klotz (uklotzde) wrote :

The following PR finally re-enables tag writing:
https://github.com/mixxxdj/mixxx/pull/592

Uwe Klotz (uklotzde) wrote :
Uwe Klotz (uklotzde) on 2016-05-13
tags: added: metadata
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers