flac tracks getting corrupt on disk -- write metadata bug?

Bug #1815305 reported by Owen Williams on 2019-02-09
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mixxx
2.1
Critical
Uwe Klotz
2.2
Critical
Uwe Klotz

Bug Description

I'm having problems with flac tracks becoming corrupt after playing them with mixxx. This has now happened on two tracks that were definitely working before. I suspect it's related to the write-track-metadata feature in Mixxx.

What happens is the track just cuts to silence at a specific point in the track (where the corruption is I guess?). If I try to transcode the flac file to mp3, there's a small gap in the audio where the corruption exists in the flac.

Here's the error message I get:
Warning [CachingReaderWorker 1]: SoundSourceFLAC - FLAC decoding error "STREAM_DECODER_ERROR_STATUS_LOST_SYNC" in file "/music/ogg/00MIXING/BeatPort/Relief/Green Velvet, Harvard Bass/Green Velvet, Harvard Bass - Lazer Beams (Original Mix).flac"
Warning [CachingReaderWorker 1]: SoundSourceFLAC - FLAC decoding error "STREAM_DECODER_ERROR_STATUS_LOST_SYNC" in file "/music/ogg/00MIXING/BeatPort/Relief/Green Velvet, Harvard Bass/Green Velvet, Harvard Bass - Lazer Beams (Original Mix).flac"
Warning [CachingReaderWorker 1]: SoundSourceFLAC - FLAC decoding error "STREAM_DECODER_ERROR_STATUS_BAD_HEADER" in file "/music/ogg/00MIXING/BeatPort/Relief/Green Velvet, Harvard Bass/Green Velvet, Harvard Bass - Lazer Beams (Original Mix).flac"
Warning [CachingReaderWorker 1]: SoundSourceFLAC - FLAC decoding error "STREAM_DECODER_ERROR_STATUS_LOST_SYNC" in file "/music/ogg/00MIXING/BeatPort/Relief/Green Velvet, Harvard Bass/Green Velvet, Harvard Bass - Lazer Beams (Original Mix).flac"
Warning [CachingReaderWorker 1]: SoundSourceFLAC - FLAC decoding error "STREAM_DECODER_ERROR_STATUS_BAD_HEADER" in file "/music/ogg/00MIXING/BeatPort/Relief/Green Velvet, Harvard Bass/Green Velvet, Harvard Bass - Lazer Beams (Original Mix).flac"
Warning [CachingReaderWorker 1]: SoundSourceFLAC - FLAC decoding error "STREAM_DECODER_ERROR_STATUS_LOST_SYNC" in file "/music/ogg/00MIXING/BeatPort/Relief/Green Velvet, Harvard Bass/Green Velvet, Harvard Bass - Lazer Beams (Original Mix).flac"
Warning [CachingReaderWorker 1]: SoundSourceFLAC - Unexpected frame index 2019328 > 1982464 while decoding FLAC file "/music/ogg/00MIXING/BeatPort/Relief/Green Velvet, Harvard Bass/Green Velvet, Harvard Bass - Lazer Beams (Original Mix).flac"
Warning [CachingReaderWorker 1]: CachingReaderWorker - Failed to read chunk samples for frame index range: actual = [1982464 -> 1982464) , expected = [1982464 -> 1990656)
Warning [CachingReaderWorker 1]: CachingReaderWorker - Readable frames in audio source reduced to [0 -> 1982464) from originally [0 -> 16730745)

I'm turning off the feature now, and hopefully this will stop happening.

Uwe Klotz (uklotzde) wrote :

I have many FLAC files that constantly get updated by Mixxx, but none of them ever got corrupt.

Be (be.ing) wrote :

Have you run fsck on the filesystem? Might there be hardware damage? Can you identify how to reproduce this (how to make a track that is currently working not work anymore)?

Changed in mixxx:
importance: Undecided → Critical
Owen Williams (ywwg) wrote :

I'll see if I can pinpoint a cause but for now I don't have a good theory. But it's definitely files that I've played recently. also, this is on the trunk version

Owen Williams (ywwg) wrote :

I do see a reference to orphan inode in /var/log/boot.log -- although the entries don't have timestamps so it's hard to know when it happened. But this is decent evidence toward that theory.

But more interestingly, damage to the middle or audio data doesn't seem to indicate mixxx as being the cause -- writing metadata should cause some sort of problem at the ends of the files, right?

I'm going to keep this feature off for now because I have a gig coming up, but I will turn it back on and watch out for this. I also haven't had any problems with mixxx writing data recently, so if I can't reproduce after a while I'll close the bug. Thanks for the ideas!

Changed in mixxx:
status: New → Incomplete
Uwe Klotz (uklotzde) wrote :

I was also wondering why the corruption occurs in the middle of the stream. We don't know what actually happens if the padding in the metadata header doesn't provide enough free space while writing and TagLib needs to reorganize the contents of the file.

It might be a very special combination of prerequisites, i.e. encoder + file system + metadata + TagLib version + ... that leads to these issues.

Daniel Schürmann (daschuer) wrote :

If the metadata is at the beginning of the file, and it need to be enlarged, taglib writes the file In place. It extends the required space and starts to copy byte by byte from the original to the new position stating at the last bytes.

If this is interrupted, you get a file with a duplicated region somewhere in the middle. However Mixxx backs this up by doing this on a temporary copy of the file as far as I remember. Am I still right? So the scenario would be that we treat a half written file as finish and replace this with the original. Is this realistic?

Uwe Klotz (uklotzde) wrote :

We perform the TagLib save() operation on a temporary copy and replace the original file only after all operations have completed successfully. This is not very SSD-friendly (...thinking about how to avoid and disable it on demand...), but the safe route.

I will add a size check of the temporary copy and the original file before starting to save the tags for an additional level of safety. But we still need to rely on the consistency and reliability of the underlying file system and that TagLib saves metadata without corrupting files.

Uwe Klotz (uklotzde) wrote :

I have an (alarming) idea: If the temporary file could not be created successfully, then SafelyWritableFile will continue with the original file. The constructor takes the wrong path, because the member containing the original file name has already been initialized and is not reset upon failure!!

This failure might happen if the file system is (almost) full and neither the temporary copy can be created nor TagLib is able to finishing writing into the original file.

Unfortunately this bug also affects 2.1.

Changed in mixxx:
status: Incomplete → Confirmed
assignee: nobody → Uwe Klotz (uklotzde)
Uwe Klotz (uklotzde) on 2019-02-10
Changed in mixxx:
status: Confirmed → In Progress
Owen Williams (ywwg) wrote :

Would it be helpful to upload the broken file somewhere? My disk is an SSD but it's not nearly full (126G free out of 1T).

Daniel Schürmann (daschuer) wrote :

Yes, please, the best together with a backup of the original file if it still exists.

Owen Williams (ywwg) wrote :

my SSD is now having lots of problems (remounting readonly, fscking, etc). So it's likely that my problems are stemming from that. But I'm glad we found that edge case too!

Uwe Klotz (uklotzde) wrote :

Phew, what an awful situation. Partial failure is always the worst case.

I still consider the bug we discovered as critical that needs to be fixed asap. Thanks for reporting and alarming us ;)

Owen Williams (ywwg) wrote :

I got a new drive and transfered everything over, so far so good.

Another thing that I noticed during this failure. While only a single frame of the FLAC file was broken, the cachingreader gives up and truncates the file at that point. In actuality, the rest of the file is indeed playable (I can seek past the breakage just fine). From a performance standpoint, what I see is that the track hits the breakage and then just goes dead silent.

What would be better would be for the caching reader to mark the frame as bad, but recover if possible and cache the rest of the data. That way there would maybe be a brief dropout, but the rest of the track could keep going. It wouldn't be quite as much of a party stopper.

Uwe Klotz (uklotzde) wrote :

We had various issues while decoding corrupt files in the past. The "truncation" strategy for handling such failures has proven to be the safest fallback so far. If someone is able to contribute a more sophisticated failure handling strategy (e.g. blacklisting individual parts of an audio stream) that doesn't cause any unwanted side effects we could improve on this.

no longer affects: mixxx
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers