Crash due to SIGBUS error while loading a track

Bug #1452005 reported by Jean Claveau
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mixxx
Confirmed
Critical
Unassigned

Bug Description

On master branch, at the first loading of track (mp3), I receive a SIGBUS error on Xubuntu 14.04 (even if I change the output).

The log from the terminal :
Debug [Main]: PlaylistTableModel(0x35dda00) select() took 14 ms 70
Debug [Main]: Successfully deserialized BeatGrid
Debug [Main]: Successfully deserialized KeyMap

Program received signal SIGBUS, Bus error.
[Switching to Thread 0x7fffccb3b700 (LWP 7705)]
0x00007ffff36180ff in mad_header_decode () from /usr/lib/x86_64-linux-gnu/libmad.so.0

And the backtrace :
#0 0x00007ffff36180ff in mad_header_decode () from /usr/lib/x86_64-linux-gnu/libmad.so.0
#1 0x0000000000afaf75 in Mixxx::(anonymous namespace)::decodeFrameHeader (pMadHeader=pMadHeader@entry=0x7fffccb3a9a0, pMadStream=pMadStream@entry=0x7fffb4014e78,
    skipId3Tag=skipId3Tag@entry=true) at src/sources/soundsourcemp3.cpp:111
#2 0x0000000000afef05 in Mixxx::SoundSourceMp3::tryOpen (this=0x7fffb4014df0) at src/sources/soundsourcemp3.cpp:224
#3 0x0000000000af436e in Mixxx::SoundSource::open (this=0x7fffb4014df0, audioSrcCfg=...) at src/sources/soundsource.cpp:27
#4 0x0000000000aeeb15 in SoundSourceProxy::openAudioSource (this=this@entry=0x7fffccb3ad00, audioSrcCfg=...) at src/soundsourceproxy.cpp:326
#5 0x00000000004dcf14 in openAudioSourceForReading (audioSrcCfg=..., pTrack=...) at src/cachingreaderworker.cpp:139
#6 CachingReaderWorker::loadTrack (this=this@entry=0x1ef2720, pTrack=...) at src/cachingreaderworker.cpp:174
#7 0x00000000004de37b in CachingReaderWorker::run (this=0x1ef2720) at src/cachingreaderworker.cpp:122
#8 0x00007ffff540032f in QThreadPrivate::start (arg=0x1ef2720) at thread/qthread_unix.cpp:349
#9 0x00007ffff31e6182 in start_thread (arg=0x7fffccb3b700) at pthread_create.c:312
#10 0x00007ffff189247d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Tags: soundsource
Revision history for this message
Jean Claveau (jean-claveau-g) wrote :

Digging a bit I realized that my data HDD wasn't available anymore (it happens sometimes). After a reboot, everything looks fine so this error may occur really rarely (but an error message would be nicer than a crash :) ).

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

Did this happen only once or are you able to reproduce the crash?
Does it only happen for a special MP3 file or when trying to open any MP3 file?
Do you build Mixxx from source? What settings do you use for the build?
Are you able to run mixxx-test successfully?

At first sight it looks like some strange memory corruption or an attempt to access uninitialized memory. I've never seen anything like this on Fedora 21 x86_64, not even for corrupt MP3 files. Maybe a version incompatibility of libmad or an inconsistent build?

Changed in mixxx:
assignee: nobody → Uwe Klotz (uklotzde)
Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

Base version of libmad should be identical: 0.15.1b

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

Just saw your last comment, Jean ;)

The content of MP3 files is mapped to memory for reading. Unfortunately we are not able to detect if the underlying file disappears unexpectedly while mapped. This kind of crash will occur with any version of Mixxx.

"You can also get it from accessing a memory mapped device if there's an error of some kind."
(http://stackoverflow.com/questions/2089167/debugging-sigbus-on-x86-linux)

I don't know if it is possible to handle SIGBUS signals gracefully? Maybe this is an edge case that we need to accept.

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

From the manpage of mmap():
"SIGBUS Attempted access to a portion of the buffer that does not correspond to the file (for example, beyond the end of the file, including the case where another process has truncated the file)."

Changed in mixxx:
status: New → Confirmed
Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

Even with a global signal handler installed the only thing we could do is print an error message and terminate Mixxx.

Revision history for this message
Jean Claveau (jean-claveau-g) wrote :

+ I'm running Mixxx build from sources (master branch), using the default settings (scons -j 4, nothing more).
+ I reproduced it at least five times, with different mp3 files (no other format).
+ My data HDD is in a caddy replacing my optical drive and I think sometimes it's not physically stable and it unplugs (maybe it's due to a hardware issue but I have never been able to diagnose it clearly). When it occurs, a reboot fixes it, sometimes requiring a checkdisk first and it's quite rare. So my computer wasn't in a normal state (I realized that just after the bug report).
+ mixxx-test runs successfully all the tests (after the reboot, I hadn't run them before).

As I probably won't be able to reproduce it before a while, I think we can close this bug.

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

For the time being I will add a note in the code where we map the file that SIGBUS errors might occur if the file disappears unexpectedly while mapped. Just to remember this rare edge case.

Changed in mixxx:
status: Confirmed → In Progress
Revision history for this message
Jean Claveau (jean-claveau-g) wrote :

If ever it can help!

Here is the exact commit of my build :
https://github.com/demos/mixxx/commit/99e9700cfda0f1158036b2e1ca0f6bea62b127b1

Revision history for this message
Daniel Schürmann (daschuer) wrote :

Interesting topic. I think there is a way to handle it gracefully since the SIGBUS is thread specific

http://stackoverflow.com/questions/6533373/is-sigsegv-delivered-to-each-thread/6533431#6533431

Here is the code fragment:
http://www.linuxprogrammingblog.com/code-examples/SIGBUS-handling

It is probably not worth the work, for that rare condition, but fun :-)
At least the test code works.

Revision history for this message
Uwe Klotz (uklotzde-deactivatedaccount) wrote :

Thread-specific is not sufficient. Different SoundSources may use memory-mapped files and even multiple instances of SoundSourceMp3 may exist simultaneously in the same thread.

The only thing we could do is print a generic error message and terminate the application immediately. We can't get hold of any information about the cause in the signal handler. Only the last invocation of sigaction()/sigsetjmp(() is effective, but this is not necessarily the same context where the failure actually occurred.

Revision history for this message
Daniel Schürmann (daschuer) wrote :

Is it an issue of different instances? It looks like as if the call stack is still valid after SIGBUS.

One idea is to guard all accesses to the mapped mamory, by a wrapper class.
In the signal handler we may check if we are in the guarded region.
If yes. stop decoding the file in the current context and clean up.
This finally may look like a try catch region.

But never mind, probably it is not possible because I have not found an example on the web. :-/

Revision history for this message
Daniel Schürmann (daschuer) wrote :
Revision history for this message
Owen Williams (ywwg) wrote :

RJ has been designing a cleaner solution to the problem of crashing while decoding by doing decoding in a separate process. This prevents us needing to hack our way around to prevent crashes in the decode pipeline. Please talk to him before doing any work on this type of workaround.

Revision history for this message
Daniel Schürmann (daschuer) wrote :

I don't think that this is a question of a future quarantine process.
Grace-full recover such an error should don in both cases.

By the way: Does it make sense to use mmap for very fast IO which introduces the SIGBUS risk in one place and
pipe the stream over a process borders in an additional step. Probably not.
As far as I remember all crashes we had where meta data related. The sample file IO itself seams to be pretty stable.
So we need to consider careful the performance gain of such an external process.

Changed in mixxx:
status: In Progress → Confirmed
Revision history for this message
RJ Skerry-Ryan (rryan) wrote :

Does this still reproduce using Mixxx 2.1.4?

Changed in mixxx:
status: Confirmed → Incomplete
Revision history for this message
RJ Skerry-Ryan (rryan) wrote :

Answering myself, we still mmap files so this is definitely still a problem.

@daschuer -- you're quite right RE: mmap'ing within a quarantine process. We would have to carefully measure the benefit.

I'm not actively working on anything in this area -- it's just an idea we've batted around for years.

I still think there's benefit to a quarantine process. This SIGBUS issue is just one instance of the set of crashes that audio decoding can produce, so even if it comes with a performance regression we should still consider it. Just as a crashed tab shouldn't take down your web browser, a corrupt audio file shouldn't take down your DJ mix.

Changed in mixxx:
status: Incomplete → Confirmed
Revision history for this message
RJ Skerry-Ryan (rryan) wrote :

I think this qualifies as a critical crash bug though, and we should prioritize fixing it -- since a common DJing use case is audio on a removable USB stick, the case of a file being removed while we've mmap'd it isn't really an edge case.

Changed in mixxx:
importance: Undecided → Critical
Revision history for this message
Daniel Schürmann (daschuer) wrote :

Related is the issue of cache misses in random seeks. If we consider to cache all samples of the whole file, there is probably no need for blazing fast file IO. A philosophy question :-/

Changed in mixxx:
assignee: Uwe Klotz (uklotzde) → nobody
tags: added: soundsource
Revision history for this message
Swiftb0y (swiftb0y) wrote :

Mixxx now uses GitHub for bug tracking. This bug has been migrated to:
https://github.com/mixxxdj/mixxx/issues/8011

lock status: Metadata changes locked and limited to project staff
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.