Playback goes choppy and stutters

Bug #240006 reported by Albert Santoni
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mixxx
Fix Released
Critical
olo
1.6
Fix Released
Undecided
Unassigned

Bug Description

Test setup:
Any Mixxx 1.6.0 beta
Windows
Multiple soundcards in use (different master and cue devices)

Start mixing for a few minutes, and the playback should become choppy and stutter to the point where Mixxx becomes unusable.

First started occurring after multiple soundcard support was added. The software faux-clock-sync algorithm is a suspect, perhaps there's some quirk about the way QMutex works on Windows that's causing this.

Update: I was able to reproduce this bug under Linux...

Revision history for this message
Albert Santoni (gamegod) wrote :

Confirmed by several users.

Changed in mixxx:
assignee: nobody → gamegod
importance: Undecided → Critical
status: New → Confirmed
Revision history for this message
Albert Santoni (gamegod) wrote :

http://mixxx.org/forums/viewtopic.php?f=3&t=113&start=0&st=0&sk=t&sd=a
Erus says he solved it by installing ASIO4ALL.... (I don't know what this means for my QMutex theory. Maybe the quirk is in PortAudio's DirectSound implementation?)

Revision history for this message
olo (zulumantee) wrote :

hey there, I have the same prob on linux with beta3. when I play files (mp3 mixtapes longer then 30 minutes) mixxx freezes my system for several minutes, then the system recovers but the audio then only stutters and I have to close mixxx and restart jack. the freeze seems to occur at random times. sometimes after three minutes, sometimes after 20 minutes or longer (thats the reason why i choose long files for reproducing).

i tried several things:

the problem seems not to occur when i set the waveform display to "simple" so that there is no running waveform. i havent yet tried to let mixxx run several hours but I never experience the problem when i let mixxx play for one hour or so, only with the simple waveform enabled.

I run strace and I saw that at the moment of the freezz a "futex" system call hangs for 40 seconds( in that case). I will provide the strace logs later!

I also added some logging around lock calls of QT related classes. when my system freezes and recovers i could see that the while loop in Reader::run stuck for several ten seconds. from there, ReaderExtractWave::getchunk gets also called which seems to calculat the waveform display if i understood right. this call takes also a long time for processing when the freeze occurs.

i will try to add more logging with timestamps and thread ids to get a better picture. until now its hard to say what causes the problem but looks like some mutex/lock/thread prob when calculating the waveform display.

i hope i can create some good debug and trace log-files soon.
peace, olo

Revision history for this message
Albert Santoni (gamegod) wrote :

Hey olo,

I also finally managed to reproduce this on Linux for the first time today. I was doing a mix and it slowly started to get stuttery and eventually it just made my CPU go crazy. I managed to hook gdb onto the running process and looked at the backtrace several times. The one part that caught my eye was this mutex lock that is eventually called from ControlObject::sync(). The sync() function updates all the controls inside Mixxx, and gets called once every buffer (ie. once every latency period, so like 25 ms or whatever). The backtrace showed that Mixxx was spending an abnormally large amount of time on that sync() call, as best as I could tell.

I might be calling sync() in the wrong place. If you look in src/sounddeviceportaudio.cpp, you'll find my call to sync() (there's only one). When I rewrote the audio core last summer, I was unsure of exactly where it should be placed, and I think this might be causing the problem.

See if you can get additional debugging and trace info olo, and we'll see if we can find a common problem. Might be something funky with all these mutexes overloading something... ?

Revision history for this message
illume (illume) wrote :

On windows I've noticed this too.

I know what causes it for me... hard disk usage by other programs.

Work around Fix: disable HD using programs. Including virus scanners etc.

I've also noticed it can stutter more if I'm searching through the library. However if I stick to browsing, or my play lists stuttering is avoided.

- windows vista home basic.
- 1.6.0 beta 2.
- 1 gig ram.
- new cheap dell laptop
- dual core duo cpu
- on board sound, + usb sound card.

cheers,

Albert Santoni (gamegod)
description: updated
Revision history for this message
ironstorm (ironstorm-gmail) wrote :

I've had problems with chop and stutter on Linux with the eeePC... The problem tended to occur quite badly on the real-time kernel variants, so much so that I gave up on them, thinking that the machine was just too slow to run -rt.

I also found laptop cpu_freq modules cause problems with ALSA during cpu frequency upclocks (rmmoding *freq* modules help) and disk I/O can also contribute (causing me to disable automatic package update checking) to momentary audio drop-outs, I wonder if these are symptoms of the same thing.

-G

Revision history for this message
olo (zulumantee) wrote :

hey albert,

i profiled the code the last two days. i could not generate the profile file gmon.out coz after the freeze the ui is dead thus I couldnt stop mixxx nicely, not even in gdb, thus the profiling file couldnt be generated.

so i added more logging, especially, logging of all the events that are send by QT and I found at least the reason of the freeze.

I looked in the sync call that u mentioned. it finally arrives at ControlObject::sync() which has three while-loops that parses the following queue member objects

- m_sqQueueThread
- m_sqQueueMutexMidi
- m_sqQueueMutexChanges

the loop that processes m_sqQueueMutexChanges causes the freeze. in this loop, the method
"updateProxies()" is called on every ControlObject in the queue. when "updateProxies()" fails, the object is put on the list again. under certain conditions, the call always fails which means that the ControlObject is put on the queue over and over again and the while loop runs for a long time. I attached the log-file and one can see the continuous "updateProxies()" calls from timestap "21:20:34.596" on until "21:21:11.298"

after that, the sounds continues choppy but the ui is dead. the chopped sound might then be caused in "EngineBuffer::process" coz the trylock call to the reader mutex always fails. QT still send events but something prevents the ui from working....

hope that gives u ppl some more clues. i will also try to figure out what the problem in the updateProxy() call causes. I may give names to all ControlObjects so that they show up in the log and one can see which causes its "updateProxies()" call to fail. I did it already for some of them but not for all...

what i found interesting when I looked in the log, was that there is in general a massive amount of events send by QT. also the sync calls are many (caused by the amount of jack callbacks, I guess ).

greetings and peace, olo

Revision history for this message
ironstorm (ironstorm-gmail) wrote :

In the past I've found that mixxx can still be controlled via my Hercules controllers (and likely other midi controllers) after the UI thread is dead.

If you have access to such a control it might be possible to map one of the buttons to tell mixxx to terminate itself, not sure if that would help... but with some steps to reproduce + debug I could have a try at it.

-G

Revision history for this message
Albert Santoni (gamegod) wrote :

Thanks again for your great troubleshooting work olo.

I have two thoughts to share:

1) Is the behaviour at the bottom of ControlObject::sync() correct? That is, if the update of a ControlObject isn't successful, should it really be re-enqueued? This seems like it's leading to "livelock" instead of a deadlock. I've only ever read about livelock, but this sounds like it might be a case of it.

2) In SoundManager.cpp, I have this requestBufferMutex.lock() call. If I make this a tryLock() call, it might improve the situation. I'm trying to think hard about what the effects of this will be, but I think it still preserve our multiple soundcard support. The case when the soundcard's clocks drift too much will still be covered with a tryLock(). (It's part of my multiple soundcard sync algorithm.)

I'm going to think about this for a bit, and I might play with #2. If I find that it's OK for me, I might commit it and ask for wider testing.

Thanks again!
Albert

Revision history for this message
olo (zulumantee) wrote :
Download full text (6.9 KiB)

hey albert, here are some more updates.

i was curious how and why the lock problem occurs and after some more tracing i think i found the reason.

ok, we know that a call to sync() on an instance of ControlObject calls at some time ControlObject::updateProxies() on every object in its
m_sqQueueChanges member. ControlObject::updateProxies() calls the setExtern() method on every ControlObjectThread object in its
m_qProxyList member. if setExtern() fails, false gets returned to updateProxies() which also returns false to sync(). then the freezing occurs
due to re-enqueueing of the corresponding object to m_sqQueueChanges.

if we now look why the ControlObjectThread::setExtern() fails, we can see in the corresponding sources that setExtern() tries to lock its m_sqMutex member.
if this trylock fails, setExtern() fails as well and returns false. the only situation the trylock fails, is when the m_sqMutex
is already locked. something special about the m_sqMutex is that it is static, thus, any object/thread that calls the thread safe public interface methods of
any ControlObjectThread instance, cause all other threads/objects to wait if they also work on instances of a ControlObjectThread.

with a bit more logging in ControlObjectThreadMain, one could see the following informations right before the freeze occurs

Debug: "tid[3047253744] [19:46:31.946]" QApplicationNotifyMonitor::notify( object[ "WVisualWaveform" ] event[ 1 ] ) : enter
Debug: "tid[3047253744] [19:46:31.946]" WVisualWaveform::timerEvent() : enter
Debug: "tid[3047253744] [19:46:31.948]" ControlObjectThread::get() : [ "ControlObjectThreadMain" ] begin lock m_sqMutex...
Debug: "tid[3047253744] [19:46:31.948]" ControlObjectThread::get() : [ "ControlObjectThreadMain" ] ...finished lock m_sqMutex
Debug: "tid[3047253744] [19:46:31.948]" ControlObjectThread::get() : [ "ControlObjectThreadMain" ] begin unlock m_sqMutex...
Debug: "tid[3047253744] [19:46:31.948]" ControlObjectThread::get() : [ "ControlObjectThreadMain" ] ...finished unlock m_sqMutex...
Debug: "tid[2995948448] [19:46:31.949]" ControlObject::sync() : enter
Debug: "tid[2995948448] [19:46:31.949]" ControlObject::sync() : time elapsed since m_sqQueueMutexThread.tryLock() [ 0 ]
Debug: "tid[2995948448] [19:46:31.949]" ControlObject::sync() : time elapsed since m_sqQueueMutexMidi.tryLock() [ 0 ]
Debug: "tid[2995948448] [19:46:31.950]" ControlObject::sync() : time elapsed since m_sqQueueMutexChanges.tryLock() [ 0 ]
Debug: "tid[2995948448] [19:46:31.950]" ControlObject::sync() : exit time elapsed [ 1 ]
Debug: "tid[3047253744] [19:46:31.954]" ControlObjectThread::get() : [ "ControlObjectThreadMain" ] begin lock m_sqMutex...
Debug: "tid[2995948448] [19:46:31.954]" ControlObject::sync() : enter
Debug: "tid[2995948448] [19:46:31.954]" ControlObject::sync() : time elapsed since m_sqQueueMutexThread.tryLock() [ 0 ]
Debug: "tid[2995948448] [19:46:31.954]" ControlObject::sync() : time elapsed since m_sqQueueMutexMidi.tryLock() [ 0 ]
Debug: "tid[2995948448] [19:46:31.955]" ControlObject::updateProxies() : obj->setExtern [ true ] for ctrlobj thread [ "ControlObjectThreadWidget" ] ctrlobj key[ "[Channel1]" | "VuMeter" ]
Debug: ...

Read more...

Revision history for this message
olo (zulumantee) wrote :

hey there, i played around with the mutexes in ControlObjectThread.

due to the single static mutex that is used for synchronization in the original code, one always halts all other ControlObjectThread instances when changing values in one instance. I splitted the member protection onto two mutexes

- a local mutex for the class member ControlObjectThread::m_dValue
- a static mutex for the static class member ControlObjectThread::m_sqQueue

the instances of ControlObjectThread are only synchronized application wide, if methods that access
the m_sqQueue member are invoked. this methods are

- ControlObjectThread::update
- ControlObjectThread::setExtern

all other public interface methods only lock the local mutex which is not shared by the other instances.
thus, when calling for example ControlObjectThread::get(), one only synchronizes the calls of the corresponding
instance by locking the local mutex, but doesnt lock the other instances.

I run mixxx now with these changes and it seems to be stable. no freeze after several hours of playing.
on the other hand, i think this change wont solve the live-lock problem completely but just drastically reduce the probability of getting in this situation. i think its complementary to the sync call fix...

the patch is attached. feel free to check it if u like. peace up, olo

Revision history for this message
Albert Santoni (gamegod) wrote :

Hi olo,

Thanks for this awesome patch! I can confirm that I no longer get the live-lock on Linux with your patch applied. You've done fantastic debugging work, and I owe you a great deal of thanks. :)

I tried playing with putting the sync() on a timer, but for some reason I get these strange ASSERTs firing every once and a while with it. I experimented with this idea about a year ago and ran into the same problem.

Since the big live-lock problem is solved and I'd like to get 1.6.0 released soon, would you feel comfortable if we left the sync() inside requestBuffer() as it is right now? If anyone encounters this live-lock problem again in the future, then we'd have to figure it out...

Thoughts?

Thanks again!
Albert
P.S. I just applied your patch to SVN.

Revision history for this message
Albert Santoni (gamegod) wrote :

Fixed in SVN, thanks to olo!

Changed in mixxx:
assignee: gamegod → zulumantee
status: Confirmed → Fix Committed
Revision history for this message
JoeM (dm2) wrote :

I'd like to add I'm having this bug under Leopard as well, using a NI Audio 8 DJ plus internal sound. Also, creating an aggregate device in Audio MIDI Setup and directing both Montior and Headphones output to different channels of the aggregate device has the same stuttering problem. This problem is immediate on playback.
Will try to test the latest SVN this evening if I can get it to build relatively painlessly.

Changed in mixxx:
status: Fix Committed → Fix Released
Revision history for this message
Swiftb0y (swiftb0y) wrote :

Mixxx now uses GitHub for bug tracking. This bug has been migrated to:
https://github.com/mixxxdj/mixxx/issues/4981

lock status: Metadata changes locked and limited to project staff
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.