Bug #1641153 “Improvement: Multithreaded BPM analysis” : Bugs : Mixxx

Revision history for this message

JosepMa (josepma) wrote on 2016-11-12:

#1

I've taken a look at the code.

__This is what we have now:__

- A class AnalyzerQueue that extends from QThread and contains a QQueue of TrackPointer to process, and a QList of Analyzer* to use.

- Two instances of this class (i.e. two threads): One used by the library (playlist, crates and analyzer view, created in AnalyzerFeature) and the other used by the engine (decks, samplers and preview deck, created in PlayerManager). It probably is this way since the option to not do the waveform analysis when doing the batch analysis was added in 2.0 (player always require waveform analysis).

- When the thread runs, it loads one track, prepares all the analyzers and does the analysis in blocks of samples (currently 4096 frames).
It can be stopped in any iteration of this loop.

- There's a priorization option which makes a track to be analyzed immediately, stopping an ongoing track analysis if it needs to. It is currently unused (Because it has two independent threads), but the implementation is cumbersome. Instead of adding the new track to the front of the queue, it checks all the tracks to see if any of them is a track from a deck and use that instead of the one that is next in the queue. (a QQueue is a QList after all).

- While this runs, some signals are emited, so that different screen updates can happen.

- When it finishes, each analyzer saves its data to the track and other signals are emited. One of these signals calls analyzerqueue.stop() and the thread ends.

_
_
__This is how we could change it:__

- New class AnalyzerWorker that will have most of the functionality of the AnalyzerQueue (i.e. it will be the thread and contain a list of analyzer pointers).

- Modify the class AnalyzerQueue so that it becomes the manager of the workers. It contains the instances of the analizers, the queue of tracks to process and the pool of workers. Add the feature to calculate the number of threads of the CPU and/or read the max threads from the configuration.

-AnalyzerQueue becomes a single-instance class and so playermanager and library will use again the same instance.

-AnalyzerQueue contains a method to add to the queue, with a priority boolean parameter. If the amount of workers in the pool is less than the maximum number of threads allowed, add a new one and let it run. (We can assume that if there is any other worker, it is working already).
If the track was marked with priority, it is added in front of the the track queue, selects one worker from the pool and tells it to skip the current work and start with the next in the queue. (if one worker has just been created, do nothing, as it will just get the track once the semaphore lets it run)

- The run and analyze methods will not change much, except to cleanup part of the old priority code.

- Signals and slots need to be adapted (or we need to call analyzerqueue from the workers to emit the existing ones)

- Worker threads will die just like they do now when they have no more work to do.

I've taken a look at the code.

__This is what we have now:__

- A class AnalyzerQueue that extends from QThread and contains a QQueue of TrackPointer to process, and a QList of Analyzer* to use.

- Two instances of this class (i.e. two threads): One used by the library (playlist, crates and analyzer view, created in AnalyzerFeature) and the other used by the engine (decks, samplers and preview deck, created in PlayerManager). It probably is this way since the option to not do the waveform analysis when doing the batch analysis was added in 2.0 (player always require waveform analysis).

- When the thread runs, it loads one track, prepares all the analyzers and does the analysis in blocks of samples (currently 4096 frames).
It can be stopped in any iteration of this loop.

- There's a priorization option which makes a track to be analyzed immediately, stopping an ongoing track analysis if it needs to. It is currently unused (Because it has two independent threads), but the implementation is cumbersome. Instead of adding the new track to the front of the queue, it checks all the tracks to see if any of them is a track from a deck and use that instead of the one that is next in the queue. (a QQueue is a QList after all).

- While this runs, some signals are emited, so that different screen updates can happen.

- When it finishes, each analyzer saves its data to the track and other signals are emited. One of these signals calls analyzerqueue.stop() and the thread ends.

_
_
__This is how we could change it:__

- New class AnalyzerWorker that will have most of the functionality of the AnalyzerQueue (i.e. it will be the thread and contain a list of analyzer pointers).

- Modify the class AnalyzerQueue so that it becomes the manager of the workers. It contains the instances of the analizers, the queue of tracks to process and the pool of workers. Add the feature to calculate the number of threads of the CPU and/or read the max threads from the configuration.

-AnalyzerQueue becomes a single-instance class and so playermanager and library will use again the same instance.

-AnalyzerQueue contains a method to add to the queue, with a priority boolean parameter. If the amount of workers in the pool is less than the maximum number of threads allowed, add a new one and let it run. (We can assume that if there is any other worker, it is working already).
If the track was marked with priority, it is added in front of the the track queue, selects one worker from the pool and tells it to skip the current work and start with the next in the queue. (if one worker has just been created, do nothing, as it will just get the track once the semaphore lets it run)

- The run and analyze methods will not change much, except to cleanup part of the old priority code.

- Signals and slots need to be adapted (or we need to call analyzerqueue from the workers to emit the existing ones)

- Worker threads will die just like they do now when they have no more work to do.

Revision history for this message

JosepMa (josepma) wrote on 2016-11-12:

#2

I forgot to mention one thing.

The idea of the worker having only pointers to the analyzers is not so much an efficiency decision as it is a decision to dynamically allocate the analyzers for a concrete track.

This way, when we prioritize one track (which we assume comes from the playermanager), the worker that will take it can be configured to include the waveform analyzer. The exact way to implement this can change, but maybe the queue of tracks could have an instance of another class that could contain the priority flag.

We cannot assume that we can always get a worker and make it skip its work. We should only do that on workers that are working on a non-prioritized track (think about loading three decks on a 2 core machine).

Revision history for this message

RJ Skerry-Ryan (rryan) wrote on 2016-11-12:

#3

Sounds good to support a thread pool of analyzers dequeueing work from the pool. The UI interaction is a little tricky so it'd be nice to discuss that a little more before starting work on implementation.

I am worried about the plan of sizing the analyzer thread pool based on the number of cores. My concern is that no matter what our heuristic, it will go wrong for some folks. For these people, a problem like this can make Mixxx unsuitable for live use (e.g. Bug #529614). To this end, I think we should expose the # of analyzer threads as a setting to the user in Preferences so they have a recourse if something goes wrong. Maybe we can set the default based on # of available hardware threads.

Some thoughts on thread priorities: In 2014 we re-organized our thread priorities into priority classes, Bug #1270583.

QThread::setPriority(QThread::IdlePriority) translates to pthread SCHED_IDLE policy with priority 0 (BSD, OSX and Linux) and THREAD_PRIORITY_IDLE on Windows. Let's refer to concrete details when discussing priority changes. Anything above QThread::IdlePriority still uses the thread's original scheduling policy (e.g. SCHED_OTHER) with pthreads.

We have had problems with IdlePriority in the past causing tracks to take a long time to analyze (e.g. we had one user report analyzing a 3 minute track took 1 hour IIRC). I would use caution here. Ultimately, the "priority group" scheme we introduced in Bug #1270583 addresses expressing the relative importance of Mixxx threads to the OS. I'm not sure we need to do more.

Setting thread priorities (especially SCHED_IDLE policy) on a CPU bound thread can also have unintended consequences of causing excessive thread pre-emption which can slow down analysis massively because every pre-emption causes CPU caches and TLB to flush.

Changed in mixxx:
status:	New → Confirmed
importance:	Undecided → Wishlist

Revision history for this message

Daniel Schürmann (daschuer) wrote on 2016-11-12:

#4

One more thing to consider is, that the track analysis needs a CPU core, memory access and HDD access, which is shared with Sqlite and CachingReaders.

We have already users complaining about audio dropouts when analyzing the track on the fly.

The assumption: "If it is playing, max-1" is probably wrong, since Mixxx uses already several threads.

Multitheading analysis will be a great benefit when Mixxx is not playing.
Than we can use all Hardware recourses.

We may also consider to split the analysis itself into parallel task. Only one should require HDD.

Has anyone OpenMP experience? This could also be interesting for us.

Revision history for this message

Mel Grubb (melgrubb) wrote on 2016-11-14:

#5

As an end user, I think that keeping total resource usage low during a performance is the most important thing. From a very high level, I would think the prioritization logic should be more along the lines of:

If a track is currently playing, stop processing from the queue and do only on-demand analysis when a track is loaded. Alternately, limit to a user configurable max thread value, defaulted to 1.
If there is no track currently playing, use all available resources.

Then again, I don't understand why anyone would load up the analysis queue while performing. That is something I only do offline, and usually only when I know I've recently added a lot of new music to the computer. I rescan the library, select all "new" and hit Analyze. Then I go get dinner or watch TV for a while.

Revision history for this message

JosepMa (josepma) wrote on 2016-11-14:

#6

Download full text (4.7 KiB)

Let's put everything in context first:

A) The batch analysis of tracks is only performed when the user explicitly presses analyze on the analyze view.
B) This analysis can be stopped if needed by pressing the same button that started it.
C) We are trying to move from a single thread analysis to a multithread analysis where the CPU resources are maximized (as in used if available).

D) The "on demand" analisys is that which happens when the track is loaded into a deck, and such track has not been analyzed yet.

If an user does "A" during a live performance, we can only provide a degree of glitch-free response dependant on the OSs ability of priorizing threads. And if it is not enough, the user has option "B".

About "C", this is about maximizing performance so we are interested in using the resources, if they are available. It is usual to offer the option to the user to change the amount of threads to use in case the detected setting isn't working as good as it should.

In this context, we can only think about "D" as being a "live performance" case, and the truth is that this case is not changing, because that is one single analysis. I am not proposing to do each analysis type into a different thread, since that requires thread synchronization and most probably the waveform analysis takes most of the time, so the whole analysis time would not improve much. (We might try that at some point, but it's not the main idea in this bug)
_
_
_
Now, how would I do it?

If there isn't any value for our "multithreaded_analysis" setting, or its value is zero, calculate the number of threads and set it.
We have two options to make this available to the user:
A) have an editbox to enter the value, and use always this value, except if it is less than 1 or higher than max-detected-threads, where we would change it to max-detected-threads and show it to the user that we changed it.
B) have an editbox to enter the value and another editbox, non-editable, that shows the actual number of threads that we are going to use, and use always this value, except if it is less than 1 or higher than max-detected-threads, where we would show to that we use max-detected-threads.
_
_
_
The process itself, as i described above consists of generating the desired amount of threads, and have them "consume" an initially filled queue of tracks.
All threads work independently and the queue is guarded so that only one thread at a time can update it.

In this context, thread priority was supposed to play a role since this is by definition a "long-running" task, and we still want to continue working on other parts of the application.
As said by RJ Ryan, we should not use idle priority since that doesn't work too well (seems that it's more geared to servers, where these tasks would only work when there is no user interaction).

Still, we can give hints with LOW_PRIORITY, NORMAL_PRIORITY and HIGH_PRIORITY.
Since we are not necessarily blocking other things, we should not find ourselves with problems. (Not sure about that cachereaderworker... )

About the amount of threads: The amount of threads only matters when they need to do something, since it will take longer for a lower priority ...

Let's put everything in context first:

A) The batch analysis of tracks is only performed when the user explicitly presses analyze on the analyze view.
B) This analysis can be stopped if needed by pressing the same button that started it.
C) We are trying to move from a single thread analysis to a multithread analysis where the CPU resources are maximized (as in used if available).

D) The "on demand" analisys is that which happens when the track is loaded into a deck, and such track has not been analyzed yet.

If an user does "A" during a live performance, we can only provide a degree of glitch-free response dependant on the OSs ability of priorizing threads. And if it is not enough, the user has option "B".

About "C", this is about maximizing performance so we are interested in using the resources, if they are available. It is usual to offer the option to the user to change the amount of threads to use in case the detected setting isn't working as good as it should.

In this context, we can only think about "D" as being a "live performance" case, and the truth is that this case is not changing, because that is one single analysis. I am not proposing to do each analysis type into a different thread, since that requires thread synchronization and most probably the waveform analysis takes most of the time, so the whole analysis time would not improve much. (We might try that at some point, but it's not the main idea in this bug)
_
_
_
Now, how would I do it?

If there isn't any value for our "multithreaded_analysis" setting, or its value is zero, calculate the number of threads and set it.
We have two options to make this available to the user: 
A) have an editbox to enter the value, and use always this value, except if it is less than 1 or higher than max-detected-threads, where we would change it to max-detected-threads and show it to the user that we changed it.
B) have an editbox to enter the value and another editbox, non-editable, that shows the actual number of threads that we are going to use, and use always this value, except if it is less than 1 or higher than max-detected-threads, where we would show to that we use max-detected-threads.
_
_
_
The process itself, as i described above consists of generating the desired amount of threads, and have them "consume" an initially filled queue of tracks.
All threads work independently and the queue is guarded so that only one thread at a time can update it.

In this context, thread priority was supposed to play a role since this is by definition a "long-running" task, and we still want to continue working on other parts of the application.
As said by RJ Ryan, we should not use idle priority since that doesn't work too well (seems that it's more geared to servers, where these tasks would only work when there is no user interaction).

Still, we can give hints with LOW_PRIORITY, NORMAL_PRIORITY and HIGH_PRIORITY.
Since we are not necessarily blocking other things, we should not find ourselves with problems. (Not sure about that cachereaderworker... )

About the amount of threads: The amount of threads only matters when they need to do something, since it will take longer for a lower priority thread to get its share of CPU if other more priorized threads are active.
But then, if a thread doesn't have anything to do (as in it is sleeping, waiting for events, synchronizing, or otherwise not ready to run), we are not making anything go slower by having a low priority thread run for the whole time-slice.

The real problem happens if we need an immediate response. An OS, except for interruptions, will not pause a running thread if it hasn't had it's time-slice. Since this is dependant on the OS, our priorized task might need to wait for a whole time-slice to start (and that's when there's no other similarly priorized task or other types of locks).

At last, when i said about reducing by one the number of threads if a track is playing, it is only so that in that case we know it is going to require some extra CPU. (And we UI graphics count too)
There's yet another option to play nicely with the user, reduce the number of threads to use if we detect an audio xrun. Do so until there is just one thread.
_
_
_
I started looking at how are threads best done with QT, and the more i read about, the more I'm convinced that I can only use "barebones" QThread (just like we do now). We need a Per-thread instance of analyzers (but not a per-task one), we have a constant number of threads and we want them all to do the same, taking as input a synchronized queue of tracks to process.
I.e. We are not in a scenario of list of black-box-like tasks to do (that include everything they need in order to acomplish that task) that are passed to a waiting thread that just tells them that they can run.

JosepMa (josepma) on 2016-11-25

Changed in mixxx:
assignee:	nobody → JosepMa (josepma)
status:	Confirmed → In Progress

Revision history for this message

JosepMa (josepma) wrote on 2016-11-25:

#7

I started working on this some days ago and I've got what I consider to be the workers ready.
Now I have to implement the worker manager, which is the one that will create the threads and feed them with the tracks to analyze.

It was a bit complicated to understand correctly how do the slots/signals operate in a multithreaded scenario, and combine them correctly with mutex, waitconditions and the likes.

I've done it this way:

- Manager creates the worker and thread and assigns them (todo)
- worker creates the analyzers (each worker needs their own instances) when starting (not in the constructor) sends a signal to the manager to ask for a track and waits on a waitcondition.
- The manager receives the message and calls the worker directly feeding it one track (todo. The manager will know the worker because the worker itself tells how it is with the message).
- The worker wakes up and processes the track. It sends signals to the manager to inform about the progress.
- The manager receives the progress signals, and sends more signals for UI update.
- The worker finishes analyzing the track and sends again the message to the manager that it needs a new track.
- If the manager has no more tracks to analyze, it calls the worker method to stop.
- The worker then sends a signal to the manager to indicate that it is finishing, and also sends a signal to the thread and they both clean up automatically.

I.e. threads will be alive only while they are required.

The scenario of prioritizing tracks (i.e. analysis of the player tracks) goes as follow:
- If the manager has allocated max number of threads, take one of the existing workers and tell it to pause. Pause means wait on a waitcondition similarly to what they do to get a new track. The granularity is each processed buffer.
- Then, it allocates a new worker, which is then configured for the foreground analysis type (i.e. ensuring waveform analysis is on). This worker will work similarly to other workers and when it finishes, it will ask too for a new track.
- If there is no more priority tracks to analyze, the manager will tell the priority worker to stop and cleanup, and then it will tell the paused worker to continue.

Most of the manager part is still on todo, but the parts on the worker that the manager will use are already there.

Hope this will be good enough and not too complicate to understand either (hey.. it wasn't that easy right now anyway)

I started working on this some days ago and I've got what I consider to be the workers ready. 
Now I have to implement the worker manager, which is the one that will create the threads and feed them with the tracks to analyze.

It was a bit complicated to understand correctly how do the slots/signals operate in a multithreaded scenario, and combine them correctly with mutex, waitconditions and the likes.

I've done it this way:

- Manager creates the worker and thread and assigns them (todo)
- worker creates the analyzers (each worker needs their own instances) when starting (not in the constructor) sends a signal to the manager to ask for a track and waits on a waitcondition.
- The manager receives the message and calls the worker directly feeding it one track (todo. The manager will know the worker because the worker itself tells how it is with the message).
- The worker wakes up and processes the track. It sends signals to the manager to inform about the progress.
- The manager receives the progress signals, and sends more signals for UI update.
- The worker finishes analyzing the track and sends again the message to the manager that it needs a new track.
- If the manager has no more tracks to analyze, it calls the worker method to stop.
- The worker then sends a signal to the manager to indicate that it is finishing, and also sends a signal to the thread and they both clean up automatically.

I.e. threads will be alive only while they are required.

The scenario of prioritizing tracks (i.e. analysis of the player tracks) goes as follow: 
- If the manager has allocated max number of threads, take one of the existing workers and tell it to pause. Pause means wait on a waitcondition similarly to what they do to get a new track. The granularity is each processed buffer.
- Then, it allocates a new worker, which is then configured for the foreground analysis type (i.e. ensuring  waveform analysis is on). This worker will work similarly to other workers and when it finishes, it will ask too for a new track.
- If there is no more priority tracks to analyze, the manager will tell the priority worker to stop and cleanup, and then it will tell the paused worker to continue.

Most of the manager part is still on todo, but the parts on the worker that the manager will use are already there.

Hope this will be good enough and not too complicate to understand either (hey.. it wasn't that easy right now anyway)

Revision history for this message

Sébastien BLAISOT (sblaisot) wrote on 2016-11-25:

#8

can you also please explain the failing/error handling scenarios, like if a worker thread crash/hang and never send back any signal to the manager ?
Is there some sort of "timeout" or something which can be used by the manager to terminate the worker thread ?

Revision history for this message

JosepMa (josepma) wrote on 2016-11-25:

#9

Mmmm. I didn't think about that. If i didn't overlook it, there's nothing in this regard right now either.

The crashing scenario might require a simple try-catch on the main process method, so that it can get captured and inform the manager that it ends.

The hang scenario is a bit more controversial. Maybe it could get detected (checking the periodicity of calls to progress), but nevertheless, it needs a forced stop, which means telling the thread directly to die. Maybe a QTimer once it has asked the thread to stop could test if it is still alive and then force it to stop.

Revision history for this message

JosepMa (josepma) wrote on 2016-12-03:

#10

First version of the changes here:

https://github.com/JosepMaJAZ/mixxx/tree/multithreaded-analysis

Revision history for this message

Daniel Schürmann (daschuer) wrote on 2016-12-05:

#11

Cool!
If you like an early review, please issue a normal PR against master and describe the unfinished state in the description.

Revision history for this message

Uwe Klotz (uklotzde-deactivatedaccount) wrote on 2018-04-18:

#12

Superseded by the responsive analysis framework that supports multi-threading implicitly by design:
https://github.com/mixxxdj/mixxx/pull/1624

Sorry, JosepMa, I didn't plan to capture this task from you! The multitude of issues around our existing analysis framework just became apparent while digging deeper and deeper into the topic, leading to a complete rewrite literally.

Changed in mixxx:
milestone:	none → 2.2.0

Be (be.ing) on 2018-06-24

Changed in mixxx:
milestone:	2.2.0 → 2.3.0
assignee:	JosepMa (josepma) → Uwe Klotz (uklotzde)

Be (be.ing) on 2018-12-30

Changed in mixxx:
status:	In Progress → Fix Committed

Daniel Schürmann (daschuer) on 2021-06-29

Changed in mixxx:
status:	Fix Committed → Fix Released

Revision history for this message

Swiftb0y (swiftb0y) wrote on 2022-10-08:

#13

Mixxx now uses GitHub for bug tracking. This bug has been migrated to:
https://github.com/mixxxdj/mixxx/issues/8686

lock status:

Metadata changes locked and limited to project staff

Mixxx

Improvement: Multithreaded BPM analysis

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches