priority inversion in the engine

Bug #1641360 reported by Be
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mixxx
In Progress
High
Daniel Schürmann

Bug Description

When a deck has a track loaded to it for the first time since opening Mixxx, CPU usage spikes. This can cause an xrun when loading the second track to be played in a set. It does not occur when loading subsequent tracks to the deck. This occurs with tracks that are already analyzed, so it is not an issue of analyses being run. I think whatever initialization is done on first track load should be done as Mixxx is starting.

Be (be.ing)
description: updated
Revision history for this message
Be (be.ing) wrote :

This occurs with sampler decks too so it is likely an issue with BaseTrackPlayer.

Revision history for this message
Daniel Schürmann (daschuer) wrote :

I did yesterday some investigations if the emit valueChange and the mutex is an issue.
I use my netbook with 5 ms buffer LR EQ and on deck playing and the other loading + analyzing.
I have very reproachable an underflow when the analysis is finished.

We a significant delay inside EngineBuffer::process - pauselock if the underrun occurs. Inside this block the delay can happen on different positions.
I have not actually cached a single valueChange singal, which causes the underflow. But they are can take up to 69 ms. This includes the direct connection of cause as well though.

Everytime a second thread writes a CO object that is connected to the main thead, when the engine tries to write as well, there is a priority inversion. Shame on QT that there is no priority inheritance.

I will experiment if a additional thread emit worker can help here. We can collect all pending value change signals in a inked list and the thread can adopt the job to actually emit them.

This has some effects:
* The overflow situation is still there, but instead of the engine the emit worker is blocked.
* I think we can omit value change singnals, when there is still a signal pending from the same CO and send always the current value. Thw worst case would be that all possible COs are chained up
* this means we loose the synced nature or the COs where we can track short changes.
If this is a problem we may introduce a sync type which does not ommit signals
* All engine value change callbacks are moved from the engine thread to the emit worker.
this is a nice gain for muticore CPUs, but may introduce new race condition because the code is not aware of a second thread.

Implementation:
* each CO gets a "next" atomic pointer to build the linked list.
* If the pointer has a value or an end marker the co is already chained up.
* Before emit it is checked for the emitting thread and in case of engine, the co is chained.
* the emit worker empties the chain as a FIFO. without locking.
* It is scheduled after the callback like the sidechain.

What do you think?

Changed in mixxx:
importance: Undecided → High
assignee: nobody → Daniel Schürmann (daschuer)
status: New → In Progress
Revision history for this message
RJ Skerry-Ryan (rryan) wrote :

Daniel -- I like your conclusion (that we need to get rid of priority inversion in the engine), but don't think this is a good direction for the implementation. It seems extremely complicated to me and will not help us simplify the engine code (which is really what it needs given how complex it is -- for example, very simple changes like https://github.com/mixxxdj/mixxx/pull/651 are hard due to the concurrency involved).

I think a better solution, is to remove Controls from the engine entirely. Instead, we should move all the COs and handlers into src/mixer/ classes -- BaseTrackPlayer, etc. Then, we communicate back and forth between the engine and src/mixer classes with a pair of FIFOs for bidirectional communication.

This has worked well in 2.0 for the effect system. The CO API is entirely implemented in the Qt main thread and the Effect* / EngineEffect* classes enforce the boundary between the engine and the rest of Mixxx.

Revision history for this message
RJ Skerry-Ryan (rryan) wrote :

> This has some effects:
> * I think we can omit value change singnals, when there is still a signal pending from the same CO and send always the current value. Thw worst case would be that all possible COs are chained up

Note: I don't think this is a safe assumption. The value based COs are probably OK, but for COs that trigger actions, I think it is unsafe in the general case.

Revision history for this message
RJ Skerry-Ryan (rryan) wrote :

(to clarify -- I mean unsafe as in it would produce a logic error, not unsafe as in would produce a crash or race condition or something)

Revision history for this message
Daniel Schürmann (daschuer) wrote :

I have considered an independent FIFO solution first, but our engine code is simply not aware of it. It is currently based on COs and I think it is a good idea to stick with it for a while to not overcomplicated the solution.

In the ideal case we do not need to touch the engine code at all. All proposed changes are done inside the ControlObjectPrivate.

It is a nice fact, that the issue currently matter only for the signals from the engine. The https://github.com/mixxxdj/mixxx/pull/651 issue is just the other way round.

If you look closely to the linked list it solution, it is finally very similar to a external fifo solution, but with the benefit, that it makes use of the QT signal dispatcher, and can reach any thread.

The trigger type COs are really an issue with this chained list. I have not had a close look, but I hope, that we only have one CO (play) CO this type, which can receive a special treatment.
Maybe we can use a fixed size lock free fifo for it like you proposed.

On the other hand we have the value type COs who externally benefit from the redundant value change ommit feature. If you consider my 69 ms lock, it will save us from CO updates of 14 engine calls that are outdated anyway. It would be much better to process only the last engine call.
Which will finally help to avoid fifo overflows which are much worse since we loose random values.

Revision history for this message
Be (be.ing) wrote :

Thanks for looking into this issue.

Revision history for this message
Be (be.ing) wrote : Re: spike in CPU usage when loading track to a deck

This issue has happened a few times even on my powerful new laptop (Intel Core i7 8550U, 16 GB RAM, 1 TB M.2 SSD), so there is something wrong in Mixxx. Originally when I reported this I encountered the issue most often when loading a track to a deck for the first time, but sometimes it happens loading a track to a deck at a random time. The discussion above is interesting, but if that general code architecture was causing the problem I do not understand why this would not be happening much more frequently than at the specific time that a track is loaded. Uwe, do you have any ideas what may be going on here? It is difficult to get a grasp of everything that goes on when a track is loaded with the rampant use of Qt's signals and slots.

summary: - spike in CPU usage when loading track to a deck for first time
+ spike in CPU usage when loading track to a deck
Revision history for this message
Daniel Schürmann (daschuer) wrote :

This can be explained by the CO looking.

The engine thread sends a lot of updates to the main thread. This works well as long as no other thread sends also changes to the main thread.
If a low prio thread is sending updates, and is inhibited by an other thread, than the engine thread is locked, waiting for the higher and low prio thread.

Loading a Track sends a lot of changes to the GUI.

Revision history for this message
Be (be.ing) wrote :

Maybe we can limit the amount of signal sent when a track is loaded to alleviate this issue before spending the effort to overhaul the architecture.

Be (be.ing)
summary: - spike in CPU usage when loading track to a deck
+ priority inversion in the engine
Revision history for this message
Be (be.ing) wrote :

I don't entirely understand the details of Daniel's proposal, but I think it is on the right track. I think the MessagePipe FIFO is appropriate for the effects system because it does more than simply pass numeric values back and forth. It is also used for transferring pointers to objects allocated on the heap in the main thread to the engine thread. However, I don't think this model should be followed throughout the engine because it requires a lot of overhead (in terms of code complexity, not computational overhead). With that approach, every message between the main thread and engine thread needs its own message type and special code to send the message from the main thread and process the received message in the engine thread. I agree with Daniel that rewriting the whole engine to use this approach would not be a great solution to the problem. Instead, I think working on the ControlObject system would be a better path forward.

If I understand Daniel's proposal correctly, it would be somewhat like the MessagePipe FIFO approach but each ControlObject would have its own private FIFO that is transparent to the rest of Mixxx. Do I understand that correctly?

Revision history for this message
Daniel Schürmann (daschuer) wrote :

Yes, I am trying to add a global linked list of ControlObjektprivates.
If the Engine thread updates a CO, it is linked into this list instead of signal this message immediately. I am currently consider how to handle direct connections here. They should be still direct.

Revision history for this message
Be (be.ing) wrote :

Should we target this for 2.3.0?

Revision history for this message
Daniel Schürmann (daschuer) wrote :

Let's decide that once we have a working solutions. Currently we have some strong blocking issues with qt5. We have to find out how and if this is related.

Revision history for this message
Be (be.ing) wrote :

I don't understand what this issue has to do with moving to Qt5. Could you explain?

Revision history for this message
Daniel Schürmann (daschuer) wrote :

I don't know either.
It is only a suspicion.

Revision history for this message
Be (be.ing) wrote :

This happened with Qt4, so I don't think it's related to whatever other problems you're referring to.

Revision history for this message
RJ Skerry-Ryan (rryan) wrote :

RE: Be,
> I think the MessagePipe FIFO is appropriate for the effects system because it does more than simply pass numeric values back and forth.

The engine *does* pass much more than numeric values back and forth, we either ignore the thread safety issues, or put a mutex on the shared state (which triggers this bug). Bug #1759637 is an instance of this. We also pass TrackPointer instances back and forth, which can cause race conditions and priority inversion (since TrackPointer requires a mutex to access). The Engine should ideally get a copy of the Track object (and its Beats and Key objects) instead of touching the one used by the rest of Mixxx. That copy ought to be delivered via a message pipe "TrackLoadRequest" message.

RE: Daniel

> If the Engine thread updates a CO, it is linked into this list instead of signal this message immediately.

This is funny, because it goes back to how COs were implemented in Mixxx 1.0 by Tue and Ken :). There was a global list of COs that were updated by the engine, which the GUI thread polled later.

This does not solve the bug, unfortunately -- because we still have shared mutable state the requires mutexes to access shared between the engine and the rest of Mixxx. We also have race conditions and logic errors caused by the design of COs -- we "flatten" conceptually grouped controls into multiple COs, and as you know, groups of COs cannot be atomically updated -- leading to race conditions. It would be better to move the CO API out of the engine (the rest of Mixxx can still use the existing CO API so no major code rewrites would be needed other than within the engine).

> I am currently consider how to handle direct connections here. They should be still direct.

Direct connections to COs updated by the engine is one of the main causes of this bug! :)

The effects message FIFO approach has worked reliably with no thread safety or priority inversion issues for the effects system -- when I wrote it, I thought of it as a test-bed for a broader fix in the engine. I still think this is the right way forward.

Revision history for this message
Daniel Schürmann (daschuer) wrote :

Fist steps are in progress for any solution.
I would be happy to receive a review for:
https://github.com/mixxxdj/mixxx/pull/1700
https://github.com/mixxxdj/mixxx/pull/1713
https://github.com/mixxxdj/mixxx/pull/1717

Revision history for this message
Be (be.ing) wrote :

> We also pass TrackPointer instances back and forth, which can cause race conditions and priority inversion (since TrackPointer requires a mutex to access).

Interesting. I have not looked at the code for track loading in detail. Do you think this explains why audible artifacts seem to occur more often when loading tracks? Is a message pipe needed here though? Do we just need to make a copy of the object before passing it to the engine?

Revision history for this message
RJ Skerry-Ryan (rryan) wrote : Re: [Bug 1641360] Re: priority inversion in the engine

Bug #1799256 is a potential mitigation for this.

On Thu, Sep 20, 2018 at 4:52 PM Be <email address hidden> wrote:

> > We also pass TrackPointer instances back and forth, which can cause
> race conditions and priority inversion (since TrackPointer requires a
> mutex to access).
>
> Interesting. I have not looked at the code for track loading in detail.
> Do you think this explains why audible artifacts seem to occur more
> often when loading tracks? Is a message pipe needed here though? Do we
> just need to make a copy of the object before passing it to the engine?
>
> --
> You received this bug notification because you are a member of Mixxx
> Development Team, which is subscribed to Mixxx.
> https://bugs.launchpad.net/bugs/1641360
>
> Title:
> priority inversion in the engine
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/mixxx/+bug/1641360/+subscriptions
>

Revision history for this message
Swiftb0y (swiftb0y) wrote :

Mixxx now uses GitHub for bug tracking. This bug has been migrated to:
https://github.com/mixxxdj/mixxx/issues/8688

lock status: Metadata changes locked and limited to project staff
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.