Drop use of ControlObjectThread->get() in performance-sensitive code

Bug #1088227 reported by Owen Williams
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mixxx
Fix Released
Low
Owen Williams

Bug Description

Calls to ControlObject ->get() have a lot of overhead, so instead of using get() on every render pass, instead we should bind a slot to the ValueChanged signal so that the value only updates when the control object changes.

and I have stats!

Before:
Debug [Main]: Stat("WaveformWidgetFactory::refresh() 2waveforms","count=2524,sum=2.105e+09ns,average=833994ns,min=0ns,max=8e+06ns,variance=3.35094e+11ns^2,stddev=578873ns")

After:
Debug [Main]: Stat("WaveformWidgetFactory::refresh() 2waveforms","count=2827,sum=1.836e+09ns,average=649452ns,min=0ns,max=4.9e+07ns,variance=2.05082e+12ns^2,stddev=1.43207e+06ns")

there are lots more opportunities to get rid of get() calls, but this is a start.

Tags: performance

Related branches

Revision history for this message
Owen Williams (ywwg) wrote :
Changed in mixxx:
status: New → In Progress
Revision history for this message
Daniel Schürmann (daschuer) wrote :

Wow, that saves a lot.

This is a simple solution suitable for 1.11. Thank you.

For later versions I would prefer to have a general solution.

Current code:
double ControlObjectThread::get() {
    m_dataMutex.lock();
    double v = m_dValue;
    m_dataMutex.unlock();
    return v;
}

for 64 bit architecture we can simply write it like this:
inline double ControlObjectThread::get() {
    return m_dValue;
}

because reading and writing a double is an atomic 64 bit operation

For 32 bit architectures we should consider if float precision is exact enough for some (most)
control objects so we can do it like that
inline float ControlObjectThread::get() {
    return m_dValue;
}

Revision history for this message
RJ Skerry-Ryan (rryan) wrote :

This only affects ControlObjectThread/ThreadMain/ThreadWidget.
Nice catch! It's easy to forget COT uses a mutex to read values.

ControlObject::get() is an inlined direct read of the double value while ControlObjectThread::get() guards the read with a mutex. That said, I'm pretty shocked at the difference in average values there. The max values lead me to believe it ended up blocking the thread.

COT doesn't emit valueChanged. COTM emits valueChanged and the slot is guaranteed to be invoked by the main thread (unless you DirectConnection it).

The patch uses DirectConnection's which is actually the equivalent to just removing the mutex lock in ControlObjectThread::get(). Since double writes are not atomic on x86 this can lead to incorrect values if the GUI thread is reading the values while the engine worker thread or the MIDI thread are in a ControlObject::sync(). I wouldn't use DirectConnection.

I would propose a different change. In COTM, add an m_dValueMain and override the get() method and in eventFilter update the m_dValueMain value. This eliminates the code verbosity of keeping a main-thread specific value of the variable and keeps reads of the COTM lock-free since they are in the main thread. That eliminates any changes in the waveform renderer.

If we could somehow make double-writes atomic or come up with a lock-free construct that lets the sync thread write to a FIFO that the COT would then check for new values from on each get() that could eliminate the lock in COT::get(). We can't use a FIFO directly because it will fill up if the owner of the COT isn't constantly reading it so once it's full the latest value of the CO won't be present.

Changed in mixxx:
milestone: none → 1.11.0
tags: added: performance
summary: - reduce use of control object ->get()'s in waveform rendering
+ Drop use of ControlObjectThread->get() in performance-sensitive code
Revision history for this message
RJ Skerry-Ryan (rryan) wrote :

Oops! I didn't read this closely enough.

COTM::setExtern is over-ridden and doesn't touch m_dValue.

COTM::eventFilter sets m_dValue by locking the mutex and it emits the signal from the main thread so a DirectConnection would be slotted by the main thread (even if you AutoConnection'd it to a main-thread object (almost all of our objects are main-thread objects) then it would be an effective DirectConnection).

This means the mutex lock in eventFilter() is unnecessary. Also, COTM::get() should be overridden to be a lock-free read of m_dValue since it is never written anywhere other than the main thread and by using a COTM you are saying "I will only use this control from the main thread."

Revision history for this message
RJ Skerry-Ryan (rryan) wrote :

I was also wrong about COT's not emitting valueChanged. I forgot we changed that in 1.11. They do but it's not great to do tons of work driven on this signal since you are executing the thread of whomever called ControlObject::sync().

The controller engine hooks valueChanged() on COT to run script events but since the ControllerEngine lives in the controller thread (which runs its own Qt event loop) the valueChanged signal is proxied from the sync-worker thread to the controller engine thread. If you use COT::valueChanged for anything you have to be careful to get the thread it does the work in right.

Revision history for this message
Owen Williams (ywwg) wrote :

So I applied this patch (attached) and the numbers are similar to the original ones:

New stats:Debug [Main]: Stat("WaveformWidgetFactory::refresh() 2waveforms","count=1181,sum=1.043e+09ns,average=883150ns,min=0ns,max=3.2e+07ns,variance=1.20159e+12ns^2,stddev=1.09617e+06ns")

I think perhaps the numbers can be easily skewed. One period of IO blockage causing many missed frames will shift the average greatly. I'm having trouble reproducing the original fantastic numbers :/

In any case, preventing unnecessary locks is a good thing.

Revision history for this message
Owen Williams (ywwg) wrote :

(clarifying: I'm having trouble getting the original numbers even when I reapply just that patch and test again.)

Revision history for this message
Daniel Schürmann (daschuer) wrote :

Puh .. hard to understand ... I think Its really a good Idea to refactor that.
Do we have a drawing about all the control object stuff?

RJs Idea sounds good. We should also add an assert for calling those unprotected COTM function from other than the GUI thread.

RJ Skerry-Ryan (rryan)
Changed in mixxx:
importance: Undecided → Low
Revision history for this message
Daniel Schürmann (daschuer) wrote :

My last post was intendet to be at #6 sorry for the delay.

I wonder why ControlObjectThread::get is virtual and why ControlObjectThread has virtual functions at all?
The only Inherited class is ControlObjectThreadMain right?
Is ControlObjectThreadMain used by a ControlObjectThread pointer anywhere? This would make would remove the mutex protection without any notice.

If its possible I would prefer not to have virtuals in ControlObjectThread and add the assert like proposed.

Revision history for this message
RJ Skerry-Ryan (rryan) wrote :

Maybe not an assert :) but a --developer mode warning?

Owen -- the patch is good except slotSet/add/sub use the mutex also.

Another thing I noticed is that updateControlObject calls get() so a slotSet() results in a lock from setting the value and then an additional lock from the get() in updateControlObject. This is avoidable by making updateControlObject take the new value.

The entire skin / GUI uses COTM/COTW heavily (all <Connection> blocks are a COTW) so this should actually be a nice improvement all over the GUI.

I also removed some old cruft in ControlObjectThread for updating every COT by using a static QQueue. It's been unused for years and is problematic in a lot of ways so I removed it to make things easier to read.

The attached patch does all this. I committed it to lp:mixxx/1.11 r3594

Changed in mixxx:
status: In Progress → Fix Committed
Revision history for this message
RJ Skerry-Ryan (rryan) wrote :

BTW, I was able to reproduce:

before:
Debug [Main]: Stat("WaveformWidgetFactory::refresh() 2waveforms","count=1443,sum=4.45201e+09ns,average=3.08524e+06ns,min=322578ns,max=2.75494e+07ns,variance=1.60048e+13ns^2,stddev=4.00059e+06ns")

after:
Debug [Main]: Stat("WaveformWidgetFactory::refresh() 2waveforms","count=827,sum=5.65783e+08ns,average=684139ns,min=303737ns,max=9.56538e+06ns,variance=4.30306e+11ns^2,stddev=655977ns")

On average, 4.5 times faster rendering.
Max value is 2.8 times faster
Min time is 1.06 times faster

Revision history for this message
RJ Skerry-Ryan (rryan) wrote :

It goes:
ControlObjectThread- > ControlObjectThreadMain -> ControlObjectThreadWidget

Certain methods between COT/COTM/COTW rely on virtual methods to invoke the right behavior (e.g. updateControlObject).

If you create a COTM and address it via a COT pointer the "right" behavior imo would be for it to behave like a COTM no matter what. If you didn't have virtual methods for get()/slotSet() etc. then you would go back to locking when using a COTM but addressing it by a COT pointer.

Revision history for this message
Owen Williams (ywwg) wrote :

Fantastic! I'm going to take these changes for a spin right now :)

Revision history for this message
Daniel Schürmann (daschuer) wrote :

http://stackoverflow.com/questions/449827/virtual-functions-and-performance-c

If you are unsure if virtual is required in this case, we should not use it.
Virtual functions cannot be in-lined, but in our case it would be nice if get() becomes an inline function.

Revision history for this message
RJ Skerry-Ryan (rryan) wrote : Re: [Bug 1088227] Re: Drop use of ControlObjectThread->get() in performance-sensitive code

Which virtuals do you want to remove? I think they should stay in the COT
for correctness when using a COTM in a COT* but could be removed from COTM
since COTW doesn't override any of those methods.

I know virtual functions come at a cost of a vtable lookup and that they
can't be inlined unless the compiler knows the true type (i.e. you allocate
the control on the stack).
CPU designs over the years have drastically improved in their ability to
speed up virtual function calls. The advice to remove virtual function
calls is generally a holdover from the 80s and 90s.

In my opinion, not using virtual function calls can lead to dangerous
software since it has assumptions built in about how you will use the class
(e.g. nobody ever calls methods on a COTM from a COT*, or COTW doesn't
over-ride get/add/sub/slotSet today). What's true about today's use of
these objects might not be true tomorrow.

So the question is what benefit speed-wise does it get you and is that
benefit worth the risk. I would say that we should just measure it but
since the difference between a virtualized call and non-virtualized is
something like 20 CPU cycles, I really don't think this is measurable since
the noise of the measurement will be massive. And if it
isn't measurable then that begs the question how much impact does it have?
I'd say very little if it's so small you can't measure it.

On Sun, Dec 9, 2012 at 5:38 PM, Daniel Schürmann <<email address hidden>
> wrote:

> http://stackoverflow.com/questions/449827/virtual-functions-and-
> performance-c
>
> If you are unsure if virtual is required in this case, we should not use
> it.
> Virtual functions cannot be in-lined, but in our case it would be nice if
> get() becomes an inline function.
>
> --
> You received this bug notification because you are a member of Mixxx
> Development Team, which is subscribed to Mixxx.
> https://bugs.launchpad.net/bugs/1088227
>
> Title:
> Drop use of ControlObjectThread->get() in performance-sensitive code
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/mixxx/+bug/1088227/+subscriptions
>

Revision history for this message
Daniel Schürmann (daschuer) wrote :

Ah, I think I have looked to much to my embedded gases ;-)

I think the performance gain is big if we inline the get functions and compare a virtual call. But sure it will not be measurable if we test the duration of refresh();

get() is always called by a pointer so "inline" keyword is ignored and misleading in this case. The virtual keyword in COTM is optional, but dos not change the virtual of get defined in the base class COT. Its best practice to have it in this case.

If we still want to save those 20 CPU cycles some day I think it is save to remove all virtuals from COT because I have not found polymorphic calls to COT/COTM.

RJ Skerry-Ryan (rryan)
Changed in mixxx:
status: Fix Committed → Fix Released
Revision history for this message
Swiftb0y (swiftb0y) wrote :

Mixxx now uses GitHub for bug tracking. This bug has been migrated to:
https://github.com/mixxxdj/mixxx/issues/6759

lock status: Metadata changes locked and limited to project staff
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.