link against SSE enabled rubberband / soundtouch
Bug #1406117 reported by
Daniel Schürmann
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Mixxx |
Confirmed
|
Wishlist
|
Unassigned |
Bug Description
It turns out, that Ubuntus rubberband and soundtouch are compiled for i386 without using the advantage of the fast sse registers.
We should consider to provide a sse enabled version for these libaries, since they are mayor CPU eaters in the audio callback.
soundtouch uses just -O3 in Ubuntu Trusty
rubberband uses just -O2 in Ubuntu Trusty
Adding -msse -mfpmath=sse will do the job.
Changed in mixxx: | |
importance: | Undecided → Wishlist |
milestone: | none → 1.12.0 |
Changed in mixxx: | |
milestone: | 2.0.0 → none |
To post a comment you must log in.
Note this comment block in rubberband:
"/* evaluation of 4 sines at onces, using only SSE1+MMX intrinsics so
it runs also on old athlons XPs and the pentium III of your grand
mother.
The code is the exact rewriting of the cephes sinf function.
Precision is excellent as long as x < 8192 (I did not bother to
take into account the special handling they have for greater values
-- it does not return garbage for arguments over 8192, though, but
the extra precision is missing).
Note that it is such that sinf((float)M_PI) = 8.74e-8, which is the
surprising but correct result.
Performance is also surprisingly good, 1.33 times faster than the
macos vsinf SSE2 function, and 1.5 times faster than the
__vrs4_sinf of amd's ACML (which is only available in 64 bits). Not
too bad for an SSE1 function (with no special tuning) !
However the latter libraries probably have a much better handling of NaN,
Inf, denormalized and other special arguments..
On my core 1 duo, the execution of this function takes approximately 95 cycles.
From what I have observed on the experiments with Intel AMath lib, switching to an
SSE2 version would improve the perf by only 10%.
Since it is based on SSE intrinsics, it has to be compiled at -O2 to
deliver full speed.
*/
"