I have a incomplete patch for the moment but the results are already good:
(somehow the machine I'm using got faster since yesterday, so I reiterated all the benchmarks for comparison)
original code:
direct took 2.20 seconds
invoke took 29.81 seconds
So, invoke overhead was ~ 27.61 seconds (~ 93%)
with attachment 433119:
direct took 2.20 seconds
invoke took 22.66 seconds
So, invoke overhead was ~ 20.45 seconds (~ 90%)
with the WIP patch:
direct took 2.22 seconds
invoke took 16.94 seconds
So, invoke overhead was ~ 14.72 seconds (~ 87%)
We're looking at almost 2x improvement, though the speed test is actually biased, as it only tries one type of calls, that doesn't even need to use the stack.
The current status of the WIP is that it is clean of any assembly and works properly, except that it lacks proper comments, and more importantly, breaks the non EABI case (which I unfortunately can't test, but I can at least try to make it theoretically work).
I have a incomplete patch for the moment but the results are already good:
(somehow the machine I'm using got faster since yesterday, so I reiterated all the benchmarks for comparison)
original code:
direct took 2.20 seconds
invoke took 29.81 seconds
So, invoke overhead was ~ 27.61 seconds (~ 93%)
with attachment 433119:
direct took 2.20 seconds
invoke took 22.66 seconds
So, invoke overhead was ~ 20.45 seconds (~ 90%)
with the WIP patch:
direct took 2.22 seconds
invoke took 16.94 seconds
So, invoke overhead was ~ 14.72 seconds (~ 87%)
We're looking at almost 2x improvement, though the speed test is actually biased, as it only tries one type of calls, that doesn't even need to use the stack.
The current status of the WIP is that it is clean of any assembly and works properly, except that it lacks proper comments, and more importantly, breaks the non EABI case (which I unfortunately can't test, but I can at least try to make it theoretically work).