This is still a size and speed optimization at Os but not for this benchmark at O2 and O3. It's also a speed optimization in case you are generating quotients and remainders of 2 different numbers in a routine. It's not a speed optimization for this particular benchmark because you are still generating the multiplies at O2 and O3 but is alright for the same. For this case we won't get a further benefit with code generation and improvements in this particular benchmark for speed.
This is still a size and speed optimization at Os but not for this benchmark at O2 and O3. It's also a speed optimization in case you are generating quotients and remainders of 2 different numbers in a routine. It's not a speed optimization for this particular benchmark because you are still generating the multiplies at O2 and O3 but is alright for the same. For this case we won't get a further benefit with code generation and improvements in this particular benchmark for speed.
I've prototyped this on trunk with the branch
https:/ /code.launchpad .net/~ramana/ gcc-linaro/ divmodsi4- experiments
Ramana