Fails to take advantage of rev16 instruction on v6+
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
GNU Arm Embedded Toolchain |
New
|
Undecided
|
Unassigned |
Bug Description
Hi,
This is more an enhancement request than a bug report, as the generated code is correct but suboptimal.
Environement: Linux 64-bit (Arch), arm-gcc toolchain from the distro package.
% arm-none-eabi-gcc --version | head -n1
arm-none-eabi-gcc (Arch Repository) 8.1.0
Test file:
#include <stdint.h>
uint32_t rot(uint32_t x) {
return (x << 16) | (x >> 16);
}
uint32_t rev(uint32_t x) {
return (((x ) & 0xff) << 24) |
(((x >> 8) & 0xff) << 16) |
(((x >> 16) & 0xff) << 8) |
(((x >> 24) & 0xff) );
}
uint32_t rev16(uint32_t x) {
return rev(rot(x));
}
Build command: arm-none-eabi-gcc -march=armv6-m -mthumb -Wall -Wextra -Os -S -o - bytes.c
Observed behaviour: rev() is correctly optimised to 1 instruction, but rev16() is compiled to three instructions.
Expected behaviour: rev16() should compile to a single REV16 instruction: http://
Why it (sometimes) matters: this was found while working on an implementation of the Korean block cipher ARIA, which has many rev16 operations in its inner loop. Having GCC recognize this possible optimisation (just like it does for rev and rotations) would yield measurably better performance (without the need for inline asm).
Thanks!