Fails to take advantage of rev16 instruction on v6+

Bug #1775261 reported by Manuel Pégourié-Gonnard
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GNU Arm Embedded Toolchain
New
Undecided
Unassigned

Bug Description

Hi,

This is more an enhancement request than a bug report, as the generated code is correct but suboptimal.

Environement: Linux 64-bit (Arch), arm-gcc toolchain from the distro package.
% arm-none-eabi-gcc --version | head -n1
arm-none-eabi-gcc (Arch Repository) 8.1.0

Test file:
#include <stdint.h>

uint32_t rot(uint32_t x) {
    return (x << 16) | (x >> 16);
}

uint32_t rev(uint32_t x) {
    return (((x ) & 0xff) << 24) |
           (((x >> 8) & 0xff) << 16) |
           (((x >> 16) & 0xff) << 8) |
           (((x >> 24) & 0xff) );
}

uint32_t rev16(uint32_t x) {
    return rev(rot(x));
}

Build command: arm-none-eabi-gcc -march=armv6-m -mthumb -Wall -Wextra -Os -S -o - bytes.c

Observed behaviour: rev() is correctly optimised to 1 instruction, but rev16() is compiled to three instructions.

Expected behaviour: rev16() should compile to a single REV16 instruction: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489h/Cihjgdid.html

Why it (sometimes) matters: this was found while working on an implementation of the Korean block cipher ARIA, which has many rev16 operations in its inner loop. Having GCC recognize this possible optimisation (just like it does for rev and rotations) would yield measurably better performance (without the need for inline asm).

Thanks!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.