Generates useless mov instructions with 64-bit add on 32-bit arch

Bug #1775263 reported by Manuel Pégourié-Gonnard on 2018-06-05
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GNU Arm Embedded Toolchain
Undecided
Unassigned

Bug Description

Hi,

This is more an enhancement request than a bug report, as the generated code is correct but quite suboptimal.

Environement: Linux 64-bit (Arch), arm-gcc toolchain from the distro package.
% arm-none-eabi-gcc --version | head -n1
arm-none-eabi-gcc (Arch Repository) 8.1.0

Test file:
#include <stdint.h>
uint32_t foo(uint32_t a, uint32_t b, uint32_t c) {
    return ((uint64_t) a + b + c) >> 32;
}

Build command: arm-none-eabi-gcc -march=armv6-m -mthumb -Wall -Wextra -Os -S -o - 64.c | sed -n '/^foo/,/^$/p'

Observed behaviour: the generated code uses a lot of registers (resulting in useless memory accesses) and contains 7 useless movs instructions (more than doubling the total number of instructions):

 push {r4, r5, r6, r7, lr}
 movs r4, #0
 movs r5, r0
 movs r6, r1
 movs r7, r4
 movs r0, r2
 movs r1, r4
 movs r2, r5
 adds r0, r0, r6
 adcs r1, r1, r7
 movs r3, r4
 adds r0, r0, r2
 adcs r1, r1, r3
 @ sp needed
 movs r0, r1
 pop {r4, r5, r6, r7, pc}

Expected behaviour: the generated code should use a reasonably minimal number of registers and not add useless instructions. For example, the code generated by clang 6.0 is as expected:

 movs r3, #0
 adds r1, r1, r0
 mov r0, r3
 adcs r0, r0
 adds r1, r1, r2
 adcs r0, r3
 bx lr

Why this matters: casting to a double-sized type is a common idiom for taking advantage of the carry flag (without having to resort to asm). GCC should generate efficient code for this idiom which is often used in performance-critical code.

Thanks!

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers