GNU Arm Embedded Toolchain

Generates useless mov instructions with 64-bit add on 32-bit arch

Bug #1775263 reported by Manuel Pégourié-Gonnard on 2018-06-05

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	GNU Arm Embedded Toolchain	New	Undecided	Unassigned

Bug Description

Hi,

This is more an enhancement request than a bug report, as the generated code is correct but quite suboptimal.

Environement: Linux 64-bit (Arch), arm-gcc toolchain from the distro package.
% arm-none-eabi-gcc --version | head -n1
arm-none-eabi-gcc (Arch Repository) 8.1.0

Test file:
#include <stdint.h>
uint32_t foo(uint32_t a, uint32_t b, uint32_t c) {
return ((uint64_t) a + b + c) >> 32;
}

Build command: arm-none-eabi-gcc -march=armv6-m -mthumb -Wall -Wextra -Os -S -o - 64.c | sed -n '/^foo/,/^$/p'

Observed behaviour: the generated code uses a lot of registers (resulting in useless memory accesses) and contains 7 useless movs instructions (more than doubling the total number of instructions):

push {r4, r5, r6, r7, lr}
movs r4, #0
movs r5, r0
movs r6, r1
movs r7, r4
movs r0, r2
movs r1, r4
movs r2, r5
adds r0, r0, r6
adcs r1, r1, r7
movs r3, r4
adds r0, r0, r2
adcs r1, r1, r3
@ sp needed
movs r0, r1
pop {r4, r5, r6, r7, pc}

Expected behaviour: the generated code should use a reasonably minimal number of registers and not add useless instructions. For example, the code generated by clang 6.0 is as expected:

movs r3, #0
adds r1, r1, r0
mov r0, r3
adcs r0, r0
adds r1, r1, r2
adcs r0, r3
bx lr

Why this matters: casting to a double-sized type is a common idiom for taking advantage of the carry flag (without having to resort to asm). GCC should generate efficient code for this idiom which is often used in performance-critical code.

Thanks!

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.