GCC is generating horrifically inefficient code for division by a constant
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
GNU Arm Embedded Toolchain |
New
|
Undecided
|
Unassigned |
Bug Description
I'm using the latest available version of arm GCC:
arm-none-eabi-gcc (GNU Arm Embedded Toolchain 10-2020-q4-major) 10.2.1 20201103 (release)
Copyright (C) 2020 Free Software Foundation, Inc.
When I compile this code using "-mcpu=cortex-m0 -mthumb -Ofast":
int main(void) {
uint16_t num = (uint16_t) ADC1->DR;
ADC1->DR = num / 7;
}
I would expect that the division would be accomplished by a multiplication and a shift, but instead this code is being generated:
08000b5c <main>:
8000b5c: b510 push {r4, lr}
8000b5e: 4c05 ldr r4, [pc, #20] ; (8000b74 <main+0x18>)
8000b60: 2107 movs r1, #7
8000b62: 6c20 ldr r0, [r4, #64] ; 0x40
8000b64: b280 uxth r0, r0
8000b66: f7ff facf bl 8000108 <__udivsi3>
8000b6a: b280 uxth r0, r0
8000b6c: 6420 str r0, [r4, #64] ; 0x40
8000b6e: 2000 movs r0, #0
8000b70: bd10 pop {r4, pc}
8000b72: 46c0 nop ; (mov r8, r8)
8000b74: 40012400 .word 0x40012400
Using __udivsi3 instead of multiply and shift is terribly inefficient
This seems to be an issue with the M0 backend, as good code is generated for M3/M4.