Avoid unnecessarily saving $lr on Thumb-1
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Linaro GCC |
Won't Fix
|
Wishlist
|
Michael Collison |
Bug Description
GCC produces sub-optimal code for ARM Thumb1 due to the limited range of the Thumb1 branch instruction.
The long story is that Thumb-1 branches have extremely limited range: O(256) bytes. The way we "solve" this is to use the branch-and-link instruction as a long branch.
Unfortunately we need to know register usage for reload to caluclate frame offsets, and this is way before we know how big all the instructions are. The current solution is to assume that any branch may be big.
Currently we have no way of fixing up out of range branches, so this estimare must be conservatively correct.
It may be able to do slightly better if we assume that we will never get more than say 10x code expansion between freezing frame offsets and final assembly.
One possibility is to teach gcc how to generate branch islands within a function, chaining if necessary to extend the range. This would allow us to make a less conservative guess on branch ranges, and fixup out of range branches when LR is not saved. It may also have secondary code size benefits by allowing multiple long branches/calls to be chained using smaller instructions.
How to reproduce:
void foo(int *i) { if (*i>1) *i = 1; else *i = 2; }
With thumb2, all is well:
foo:
ldr r3, [r0, #0]
cmp r3, #1
ble .L2
movs r3, #1
str r3, [r0, #0]
bx lr
.L2:
movs r3, #2
str r3, [r0, #0]
bx lr
But, with thumb1, there's an unnecessary stack push of lr:
foo:
push {lr}
ldr r3, [r0]
cmp r3, #1
ble .L2
mov r3, #1
str r3, [r0]
.L4:
pop {r1}
bx r1
.L2:
mov r3, #2
str r3, [r0]
b .L4
[CodeSourcery Tracker ID #797]
Changed in gcc-linaro: | |
importance: | Undecided → Low |
tags: | added: armel toolchain |
tags: | added: speed task |
Is this just for tracking, or do we want to spend time improving Thumb-1 support?!