Comment 15 for bug 1401316

Revision history for this message
Strntydog (strntydog) wrote :

I am also seeing similar bad code optimisation on a m0+ target like this.

sing gcc-arm-none-eabi-4_9-2015q3, for a cortex m0+ target.

In the process of writing code, I notice that the literal tables are full of redundant entries, at ANY Optimization Level.

Here is an example at level -O2 :

This is a disassembly from objdump of a section from my test main:

 148: 4b0b ldr r3, [pc, #44] ; (178 <main+0x34>)
 14a: 601a str r2, [r3, #0]
 14c: 4b0b ldr r3, [pc, #44] ; (17c <main+0x38>)
 14e: 3215 adds r2, #21
 150: 705a strb r2, [r3, #1]
....

 170: 4b09 ldr r3, [pc, #36] ; (198 <main+0x54>)
 172: 0512 lsls r2, r2, #20
 174: 601a str r2, [r3, #0]
 176: e7fe b.n 176 <main+0x32>
 178: 40002800 .word 0x40002800
 17c: 40002804 .word 0x40002804
 180: 40002808 .word 0x40002808
 184: 4000280c .word 0x4000280c
 188: 00001234 .word 0x00001234
 18c: 40002810 .word 0x40002810
 190: 40002818 .word 0x40002818
 194: 98760000 .word 0x98760000
 198: 4000281c .word 0x4000281c

At 148, R3 is loaded with the address 0x40002800
At 14a, a byte is stored using an offset of 0 from that address.
At 14c, R3 is loaded with the address 0x40002804
At 150, a byte is stored using an offset of 1 from that address, when it could have just used a offset of 5 from the address already loaded into r3 and skip the load at 14c, AND the table entry, saving 6 bytes and producing faster code.

When you are writing code which is bit banging GPIO registers this sort of thing makes a huge difference to performance and code density. And as the GPIO registers are usually (on most micros) clustered in a tight bunch, they can typically all be reached by offsets from a fixed base. I am not sure if this is exactly the same problem reported in this bug report, but if not then it is very similar. It can be seen the literal table is full of closely placed addresses, even if they are not used contiguously, there is no need to store anything but 0x40002800 because all of the rest can be reached with offsets.

I tried declaring memory as an array or as a bunch of pointers, neither seems to make any difference to this problem.