Comment 16 for bug 1401316

Revision history for this message
Strntydog (strntydog) wrote :

Follow up:
I tried changing my register access to using a structure (as recommended here because its "more friendly to the compiler"), rather than using the register addresses directly. It made no difference to the redundancy generated in the Literal Tables.

One thing I notice, is that the addresses are constants and the compiler knows them at run time, so are the values I am writing to them. The compiler will generate the constant values (where it can) using an 8 bit immediate load, and then it will transform the constants from one to another using a single math instruction (add/subtract with 8 bit immediate). But for addresses, it does not do the same optimization, where it could just add and subtract to the base address to generate all of the addresses in the literal table, except the first. This would not be as optimal as using appropriate offsets in the loads/stores, but would be significantly more optimal than current code generation. Even if it cant reach the new destination address with an offset, because its further away than the offset allows, this kind of optimization should happen before a new literal table entry is created (at least at -Os) as it is denser.

Also, here is a test case:

int main (void)
{
    const uint16_t p1 = 0x1234;
    const uint16_t p2 = 0x9876;
    const uint32_t p3 = 0x12349876;

    const uint32_t p4 = 0x21;
    const uint32_t p5 = 0x33;

    volatile uint16_t* const first16 = (uint16_t*)(0x40002800U);
    volatile uint32_t* const first = (uint32_t*)(0x40002800U);
    volatile uint32_t* const second = (uint32_t*)(0x40002804U);
    volatile uint32_t* const third = (uint32_t*)(0x40002808U);
    volatile uint32_t* const fourth = (uint32_t*)(0x4000280CU);

    *first16 = p1;
    *first = p1;
    *first16 = p2;
    *second = p3;
    *third = p4;
    *fourth = p5;

    while (true) {}
}

Which, using gcc-arm-none-eabi-4_9-2015q3, generates this on a Cortex M0+ (compiler options : -Os -mcpu=cortex-m0plus, -mthumb):

00000118 <main>:
    volatile uint32_t* const first = (uint32_t*)(0x40002800U);
    volatile uint32_t* const second = (uint32_t*)(0x40002804U);
    volatile uint32_t* const third = (uint32_t*)(0x40002808U);
    volatile uint32_t* const fourth = (uint32_t*)(0x4000280CU);

    *first16 = p1;
 118: 4b07 ldr r3, [pc, #28] ; (138 <main+0x20>)
 11a: 4a08 ldr r2, [pc, #32] ; (13c <main+0x24>)
 11c: 801a strh r2, [r3, #0]
    *first = p1;
 11e: 601a str r2, [r3, #0]
    *first16 = p2;
 120: 4a07 ldr r2, [pc, #28] ; (140 <main+0x28>)
 122: 801a strh r2, [r3, #0]
    *second = p3;
 124: 4a07 ldr r2, [pc, #28] ; (144 <main+0x2c>)
 126: 4b08 ldr r3, [pc, #32] ; (148 <main+0x30>)
 128: 601a str r2, [r3, #0]
    *third = p4;
 12a: 2221 movs r2, #33 ; 0x21
 12c: 4b07 ldr r3, [pc, #28] ; (14c <main+0x34>)
 12e: 601a str r2, [r3, #0]
    *fourth = p5;
 130: 4b07 ldr r3, [pc, #28] ; (150 <main+0x38>)
 132: 3212 adds r2, #18
 134: 601a str r2, [r3, #0]
 136: e7fe b.n 136 <main+0x1e>
 138: 40002800 .word 0x40002800
 13c: 00001234 .word 0x00001234
 140: ffff9876 .word 0xffff9876
 144: 12349876 .word 0x12349876
 148: 40002804 .word 0x40002804
 14c: 40002808 .word 0x40002808
 150: 4000280c .word 0x4000280c