peephole optimisation misses obvious uxtb removal

Bug #1812887 reported by Dominic Plunkett on 2019-01-22
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GNU Arm Embedded Toolchain
Undecided
Unassigned

Bug Description

armv6 target

The arrays are byte arrays so are written to with strb. The code does have & 0xFF which the optimiser should remove.

C code :

static uint8_t * ram;
static uint8_t * Memory;

void ram_emulator_byte_write(unsigned int gpio)
{
   uint8_t data = gpio & 0xFF;
   ram[ram_addr] = data;
   Memory[3] = data;
}

Code produced by gcc version 7.3.1 20180622

01f01af4 <ram_emulator_byte_write>:
 1f01af4: e59f201c ldr r2, [pc, #28] ; 1f01b18 <ram_emulator_byte_write+0x24>
 1f01af8: e59f301c ldr r3, [pc, #28] ; 1f01b1c <ram_emulator_byte_write+0x28>
 1f01afc: e5921000 ldr r1, [r2]
 1f01b00: e5932000 ldr r2, [r3]
 1f01b04: e59f3014 ldr r3, [pc, #20] ; 1f01b20 <ram_emulator_byte_write+0x2c>
 1f01b08: e6ef0070 uxtb r0, r0
 1f01b0c: e7c10002 strb r0, [r1, r2]
 1f01b10: e5c30003 strb r0, [r3, #3]
 1f01b14: e12fff1e bx lr
 1f01b18: 01f1cc34 mvnseq ip, r4, lsr ip
 1f01b1c: 01f1cc38 mvnseq ip, r8, lsr ip
 1f01b20: 01f1e164 mvnseq lr, r4, ror #2

looking at the code in more detail the first two ldrs could be also be be peepholed to ldrd

Ramana Radhakrishnan (ramana) wrote :

The testcase is not compilable and there is no input on the options that are required to provoke this code generation.

Modifying the testcase to

#include <stdint.h>
static uint8_t * ram;
static uint8_t * Memory;
unsigned int ram_addr;

void ram_emulator_byte_write(unsigned int gpio)
{
   uint8_t data = gpio & 0xFF;
   ram[ram_addr] = data;
   Memory[3] = data;
}

with a recent gcc 7 and using -O2 -march=armv6 produces

am_emulator_byte_write:
 @ args = 0, pretend = 0, frame = 0
 @ frame_needed = 0, uses_anonymous_args = 0
 @ link register save eliminated.
 ldr r2, .L3
 mov r3, #0
 ldr r2, [r2]
 strb r0, [r2]
 strb r3, [r3, #3]
 .inst 0xe7f000f0
.L4:
 .align 2
.L3:
 .word ram_addr

At this point I am speculating about code generated as the example is incomplete.

Changed in gcc-arm-embedded:
status: New → Incomplete
Dominic Plunkett (dp111) wrote :

Okay , I've played a bit more .

I'm now using gcc 8 in my full case I still get the output I posted, but it turns out the critical line which I had simplified was

  uint8_t data = gpio & 0x3FC>>2;

My current test case :

#include <stdint.h>
static uint8_t * ram;
static uint8_t * Memory;
static unsigned int ram_addr;

void test(unsigned int gpio)
{
   uint8_t data = gpio & 0x3FC>>2;
   ram[ram_addr] = data;
   Memory[3] = data;
}

void main()
{
   ram_addr= 3;
   Memory = (uint8_t *) 0x8000000;
   ram = (uint8_t *) 0x9000000;

   test(0x7745);
   ram_addr= 4;
   test(0x7712);
   while(1);
}

void _exit(unsigned int fred)
{
   while(1);
}

gives :

00008230 <test>:
    8230: e59f3010 ldr r3, [pc, #16] ; 8248 <test+0x18>
    8234: e6ef0070 uxtb r0, r0
    8238: e893000e ldm r3, {r1, r2, r3}
    823c: e7c10002 strb r0, [r1, r2]
    8240: e5c30003 strb r0, [r3, #3]
    8244: e12fff1e bx lr
    8248: 00018af0 .word 0x00018af0

David Brown (davidbrown) wrote :

It is useful to give the compiler flags as well as the version.

But yes, the compiler is generating an unnecessary "uxtb" (effectively "and 0xff") instruction. So I agree this is a missed optimisation opportunity. I don't know if it would be a peeophole optimisation - I suspect not.

Ramana Radhakrishnan (ramana) wrote :

 I am pretty sure I have seen something like this before . Worth trawling https://gcc.gnu.org/bugzilla for more such examples and I'll take a look when I can. If someone beats me to it, please do :) It's also reproducible with upstream trunk (or what will be GCC-9)

Changed in gcc-arm-embedded:
status: Incomplete → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers