ternary optimization improves with cast to prvalue

Bug #1752178 reported by Nachum Kanovsky on 2018-02-27
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GNU Arm Embedded Toolchain
Undecided
Unassigned

Bug Description

As described here:
https://answers.launchpad.net/gcc-arm-embedded/+question/664087

The cast is converted the types to prvalues which is improving the results. The same code using uint32_t instead of mytype doesn't require the cast. It appears that there is an optimization miss in this situation.

Here's the original text of the post:

In the code below, when defining WITH_CAST, the results of the compilation are significantly improved (with identical results in my larger codebase). The cast performed appears to be superfluous. I am running this within Keil 5.25pre2 (only as a simulator). I've used Keil simulator to check performance speed, by looking at what the t1 timer shows in terms of micro-seconds passed.

Snippet from code:
#if defined (WITH_CAST)
#define MAX(a,b) (((a) > (b)) ? (decltype(a)(a)) : (decltype(b)(b)))
#else
#define MAX(a,b) (((a) > (b)) ? ((a)) : ((b)))
#endif

GNU Arm Tools Embedded v. 7 2017-q4-major.

Compiler options:
-c -mcpu=cortex-m4 -mthumb -gdwarf-2 -MD -Wall -O -mapcs-frame -mthumb-interwork -std=c++14 -Ofast -I./RTE/_Target_1 -IC:/Keil_v525pre/ARM/PACK/ARM/CMSIS/5.2.0/CMSIS/Include -IC:/Keil_v525pre/ARM/PACK/ARM/CMSIS/5.2.0/Device/ARM/ARMCM4/Include -I"C:/Program Files (x86)/GNU Tools ARM Embedded/7 2017-q4-major/arm-none-eabi/include" -I"C:/Program Files (x86)/GNU Tools ARM Embedded/7 2017-q4-major/lib/gcc/arm-none-eabi/7.2.1/include" -I"C:/Program Files (x86)/GNU Tools ARM Embedded/7 2017-q4-major/arm-none-eabi/include/c++/7.2.1" -I"C:/Program Files (x86)/GNU Tools ARM Embedded/7 2017-q4-major/arm-none-eabi/include/c++/7.2.1/arm-none-eabi" -D__UVISION_VERSION="525" -D__GCC -D__GCC_VERSION="721" -D_RTE_ -DARMCM4 -Wa,-alhms="*.lst" -o *.o

Assembler options:
-mcpu=cortex-m4 -mthumb --gdwarf-2 -mthumb-interwork --MD *.d -I./RTE/_Target_1 -IC:/Keil_v525pre/ARM/PACK/ARM/CMSIS/5.2.0/CMSIS/Include -IC:/Keil_v525pre/ARM/PACK/ARM/CMSIS/5.2.0/Device/ARM/ARMCM4/Include -I"C:/Program Files (x86)/GNU Tools ARM Embedded/7 2017-q4-major/arm-none-eabi/include" -I"C:/Program Files (x86)/GNU Tools ARM Embedded/7 2017-q4-major/lib/gcc/arm-none-eabi/7.2.1/include" -I"C:/Program Files (x86)/GNU Tools ARM Embedded/7 2017-q4-major/arm-none-eabi/include/c++/7.2.1" -I"C:/Program Files (x86)/GNU Tools ARM Embedded/7 2017-q4-major/arm-none-eabi/include/c++/7.2.1/arm-none-eabi" -alhms="*.lst" -o *.o

Linker options:
-T ./RTE/Device/ARMCM4/gcc_arm.ld -mcpu=cortex-m4 -mthumb -mthumb-interwork -Wl,-Map="./Optimization.map"
-o Optimization.elf
*.o -lm

#include <cstdlib>
#include <cstring>
#include <cstdint>

#define WITH_CAST
struct mytype {
 uint32_t value;
 __attribute__((const, always_inline)) constexpr friend bool operator>(const mytype & t, const mytype & a) {
  return t.value > a.value;
 }
};
static mytype output_buf [32];
static mytype * output_memory_ptr = output_buf;
static mytype * volatile * output_memory_tmpp = &output_memory_ptr;
static mytype input_buf [32];
static mytype * input_memory_ptr = input_buf;
static mytype * volatile * input_memory_tmpp = &input_memory_ptr;
#if defined (WITH_CAST)
#define MAX(a,b) (((a) > (b)) ? (decltype(a)(a)) : (decltype(b)(b)))
#else
#define MAX(a,b) (((a) > (b)) ? ((a)) : ((b)))
#endif
int main (void) {
 const mytype * input = *input_memory_tmpp;
 mytype * output = *output_memory_tmpp;
 mytype p = input[0];
 mytype c = input[1];
 mytype pc = MAX(p, c);
 output[0] = pc;
 for (int i = 1; i < 31; i ++) {
  mytype n = input[i + 1];
  mytype cn = MAX(c, n);
  output[i] = MAX(pc, cn);
  p = c;
  c = n;
  pc = cn;
 }
 output[31] = pc;
}

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers