ternary optimization improves with cast to prvalue

Bug #1752178 reported by Nachum Kanovsky
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GNU Arm Embedded Toolchain
New
Undecided
Unassigned

Bug Description

As described here:
https://answers.launchpad.net/gcc-arm-embedded/+question/664087

The cast is converted the types to prvalues which is improving the results. The same code using uint32_t instead of mytype doesn't require the cast. It appears that there is an optimization miss in this situation.

Here's the original text of the post:

In the code below, when defining WITH_CAST, the results of the compilation are significantly improved (with identical results in my larger codebase). The cast performed appears to be superfluous. I am running this within Keil 5.25pre2 (only as a simulator). I've used Keil simulator to check performance speed, by looking at what the t1 timer shows in terms of micro-seconds passed.

Snippet from code:
#if defined (WITH_CAST)
#define MAX(a,b) (((a) > (b)) ? (decltype(a)(a)) : (decltype(b)(b)))
#else
#define MAX(a,b) (((a) > (b)) ? ((a)) : ((b)))
#endif

GNU Arm Tools Embedded v. 7 2017-q4-major.

Compiler options:
-c -mcpu=cortex-m4 -mthumb -gdwarf-2 -MD -Wall -O -mapcs-frame -mthumb-interwork -std=c++14 -Ofast -I./RTE/_Target_1 -IC:/Keil_v525pre/ARM/PACK/ARM/CMSIS/5.2.0/CMSIS/Include -IC:/Keil_v525pre/ARM/PACK/ARM/CMSIS/5.2.0/Device/ARM/ARMCM4/Include -I"C:/Program Files (x86)/GNU Tools ARM Embedded/7 2017-q4-major/arm-none-eabi/include" -I"C:/Program Files (x86)/GNU Tools ARM Embedded/7 2017-q4-major/lib/gcc/arm-none-eabi/7.2.1/include" -I"C:/Program Files (x86)/GNU Tools ARM Embedded/7 2017-q4-major/arm-none-eabi/include/c++/7.2.1" -I"C:/Program Files (x86)/GNU Tools ARM Embedded/7 2017-q4-major/arm-none-eabi/include/c++/7.2.1/arm-none-eabi" -D__UVISION_VERSION="525" -D__GCC -D__GCC_VERSION="721" -D_RTE_ -DARMCM4 -Wa,-alhms="*.lst" -o *.o

Assembler options:
-mcpu=cortex-m4 -mthumb --gdwarf-2 -mthumb-interwork --MD *.d -I./RTE/_Target_1 -IC:/Keil_v525pre/ARM/PACK/ARM/CMSIS/5.2.0/CMSIS/Include -IC:/Keil_v525pre/ARM/PACK/ARM/CMSIS/5.2.0/Device/ARM/ARMCM4/Include -I"C:/Program Files (x86)/GNU Tools ARM Embedded/7 2017-q4-major/arm-none-eabi/include" -I"C:/Program Files (x86)/GNU Tools ARM Embedded/7 2017-q4-major/lib/gcc/arm-none-eabi/7.2.1/include" -I"C:/Program Files (x86)/GNU Tools ARM Embedded/7 2017-q4-major/arm-none-eabi/include/c++/7.2.1" -I"C:/Program Files (x86)/GNU Tools ARM Embedded/7 2017-q4-major/arm-none-eabi/include/c++/7.2.1/arm-none-eabi" -alhms="*.lst" -o *.o

Linker options:
-T ./RTE/Device/ARMCM4/gcc_arm.ld -mcpu=cortex-m4 -mthumb -mthumb-interwork -Wl,-Map="./Optimization.map"
-o Optimization.elf
*.o -lm

#include <cstdlib>
#include <cstring>
#include <cstdint>

#define WITH_CAST
struct mytype {
 uint32_t value;
 __attribute__((const, always_inline)) constexpr friend bool operator>(const mytype & t, const mytype & a) {
  return t.value > a.value;
 }
};
static mytype output_buf [32];
static mytype * output_memory_ptr = output_buf;
static mytype * volatile * output_memory_tmpp = &output_memory_ptr;
static mytype input_buf [32];
static mytype * input_memory_ptr = input_buf;
static mytype * volatile * input_memory_tmpp = &input_memory_ptr;
#if defined (WITH_CAST)
#define MAX(a,b) (((a) > (b)) ? (decltype(a)(a)) : (decltype(b)(b)))
#else
#define MAX(a,b) (((a) > (b)) ? ((a)) : ((b)))
#endif
int main (void) {
 const mytype * input = *input_memory_tmpp;
 mytype * output = *output_memory_tmpp;
 mytype p = input[0];
 mytype c = input[1];
 mytype pc = MAX(p, c);
 output[0] = pc;
 for (int i = 1; i < 31; i ++) {
  mytype n = input[i + 1];
  mytype cn = MAX(c, n);
  output[i] = MAX(pc, cn);
  p = c;
  c = n;
  pc = cn;
 }
 output[31] = pc;
}

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.