Assembler errors in code built with lto

Bug #1379250 reported by Terry Guo
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GNU Arm Embedded Toolchain
Confirmed
Undecided
Terry Guo

Bug Description

When build attached project with option -flto and -fomit-frame-pointer, we will see lots of assembler errors like below. Removing those two options, the errors will disappear.

tmp/ccRHu5Bs.s: Assembler messages:
/tmp/ccRHu5Bs.s:250: Error: lo register required -- `add r1,r1,#3'
/tmp/ccRHu5Bs.s:264: Error: cannot honor width suffix -- `mov r0,#128'
/tmp/ccRHu5Bs.s:266: Error: cannot honor width suffix -- `mov r3,#0'
/tmp/ccRHu5Bs.s:271: Error: lo register required -- `add r3,r3,#23'
/tmp/ccRHu5Bs.s:272: Error: lo register required -- `add r4,r4,#4'
/tmp/ccRHu5Bs.s:275: Error: lo register required -- `add r3,r3,#16'

Revision history for this message
Terry Guo (terry.guo) wrote :
Changed in gcc-arm-embedded:
status: New → Confirmed
Revision history for this message
demiurg_spb (demiurg-spb-h) wrote :

static inline void delay_4cy(unsigned int cy)
{
        __asm__ __volatile__
        (
            "loop%=:" "\n\t"
            " subs %[cnt],#1" "\n\t"
            " bne loop%=" "\n\t"
            : [cnt]"+r"(cy) // output: +r means input+output
            : // input:
            : "cc" // clobbers:
        );
}

When buld with -flto -mcpu=cortex-m1 -mthumb -msoft-float -Os

get error:

C:\...\ccsln0ZZ.s: Assembler messages:
C:\...\ccsln0ZZ.s:2701: Error: instruction not supported in Thumb16 mode -- `subs r3,#1'
lto-wrapper.exe: fatal error: arm-none-eabi-gcc returned 1 exit status

Revision history for this message
demiurg_spb (demiurg-spb-h) wrote :

Strange case#2
When my previous function delay_4cycase compile with -mcpu=cortex-m3 error message is gone.

Strange case#3
When in my previous function delay_4cycase i change instruction SUBS to SUB error message is gone, but in LST and LSS files I see SUBS!!!

080034bc <loop1613>:
 80034bc: 3b01 subs r3, #1
 80034be: d1fd bne.n 80034bc <loop1613>

Revision history for this message
Thomas Preud'homme (thomas-preudhomme) wrote :

Hi demiurg_spb,

The reason for this error is that inline assembly is considered divided for Thumb-1 instead of unified. This has nothing to do with lto (I can reproduce with your example without -flto). What happens is that the compiler will prefix the inline assembly with .syntax divided and thus the assembler will expect sub instead of subs.

Case #2 is explained by the fact that Thumb-2 will use unified syntax by default for inline assembly. When compiling for Cortex-M3, you are targeting Thumb-2 so the compiler will prefix the inline assembly with .syntax unified.

I'm not sure what you mean by LST and LSS in case #3 but I guess it's some form of disassembly. The reason for this is that this difference of syntax is only relevant to the assembler. sub in divided syntax is the same instruction as subs in unified syntax. It's akin to the difference between x:=1 in one programming language when another uses x=1. Once assembled, it's the same underlying instruction. The disassembler will choose one syntax to display its work which is independent to the one you used for the assembler. In this example the assembler gets fed divided syntax (sub) and the disassembler prefer to display unified syntax (subs).

Hope this helps.

Best regards.

Revision history for this message
demiurg_spb (demiurg-spb-h) wrote :

Thank you very much!
Allow me to share the results!
The function delay_cycles generates a delay with an accuracy of 1 clock cycle:

delay.h:
...
//=============================================================================
static __is_always_inline void delay_4cycles(uint32_t cy) // +1 cycle
{
 #if ARCH_PIPELINE_RELOAD_CYCLES<2
 # define EXTRA_NOP_CYCLES "nop"
 #else
 # define EXTRA_NOP_CYCLES ""
 #endif

 __asm__ __volatile__
 (
  ".syntax unified" "\n\t" // is to prevent CM0,CM1 non-unified sintax
  "loop%=:" "\n\t"
  " subs %[cnt],#1" "\n\t"
          EXTRA_NOP_CYCLES "\n\t"
  " bne loop%=" "\n\t"
  : [cnt]"+r"(cy) // output: +r means input+output http://www.nongnu.org/avr-libc/user-manual/inline_asm.html
  : // input:
  : "cc" // clobbers:
 );
}

//=============================================================================
static __is_always_inline void delay_cycles(uint32_t x)
{
 #define MAXNOPS 4 // delay_4cycles
 if (x<=MAXNOPS)
 {
  if (x==1) {nop();}
  else if (x==2) {nop(); nop();}
  else if (x==3) {nop(); nop(); nop();}
  else if (x==4) {nop(); nop(); nop(); nop();}
 }
 else // because of +1 cycle inside delay_4cycles
 {
   uint32_t rem = (x-1)%MAXNOPS;
   if (rem==1) {nop();}
   else if (rem==2) {nop(); nop();}
   else if (rem==3) {nop(); nop(); nop();}
   if ((x=(x-1)/MAXNOPS)) delay_4cycles(x); // if need more then 4 nop loop is more optimal
 }
}

Revision history for this message
demiurg_spb (demiurg-spb-h) wrote :

BUT with gcc version 4.9.3 even witout -flto and -fomit-frame-pointer i get my project compilation fail

C:\Users\A7DDC~1.IVA\AppData\Local\Temp\ccxI4e8j.s: Assembler messages:
C:\Users\A7DDC~1.IVA\AppData\Local\Temp\ccxI4e8j.s:194: Error: cannot honor width suffix -- `lsl r0,r0,r4'
C:\Users\A7DDC~1.IVA\AppData\Local\Temp\ccxI4e8j.s:201: Error: cannot honor width suffix -- `orr r0,r1'
C:\Users\A7DDC~1.IVA\AppData\Local\Temp\ccxI4e8j.s:231: Error: lo register required -- `sub r2,r2,#1'
C:\Users\A7DDC~1.IVA\AppData\Local\Temp\ccxI4e8j.s:279: Error: cannot honor width suffix -- `add r6,r0,r1'
C:\Users\A7DDC~1.IVA\AppData\Local\Temp\ccxI4e8j.s:292: Error: lo register required -- `add r4,r4,#1'
C:\Users\A7DDC~1.IVA\AppData\Local\Temp\ccxI4e8j.s:328: Error: cannot honor width suffix -- `add r6,r0,r1'
C:\Users\A7DDC~1.IVA\AppData\Local\Temp\ccxI4e8j.s:336: Error: cannot honor width suffix -- `mov r0,#0'
C:\Users\A7DDC~1.IVA\AppData\Local\Temp\ccxI4e8j.s:340: Error: lo register required -- `add r4,r4,#1'
C:\Users\A7DDC~1.IVA\AppData\Local\Temp\ccxI4e8j.s:372: Error: cannot honor width suffix -- `add r5,r0,r1'
C:\Users\A7DDC~1.IVA\AppData\Local\Temp\ccxI4e8j.s:383: Error: lo register required -- `add r4,r4,#1'

I have no such problems with gcc-5.2.1

Revision history for this message
demiurg_spb (demiurg-spb-h) wrote :

This happen only if CM1 target is selected (with CM3 is OK).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.