C compiler: an unexpected access to the volatile type

Bug #1608882 reported by JiriJ
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GNU Arm Embedded Toolchain
Confirmed
Low
Thomas Preud'homme

Bug Description

Hi all,

GCC version: 5.4 2016q2p1 - an own build (the big endian patch included)
CPU core: ARM Cortex-R5
External device accessed via the parallel bus: FPGA
* The FPGA proper behavior is negatively influenced by the unexpected read access!
Optimization level: -O2 //another levels do behave in the same way

C snippet:

  volatile struct tExtMemDev
  {
    unsigned short regA;
    unsigned short regB;
    unsigned short regC;
  } __attribute__((packed)) T_EXT_MEM_DEV;
  T_EXT_MEM_DEV rxBuff __attribute__ ((section (.external_device)));

  rxBuff.regA = 0xDEAD;

Generated Assembler:
* Scenario 1 - the struct WITH volatile keyword - the unexpected LOAD of the address for only write operation
* Incomplete - just the most important instructions:
 02aa F06F0221 mvn r2, #33
 02b0 F06F0352 mvn r3, #82
//UNEXPECTED and UNWANTED read
 02a8 7923 ldrb r3, [r4, #4] @ zero_extendqisi2 // !!!
 02b4 7122 strb r2, [r4, #4]
//UNEXPECTED and UNWANTED read
 02b6 7962 ldrb r2, [r4, #5] @ zero_extendqisi2 // !!!
 02b8 7163 strb r3, [r4, #5]

Scenario 2 - the struct WITHOUT volatile keyword - the proper behavior - only write operation
* The proper instructions generated - no LOAD operation.

Naturally the memory space of FPGA is intended to have a volatile access type.

Thanks for your hints in advance,
Best regards,
Jiri

JiriJ (jirij-0)
description: updated
Revision history for this message
Thomas Preud'homme (thomas-preudhomme) wrote :

Hi JiriJ,

I cannot reproduce the behavior you observed. I compiled with -S -O2 -mcpu=cortex-r5 the following code:

typedef volatile struct tExtMemDev
{
  unsigned short regA;
  unsigned short regB;
  unsigned short regC;
} __attribute__((packed)) T_EXT_MEM_DEV;

  T_EXT_MEM_DEV rxBuff __attribute__ ((section (".external_device")));

void
foo (void)
{
  rxBuff.regA = 0xDEAD;
}

NB: I had to add typedef and quotes around the section name.

Once compiled, I got:

        movw r3, #:lower16:.LANCHOR0
        movw r2, #57005
        movt r3, #:upper16:.LANCHOR0
        movt r2, 65535
        strh r2, [r3] @ movhi
        bx lr

Can you provide the command line option you used for me to reproduce?

Best regards.

Revision history for this message
JiriJ (jirij-0) wrote :

Hi Thomas,

thanks for your effort!
The compiler switches used as follows:

CFLAGS= -mcpu=cortex-r5 -mbig-endian -mthumb -mthumb-interwork
CFLAGS+= -gdwarf-2 -Wall -Wundef -ffast-math -fomit-frame-pointer -std=gnu99
CFLAGS+= -g -Wa,-a,-ad
CFLAGS+= -mstructure-size-boundary=8 -fno-strict-aliasing -MD
CFLAGS+= -O2 -falign-functions=8

Best regards,
Jiri

Revision history for this message
Thomas Preud'homme (thomas-preudhomme) wrote :

Hi JiriJ,

I'm sorry, but I still cannot reproduce it. Please provide us with a .c file and a command line that works with the 2016Q2 release we provide. Since it's a compile time issue you should not need the big endian multilib patch to reproduce the issue.

Best regards.

Changed in gcc-arm-embedded:
status: New → Incomplete
assignee: nobody → Thomas Preud'homme (thomas-preudhomme)
Revision history for this message
JiriJ (jirij-0) wrote :

Dear Thomas,

I have got the same results as you had.

C test code (a new separated C module created and added to my build) is as you pasted above, the compiler options are identical to the defined ones in my previous post. Additionally my section ".external_device" has now following dummy (but still meaningful) definition in the linker description file:

MEMORY
{
    EXT_DEV_MEM (RW) : org = 0x68000000, len = 0x00000070
}
SECTIONS
{
    .external_device (NOLOAD) :
    {
        . = ALIGN(2); /* 16-bit variables aligned without any padding */
        external_device_start = .;
        *(.EXT_DEV_SECTION)
    } > EXT_DEV_MEM
    external_device_size = SIZEOF(.external_device);

}

Oops, I must investigate our project in more detail - sorry, I cannot easily share this source code (although that would be really helpful for deeper analysis).

Thanks for your support up to now, I will be back with news (hopefully) soon...

Best regards,
Jiri

Revision history for this message
JiriJ (jirij-0) wrote :

Hi Thomas,
I must reformulate my original description - I have missed very important fact: the consecutive write-read operations above the same address. CPU writes some command value X at the address A and the device (FPGA, in our case) "responds" by writing some confirmation value Y at the same address A - CPU reads this value Y back to get the device acceptance.
Well, such consecutive write-read operations can be somehow understood by the optimizer - but I would expect no optimization impact in this case - the variable (the type respectively) is defined as the volatile one... and that is obviously done as the piece of Assembler below shows:

So the better C snippet is as follows:

  typedef volatile struct tExtMemDev
  {
    unsigned short regA;
    unsigned short regB;
    unsigned short regC;
  } __attribute__((packed)) T_EXT_MEM_DEV;

  T_EXT_MEM_DEV rxBuff __attribute__ ((section (".external_device")));

  int foo (void)
  {
    int ret;

    rxBuff.regA = 0xDEAD;

    if(rxBuff.regA == 0xBEEF)
    { ret = 0; }
    else
    { ret = -1; }

    return ret;
  }

The Assembler looks good ;-)

  39 0010 801A strh r2, [r3] @ movhi
  16:test.c **** { ret = 0; }
  40 .loc 1 16 0
  41 0012 881B ldrh r3, [r3]

I am continuing...

Revision history for this message
JiriJ (jirij-0) wrote :

Hi Thomas,
I try to summarize th current state and observation. In attachment, you can find two modules and one header for testing purpose. Compiler options / switches retain the same ones as in the previous posts.

Scenario 1:
* Types defined in the header another.h have the attribute "__attribute__ ((aligned (1)))" attached.
* Both write operations in foo() are properly compiled - only "strh" instructions detected.

Scenario 2:
* The attribute of the types defined in the header another.h changes to "__attribute__ ((packed))" attached.
* The write operation to the external variable tPldRxRegSet_Test (defined in the module "another.c") is in the strange, already reported way:

movw r3, #:lower16:tPldRxRegSet_Test
movt r3, #:upper16:tPldRxRegSet_Test

mvn r4, #33
mvn r0, #82

ldrb r1, [r3, #4] @ zero_extendqisi2 // !!! UNWANTED reading !!!
strb r4, [r3, #4]
ldrb r4, [r3, #5] @ zero_extendqisi2 // !!! UNWANTED reading !!!
strb r0, [r3, #5]

Hopefully this could describe the issue in better way...

Thanks for your hints,
BR, Jiri

Revision history for this message
JiriJ (jirij-0) wrote :

Hi Thomas,

is there any news regarding this issue?
Or would you recommend just to apply "__attribute__ ((aligned (1)))" and suppress any use of "__attribute__ ((packed))" for this purpose? Is there some explanation behind such a statement? I cannot differ between these two attributes for this use case.

Thanks for your hints in advance,
Best regards, Jiri

Revision history for this message
Thomas Preud'homme (thomas-preudhomme) wrote :

Hi JiriJ,

I have not had time to look into it but it is in my TODO list. I'll come back to you once I have an answer but if you have time constraints on this I would indeed recommend this workaround as solving the problem would in the *best* case be solved only in the next update release.

Best regards.

Revision history for this message
JiriJ (jirij-0) wrote :

Hi Thomas,
thanks a lot for your reply!
No, there is no pressure for any urgent solution - I absolutely agree with you, this workaround is sufficient.
I am just curious whether our usage of these attributes was improper... but I can surely wait for further explanation ;-)

Cheers, Jiri

Revision history for this message
Thomas Preud'homme (thomas-preudhomme) wrote :

Hi JiriJ,

I managed to reproduced the issue with the code you provided, thank you. It is still early in the investigation but what I can say is that the use of strb rather than strh is due to GCC not knowing the alignment of tPldRxRegSet_Test. Since the structure is packed it assumes no particular alignment and thus accessing reg2 with strh could be an unaligned access. Since the structure is volatile, it will not do such an unaligned access, even in the presence of -munaligned-access and so it breaks it into two strb. You can get an strh by adding an __atribute__ ((aligned(2))) on the declaration and definition of tPldRxRegSet_Test (ie. in another.h and another.c)

Best regards.

Changed in gcc-arm-embedded:
status: Incomplete → Confirmed
importance: Undecided → Low
Revision history for this message
Thomas Preud'homme (thomas-preudhomme) wrote :

Hi JiriJ,

The reason packed and aligned are different is because aligned specify a *minimum* alignment. When doing aligned (1) it requests the structure to be at least 1byte aligned but since it must already be 2byte aligned because of the short it's effectively a nop. If you want the structure to be packed but have the code generation using strh, you should use both packed and aligned(2).

Now I agree that though strb is expected when using only packed, it should not generate ldrb but that's an optimization issue and thus is less critical.

Revision history for this message
JiriJ (jirij-0) wrote :

Hi Thomas,
thanks for your explanation!

Sorry I still don't understand why the variable tPldRxRegSet defined and used in the same module is handled in proper way (Store only) but the variable tPldRxRegSet_Test defined in one module and used in other one is written (Store) and also surprisingly read (Load). Does it depends...?

In our application (FPGA accessed), the additional Load instruction(s) generated is more than less critical issue - but I agree that is only personal view :-)

Thanks,
Jiri

Revision history for this message
JiriJ (jirij-0) wrote :

Just a remark: that is not a big issue for our case whether STRB instructions or STRH instruction are/is generated - such a sequence access is allowed (no read, no cry).

Revision history for this message
Thomas Preud'homme (thomas-preudhomme) wrote :

Hi JiriJ,

TL;DR: the compiler has more alignment information for object in the file it's compiling than for objects in other files (declared with extern keyword).

From elf (5) manual:

"All data structures that the file format defines follow the "natural" size and alignment guidelines for the relevant class. If necessary, data structures contain explicit padding to ensure 4-byte alignment for 4-byte objects, to force structure sizes to a multiple of 4, etc."

This means the compiler can rely on the fact that the first object in a section will be 4byte aligned in the final executable so the compilers know the structure in test.c has the proper alignment. For the structure defined elsewhere, it cannot know where the structure will be in the object file that defines it: it could be at a non aligned address and thus be non aligned once linked into an executable.

Best regards.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.