Wrong code generated for Cortex-M0/M0+

Bug #1722849 reported by Miro Samek on 2017-10-11
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
GNU ARM Embedded Toolchain
Undecided
Unassigned

Bug Description

The GCC for ARM generates incorrect code for disabling interrupts around a function call. The interrupts are disabled by first obtaining the PRIMASK value into an automatic variable and then setting PRIMASK. Interrupts are re-enabled by restoring the saved PRIMASK value. Inline assembly is used for these operations (why there are no intrinsic functions for this?).

The problem is the incorrect reordering of the instructions for disabling/enabling interrupts with optimization level -O.

I've been able to distill the problem to the following test code:

#include <stdint.h>

/*! Private QS data to keep track of the filters and the trace buffer. */
typedef struct {
    uint8_t glbFilter[16]; /*!< global on/off QS filter */
    void const *locFilter[16]; /*!< local QS filters */
    //...
} QSPriv;

extern QSPriv QS_priv_;
extern void *QS_obj_;

void foo(void); /* prototype */

void bug_test(void) {
    uint8_t i;
    for(i = 0; i < 10; i++) {
        if ((((uint_fast8_t)QS_priv_.glbFilter[(uint8_t)(123) >> 3]
            & (uint_fast8_t)((uint_fast8_t)1 << ((uint8_t)(123) & (uint8_t)7)))
                != (uint_fast8_t)0)
            && ((QS_priv_.locFilter[3] == (void *)0)
                || (QS_priv_.locFilter[3] == (QS_obj_))))
        {
            uint32_t status;

            /* save the PRIMASK into 'status' */
            __asm volatile ("mrs %0,PRIMASK" : "=r" (status) :: );

            /* set PRIMASK to disable interrupts */
            __asm volatile ("cpsid i");

            foo(); /* call a function */

            /* restore PRIMASK from 'status' */
            __asm volatile ("msr PRIMASK,%0" :: "r" (status) : );
        }
    }
}

The code compiled with GNU Tools for ARM Embedded Processors 6-2017-q2-update, 6.3.1 20170620 release on Windows 10 64-bit. The following command line was used:

C:\tools\GNU_ARM\bin\arm-none-eabi-gcc -c -O -mcpu=cortex-m0plus -mthumb -Wall bug.c -o bug.o
C:\tools\GNU_ARM\bin\arm-none-eabi-objdump -d bug.o > bug.dasm

Here is the generated disassembly:

Disassembly of section .text:

00000000 <bug_test>:
   0: b5f8 push {r3, r4, r5, r6, r7, lr}
   2: 240a movs r4, #10
   4: 4d0c ldr r5, [pc, #48] ; (38 <bug_test+0x38>)
   6: 002e movs r6, r5
   8: 4f0c ldr r7, [pc, #48] ; (3c <bug_test+0x3c>)
   a: e00a b.n 22 <bug_test+0x22>
   c: b672 cpsid i
   e: f7ff fffe bl 0 <foo>
  12: f3ef 8310 mrs r3, PRIMASK
  16: f383 8810 msr PRIMASK, r3
  1a: 3c01 subs r4, #1
  1c: b2e4 uxtb r4, r4
  1e: 2c00 cmp r4, #0
  20: d009 beq.n 36 <bug_test+0x36>
  22: 7beb ldrb r3, [r5, #15]
  24: 071b lsls r3, r3, #28
  26: d5f8 bpl.n 1a <bug_test+0x1a>
  28: 69f3 ldr r3, [r6, #28]
  2a: 2b00 cmp r3, #0
  2c: d0ee beq.n c <bug_test+0xc>
  2e: 683a ldr r2, [r7, #0]
  30: 4293 cmp r3, r2
  32: d1f2 bne.n 1a <bug_test+0x1a>
  34: e7ea b.n c <bug_test+0xc>
  36: bdf8 pop {r3, r4, r5, r6, r7, pc}
 ...

The problem is the incorrect disabling of interrupts around the function call foo().

As I experimented with this code, the excessive type casting in the condition for the if statement seems to be implicated (the bug goes away if I remove some of this type casting). The type casting has been added in the first place to satisfy static analysis with PC-Lint for MISRA-C compliance.

The *same* code compiled with optimization -O2 is as follows:

Disassembly of section .text:

00000000 <bug_test>:
   0: b5f0 push {r4, r5, r6, r7, lr}
   2: 46c6 mov lr, r8
   4: 4b0f ldr r3, [pc, #60] ; (44 <bug_test+0x44>)
   6: 240a movs r4, #10
   8: 2608 movs r6, #8
   a: 4698 mov r8, r3
   c: b500 push {lr}
   e: 4d0e ldr r5, [pc, #56] ; (48 <bug_test+0x48>)
  10: 7beb ldrb r3, [r5, #15]
  12: 421e tst r6, r3
  14: d006 beq.n 24 <bug_test+0x24>
  16: 69eb ldr r3, [r5, #28]
  18: 2b00 cmp r3, #0
  1a: d00a beq.n 32 <bug_test+0x32>
  1c: 4642 mov r2, r8
  1e: 6812 ldr r2, [r2, #0]
  20: 4293 cmp r3, r2
  22: d006 beq.n 32 <bug_test+0x32>
  24: 3c01 subs r4, #1
  26: b2e4 uxtb r4, r4
  28: 2c00 cmp r4, #0
  2a: d1f1 bne.n 10 <bug_test+0x10>
  2c: bc04 pop {r2}
  2e: 4690 mov r8, r2
  30: bdf0 pop {r4, r5, r6, r7, pc}
  32: f3ef 8710 mrs r7, PRIMASK
  36: b672 cpsid i
  38: f7ff fffe bl 0 <foo>
  3c: f387 8810 msr PRIMASK, r7
  40: e7f0 b.n 24 <bug_test+0x24>
  42: 46c0 nop ; (mov r8, r8)
 ...

Which seems to be correct in terms of the critical section.

Miro Samek
state-machine.com

Miro Samek (mirosamek) wrote :
Miro Samek (mirosamek) on 2017-10-11
description: updated
Changed in gcc-arm-embedded:
status: New → Confirmed

This is very interesting, I always thought myself that volatile acted as a code barrier. However the documentation [1] explicitely states that this is not the case:

"Note that the compiler can move even volatile asm instructions relative to other code, including across jump instructions. For example, on many targets there is a system register that controls the rounding mode of floating-point operations. Setting it with a volatile asm, as in the following PowerPC example, does not work reliably."

[1] https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile

It is therefore not a bug since the compiler is working as intended, though I agree it is surprising. Worse of, the only solution I can propose is to write atomic sections into a single assembly block. I've looked into GCC atomic and synchronization primitives but couldn't find any that provide a code barrier, despite GCC having code barriers internally. I'll continue searching for a solution and will get back to you if I find one.

Best regards.

Leo Havmøller (leh-p) wrote :

> why there are no intrinsic functions for this?
These are available in the ARM CMSIS: __get_PRIMASK(), __disable_irq() and __set_PRIMASK().

Miro Samek (mirosamek) wrote :

I have tested the code with the "intrinsic" functions provided in the CMSIS (header file "cmsis_gcc.h"):
__get_PRIMASK(), __disable_irq() and __set_PRIMASK()

The answer is that the "intrisic" functions don't help and the bug persists.
--MMS

Adding a clobber on memory does solve the problem on this particular testcase though.

Tejas Belagod (belagod-tejas) wrote :

I think this is a bug in CMSIS. The __get_PRIMASK () intrinsic implementation in cmsis_gcc.h needs to have a memory clobber in the inline asm statement to prevent it from being reordered.

David Brown (davidbrown) wrote :

I agree that it is surprising that "asm volatile" statements can be re-arranged with respect to each other, and with respect to other volatile accesses. This seems to be a particular problem with asm statements that return outputs - "volatile" is used primarily to tell the compiler that you can get different outputs at different times, even if the inputs (if any) are the same. For asm statements with no outputs, the compiler appears to assume they have a special function and should not be moved.

As far as I know, there is no way in C or as a gcc extension to specify ordering of statements or executable code - you can only specify ordering of memory (via volatile accesses). Even the traditional method of declaring a function (like "foo" in the sample) externally is in danger - with link-time optimisation, the compiler knows everything and can re-arrange bits of "foo" with respect to the asm statements or volatile accesses.

A related problem is well documented for the AVR gcc port:

http://www.nongnu.org/avr-libc/user-manual/optimization.html

There is, however, a solution to all this. (I have told the avr-libc folks about it a number of times, but they have not added my solution to their webpage. I have given up trying to persuade them.)

I have three macros defined that I use in circumstances like these:

#define forceDependency(val) \
                asm volatile("" :: "" (val) : )

#define forgetCompilerKnowledge(v) \
                asm ("" : "+g" (v))

#define forgetCompilerBlock(start, size) \
    do { typedef struct { char x[size]; } XS; XS *p = (XS *) start; \
      asm ("" : "+m" (*p)); \
    } while (0);

The first one tells the compiler "I am using "val" here, so you have to evaluate it before this point". The second one tells the compiler "I am using "val" here, and I might change it, so you have to evaluate it before this point, but forget anything you know about it". The third one is just another version that can handle data of any size.

Putting "forceDependency(status)" after the "mrs" instruction, but before the "foo()" call, ensures that the compiler has evaluated "status" before calling foo. It makes most sense to put it before the "cpsid i" instruction, but that does not appear to be critical. The neatest arrangement is to combine it with the cpsid:

    uint32_t status;
    asm volatile ("mrs %0, PRIMASK" : "=r" (status) :: );
    asm volatile ("cpsid i" :: "" (status) :);

    foo();

    asm volatile ("msr PRIMASK, %0" :: "r" (status) : );

Miro Samek (mirosamek) wrote :

The latest comment by David Brown seems to address the real issue and seems to fix the problem in my experiments. This is an acceptable fix of this "bug" for me, although the issue is perhaps worth documenting to help others to avoid this pitfall in the future.

In summary, here is the correct code for disabling interrupts by "saving and restoring interrupt status" for Cortex-M0(+):

/* critical section (save and restore interrupt status) */
#define CRIT_ENTRY(primask_) do { \
    __asm volatile ("mrs %0,PRIMASK" : "=r" (primask_) :: ); \
    __asm volatile ("cpsid i" :: "" (primask_) : ); \
} while (0)

#define CRIT_EXIT(primask_) \
    __asm volatile ("msr PRIMASK,%0" :: "r" (primask_) : )

Here is the intended use of this critical section:

uint32_t status;
. . .
CRIT_ENTRY(status);
/* inside critical section */
CRIT_EXIT(status);
. . .

Please feel free to improve on this code.

--MMS

Leo Havmøller (leh-p) wrote :

> The answer is that the "intrisic" functions don't help and the bug persists.
You asked for intrisics, and I pointed you to where they were. Nobody claimed they would make any difference.

Miro Samek (mirosamek) wrote :

Regarding the "intrinsic" functions for getting the PRIMASK or for setting/clearing the PRIMASK, I should have probably used the term "builtin functions" for the GNU-ARM, such as the collection listed at https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html. Unfortunately, the built-in collection has no functions for getting/setting the PRIMASK or for setting/clearing it.

The CMSIS functions __get_PRIMASK(), __disable_irq() and __set_PRIMASK() are not really "built in", because they are simply defined by means of in-line assembly in the "cmsis_gcc.h" header file. Therefore they don't work.

Quite specifically, the following textbook recipe for disabling interrupts by "saving and restoring" the interrupt context (PRIMASK in this case) will NOT work:

/* critical section (save and restore interrupt status) -- DOES NOT WORK! */
#define CRIT_ENTRY(primask_) do { \
    (primask_) = __get_PRIMASK(); \
    __disable_irq(); \
} while (0)

#define CRIT_EXIT(primask_) __set_PRIMASK((primask_))

--MMS

Johan Tufvesson (t-jot) wrote :

I have had the questionable pleasure to implement or correct macros for critical sections on several target platforms, all with the help of GCC. My thoughts on the information collected in this issue so far:

I have always thought that volatile asm statements are as volatile as volatile memory accesses. If that is true, the original problem with GCC reordering volatile asm statements relative to an external function call which (without the compiler's knowledge) could hold volatile memory accesses is a bug.

Regarding the summary by mirosamek in #8, I would really recommend using "memory" as clobbering instead of only the primask_ variable. This is important since otherwise the values of non-volatile variables sampled outside of the critical section might be cached in registers and reused inside the critical section. This is probably not something that the coder of this particular code is aware of. If the critical section is used to keep consistency while reading/writing several global variables, this might be fatal.

I also see no reason to break up assembly code which must be executed in sequence, but that is more a matter of taste I guess.

Suggestion, given the general functionality of the previous macros:

/* critical section (save and restore interrupt status) */
#define CRIT_ENTRY(primask_) do { \
    __asm volatile ("mrs %0,PRIMASK \n" \
                    "cpsid i" \
                    : "=r" (primask_) :: "memory" ); \
} while (0)

#define CRIT_EXIT(primask_) \
    __asm volatile ("msr PRIMASK,%0" :: "r" (primask_) : "memory" )

The "memory" in CRIT_ENTRY is to avoid cached variables from before the critical section to be used within the critical section. The "memory" in CRIT_EXIT is to avoid any code which needs protection to be moved outside after the critical section.

David Brown (davidbrown) wrote :

Memory clobbers can be costly, and may not necessarily do what you want anyway unless you have a matching dsb or dmb instruction (which can be even more costly). On the Cortex M0, however, these instructions are unlikely to be needed. But for a general critical section entry/exit macro, memory clobbers are not a bad idea to match user expectation.

I agree that it makes more sense to put a sequence of assembly instructions in one asm statement rather than break them up, in the general case. However, there is good reason to split them up here. The compiler seems to be happy to move asm volatile statements around if they return a result, but not if they do not return a result (the documentation of asm volatile is not clear about this). That is why I have two statements with an artificial dependency on the second one, as shown in my earlier comment here. This has been confirmed by testing (albeit without the memory clobber). Without the memory clobber, the compiler (-O1, -mcu=cortex-M0) will re-arrange your version (with combined mrs and cpsid) from:

  CRIT_ENTRY(primask_);
  foo();
  CRIT_EXIT(primask_);

into

  foo();
  CRIT_ENTRY(primask_);
  CRIT_EXIT(primask_);

I recommend defining these macros with separate asm statements, as I gave them, and then adding memory clobbers if that suits the application or is useful for user convenience.

Johan Tufvesson (t-jot) wrote :

Interesting that there is a difference between one and several asm statements, but without a clear statement in the GCC manual I would not dare to rely on that fact for any functionality. Of course the "memory" clobbering is a vital part of this solution. Since I have seen way too many bugs related to not having the "memory" clobbering in general critical section macros they are a must have to me, and they do take care of the ordering in a guaranteed way on simple targets.

The only slightly related note I find in the manual is:
"asm statements that have no output operands, including asm goto statements, are implicitly volatile."

Any thoughts on the original behavior, which I think is faulty? If "volatile" code is not ordered in line with "volatile" memory accesses, I think it is either a buggy implementation or a buggy design, but I find nothing in the GCC manual to rely on. I would not have protested if the original foo() was known by the compiler to not have anything volatile, but in this case GCC does not know anything about foo(). foo() could for example contain another "asm volatile" statement, which I would have thought to be executed in the right order.

From the manual:
"Do not expect a sequence of asm statements to remain perfectly consecutive after compilation, even when you are using the volatile qualifier. If certain instructions need to remain consecutive in the output, put them in a single multi-instruction asm statement."

I thought (English not being my native language) that "perfectly consecutive" here meant that I should expect GCC to be able to put other code between my volatile asm statements, but that they would still remain consecutive with regards to each other. On the other hand, the second sentence only mentions "consecutive" and not "perfectly consecutive", so I might very well be wrong in that assumption. If I'm wrong here, I must say that I am disappointed on the GCC design, having so different behavior on volatile asm and volatile accesses.

David Brown (davidbrown) wrote :

I have written a post to the gcc-help mailing list, to try to get an opinion from the gcc developers on this issue.

https://gcc.gnu.org/ml/gcc-help/2017-10/msg00061.html

Miro Samek (mirosamek) wrote :

I have tested the new CMSIS_5 code (commit 3520562) with my test case (a bit more involved than the distilled version posted in the original description in #1) and it seems to work correctly.

Specifically, I tested the critical section defined as follows (see also post #10):

#define CRIT_ENTRY(primask_) do { \
    (primask_) = __get_PRIMASK(); \
    __disable_irq(); \
} while (0)

#define CRIT_EXIT(primask_) __set_PRIMASK((primask_))

I first tested the unmodified CMSIS_5 and verified that the problem exists when compiled with the flags -mcpu=cortex-m0plus -O.

Next, I replaced the old "cmsis_gcc.h" with the new version (commit 3520562) and recompiled my test code. The problem was fixed.

--MMS

David Brown (davidbrown) wrote :

After some discussion in the gcc mailing list, one of the developers there declared it a bug in gcc - "asm volatile" statements can move across normal code, but not across other asm volatiles, volatile memory accesses, or other observable behaviour. He filed an upstream gcc bug, and also posted a patch:

<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602>

The patch is very simple, and should be directly applicable to any version of gcc as the relevant code has not changed in years.

Hopefully the gcc-arm-embedded folk will therefore be able to include it in all their currently open toolchain branches, to be included in their next releases.

Certainly, once the bug is committed to the stable GCC branches it'll make its way into the GNU Arm Embedded Toolchain. Thanks for reporting the issue David.

Best regards.

David Brown (davidbrown) wrote :

I didn't spot the bug or report it - credit for that goes to Miro. I just tested it and shuffled it around mailing lists.

Leo Havmøller (leh-p) wrote :

An IMHO important detail is that this issue is only seen with -O, which is the same as -O1.
The issue is not seen with -O0, -O2, -O3, -Os, -Ofast or -Og.

David Brown (davidbrown) wrote :

I think the -O1 is an interesting detail, but probably not an important one. Certainly if there is no optimisation (-O0), code rarely moves around and the issue does not arrive. The bug in gcc means that the volatile asm /can/ move around - not that it has to be moved. Once it is free to move, the actual code generated will depend on the details of the generated code, and in particular, on the register pressure. These details will vary for different combinations of the code surrounding the problem area, and the optimisation choices. It appears to be just luck that this particular combination of code (and register usage) before the loop, and these particular flags, happens to result in a bad move. Changing apparently irrelevant details in the code also stop the problem appearing - and I expect that if you have enough trials with different code combinations, you will also be able to find examples of the bug with -Os, -O2, -O3, cortex-m4, and other variations of the flags. But you'd need plenty of patience, as combinations that trigger a dangerous move are rare.

This bug has been in gcc for many years, and applies equally to all ports (not just ARM), all variants, and all optimisation levels (except -O0). The fact that no one has noticed it until now shows it is very unlikely to cause a noticeable effect in practice. (Though it may, of course, have caused bugs before without people realising it.) Thanks to Miro for spotting the problem - such rare bugs are always hard to find.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.