Not thread-safe: C++ Exception Processing Crashes in GNU Arm Embedded Toolchain

Bug #1905459 reported by Dave Nadler
30
This bug affects 6 people
Affects Status Importance Assigned to Milestone
GNU Arm Embedded Toolchain
New
Undecided
Unassigned

Bug Description

Concurrently throwing C++ exceptions in multiple threads crashes.

This bug has impacted multiple commercial projects, some examples:
https://forums.xilinx.com/t5/Embedded-Development-Tools/Re-building-the-SDK-libc-newlib-and-libgcc-for-reentrancy/m-p/1164973
https://www.arangodb.com/2019/09/when-exceptions-collide/
https://forums.freertos.org/t/freertos-and-c-exceptions/10911

BACKGROUND
Exception unwinding requires some data structures; specifically
the libgcc module unwind-dw2-fde.c has two-three statics.
These statics are protected by a gthread_mutex.

BUG
Arm build of libgcc builds gthread_mutex as no-op.
Thus applications cannot provide for example an RTOS
mutex with priority inheritance to protect above statics.
If I understand correctly, ARM toolchain build includes:
  gthr-single.h
which dummies out the gthread functions as inline no-ops;
consequently they cannot be overridden.

SUGGESTIONS
A couple different approaches to fix this:

1) Create+use a separate gthreads module to provide weak
no-op functions for the gthreads functions. Thus RTOS users
can provide an alternate implementation of gthreads
appropriate for their RTOS, and non-RTOS users are
unaffected.

2) Use thread-local storage instead of statics in unwind-dw2-fde.c?
Is there a architecture-independent TLS wrapper in gcc libcc?

Please let me know if you agree, and if I can help...
Thanks!
Best Regards, Dave

Tags: c++ exceptions
Dave Nadler (drn-nadler)
description: updated
Dave Nadler (drn-nadler)
description: updated
summary: - Not thread-safe: C++ Exception Processing in GNU Arm Embedded Toolchain
+ Not thread-safe: C++ Exception Processing Crashes in GNU Arm Embedded
+ Toolchain
Revision history for this message
Joshua Booth (xenoamor) wrote :

I'm hitting this same issue using FreeRTOS on a single core MCU.

Any ideas on if this is possible to work around with a GCC wrap linker option perhaps?

Revision history for this message
Toni Neubert (viatorus) wrote :

Looking at the source code:

https://github.com/gcc-mirror/gcc/blob/master/libgcc/unwind-dw2-fde.c#L85

__register_frame_info_bases
__register_frame_info_table_bases
__deregister_frame_info_bases
_Unwind_Find_FDE

You could try to use GCC wrap functionality to disable threads/interrupts before accessing these function.

Keep in mind, that wrap does not work probably with LTO... The best way would be to change the sources and add lock/unlock functions at the right place (similar to __malloc_lock/unlock).

Revision history for this message
Toni Neubert (viatorus) wrote :

Also keep in mind that other places are also not thread safe when using exceptions:

* fallback allocator for exceptions
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/libsupc%2B%2B/eh_alloc.cc#L105

* cxa_eh_globals:
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/libsupc%2B%2B/eh_globals.cc#L51

But these can be overridden by redefining.

Revision history for this message
Dave Nadler (drn-nadler) wrote (last edit ):

Joshua and Toni - You cannot use wrap here as the gthread calls are dummied out at compile time. My suggested fix (1) is a separate module containing no-op functions; these could then be replaced (or wrapped). Hope that helps,
Best Regards, Dave

Revision history for this message
Dave Nadler (drn-nadler) wrote :

PS: Please make sure to click on "This bug also affects me" in launchpad to help push up the priority...

Revision history for this message
Joshua Booth (xenoamor) wrote (last edit ):

Okay so I think I see what's going on. build-toolchain.sh is building gcc with the --disable-threads compile option. As you say this is pulling in gthr-single.h which is going to optimise out all of the __gthread_mutex_... calls.

Having a gthr-weak.h header or something similar with dummy functions that are marked weak could be a good solution. However this will require GCC to add a new --enable-threads=weak compile option.

If the GCC team are okay with it then these could just be weak in the --enable-threads=single and thus in the gthr-single.h header. There will be some small overhead for applications that don't use it however as it jumps to and from the dummy function.

I think what Toni is suggesting would be a good temporary patch until something like this can be implemented. Although the desired "__gthread_mutex_lock/unlock()" functions we actually want are absent due to the gthr-single.h file the functions that use them can still be wrapped. It's worth noting though that this will lock the entire function rather then the sections within them that are necessary. This will mean the lock will be in place for slightly longer than it needs to be. As a whole this is pretty brittle as function names and what requires wrapping may change between versions.

As pointed out though this issue continues into libstdc++ so things can get messy.

Revision history for this message
Dave Nadler (drn-nadler) wrote :

The ARM distribution uses newlib, which provides hooks for thread-safe malloc/free. The short-term solution I see is just replacing/wrapping unwind-dw2-fde.c with a version built using proper locks. Joshua, are you a GCC maintainer or just a poor suffering user like me?
Thanks,
Best Regards, Dave

Revision history for this message
Joshua Booth (xenoamor) wrote :

I'm new to pretty much all of this as I've never had to look under the hood this deeply before but from my understanding newlib/newlib_nano only implements the C standard library. Libstdc++ then sits on top of this to provide the C++ support but with the lighter weight foundation. This issue seems to be in the core of GCC itself, I'm unfortunately not a maintainer though.

I think I've ended up down this rabbit hole for the same reasons as you mind, trying to run FreeRTOS on an embedded device with C++. Thanks for the great blog post about this by the way.

I've found a couple of pieces of source-code that may well do what we want though:

https://github.com/grygorek/FreeRTOS_cpp11 (see the GCC Hook section)
https://github.com/microHAL/microhal/blob/master/microhal/os/freeRTOS/gthr-freeRTOS.h (not sure how this one gets injected)

If it is indeed possible for us to inject our own gthread implementation without rebuilding the library then we should be quids in. Simply implementing the __gthread_mutex should achieve everything we want, exceptions would then become thread-safe as will libstdc++ which relies on it. As a side effect we'll get std::mutex functionality for free should we want it

Revision history for this message
Joshua Booth (xenoamor) wrote :

Okay looks like I was wrong. I made a gthr-freeRTOS.h file based on the others which I could inject using the gthr-default.h override method but this only gives access to some of the additional libstdc++ parts. As the GCC unwinding functionality is compiled into libgcc this was uneffected. I can't seem to --wrap it either as none of these functions are appearing in my .map file.

What's weird is the only references to unwinding I can find come from a unwind-arm.c file. Is unwind-dw2-fde.c even being built, I'm starting to suspect it might not be? If so I'm at a bit of a loss for where the unwinding code actually lives.

I'm not sure doing a generic --enable-threads=weak implementation is possible either as gthr-weak.h would have to define the types of __gthread_key_t, __gthread_once_t, __gthread_mutex_t and __gthread_recursive_mutex_t. These can obviously change depending on what each individual implementation ends up requiring. In gthr-single.h these are defined as int

A --enable-threads=freeRTOS would definitely be possible but I doubt it would ever make it's way into GCC or Arm GCC and patching and compiling the whole toolchain is a real pain. The plus side to that is if it's implemented properly you'll get all of the C++ threading bells and whistles

Revision history for this message
Joshua Booth (xenoamor) wrote :

This is quite the rabbit hole...

So unwind-dw2-fde.c is definitely not being built and instead unwind-arm.c is.

I hooked up a debugger to my device and ran some FreeRTOS tasks in parallel constantly throwing exceptions that they themselves were catching. I set both configUSE_PREEMPTION and configUSE_TIME_SLICING to 1 to ensure these were getting switched out randomly. This very quickly replicated the issue.

From here I could see __cxa_throw was getting called which was then calling std::terminate.
See here: https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/libstdc%2B%2B-v3/libsupc%2B%2B/eh_throw.cc#L95

I tried wrapping __cxa_throw(), however this didn't work as I believe it uses a goto/jump to exit the function so was never returning.

What I did find is wrapping both _cxa_begin_catch() and _cxa_end_catch() has worked for me. I need to spend some time figuring out why and if this is safe for all cases but I've ran this for some time now and am not seeing any failures.

My linker options now include:
-Wl,--wrap=__cxa_begin_catch
-Wl,--wrap=__cxa_end_catch

And I now have the following in my main.cpp:

SemaphoreHandle_t cxa_throw_mutex = nullptr;

extern "C" void *__real___cxa_begin_catch(void *);

extern "C" void *__wrap___cxa_begin_catch(void *ptr)
{
    xSemaphoreTakeRecursive(cxa_throw_mutex, portMAX_DELAY);
    return __real___cxa_begin_catch(ptr);
}

extern "C" void __real___cxa_end_catch();

extern "C" void __wrap___cxa_end_catch()
{
    __real___cxa_end_catch();
    xSemaphoreGiveRecursive(cxa_throw_mutex);
}

Make sure you create the mutex with the following in your main() before you generate any exceptions:
cxa_throw_mutex = xSemaphoreCreateRecursiveMutex();

Also note that __cxa_allocate_exception() calls malloc so you will need a threadsafe malloc:
https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/libstdc%2B%2B-v3/libsupc%2B%2B/eh_alloc.cc#L284

Dave has a great blog post about making malloc thread-safe over here:
https://nadler.com/embedded/newlibAndFreeRTOS.html

Revision history for this message
Dave Nadler (drn-nadler) wrote (last edit ):

Super work Joshua. Do you have a short example of generated code to share at godbot - to illustrate and confirm how the generated code is calling exception processing?

Joshua, unwind-arm.c appears to be ARM floating point exception unwinding, not C++ exception unwinding?
Unfortunately the commentary at the top does not make this clear... See:
https://github.com/gcc-mirror/gcc/blob/master/libgcc/config/arm/unwind-arm.c

Revision history for this message
Joshua Booth (xenoamor) wrote (last edit ):

Unfortunately on Godbolt the "Compile to binary" option is greyed out for the arm-none-eabi compilers. I think this maybe necessary for the wrapping functions to work which are done at link time? Not sure, but either way it's not wrapping the function calls.

Here's the link anyway: https://godbolt.org/z/Mcj39xxKh What we can see though is that __cxa_begin_catch and __cxa_end_catch are both being called on the "catch (int e)" line where I believe the stack unwinding is occurring.

If we nest throw/catches I can actually see 3x __cxa_end_catch calls but only 2x __cxa_begin_catch calls so perhaps this solution won't work? https://godbolt.org/z/4b4WbEbfK

Good catch on the unwind files. I've no clue what the Libstdc++ __cxa_* calls are using under the hood then as I couldn't find any reference to the unwind-dw2-fde.c file with my .map or debugger. I think I'd have to add all the standard library files to my project non-statically so that my debugger has full access to step into every function call.

Revision history for this message
Dave Nadler (drn-nadler) wrote :

Hi Joshua - Some notes:
1) Wrap is done at link time so you will not see any effect on Godblot.
2) There's a lot of incorrect/incomplete info around about exception internals but try this:
   https://monkeywritescode.blogspot.com/p/c-exceptions-under-hood.html
3) Sorry I have not had time to study this - but we should make sure nothing
   unsafe happens between the __cxa_throw and the catch processing...
Thanks for this work!
Best Regards, Dave

Revision history for this message
Joshua Booth (xenoamor) wrote (last edit ):

I misread the assembly output actually. The nested throw/catch is fine, I missed a branch so this should still work with a recursive mutex.

It does mean you're locking the entirety of your unwinding and also everything within your catch section though which probably isn't optimum. There really should be a better way of doing this but I don't understand where the unwinding is actually taking place. I'm starting to suspect that there might not be a way of achieving this without changing the GCC compile flags or source-code.

You'd have thought that unwinding could be done with Thread Local Storage TLS as you first mentioned Dave but I'm not seeing any clear path to do this either

Thanks for the links, I'll read up on it and see if I can get a better understanding of it all

Revision history for this message
Dave Nadler (drn-nadler) wrote : Re: [Bug 1905459] Re: Not thread-safe: C++ Exception Processing Crashes in GNU Arm Embedded Toolchain

Locking only blocks out other threads doing catch processing;
shouldn't be too bad with normally-coded applications!
TLS would replicate data-structures for each thread that
does exception processing which could be bad (space+time)
in some circumstances.
Probably generally the least bad with a single mutex...

On 6/9/2021 10:41 AM, Joshua Booth wrote:
> I misread the assembly output actually. The nested throw/catch is fine,
> I missed a branch so this should still work with a recursive mutex.
>
> It does mean you're locking the entirety of your unwinding and also
> everything within your catch section though which probably isn't
> optimum. There really should be a better way of doing this but I don't
> understand where the unwinding is actually taking place. I'm starting to
> suspect that there might not be a way of achieving this without changing
> the GCC compile flags or source-code.
>
> You'd have thought that unwinding could be done with Thread Local
> Storage TLS as you first mentioned Dave but I'm not seeing any clear
> path to do this either

--
Dave Nadler, USA East Coast voice (978) 263-0097, <email address hidden>, Skype
  Dave.Nadler1

Revision history for this message
Toni Neubert (viatorus) wrote :

Hello Dave and Joshua, super work! Too sad, that all solutions are not 100% promising.

Could you please share your test code, where you could trigger the issue? So I could test it my self.

Revision history for this message
Joshua Booth (xenoamor) wrote :

Hi Tony, here's a quick example of what I've been using. https://gist.github.com/Xenoamor/cacf18cb19e2b260fe291b3474e1605d

I've trimmed a lot of the fat off it that I had. I've also included my binding of malloc/free to the FreeRTOS functions as well as it's imperative that malloc and free are threadsafe for this to work. You should only need to compile this file and it will override the newlib versions.

You'll need the following linker settings as well to wrap the key functions:
-Wl,--wrap=__cxa_begin_catch
-Wl,--wrap=__cxa_end_catch

You will need the following FreeRTOSConfig settings to ensure that the throw/catches get interrupted by other tasks:
#define configUSE_PREEMPTION 1
#define configUSE_TIME_SLICING 1

I'd love to share compilable code but there's a lot of platform dependant stuff involved so this is probably the best I can do.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.