Bug in vxWorks epicsAtomicCmpAndSwapIntT

Bug #1932118 reported by Dirk Zimoch
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
EPICS Base
Fix Released
Critical
Dirk Zimoch

Bug Description

The usage of epicsAtomicCmpAndSwapIntT() in callback.c suggests that the function should return the previous value. But the VxWorks implementation uses vxCas() which is documented as returning "TRUE if the write actually occurs, FALSE otherwise."

vxAtomicLib.html:
vxCas( )
NAME
    vxCas( ) - atomically compare-and-swap the contents of a memory location

SYNOPSIS
    BOOL vxCas
        (
        atomic_t * target,
        atomicVal_t oldValue,
        atomicVal_t newValue
        )

DESCRIPTION
    This routine performs an atomic compare-and-swap; testing that *target contains oldValue, and if it does, setting the value of *target to newValue.

RETURNS
    TRUE if the swap is actually executed, FALSE otherwise.

Comparing this with the gcc documentation:
bool __sync_bool_compare_and_swap (type *ptr, type oldval type newval, ...)
type __sync_val_compare_and_swap (type *ptr, type oldval type newval, ...)

These builtins perform an atomic compare and swap. That is, if the current value of *ptr is oldval, then write newval into *ptr.

The “bool” version returns true if the comparison is successful and newval was written.
The “val” version returns the contents of *ptr before the operation.

That means epicsAtomicCmpAndSwapIntT() behaves behaves like __sync_bool_compare_and_swap(). Simply casting the return value from BOOL to the type of oldVal does not change that.

But the usage implies an expected bahavior like __sync_val_compare_and_swap (), wich is used in the gcc implementation of epicsAtomicCmpAndSwapIntT().

This bugs prevents the callback threads from being started in vxWorks 6.7, maybe more or all vxWorks 6 versions.

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote :

This patch makes vxWorks use the generic version of epicsAtomicCmpAndSwapIntT() instead of using vxCas().

Revision history for this message
Andrew Johnson (anj) wrote :

I'm having difficulty downloading your patch, Launchpad's librarian web-server seems to be misbehaving and neither of my browsers nor curl can fetch it. Is your fix to just delete that routine and its associated #define from the libCom osi/os/vxWorks/epicsAtomicOSD.h file, thus falling back to the gcc-specific version?

We really should have a test that can actually detect this bug, could you look at what needs doing to add that to epicsAtomicTest.cpp please?

We should fix this on the 3.15 branch and merge it up rather than just fixing it on 7.0. Note that we'll be releasing EPICS 7.0.6 next week so I'd like to get it included ASAP.

Changed in epics-base:
status: New → In Progress
importance: Undecided → Critical
assignee: nobody → Dirk Zimoch (dirk.zimoch)
Andrew Johnson (anj)
Changed in epics-base:
milestone: none → 7.0.6
Revision history for this message
mdavidsaver (mdavidsaver) wrote :

epicsAtomicTest has coverage which tries to test the return value of the three CAS variants.

void testBasic()
{
...
    testOk1(compareAndSwap(Int, -34, -10)==-44);
    testOk1(compareAndSwap(Sizet, 34, 10)==40);
    testOk1(compareAndSwap(voidp, NULL, (void*)&Sizet)==(void*)&voidp);

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote :

Andrew, my patch deletes the compareAndSwap* functions that use vxCas together with the #defines AND pulls epicsAtomicLock/Unlock in front of the vxWorks version preprocessor branch, because it is now needed in either case. Now, compareAndSwap* defaults to the same implementation for all vxWorks versions.

Michael, does any CI system actually run the tests for vxWorks? if yes, which version?

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

Unfortunately, no. It's all still manual for vxWorks. I'm not in a position to automate this as I did with RTEMS and QEMU. Doing so would be a great addition, but if possible, would likely be a significant investment of time. The RTEMS testing certainly was.

Personally, manually running the Base test suite is my first task after successfully compiling for a new (to me at least) target/OS version. It doesn't take much time compared with what I might otherwise spend on unnecessary troubleshooting.

Revision history for this message
Andrew Johnson (anj) wrote :

Thanks to Michael I have the patch files now, although I still can't download them myself.

We don't have any CI systems that are able to run our tests on VxWorks, nor is there any infrastructure that I know of to do so automatically. I have run them by hand at times, but possibly not for a couple of years now. I would probably have used 6.9.4.1 when I last did that.

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote :

How do you run tests on vxWorks? I never did. I see several *TestHarness.munch, but what to do with them?

Revision history for this message
Andrew Johnson (anj) wrote :

@Dirk The VxWorks *TestHarness.munch files provide entry points for running each test program individually, and one "epicsRun*Tests()" that runs all the tests in a known working order and shows a summary of any failures at the end. In most cases you need to mount the disk containing the source code (preferably over NFS although netDrv mounts should work too) and cd into the O.<target> directory where the .munch file was built before you can run any test program.

For the libCom tests though you don't actually have to mount the source disk because they don't need to load any .dbd or .db files like the others do, but you should cd to a directory which VxWorks can write into to allow epicsStdioTest() to create a file and read it back (if you forget that cd it should just skip those tests). The final test in epicsRunLibComTests() checks epicsAtExit() functionality and doesn't really work properly with the test harness, so I display the summary results from all the other tests before calling that; don't worry if it hangs, I get that too and it would be a pain to fix it (we could drop running it).

It seems a shame to replace a truly atomic operation (which uses the lwarx + stwcx. instructions on the PowerPC so it should be fast) with a slower one that has to lock out interrupts to achieve atomicity. It would be nice to compare the output from epicsAtomicPerform() before and after applying this fix to see what effect it has. Performance tests are not run by epicsRunLibComTests() but that test is included in the libComTestHarness.munch file so you could run it by hand.

I looked through the source code for the vxAtomic library yesterday but it provides no routine with which we could directly implement the epicsAtomicCmpAndSwapXxx() semantics that our API requires, although it should be trivial to implement a __sync_bool_compare_and_swap() equivalent from a __sync_val_compare_and_swap() routine. Our internal uses could easily be converted to a new API that returns bool, and I see no reason not to switch to that, although it's too late now to develop that kind of change for the 7.0.6 release. It's getting a bit late to even apply this patch, but I trust Dirk to have tested it sufficiently that I'm going to anyway.

Andrew Johnson (anj)
Changed in epics-base:
status: In Progress → Fix Committed
Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote :

> It seems a shame [...]

I fully agree. vxCas() is basically the same as __sync_bool_compare_and_swap. I too would prefer to use it, even if that means to change all uses of epicsAtomicCmpAndSwap* to the "bool" version. I can provide a patch (or rather a pull request). I guess we have to keep the existing epicsAtomicCmpAndSwap* functions in case anyone uses outside base. But I can try to provide new functions and change all base internal uses. As far as I can see, callback.c is the only code, that uses it. Unfortunatey, there is no way to emulate the old version using the new one while staying atomic.

I wonder how this problem got unnoticed for 10 years? (commit date of vxWorks/epicsAtomicOSD.h) On the other hand, it seems it has never been used before November 2017 (commit 89f0f133). But still that is 3 and a half years ago.

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote :

Here are the test results on vxWorks 6.7 ppc604

Before fix:
-> epicsAtomicTest
[...]
not ok 8 - compareAndSwap(Int, -34, -10)==-44
not ok 9 - compareAndSwap(Sizet, 34, 10)==40
not ok 10 - compareAndSwap(voidp, NULL, (void*)&Sizet)==(void*)&voidp
[...]
not ok 14 - compareAndSwap(Int, -44, -10)==-44
not ok 15 - compareAndSwap(Sizet, 40, 10)==40
not ok 16 - compareAndSwap(voidp, (void*)&voidp, (void*)&Sizet)==(void*)&voidp
[...]
    Results
    =======
       Tests: 50
      Passed: 44 = 88.00%
      Failed: 6 = 12.00%

-> epicsAtomicPerform
# epicsAtomicSet of "i" takes 0.080311 microseconds
# epicsAtomicSet of "j" takes 0.080309 microseconds
# epicsAtomicSet of "Pv" takes 0.080330 microseconds
# raw incr of "i" and a NOOP function call takes 0.039048 microseconds
# raw incr of "j" and a NOOP function call takes 0.039061 microseconds
# epicsAtomicIncr "i" takes 0.074861 microseconds
# epicsAtomicIncr "j" takes 0.074872 microseconds
# epicsAtomicCmpAndSwap of "i" takes 0.077072 microseconds
# epicsAtomicCmpAndSwap of "j" takes 0.077066 microseconds
# epicsAtomicCmpAndSwap of "Pv" takes 0.077709 microseconds
# retOwnership() takes 0.148657 microseconds
# passRefOwnership() takes 0.291261 microseconds

After fix:
-> epicsAtomicTest
[...]
    Results
    =======
       Tests: 50
      Passed: 50 = 100.00%

-> epicsAtomicPerform
# epicsAtomicSet of "i" takes 0.080310 microseconds
# epicsAtomicSet of "j" takes 0.080309 microseconds
# epicsAtomicSet of "Pv" takes 0.080331 microseconds
# raw incr of "i" and a NOOP function call takes 0.039049 microseconds
# raw incr of "j" and a NOOP function call takes 0.039061 microseconds
# epicsAtomicIncr "i" takes 0.075062 microseconds
# epicsAtomicIncr "j" takes 0.074873 microseconds
# epicsAtomicCmpAndSwap of "i" takes 0.035710 microseconds
# epicsAtomicCmpAndSwap of "j" takes 0.036103 microseconds
# epicsAtomicCmpAndSwap of "Pv" takes 0.034744 microseconds
# retOwnership() takes 0.148437 microseconds
# passRefOwnership() takes 0.291260 microseconds

That means, using EpicsAtomicLockKey is actually twice as fast as using vxCas!
So I was curious and compiled again using only the pre-6.6 implementation without vxAtomic*:

-> epicsAtomicPerform
# epicsAtomicSet of "i" takes 0.080327 microseconds
# epicsAtomicSet of "j" takes 0.080339 microseconds
# epicsAtomicSet of "Pv" takes 0.080339 microseconds
# raw incr of "i" and a NOOP function call takes 0.039070 microseconds
# raw incr of "j" and a NOOP function call takes 0.039071 microseconds
# epicsAtomicIncr "i" takes 0.034906 microseconds
# epicsAtomicIncr "j" takes 0.034893 microseconds
# epicsAtomicCmpAndSwap of "i" takes 0.036107 microseconds
# epicsAtomicCmpAndSwap of "j" takes 0.035703 microseconds
# epicsAtomicCmpAndSwap of "Pv" takes 0.034886 microseconds
# retOwnership() takes 0.092999 microseconds
# passRefOwnership() takes 0.179527 microseconds

Now epicsAtomicIncr is even faster than raw incr! Is there maybe something wrong with the time measurement?

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

> ... Is there maybe something wrong with the time measurement?

I am at least somewhat suspicious when only a statistical mean is reported. eg. without min, max, and std.

Also, this test isn't so realistic as there is no threading, and so no contention.

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote :

Here is a re-implemtation of atomic compare and swap returning boolean.
https://github.com/epics-base/epics-base/pull/177

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote :

I just found that gcc/epicsAtomicCD.h only considers version 4.1 atomic intrinsics on x86 architectures. But it seems to compile with ppc as well. Testing now if it works and if yes, which vxWorks versions support it.

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote :

Using the gcc __sync_* buildins, in particular __sync_val_compare_and_swap, seems to work for vxWorks 6.6 or higher (using gcc 4.1.2 or higher) on the PPC architecture.

I see two possibilities to modify gcc/epicsAtomicCD.h so that the compiler builtins are used on PPC as well (and not only on x86):
a. Remove the architecture filter and assume gcc 4.1+ will always have __sync_*
b. Add __ppc to the filter.

I would prefer solution a, but cannot test it as I have no other architectures.
It would affect all non x86 architectures (not only vxWorks) with gcc >= 4.1 but not yet defining __GCC_HAVE_SYNC_COMPARE_AND_SWAP_* macros.
The vxWorks 6.9 gcc 4.3.3 already defines those macros, (which explains that this bug is not visible in vxWorks 6.9) thus probably only gcc 4.1 and maybe 4.2 are affected.

Opinions?

Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote :
Revision history for this message
mdavidsaver (mdavidsaver) wrote :

The attached gcc-atomic-predef.txt has the output of running

> cpp -dM -x c /dev/null |egrep 'ATOMIC|SYNC'

for the various architectures I have easy access to: i386 (linux, mingw, rtems), powerpc (linux, rtems), arm (linux), aarch64 (linux). I also added some -march= variations.

All are GCC 8.3 with the exception of RTEMS5, which is 7.5.

Bottom line, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1, 2, and 4 are available on all except arm-linux with the default -march (whatever that is), and i386 with -march=i386 .

Of those with __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4, only the powerpc targets lack __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

While not immediately relevant, all targets and variations define __GCC_ATOMIC_POINTER_LOCK_FREE. cf. https://github.com/epics-base/epics-base/issues/178

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

The __sync_synchronize() builtin expands on all (including -march=i386) except arm-linux-gnueabi-cpp w/ the default march (appears to be '-march=armv5te').

Revision history for this message
mdavidsaver (mdavidsaver) wrote (last edit ):

I was confusing arm-linux-gnueabihf-gcc (hard float) with arm-linux-gnueabi-gcc (soft float). The raspberry pi boards I have are hard float (pi3 is 'armv7l', zero is 'armv6l'). The raspbian OS defaults to '-march=armv6+fp'.

And, for some added variety... this configuration defines only __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4, not 1, 2, or 8.

Happily I don't think this will be an issue for us as sizeof(int)==sizeof(size_t)==sizeof(void*)==4 on these targets.

New bottom line.

> #if defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4) || defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_8) || defined(__i386)

Seems like a reasonable test for use of '__sync_synchronize()'.

The existing test of __GCC_HAVE_SYNC_COMPARE_AND_SWAP_* using __SIZEOF_*__ should work for use of the other __sync_* builtins.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :
Revision history for this message
Dirk Zimoch (dirk.zimoch) wrote :

Here are the results of my collection of compilers.
I found the following:
 * gcc 4.1 on 32 bit ppc does not define any __GCC_HAVE_SYNC_COMPARE_AND_SWAP_x macro but supports __sync_val_compare_and_swap_4 but not data size 1, 2, or 8
 * gcc 4.2 on 32 bit ppc does not define any __GCC_HAVE_SYNC_COMPARE_AND_SWAP_x macro but supports supports data size 1, 2, and 4 but not 8.
 * gcc 4.3 on 32 bit architectures define the macros for data sizes 1, 2, and 4 but not 8 and support __sync_val_compare_and_swap accordingly
 * gcc 4.9 on 64 bit architectures define the macros for data sizes 1, 2, 4, and 8 and support __sync_val_compare_and_swap accordingly
 * all architectures that support any __sync_val_compare_and_swap (gcc 4.1+) so do for at least int, size_t and void*.

Andrew Johnson (anj)
Changed in epics-base:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.