Bug #1932118 “Bug in vxWorks epicsAtomicCmpAndSwapIntT” : Bugs : EPICS Base

Revision history for this message

Dirk Zimoch (dirk.zimoch) wrote on 2021-06-16:

#1

atomic.patch Edit (2.9 KiB, text/plain)

This patch makes vxWorks use the generic version of epicsAtomicCmpAndSwapIntT() instead of using vxCas().

Revision history for this message

Andrew Johnson (anj) wrote on 2021-06-17:

#2

I'm having difficulty downloading your patch, Launchpad's librarian web-server seems to be misbehaving and neither of my browsers nor curl can fetch it. Is your fix to just delete that routine and its associated #define from the libCom osi/os/vxWorks/epicsAtomicOSD.h file, thus falling back to the gcc-specific version?

We really should have a test that can actually detect this bug, could you look at what needs doing to add that to epicsAtomicTest.cpp please?

We should fix this on the 3.15 branch and merge it up rather than just fixing it on 7.0. Note that we'll be releasing EPICS 7.0.6 next week so I'd like to get it included ASAP.

Changed in epics-base:
status:	New → In Progress
importance:	Undecided → Critical
assignee:	nobody → Dirk Zimoch (dirk.zimoch)

Andrew Johnson (anj) on 2021-06-17

Changed in epics-base:
milestone:	none → 7.0.6

Revision history for this message

mdavidsaver (mdavidsaver) wrote on 2021-06-17:

#3

epicsAtomicTest has coverage which tries to test the return value of the three CAS variants.

void testBasic()
{
...
    testOk1(compareAndSwap(Int, -34, -10)==-44);
    testOk1(compareAndSwap(Sizet, 34, 10)==40);
    testOk1(compareAndSwap(voidp, NULL, (void*)&Sizet)==(void*)&voidp);

Revision history for this message

Dirk Zimoch (dirk.zimoch) wrote on 2021-06-17:

#4

Andrew, my patch deletes the compareAndSwap* functions that use vxCas together with the #defines AND pulls epicsAtomicLock/Unlock in front of the vxWorks version preprocessor branch, because it is now needed in either case. Now, compareAndSwap* defaults to the same implementation for all vxWorks versions.

Michael, does any CI system actually run the tests for vxWorks? if yes, which version?

Revision history for this message

mdavidsaver (mdavidsaver) wrote on 2021-06-17:

#5

Unfortunately, no. It's all still manual for vxWorks. I'm not in a position to automate this as I did with RTEMS and QEMU. Doing so would be a great addition, but if possible, would likely be a significant investment of time. The RTEMS testing certainly was.

Personally, manually running the Base test suite is my first task after successfully compiling for a new (to me at least) target/OS version. It doesn't take much time compared with what I might otherwise spend on unnecessary troubleshooting.

Revision history for this message

Andrew Johnson (anj) wrote on 2021-06-17:

#6

Thanks to Michael I have the patch files now, although I still can't download them myself.

We don't have any CI systems that are able to run our tests on VxWorks, nor is there any infrastructure that I know of to do so automatically. I have run them by hand at times, but possibly not for a couple of years now. I would probably have used 6.9.4.1 when I last did that.

Revision history for this message

Dirk Zimoch (dirk.zimoch) wrote on 2021-06-18:

#7

How do you run tests on vxWorks? I never did. I see several *TestHarness.munch, but what to do with them?

Revision history for this message

Andrew Johnson (anj) wrote on 2021-06-18:

#8

@Dirk The VxWorks *TestHarness.munch files provide entry points for running each test program individually, and one "epicsRun*Tests()" that runs all the tests in a known working order and shows a summary of any failures at the end. In most cases you need to mount the disk containing the source code (preferably over NFS although netDrv mounts should work too) and cd into the O.<target> directory where the .munch file was built before you can run any test program.

For the libCom tests though you don't actually have to mount the source disk because they don't need to load any .dbd or .db files like the others do, but you should cd to a directory which VxWorks can write into to allow epicsStdioTest() to create a file and read it back (if you forget that cd it should just skip those tests). The final test in epicsRunLibComTests() checks epicsAtExit() functionality and doesn't really work properly with the test harness, so I display the summary results from all the other tests before calling that; don't worry if it hangs, I get that too and it would be a pain to fix it (we could drop running it).

It seems a shame to replace a truly atomic operation (which uses the lwarx + stwcx. instructions on the PowerPC so it should be fast) with a slower one that has to lock out interrupts to achieve atomicity. It would be nice to compare the output from epicsAtomicPerform() before and after applying this fix to see what effect it has. Performance tests are not run by epicsRunLibComTests() but that test is included in the libComTestHarness.munch file so you could run it by hand.

I looked through the source code for the vxAtomic library yesterday but it provides no routine with which we could directly implement the epicsAtomicCmpAndSwapXxx() semantics that our API requires, although it should be trivial to implement a __sync_bool_compare_and_swap() equivalent from a __sync_val_compare_and_swap() routine. Our internal uses could easily be converted to a new API that returns bool, and I see no reason not to switch to that, although it's too late now to develop that kind of change for the 7.0.6 release. It's getting a bit late to even apply this patch, but I trust Dirk to have tested it sufficiently that I'm going to anyway.

@Dirk The VxWorks *TestHarness.munch files provide entry points for running each test program individually, and one "epicsRun*Tests()" that runs all the tests in a known working order and shows a summary of any failures at the end. In most cases you need to mount the disk containing the source code (preferably over NFS although netDrv mounts should work too) and cd into the O.<target> directory where the .munch file was built before you can run any test program.

For the libCom tests though you don't actually have to mount the source disk because they don't need to load any .dbd or .db files like the others do, but you should cd to a directory which VxWorks can write into to allow epicsStdioTest() to create a file and read it back (if you forget that cd it should just skip those tests). The final test in epicsRunLibComTests() checks epicsAtExit() functionality and doesn't really work properly with the test harness, so I display the summary results from all the other tests before calling that; don't worry if it hangs, I get that too and it would be a pain to fix it (we could drop running it).

It seems a shame to replace a truly atomic operation (which uses the lwarx + stwcx. instructions on the PowerPC so it should be fast) with a slower one that has to lock out interrupts to achieve atomicity. It would be nice to compare the output from epicsAtomicPerform() before and after applying this fix to see what effect it has. Performance tests are not run by epicsRunLibComTests() but that test is included in the libComTestHarness.munch file so you could run it by hand.

I looked through the source code for the vxAtomic library yesterday but it provides no routine with which we could directly implement the epicsAtomicCmpAndSwapXxx() semantics that our API requires, although it should be trivial to implement a __sync_bool_compare_and_swap() equivalent from a __sync_val_compare_and_swap() routine. Our internal uses could easily be converted to a new API that returns bool, and I see no reason not to switch to that, although it's too late now to develop that kind of change for the 7.0.6 release. It's getting a bit late to even apply this patch, but I trust Dirk to have tested it sufficiently that I'm going to anyway.

Andrew Johnson (anj) on 2021-06-20

Changed in epics-base:
status:	In Progress → Fix Committed

Revision history for this message

Dirk Zimoch (dirk.zimoch) wrote on 2021-06-21:

#9

> It seems a shame [...]

I fully agree. vxCas() is basically the same as __sync_bool_compare_and_swap. I too would prefer to use it, even if that means to change all uses of epicsAtomicCmpAndSwap* to the "bool" version. I can provide a patch (or rather a pull request). I guess we have to keep the existing epicsAtomicCmpAndSwap* functions in case anyone uses outside base. But I can try to provide new functions and change all base internal uses. As far as I can see, callback.c is the only code, that uses it. Unfortunatey, there is no way to emulate the old version using the new one while staying atomic.

I wonder how this problem got unnoticed for 10 years? (commit date of vxWorks/epicsAtomicOSD.h) On the other hand, it seems it has never been used before November 2017 (commit 89f0f133). But still that is 3 and a half years ago.

Revision history for this message

Dirk Zimoch (dirk.zimoch) wrote on 2021-06-22:

#10

Here are the test results on vxWorks 6.7 ppc604

Before fix:
-> epicsAtomicTest
[...]
not ok 8 - compareAndSwap(Int, -34, -10)==-44
not ok 9 - compareAndSwap(Sizet, 34, 10)==40
not ok 10 - compareAndSwap(voidp, NULL, (void*)&Sizet)==(void*)&voidp
[...]
not ok 14 - compareAndSwap(Int, -44, -10)==-44
not ok 15 - compareAndSwap(Sizet, 40, 10)==40
not ok 16 - compareAndSwap(voidp, (void*)&voidp, (void*)&Sizet)==(void*)&voidp
[...]
    Results
    =======
       Tests: 50
      Passed: 44 = 88.00%
      Failed: 6 = 12.00%

-> epicsAtomicPerform
# epicsAtomicSet of "i" takes 0.080311 microseconds
# epicsAtomicSet of "j" takes 0.080309 microseconds
# epicsAtomicSet of "Pv" takes 0.080330 microseconds
# raw incr of "i" and a NOOP function call takes 0.039048 microseconds
# raw incr of "j" and a NOOP function call takes 0.039061 microseconds
# epicsAtomicIncr "i" takes 0.074861 microseconds
# epicsAtomicIncr "j" takes 0.074872 microseconds
# epicsAtomicCmpAndSwap of "i" takes 0.077072 microseconds
# epicsAtomicCmpAndSwap of "j" takes 0.077066 microseconds
# epicsAtomicCmpAndSwap of "Pv" takes 0.077709 microseconds
# retOwnership() takes 0.148657 microseconds
# passRefOwnership() takes 0.291261 microseconds

After fix:
-> epicsAtomicTest
[...]
    Results
    =======
       Tests: 50
      Passed: 50 = 100.00%

-> epicsAtomicPerform
# epicsAtomicSet of "i" takes 0.080310 microseconds
# epicsAtomicSet of "j" takes 0.080309 microseconds
# epicsAtomicSet of "Pv" takes 0.080331 microseconds
# raw incr of "i" and a NOOP function call takes 0.039049 microseconds
# raw incr of "j" and a NOOP function call takes 0.039061 microseconds
# epicsAtomicIncr "i" takes 0.075062 microseconds
# epicsAtomicIncr "j" takes 0.074873 microseconds
# epicsAtomicCmpAndSwap of "i" takes 0.035710 microseconds
# epicsAtomicCmpAndSwap of "j" takes 0.036103 microseconds
# epicsAtomicCmpAndSwap of "Pv" takes 0.034744 microseconds
# retOwnership() takes 0.148437 microseconds
# passRefOwnership() takes 0.291260 microseconds

That means, using EpicsAtomicLockKey is actually twice as fast as using vxCas!
So I was curious and compiled again using only the pre-6.6 implementation without vxAtomic*:

-> epicsAtomicPerform
# epicsAtomicSet of "i" takes 0.080327 microseconds
# epicsAtomicSet of "j" takes 0.080339 microseconds
# epicsAtomicSet of "Pv" takes 0.080339 microseconds
# raw incr of "i" and a NOOP function call takes 0.039070 microseconds
# raw incr of "j" and a NOOP function call takes 0.039071 microseconds
# epicsAtomicIncr "i" takes 0.034906 microseconds
# epicsAtomicIncr "j" takes 0.034893 microseconds
# epicsAtomicCmpAndSwap of "i" takes 0.036107 microseconds
# epicsAtomicCmpAndSwap of "j" takes 0.035703 microseconds
# epicsAtomicCmpAndSwap of "Pv" takes 0.034886 microseconds
# retOwnership() takes 0.092999 microseconds
# passRefOwnership() takes 0.179527 microseconds

Now epicsAtomicIncr is even faster than raw incr! Is there maybe something wrong with the time measurement?

Here are the test results on vxWorks 6.7 ppc604

Before fix:
-> epicsAtomicTest
[...]
not ok  8 - compareAndSwap(Int, -34, -10)==-44
not ok  9 - compareAndSwap(Sizet, 34, 10)==40
not ok 10 - compareAndSwap(voidp, NULL, (void*)&Sizet)==(void*)&voidp
[...]
not ok 14 - compareAndSwap(Int, -44, -10)==-44
not ok 15 - compareAndSwap(Sizet, 40, 10)==40
not ok 16 - compareAndSwap(voidp, (void*)&voidp, (void*)&Sizet)==(void*)&voidp
[...]
    Results
    =======
       Tests: 50 
      Passed:  44 = 88.00%
      Failed:   6 = 12.00%

-> epicsAtomicPerform
# epicsAtomicSet of "i" takes 0.080311 microseconds
# epicsAtomicSet of "j" takes 0.080309 microseconds
# epicsAtomicSet of "Pv" takes 0.080330 microseconds
# raw incr of "i" and a NOOP function call takes 0.039048 microseconds
# raw incr of "j" and a NOOP function call takes 0.039061 microseconds
# epicsAtomicIncr "i" takes 0.074861 microseconds
# epicsAtomicIncr "j" takes 0.074872 microseconds
# epicsAtomicCmpAndSwap of "i" takes 0.077072 microseconds
# epicsAtomicCmpAndSwap of "j" takes 0.077066 microseconds
# epicsAtomicCmpAndSwap of "Pv" takes 0.077709 microseconds
# retOwnership() takes 0.148657 microseconds
# passRefOwnership() takes 0.291261 microseconds

After fix:
-> epicsAtomicTest
[...]
    Results
    =======
       Tests: 50 
      Passed:  50 = 100.00%

-> epicsAtomicPerform
# epicsAtomicSet of "i" takes 0.080310 microseconds
# epicsAtomicSet of "j" takes 0.080309 microseconds
# epicsAtomicSet of "Pv" takes 0.080331 microseconds
# raw incr of "i" and a NOOP function call takes 0.039049 microseconds
# raw incr of "j" and a NOOP function call takes 0.039061 microseconds
# epicsAtomicIncr "i" takes 0.075062 microseconds
# epicsAtomicIncr "j" takes 0.074873 microseconds
# epicsAtomicCmpAndSwap of "i" takes 0.035710 microseconds
# epicsAtomicCmpAndSwap of "j" takes 0.036103 microseconds
# epicsAtomicCmpAndSwap of "Pv" takes 0.034744 microseconds
# retOwnership() takes 0.148437 microseconds
# passRefOwnership() takes 0.291260 microseconds

That means, using EpicsAtomicLockKey is actually twice as fast as using vxCas!
So I was curious and compiled again using only the pre-6.6 implementation without vxAtomic*:

-> epicsAtomicPerform
# epicsAtomicSet of "i" takes 0.080327 microseconds
# epicsAtomicSet of "j" takes 0.080339 microseconds
# epicsAtomicSet of "Pv" takes 0.080339 microseconds
# raw incr of "i" and a NOOP function call takes 0.039070 microseconds
# raw incr of "j" and a NOOP function call takes 0.039071 microseconds
# epicsAtomicIncr "i" takes 0.034906 microseconds
# epicsAtomicIncr "j" takes 0.034893 microseconds
# epicsAtomicCmpAndSwap of "i" takes 0.036107 microseconds
# epicsAtomicCmpAndSwap of "j" takes 0.035703 microseconds
# epicsAtomicCmpAndSwap of "Pv" takes 0.034886 microseconds
# retOwnership() takes 0.092999 microseconds
# passRefOwnership() takes 0.179527 microseconds

Now epicsAtomicIncr is even faster than raw incr! Is there maybe something wrong with the time measurement?

Revision history for this message

mdavidsaver (mdavidsaver) wrote on 2021-06-22:

#11

> ... Is there maybe something wrong with the time measurement?

I am at least somewhat suspicious when only a statistical mean is reported. eg. without min, max, and std.

Also, this test isn't so realistic as there is no threading, and so no contention.

Revision history for this message

Dirk Zimoch (dirk.zimoch) wrote on 2021-06-23:

#12

Here is a re-implemtation of atomic compare and swap returning boolean.
https://github.com/epics-base/epics-base/pull/177

Revision history for this message

Dirk Zimoch (dirk.zimoch) wrote on 2021-06-23:

#13

I just found that gcc/epicsAtomicCD.h only considers version 4.1 atomic intrinsics on x86 architectures. But it seems to compile with ppc as well. Testing now if it works and if yes, which vxWorks versions support it.

Revision history for this message

Dirk Zimoch (dirk.zimoch) wrote on 2021-06-24:

#14

Using the gcc __sync_* buildins, in particular __sync_val_compare_and_swap, seems to work for vxWorks 6.6 or higher (using gcc 4.1.2 or higher) on the PPC architecture.

I see two possibilities to modify gcc/epicsAtomicCD.h so that the compiler builtins are used on PPC as well (and not only on x86):
a. Remove the architecture filter and assume gcc 4.1+ will always have __sync_*
b. Add __ppc to the filter.

I would prefer solution a, but cannot test it as I have no other architectures.
It would affect all non x86 architectures (not only vxWorks) with gcc >= 4.1 but not yet defining __GCC_HAVE_SYNC_COMPARE_AND_SWAP_* macros.
The vxWorks 6.9 gcc 4.3.3 already defines those macros, (which explains that this bug is not visible in vxWorks 6.9) thus probably only gcc 4.1 and maybe 4.2 are affected.

Opinions?

Revision history for this message

Dirk Zimoch (dirk.zimoch) wrote on 2021-06-24:

#15

fixAtomicGcc41ppc.patch Edit (552 bytes, text/plain)

Revision history for this message

mdavidsaver (mdavidsaver) wrote on 2021-06-24:

#16

GCC selection of predefined macros Edit (9.7 KiB, text/plain)

The attached gcc-atomic-predef.txt has the output of running

> cpp -dM -x c /dev/null |egrep 'ATOMIC|SYNC'

for the various architectures I have easy access to: i386 (linux, mingw, rtems), powerpc (linux, rtems), arm (linux), aarch64 (linux). I also added some -march= variations.

All are GCC 8.3 with the exception of RTEMS5, which is 7.5.

Bottom line, __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1, 2, and 4 are available on all except arm-linux with the default -march (whatever that is), and i386 with -march=i386 .

Of those with __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4, only the powerpc targets lack __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8

Revision history for this message

mdavidsaver (mdavidsaver) wrote on 2021-06-24:

#17

While not immediately relevant, all targets and variations define __GCC_ATOMIC_POINTER_LOCK_FREE. cf. https://github.com/epics-base/epics-base/issues/178

Revision history for this message

mdavidsaver (mdavidsaver) wrote on 2021-06-24:

#18

The __sync_synchronize() builtin expands on all (including -march=i386) except arm-linux-gnueabi-cpp w/ the default march (appears to be '-march=armv5te').

Revision history for this message

mdavidsaver (mdavidsaver) wrote on 2021-06-24 (last edit on 2021-06-24):

#19

I was confusing arm-linux-gnueabihf-gcc (hard float) with arm-linux-gnueabi-gcc (soft float). The raspberry pi boards I have are hard float (pi3 is 'armv7l', zero is 'armv6l'). The raspbian OS defaults to '-march=armv6+fp'.

And, for some added variety... this configuration defines only __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4, not 1, 2, or 8.

Happily I don't think this will be an issue for us as sizeof(int)==sizeof(size_t)==sizeof(void*)==4 on these targets.

New bottom line.

> #if defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4) || defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_8) || defined(__i386)

Seems like a reasonable test for use of '__sync_synchronize()'.

The existing test of __GCC_HAVE_SYNC_COMPARE_AND_SWAP_* using __SIZEOF_*__ should work for use of the other __sync_* builtins.

Revision history for this message

mdavidsaver (mdavidsaver) wrote on 2021-06-24:

#20

My attempt to apply: https://github.com/epics-base/epics-base/pull/180

Revision history for this message

Dirk Zimoch (dirk.zimoch) wrote on 2021-06-25:

#21

dirks-atomic-predef.txt Edit (5.5 KiB, text/plain)

Here are the results of my collection of compilers.
I found the following:
* gcc 4.1 on 32 bit ppc does not define any __GCC_HAVE_SYNC_COMPARE_AND_SWAP_x macro but supports __sync_val_compare_and_swap_4 but not data size 1, 2, or 8
* gcc 4.2 on 32 bit ppc does not define any __GCC_HAVE_SYNC_COMPARE_AND_SWAP_x macro but supports supports data size 1, 2, and 4 but not 8.
* gcc 4.3 on 32 bit architectures define the macros for data sizes 1, 2, and 4 but not 8 and support __sync_val_compare_and_swap accordingly
* gcc 4.9 on 64 bit architectures define the macros for data sizes 1, 2, 4, and 8 and support __sync_val_compare_and_swap accordingly
* all architectures that support any __sync_val_compare_and_swap (gcc 4.1+) so do for at least int, size_t and void*.

Andrew Johnson (anj) on 2021-07-04

Changed in epics-base:
status:	Fix Committed → Fix Released

EPICS Base

Bug in vxWorks epicsAtomicCmpAndSwapIntT

Bug Description

Other bug subscribers

Patches

Bug attachments

Remote bug watches