Comment 7 for bug 1868486

Revision history for this message
rivers (rivers) wrote :

Hi Andrew,

Thanks for your comment, I only just saw it. For some reason when other people add comments here (e.g. Michael) I receive an e-mail about it. But when you write a comment I do not receive an e-mail. I have seen the same thing previously. Any idea why?

> Do you have the ability to run code on VxWorks at the moment?

Yes, I can test on vxWorks.

I just came up with a very simple patch that almost fixes the problem.

diff --git a/modules/libcom/src/osi/os/default/osdMessageQueue.cpp b/modules/libcom/src/osi/os/default/osdMessageQueue.cpp
index c86d8cc..e0d10be 100644
--- a/modules/libcom/src/osi/os/default/osdMessageQueue.cpp
+++ b/modules/libcom/src/osi/os/default/osdMessageQueue.cpp
@@ -339,8 +339,10 @@ myReceive(epicsMessageQueueId pmsg, void *message, unsigned int size,
     epicsMutexUnlock(pmsg->mutex);

     epicsEventStatus status;
- if (timeout > 0)
+ if (timeout > 0) {
         status = epicsEventWaitWithTimeout(threadNode.evp->event, timeout);
+ if (status != epicsEventWaitOK) status = epicsEventTryWait(threadNode.evp->event);
+ }
     else
         status = epicsEventWait(threadNode.evp->event);

This allows for the fact that there is a very brief window when the timeout can occur but an event has actually been signaled. It just calls epicsEventTryWait to detect this case.

Here is the output of running the attached test program 10,000 loops, delay time=0.01 before the patch.
Loops=10000, numSent=10000, numReceived=9770

So it failed 2.3% of the time.

Here is the output of running the test program 10,000 loops after the patch:
Loops=10000, numSent=10000, numReceived=9999

So it failed 0.01% of the time, or 230 times less than before the patch. But unfortunately it did fail, so it is not a solution.