I understand that Michael does not have the time to fix this problem. But I really think it should be fixed. While there is no code in base 7.0.3.1 that uses the timeout variant of epicsMessageQueue::receive() there are plenty of places in epics-modules and areaDetector that do:
corvette:~/devel>find . -name '*.c*' -type f -exec grep -H ReceiveWithTimeout {} \;
./mca-7-8/mcaApp/CanberraSrc/nmc_comm_subs_1.c: len = epicsMessageQueueReceiveWithTimeout(m->responseQ, pkt, sizeof(*pkt), (double)(i->timeout_time/1000.));
./asyn-4-39/asyn/drvAsynUSBTMC/drvAsynUSBTMC.c: s = epicsMessageQueueReceiveWithTimeout(
./ipUnidig-2-11/ipUnidigApp/src/drvIpUnidig.cpp: status = epicsMessageQueueReceiveWithTimeout(msgQId_,
./autosave-5-10/asApp/src/save_restore.c: while (epicsMessageQueueReceiveWithTimeout(opMsgQueue, (void*) &msg, OP_MSG_SIZE, (double)MIN_DELAY) >= 0) {
./caputRecorder-1-7-2/caputRecorderApp/src/caputRecorder.c: msg_size = epicsMessageQueueReceiveWithTimeout(caputRecorderMsgQueue, (void *)pmsg, MSG_SIZE, 5.0);
./softGlue-2-8-2/softGlueApp/src/drvIP_EP201.c: if (epicsMessageQueueReceiveWithTimeout(pPvt->msgQId,
./areaDetector-3-9/aravisGigE/aravisGigEApp/src/aravisCamera.cpp: if (epicsMessageQueueReceiveWithTimeout(this->msgQId, &buffer, sizeof(&buffer), 0.005) == -1) {
./areaDetector-3-9/ADAravis/aravisApp/src/ADAravis.cpp: if (epicsMessageQueueReceiveWithTimeout(this->msgQId, &buffer, sizeof(&buffer), 0.005) == -1) {
This is where I believe epicsMessageQueue::receive() with a timeout is used:
The bug is such that if the time between messages is ever the same as the timeout then there is significant probability that the message will be lost. This can lead to unreliable behavior and difficult to track down problems.
I spent 2 days isolating this problem to EPICS base. Since the symptom was that frames were being dropped on a FLIR camera my first suspects were the hardware, the areaDetector driver, and the vendor library respectively. It took a long time to realize the problem was with EPICS base.
I understand that Michael does not have the time to fix this problem. But I really think it should be fixed. While there is no code in base 7.0.3.1 that uses the timeout variant of epicsMessageQue ue::receive( ) there are plenty of places in epics-modules and areaDetector that do:
corvette: ~/devel> find . -name '*.c*' -type f -exec grep -H ReceiveWithTimeout {} \; 8/mcaApp/ CanberraSrc/ nmc_comm_ subs_1. c: len = epicsMessageQue ueReceiveWithTi meout(m- >responseQ, pkt, sizeof(*pkt), (double) (i->timeout_ time/1000. )); 4-39/asyn/ drvAsynUSBTMC/ drvAsynUSBTMC. c: s = epicsMessageQue ueReceiveWithTi meout( 2-11/ipUnidigAp p/src/drvIpUnid ig.cpp: status = epicsMessageQue ueReceiveWithTi meout(msgQId_ , 5-10/asApp/ src/save_ restore. c: while (epicsMessageQu eueReceiveWithT imeout( opMsgQueue, (void*) &msg, OP_MSG_SIZE, (double)MIN_DELAY) >= 0) { -1-7-2/ caputRecorderAp p/src/caputReco rder.c: msg_size = epicsMessageQue ueReceiveWithTi meout(caputReco rderMsgQueue, (void *)pmsg, MSG_SIZE, 5.0); 2-8-2/softGlueA pp/src/ drvIP_EP201. c: if (epicsMessageQu eueReceiveWithT imeout( pPvt->msgQId, 3-9/aravisGigE/ aravisGigEApp/ src/aravisCamer a.cpp: if (epicsMessageQu eueReceiveWithT imeout( this->msgQId, &buffer, sizeof(&buffer), 0.005) == -1) { 3-9/ADAravis/ aravisApp/ src/ADAravis. cpp: if (epicsMessageQu eueReceiveWithT imeout( this->msgQId, &buffer, sizeof(&buffer), 0.005) == -1) {
./mca-7-
./asyn-
./ipUnidig-
./autosave-
./caputRecorder
./softGlue-
./areaDetector-
./areaDetector-
This is where I believe epicsMessageQue ue::receive( ) with a timeout is used:
corvette: ~/devel> find . -name '*.c*' -type f -exec grep -H 'receive(' {} \; 1-0/danteApp/ danteSrc/ dante.cpp: numRecv = msgQ_-> receive( &message, sizeof(message), MESSAGE_TIMEOUT); 3-9/ADCore/ ADApp/pluginSrc /NDPluginDriver .cpp: numBytes = pFromThreadMsgQ _->receive( &fromMsg, sizeof(fromMsg), 2.0); 3-9/ADCore/ ADApp/pluginSrc /NDPluginDriver .cpp: numBytes = pFromThreadMsgQ _->receive( &fromMsg, sizeof(fromMsg), 2.0);
./dante-
./areaDetector-
./areaDetector-
The bug is such that if the time between messages is ever the same as the timeout then there is significant probability that the message will be lost. This can lead to unreliable behavior and difficult to track down problems.
I spent 2 days isolating this problem to EPICS base. Since the symptom was that frames were being dropped on a FLIR camera my first suspects were the hardware, the areaDetector driver, and the vendor library respectively. It took a long time to realize the problem was with EPICS base.