scanIoRequest is sometimes not processing record?

Bug #541352 reported by rivers
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
EPICS Base
Fix Released
High
Eric Norum

Bug Description

An asyn port driver makes rapid callbacks to
asynDevice support for about 10 records. The asyn device support callback
function puts each value in a ring buffer and calls scanIoRequest each time. On
several tested systems this works fine, and the record processes exactly as many
times as the callback is called, i.e. it processes once for each call to
scanIoRequest. On one tested system, however, the record appears to not be processed
as many times as scanIoRequest is being called. The evidence is that there are
items remaining in the ring buffer after a burst of driver callbacks.

System configuration
3 systems were tested. All were running EPICS 3.14.8.2, asyn R4-10, and version
1-2beta of the areaDetector simulator driver.

The 2 systems with no problems were:
1) win32-x86 architecture, Visual Studio 2005 C++ compiler

2) linux-x86 architechure, /proc/version=
Linux version 2.4.20-20.9smp (<email address hidden>) (gcc
version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)) #1 SMP
Mon Aug 18 11:32:15 EDT 2003

So 2) is an older Linux kernal and an older gcc compiler. It is a 4-processor
system.

The system exhibiting the problem is:
linux-x86 system, /proc/version=
Linux version 2.6.25.14-108.fc9.i686 (mockbuild@) (gcc version 4.3.0 20080428
(Red Hat 4.3.0-8) (GCC) ) #1 SMP Mon Aug 4 14:08:11 EDT 2008

So it is a newer Linux kernal and a newer gcc compiler.

Here is the code that the asyn driver calls in asyn/devEpics/devAsynInt32.c
each time there is a new value for a record:

static void interruptCallbackInput(void *drvPvt, asynUser *pasynUser,
                epicsInt32 value)
{
    devInt32Pvt *pPvt = (devInt32Pvt *)drvPvt;
    dbCommon *pr = pPvt->pr;
    int count, size = sizeof(epicsInt32);

    if (pPvt->bipolar && (value & pPvt->signBit)) value |= ~pPvt->mask;
    asynPrint(pPvt->pasynUser, ASYN_TRACEIO_DEVICE,
        "%s devAsynInt32::interruptCallbackInput new value=%dn",
        pr->name, value);
    count = epicsRingBytesPut(pPvt->ringBuffer, (char *)&value, size);
    if (count != size) {
        epicsMutexLock(pPvt->mutexId);
        pPvt->ringBufferOverflows++;
        epicsMutexUnlock(pPvt->mutexId);
    }
    /* There is a problem. A driver could be calling us with a value after
     * this record has registered for callbacks but before EPICS has set
     * interruptAccept,
     * which means that scanIoRequest will return immediately.
     * This is very bad, because we will have pushed a value into the ring
     * buffer
     * but it won't get popped off because the record won't process. The
     * values
     * read the next time the record processes would then be stale.
     * Work around this problem by waiting here for interruptAccept */
    while (!interruptAccept) epicsThreadSleep(0.1);
    scanIoRequest(pPvt->ioScanPvt);
}

The asynPrint call in this function was added since asyn R4-10 to help in
debugging, but otherwise the code is that from asyn R4-10. Note that this
code does not take a lock around the call to epicsRingBytesPut, because that
is not supposed to be required according to the Application Developer's Guide.
Note also that every call to epicsRingBytesPut is followed uncondionally by a
call to scanIoRequest.

When the record processes (for example longin record) it calls the following
function in devAsynInt32.c

static long processLi(longinRecord *pr)
{
    devInt32Pvt *pPvt = (devInt32Pvt *)pr->dpvt;
    int status;

    getCallbackValue(pPvt);
    if(!pPvt->gotValue && !pr->pact) {
        if(pPvt->canBlock) pr->pact = 1;
        status = pasynManager->queueRequest(pPvt->pasynUser, 0, 0);
        if((status==asynSuccess) && pPvt->canBlock) return 0;
        if(pPvt->canBlock) pr->pact = 0;
        if(status != asynSuccess) {
            asynPrint(pPvt->pasynUser, ASYN_TRACE_ERROR,
                "%s devAsynInt32 queueRequest %sn",
                pr->name,pPvt->pasynUser->errorMessage);
            recGblSetSevr(pr, READ_ALARM, INVALID_ALARM);
        }
    }
    if(pPvt->status==asynSuccess) {
        pr->val = pPvt->value; pr->udf=0;
    }
    pPvt->gotValue = 0; pPvt->status = asynSuccess;
    return 0;
}

This function calls getCallbackValue to read the value from the ring buffer,
if any. If there is data in the ring buffer pPvt->gotValue is set to 1 and
the value from the ring buffer is put in pPvt->value. Thus, if there is data
in the ring buffer, the large block of code that can set PACT=1 is never
executed.

This is the code for getCallbackValue():

static void getCallbackValue(devInt32Pvt *pPvt)
{
    int count, size=sizeof(epicsInt32);

    if (pPvt->ringBuffer && (epicsRingBytesUsedBytes(pPvt->ringBuffer) >=
size)) {
        epicsMutexLock(pPvt->mutexId);
        if (pPvt->ringBufferOverflows > 0) {
            asynPrint(pPvt->pasynUser, ASYN_TRACE_ERROR,
                "%s devAsynInt32 getCallbackValue error, %d ring buffer
overflowsn",
                pPvt->pr->name, pPvt->ringBufferOverflows);
                pPvt->ringBufferOverflows = 0;
        }
        count = epicsRingBytesGet(pPvt->ringBuffer, (char *)&pPvt->value,
size);
        if (count == size)
            pPvt->gotValue = 1;
        else
            asynPrint(pPvt->pasynUser, ASYN_TRACE_ERROR,
                "%s devAsynInt32 getCallbackValue error, ring read failedn",
                pPvt->pr->name);
        epicsMutexUnlock(pPvt->mutexId);
    }
}

Since the records in question are only being processed because of the
scanIoRequest callbacks which happen after the data is put in the ring buffer,
processLi should never be setting PACT=1.

There are 2 reasons I can think of why the records would not process when
scanIoRequest is called.

1) The EPICS callback queue is full. However, if this happens EPICS prints an
error message.

2) The callback task in dbProcess() finds PACT=1. This should not be
happening, because PACT does not get set to 1 under the circumstances that
should be happening, i.e. value is in ring buffer.

To see which if either of these is happening I put "printf" statements in
dbAccess/dbProcess()

        /* If already active dont process */
        if (precord->pact) {
                unsigned short monitor_mask;

printf("DbProcess, record active %sn", precord->name);
                if (*ptrace) printf("%s: Active %sn",
                        epicsThreadGetNameSelf(), precord->name);
                /* raise scan alarm after MAX_LOCK times */
                if (precord->stat==SCAN_ALARM) goto all_done;

and in the devAsynFloat64::processAi, devAsynInt32::processLi, and
devAsynInt32ProcessMbbi, .eg.

static long processLi(longinRecord *pr)
{
    devInt32Pvt *pPvt = (devInt32Pvt *)pr->dpvt;
    int status;

    getCallbackValue(pPvt);
    if(!pPvt->gotValue && !pr->pact) {
printf("%s devAsynInt32::processLi, no ring buffer value, reading from
drivern",
pr->name);
        if(pPvt->canBlock) pr->pact = 1;
        status = pasynManager->queueRequest(pPvt->pasynUser, 0, 0);
        if((status==asynSuccess) && pPvt->canBlock) return 0;
        if(pPvt->canBlock) pr->pact = 0;

When running on the problematic Linux system I do see very rare (less than 0.1% of the time)
messages from dbProcess and devAsyn printing, e.g.

13SIM1:ROI1:UniqueId_RBV devAsynInt32::processLi, no ring buffer value,
reading from driver
DbProcess, record active 13SIM1:ROI1:UniqueId_RBV
13SIM1:ROI1:0:Net_RBV devAsynFloat64::processAi, no ring buffer value, reading
from driver
DbProcess, record active 13SIM1:ROI1:0:Net_RBV
13SIM1:ROI1:UniqueId_RBV devAsynInt32::processLi, no ring buffer value,
reading from driver
DbProcess, record active 13SIM1:ROI1:UniqueId_RBV

I do not see any messages on Windows or the other Linux system.

There is something wrong, because the record should only be processing when
the ring buffer has something in it. Because device support thinks the ring
buffer is empty it queues a call to the driver and sets PACT (this is
asynchronous device support). On the next callback dbProcess sees PACT=1 and
does not process the record. Thus I understand dbProcess finding PACT=1,
but the reason for ring buffer appearing to be empty needs to be tracked down.

However, this cannot be the only problem. There are other records, the frame
counters for example, that never appear in the printf calls in dbProcess or
device support, but which are ending up with extra ring buffer values when the
driver is done with its burst of activity.

Version: R3.14.8.2

Original Mantis Bug: mantis-323
    http://www.aps.anl.gov/epics/mantis/view_bug_page.php?f_id=323

Tags: db 3.14 3.14.8.2
Revision history for this message
Eric Norum (wenorum) wrote :

Problem was a reordering of instructions by newer, more aggressive, GCC optimizers.
Adding some 'volatile' modifiers seems to have fixed the problem

Revision history for this message
Andrew Johnson (anj) wrote :

R3.14.10 released.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.