Large array problem in 3.14.12

Reported by Andrew Johnson on 2011-01-11
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
EPICS Base
High
Unassigned

Bug Description

Mark Rivers reports:

There appears to be a serious problem with EPICS Channel Access when using large arrays in 3.14.12.

This is easy to demonstrate. Simply load a waveform record with more than 64K elements, and then use cainfo to report on the record. It does not matter what version of cainfo is used, 3.14.12, 3.14.11, 3.14.10 and 3.14.8.2 all show the same thing. The problem is in the server.

I just built the example application from base, and loaded a 1 record database.

Here is the database:

corvette:example/iocBoot/iocexample>more test.template
record(waveform, "TEST:waveform") {
   field(FTVL, "UCHAR")
   field(NELM, "2000000")
}

So it is a waveform record, type UCHAR, 2 million elements.

I have set EPICS_CA_MAX_ARRAY_BYTES to 20000000 (20 MB) on both client and server.

When I run the IOC and do caget on the NELM field of the waveform record I get the correct value:

corvette:areaDetector/iocBoot/iocProsilica>caget TEST:waveform.NELM
TEST:waveform.NELM 2e+06

However, when I run cainfo I get 33920 for the element count:

corvette:areaDetector/iocBoot/iocProsilica>cainfo TEST:waveform
TEST:waveform
    State: connected
    Host: corvette.cars.aps.anl.gov:48884
    Access: read, write
    Data type: DBR_CHAR (native: DBF_CHAR)
    Element count: 33920

33920 is 2000000 modulo 65536.

I then did a systematic study and found that cainfo reports the correct element count up to 65535, but rolls over to 0 at 65536.

Load the database with NELM=65535
corvette:areaDetector/iocBoot/iocProsilica>caget TEST:waveform.NELM
TEST:waveform.NELM 65535
corvette:areaDetector/iocBoot/iocProsilica>cainfo TEST:waveform
TEST:waveform
    State: connected
    Host: corvette.cars.aps.anl.gov:5064
    Access: read, write
    Data type: DBR_CHAR (native: DBF_CHAR)
    Element count: 65535

Load the database with NELM=65536
corvette:areaDetector/iocBoot/iocProsilica>caget TEST:waveform.NELM
TEST:waveform.NELM 65536
corvette:areaDetector/iocBoot/iocProsilica>cainfo TEST:waveform
TEST:waveform
    State: connected
    Host: corvette.cars.aps.anl.gov:5064
    Access: read, write
    Data type: DBR_CHAR (native: DBF_CHAR)
    Element count: 0

It thus appears that the high word for the native element count is set to 0 in 3.14.12.

This means that applications that send large arrays over Channel Access (e.g. areaDetector viewers) will not work in 3.14.12.

Interestingly the CAJ native Java library does not have a problem, and the ImageJ viewer in areaDetector does work.

Related branches

Jeff Hill (johill-lanl) wrote :

this is the fix. apparently an optimization was added that is causing this bug

=== modified file 'src/rsrv/caserverio.c'
--- src/rsrv/caserverio.c 2010-08-13 17:59:50 +0000
+++ src/rsrv/caserverio.c 2011-01-12 22:30:09 +0000
@@ -33,10 +33,6 @@
 #define epicsExportSharedSymbols
 #include "server.h"

-/* As an optimisation, any message allocated with a large header is resized to
- * use a small header if the payload size is below this threshold. */
-#define SMALL_MESSAGE_THRESHOLD 65
-
 /*
  * cas_send_bs_msg()
  *
@@ -357,19 +353,8 @@
     if ( pMsg->m_postsize == htons ( 0xffff ) ) {
         ca_uint32_t * pLW = ( ca_uint32_t * ) ( pMsg + 1 );
         assert ( size <= ntohl ( *pLW ) );
- if (size < SMALL_MESSAGE_THRESHOLD) {
- /* If the message is sufficiently small it can be worth converting a
- * large message header into a small header. This saves us all of 8
- * bytes over the wire, so it's not such a big deal. */
- pMsg->m_postsize = htons((ca_uint16_t) size);
- pMsg->m_count = htons((ca_uint16_t) ntohl(pLW[1]));
- memmove(pLW, pLW + 2, size);
- size += sizeof(caHdr);
- }
- else {
- pLW[0] = htonl ( size );
- size += sizeof ( caHdr ) + 2 * sizeof ( *pLW );
- }
+ pLW[0] = htonl ( size );
+ size += sizeof ( caHdr ) + 2 * sizeof ( *pLW );
     }
     else {
         assert ( size <= ntohs ( pMsg->m_postsize ) );

Jeff Hill (johill-lanl) wrote :

committed a fix

Andrew Johnson (anj) on 2011-01-13
Changed in epics-base:
status: Confirmed → Fix Committed
Andrew Johnson (anj) on 2011-04-26
Changed in epics-base:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers