Large array problem in 3.14.12
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
EPICS Base |
Fix Released
|
High
|
Unassigned |
Bug Description
Mark Rivers reports:
There appears to be a serious problem with EPICS Channel Access when using large arrays in 3.14.12.
This is easy to demonstrate. Simply load a waveform record with more than 64K elements, and then use cainfo to report on the record. It does not matter what version of cainfo is used, 3.14.12, 3.14.11, 3.14.10 and 3.14.8.2 all show the same thing. The problem is in the server.
I just built the example application from base, and loaded a 1 record database.
Here is the database:
corvette:
record(waveform, "TEST:waveform") {
field(FTVL, "UCHAR")
field(NELM, "2000000")
}
So it is a waveform record, type UCHAR, 2 million elements.
I have set EPICS_CA_
When I run the IOC and do caget on the NELM field of the waveform record I get the correct value:
corvette:
TEST:waveform.NELM 2e+06
However, when I run cainfo I get 33920 for the element count:
corvette:
TEST:waveform
State: connected
Host: corvette.
Access: read, write
Data type: DBR_CHAR (native: DBF_CHAR)
Element count: 33920
33920 is 2000000 modulo 65536.
I then did a systematic study and found that cainfo reports the correct element count up to 65535, but rolls over to 0 at 65536.
Load the database with NELM=65535
corvette:
TEST:waveform.NELM 65535
corvette:
TEST:waveform
State: connected
Host: corvette.
Access: read, write
Data type: DBR_CHAR (native: DBF_CHAR)
Element count: 65535
Load the database with NELM=65536
corvette:
TEST:waveform.NELM 65536
corvette:
TEST:waveform
State: connected
Host: corvette.
Access: read, write
Data type: DBR_CHAR (native: DBF_CHAR)
Element count: 0
It thus appears that the high word for the native element count is set to 0 in 3.14.12.
This means that applications that send large arrays over Channel Access (e.g. areaDetector viewers) will not work in 3.14.12.
Interestingly the CAJ native Java library does not have a problem, and the ImageJ viewer in areaDetector does work.
Related branches
Changed in epics-base: | |
status: | Confirmed → Fix Committed |
Changed in epics-base: | |
status: | Fix Committed → Fix Released |
this is the fix. apparently an optimization was added that is causing this bug
=== modified file 'src/rsrv/ caserverio. c' caserverio. c 2010-08-13 17:59:50 +0000 caserverio. c 2011-01-12 22:30:09 +0000 edSymbols
--- src/rsrv/
+++ src/rsrv/
@@ -33,10 +33,6 @@
#define epicsExportShar
#include "server.h"
-/* As an optimisation, any message allocated with a large header is resized to THRESHOLD 65
ca_uint32_ t * pLW = ( ca_uint32_t * ) ( pMsg + 1 ); THRESHOLD) {
- * use a small header if the payload size is below this threshold. */
-#define SMALL_MESSAGE_
-
/*
* cas_send_bs_msg()
*
@@ -357,19 +353,8 @@
if ( pMsg->m_postsize == htons ( 0xffff ) ) {
assert ( size <= ntohl ( *pLW ) );
- if (size < SMALL_MESSAGE_
- /* If the message is sufficiently small it can be worth converting a
- * large message header into a small header. This saves us all of 8
- * bytes over the wire, so it's not such a big deal. */
- pMsg->m_postsize = htons((ca_uint16_t) size);
- pMsg->m_count = htons((ca_uint16_t) ntohl(pLW[1]));
- memmove(pLW, pLW + 2, size);
- size += sizeof(caHdr);
- }
- else {
- pLW[0] = htonl ( size );
- size += sizeof ( caHdr ) + 2 * sizeof ( *pLW );
- }
+ pLW[0] = htonl ( size );
+ size += sizeof ( caHdr ) + 2 * sizeof ( *pLW );
}
else {
assert ( size <= ntohs ( pMsg->m_postsize ) );