IOC crashes for DBR_CTRL_CHAR request with 1 array element

Bug #1678494 reported by mdavidsaver
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
EPICS Base
Fix Released
High
mdavidsaver
3.14
Invalid
Undecided
Unassigned
3.15
Fix Released
High
mdavidsaver
3.16
Fix Released
High
mdavidsaver

Bug Description

http://www.aps.anl.gov/epics/core-talk/2017/msg00060.php

On 04/01/2017 09:20 AM, Kasemir, Kay wrote:
> Hi:
>
>
> When reading a "long string" PV as DBR_CTRL_CHAR with 1 array element,
> an EPICS 3.15.5 IOC crashes.
>
> Has been OK with R3.14.x.
>
>
> Example C client and stack trace of server is below. Basically, CA
> server seems to try return all elements of the CHAR array, but the
> buffer only allows for the 1 requested.
>
>
> Why I ran into this:
>
> Older IOCs, i.e. most of those in operation ;-), don't support
> DBE_PROPERTY.
>
> So clients fetch the DBR_CTRL_native_type once, then subscribe to
> for example DBR_STS_* for the complete array.
>
> When fetching the initial meta data, they don't need the complete value.
> In fact for large arrays it would be wasteful to keep a copy of the
> whole outdated array, just need the meta data and then one instance of
> the 'current' array.
>
>
> Thanks,
>
> Kay
>
>
> Example C code that crashes IOC:
> ````
> /* USAGE: cademo SomePV.INP$
> *
> * Fetches a "long string" channel as DBR_CTRL_CHAR[1].
> * OK for R4.14.*, crashes R3.15.5
> */
> #include <stdio.h>
> #include <string.h>
> #include <epicsStdlib.h>
> #include <epicsString.h>
> #include <cadef.h>
>
> int main (int argc, char *argv[])
> {
> const char *name = argv[1];
> int result;
> chid chid;
> struct dbr_ctrl_char value;
>
> puts(name);
>
> result = ca_context_create(ca_disable_preemptive_callback);
> SEVCHK(result, "connect");
> result = ca_create_channel(name, 0, 0, 0, &chid);
> SEVCHK(result, "create");
> result = ca_pend_io(2.0);
> SEVCHK(result, "pend create");
> result = ca_array_get(DBR_CTRL_CHAR, 1, chid, &value);
> SEVCHK(result, "get");
> result = ca_pend_io(2.0);
> SEVCHK(result, "pend get");
>
> ca_context_destroy();
>
> return result;
> }
> ````
>
> For an IOC created via `makeBaseApp.pl -t example`, when reading
> `$(user):ai1.INP$`, the IOC crashes while placing the value,
> `$(user):calcExample1.VAL NPP NMS`, in the ca-get return buffer:
> ````
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffd75f8700 (LWP 4974)]
> 0x00007ffff66eb795 in __strncpy_sse2_unaligned () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install
> glibc-2.17-157.el7.x86_64 libgcc-4.8.5-11.el7.x86_64
> libstdc++-4.8.5-11.el7.x86_64 ncurses-libs-5.9-13.20130511.el7.x86_64
> readline-6.2-9.el7.x86_64
> (gdb) bt
> #0 0x00007ffff66eb795 in __strncpy_sse2_unaligned () from /lib64/libc.so.6
> #1 0x00007ffff7735ebf in getLinkValue (paddr=0x7fffe4013020,
> paddr=0x7fffe4013020,
> nRequest=<optimized out>, pbuf=0x7ffff7e53035
> "training:calcExample1.VAL NPP NMS",
> dbrType=<optimized out>) at ../../../src/ioc/db/dbAccess.c:773
> #2 dbGet (paddr=paddr@entry=0x7fffe4013020, dbrType=dbrType@entry=2,
> pbuffer=<optimized out>,
> options=options@entry=0x7fffd75f76e0,
> nRequest=nRequest@entry=0x7fffd75f7c30,
> pflin=pflin@entry=0x0) at ../../../src/ioc/db/dbAccess.c:857
> #3 0x00007ffff773888c in dbChannelGet (chan=chan@entry=0x7fffe4013018,
> type=type@entry=2,
> pbuffer=<optimized out>, options=options@entry=0x7fffd75f76e0,
> nRequest=nRequest@entry=0x7fffd75f7c30, pfl=pfl@entry=0x0)
> at ../../../src/ioc/db/dbChannel.c:668
> #4 0x00007ffff774b8d3 in dbChannel_get_count
> (chan=chan@entry=0x7fffe4013018,
> buffer_type=<optimized out>, pbuffer=0x7ffff7e53020,
> nRequest=nRequest@entry=0x7fffd75f7c30,
> pfl=pfl@entry=0x0) at ../../../src/ioc/db/db_access.c:685
> #5 0x00007ffff776f6c2 in read_reply (pfl=0x0, eventsRemaining=1,
> dbch=0x7fffe4013018,
> pArg=0x7fffd75f7c50) at ../../../src/ioc/rsrv/camessage.c:587
> #6 read_notify_action (mp=<optimized out>, pPayload=<optimized out>,
> client=<optimized out>)
> at ../../../src/ioc/rsrv/camessage.c:800
> #7 0x00007ffff77708df in camessage (client=client@entry=0x7fffec000f90)
> at ../../../src/ioc/rsrv/camessage.c:2628
> #8 0x00007ffff776d26c in camsgtask (pParm=0x7fffec000f90)
> at ../../../src/ioc/rsrv/camsgtask.c:129
> #9 0x00007ffff727a58c in start_routine (arg=0x7fffe0001130)
> at ../../../src/libCom/osi/os/posix/osdThread.c:403
> #10 0x00007ffff6441dc5 in start_thread () from /lib64/libpthread.so.0
> #11 0x00007ffff674d73d in clone () from /lib64/libc.so.6
> ````

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

This is quite easy to reproduce.

> $ cat <<EOF > test.db
> record(calc, "cnt") {
> field(INPA, "cnt")
> field(CALC, "A+1")
> field(SCAN, "1 second")
> }
> EOF
> $ ./bin/linux-x86_64-debug/softIoc -d test.db

then in another shell run

> $ caget -# 1 -d CTRL_CHAR cnt.INPA

Changed in epics-base:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
mdavidsaver (mdavidsaver) wrote :

The logic in getLinkValue() clearly needs some work. I find that strncpy() is being called with maxlen==-1. *nRequest==1 and maxlen=*nRequest-1, then --maxlen.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

A potential fix for 3.15.5.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

3.14 does not appear to crash.

Revision history for this message
Ralph Lange (ralph-lange) wrote :

I started working on a branch that adds a test for this stuff.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :
Revision history for this message
mdavidsaver (mdavidsaver) wrote :

I wanted to check with Andrew on git conversion progress before applying.

Revision history for this message
Andrew Johnson (anj) wrote :

The very similar adjacent routine getAttrValue() has exactly the same bug and segfaults with this:

$ caget -# 1 -d CTRL_CHAR cnt.RTYP

Revision history for this message
Andrew Johnson (anj) wrote :

This fix does cause a subtle change in behavior (which I am in favor of making, I was going to propose a similar change this week although I wouldn't have found this particular bug).

Previously, monitoring/getting a link field using a char array type would always return a fixed size array, whereas now it will only return the number of elements required for the string plus the terminating NUL byte. The change is caused by these lines (we didn't previously update *nRequest):

+ if(dbrType!=DBR_STRING)
+ nReq = strlen(pbuf)+1;
+ if(nRequest) *nRequest = nReq;

If we apply this change and its equivalent in getAttrValue() we should also make regular DBF_STRING fields behave similarly by applying something like this change to dbGet():

@@ -908,6 +912,9 @@
             ;/*do nothing*/
         } else if (!pfl || pfl->type == dbfl_type_rec) {
             status = convert(paddr, pbuf, n, capacity, offset);
+ if (!status && dbrType == DBR_CHAR && nRequest &&
+ paddr->pfldDes && paddr->pfldDes->field_type == DBF_STRING)
+ *nRequest = strlen(pbuf) + 1;
         } else {
             DBADDR localAddr = *paddr; /* Structure copy */

Revision history for this message
Ralph Lange (ralph-lange) wrote :

OK, I trashed the tests I was about to commit.
Let's please coordinate better.

Revision history for this message
Ralph Lange (ralph-lange) wrote :

Please also test with nRequest == 0, which is legal as per dbGet() documentation.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

> The very similar adjacent routine getAttrValue()

Noted and hopefully fixed as well. I'll commit these on the 3.16 git branch after my travis-ci tests pass.

https://github.com/mdavidsaver/epics-base/commits/integration

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

> Please also test with nRequest == 0

Now done.

Revision history for this message
Ralph Lange (ralph-lange) wrote :

Why not commit to 3.15 and upmerge as usual?

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

The test failed on rtems and I haven't had time to investigate yet.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

> Please also test with nRequest == 0

Ha, I hadn't realized that with rtems/newlib, malloc(0) returns NULL on success (which is allowed).

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

Fixed on 3.15 with 188825309309be74f3a4c2f26397c9ce36a806a5 along with a similar fix for attributes 8ebfd0821aca0a29555a119db87ddb552132c524 and a consistent change fpr regular field 739a112becd12b85b2c5a184a1c9a8e8e4c45909

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.