Loop in casDGClient::processDG() after zero-length PV

Bug #1686787 reported by Andrew Johnson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
EPICS Base
Confirmed
Medium
Unassigned

Bug Description

From Shuei Yamada in http://www.aps.anl.gov/epics/tech-talk/2017/msg00714.php:

I have a problem with CA gateway that it runs away and all PVs
subscribing via ca-gateway become disconnected. The problem occurred 4
times in past 6 month and each time I had to restart CA gateway. I'm
using CA gataway 2.1.0 with EPICS base 3.14.12.3 on Scientific Linux
6.8 (RHEL 6.8 clone).

The log file contains following message:
====
CAS Request: ? on jkjnuc31.ccr.jkcont:40325: cmd=6 cid=76 typ=5 cnt=11
psz=32 avail=4c
CAS:
Apr 20 06:29:30 !!! Errlog message received (message is above)
zero length PV name in UDP search request?
====
The machine in the logfile, jkjnuc31.ccr.jkcont (we're using private
network for accelerator control), is one of our PCs running
CS-Studio. I replaced the machine with different one, upgraded CSS
from 4.1.1 to 4.4.1, but jkjnuc31 is sure to be involved every time.

I attached to the process during the inspection and got following back-trace:
====
(gdb) bt
#0 0x00007f5a180a74e0 in outBuf::commitMsg() () from
/svjk/jk/epics/R3.14.12/base-3.14.12.3-CSA/lib/linux-x86_64/libcas.so.3.14
#1 0x00007f5a18099512 in casDGClient::sendVersion() () from
/svjk/jk/epics/R3.14.12/base-3.14.12.3-CSA/lib/linux-x86_64/libcas.so.3.14
#2 0x00007f5a1809abb3 in casDGClient::processDG() () from
/svjk/jk/epics/R3.14.12/base-3.14.12.3-CSA/lib/linux-x86_64/libcas.so.3.14
#3 0x00007f5a180aa16b in casDGIntfOS::recvCB(inBufClient::fillParameter) ()
   from /svjk/jk/epics/R3.14.12/base-3.14.12.3-CSA/lib/linux-x86_64/libcas.so.3.14
#4 0x00007f5a17388fa6 in fdManager::process(double) () from
/svjk/jk/epics/R3.14.12/base-3.14.12.3-CSA/lib/linux-x86_64/libCom.so.3.14
#5 0x0000000000412f0e in gateServer::mainLoop (this=0x7a2440) at
../gateServer.cc:280
#6 0x0000000000406830 in startEverything (prefix=0x7ffcd6f010f5
"MRCO:GW:MR-CCR01") at ../gateway.cc:685
#7 0x0000000000408c16 in main (argc=26, argv=0x7ffcd6effbf8) at
../gateway.cc:1353
====
It seems that gateway falls into an infinite loop within
casDGClient::processDG().

Related (fix in CS-Studio): https://github.com/ControlSystemStudio/cs-studio/issues/2199

Tags: cas
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.