Gateway sigsegv's when cleaning up channels using ca_clear_channel
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
PV Gateway |
Invalid
|
Undecided
|
Unassigned |
Bug Description
At LCLS, the archiver appliances connect to the IOC's thru a CA gateway. The gateway crashes once in a while. This does not seem to be related to an “out-of-memory” issue or a “Gateway has been running for a long time” issue. Instead, it seems to be related to the gateway cleaning up PVs (Feb 07 04:42) from an IOC that is CPU overloaded and keeps disconnecting ( Feb 07 02:41).
From the gateway logs...
>> Unexpected problem with CA circuit to server "eioc-und1-
>> Feb 07 02:21:23 Warning: Virtual circuit disconnect eioc-und1-
>> Feb 07 02:21:23 !!! Errlog message received (message is above)
>> Unexpected problem with CA circuit to server "eioc-und1-
>> Feb 07 02:41:49 !!! Errlog message received (message is above)
>> Feb 07 02:41:49 Warning: Virtual circuit disconnect eioc-und1-
>> Feb 07 04:42:32 PV Gateway Aborting (SIGSEGV)
I have core dumps and I am able to examine the variables etc and indeed the gateway is trying to clean up the PVs from this IOC using ca_clear_channel. However, the place where this crashes is in a fundamental place (tsDLList.h:238) in EPICS base. I can provide more details/core if needed.
Regards,
Murali
(gdb) bt
#0 0x0016c410 in __kernel_vsyscall ()
#1 0x0086de30 in raise () from /lib/libc.so.6
#2 0x0086f741 in abort () from /lib/libc.so.6
#3 0x080513a4 in sig_end (sig=11) at ../gateway.cc:300
#4 <signal handler called>
#5 0x0075a8c9 in remove (this=0xaf728260, guard=..., chan=...) at ../../.
#6 tcpiiu:
#7 0x007512b7 in nciu::destroy (this=0x17e24b88, guard=...) at ../nciu.cpp:93
#8 0x00768347 in oldChannelNotif
#9 0x00749039 in ca_clear_channel (pChan=0x17e179f0) at ../access.cpp:386
#10 0x080582e0 in gatePvData:
#11 0x08062064 in gatePvNode::destroy (this=0x1ca02110) at ../gateServer.h:69
#12 0x0805d6e7 in gateServer:
#13 0x08060fc8 in gateServer:
#14 0x0804ef18 in startEverything (prefix=0xbfd7bbe2 "GWLCLSARCH") at ../gateway.cc:656
#15 0x080511a8 in main (argc=16, argv=0xbfd7b494) at ../gateway.cc:1299
……
(gdb) up
#4 <signal handler called>
(gdb) up
#5 0x0075a8c9 in remove (this=0xaf728260, guard=..., chan=...) at ../../.
238 prevNode.pNext = theNode.pNext;
(gdb) print theNode
$1 = (tsDLNode<nciu> &) @0x17e24b98: {pNext = 0x17d44d68, pPrev = 0x0}
(gdb) up
#6 tcpiiu:
1981 this->createReq
(gdb) print chan
$2 = (nciu &) @0x17e24b88: {<cacChannel> = {_vptr.cacChannel = 0x781168, static priorityMax = 99, static priorityMin = 0, static priorityDefault = 0, static priorityLinksDB = 99,
static priorityArchive = 49, static priorityOPI = 0, callback = @0x17e179f0}, <chronIntIdRes<
id = 833073}, <No data fields>}, <tsSLNode<nciu>> = {pNext = 0x0}, <No data fields>}, <channelNode> = {<tsDLNode<nciu>> = {pNext = 0x17d44d68, pPrev = 0x0},
listMember = cs_createReqPend}, <privateInterfa
f_readPermit = false, f_writePermit = false, f_operatorConfi
sid = 4294967295, count = 0, retry = 1, nameLength = 30, typeCode = 65535, priority = 0 '\000'}
(gdb) quit
Changed in epics-base: | |
assignee: | nobody → Ralph Lange (ralph-lange) |
no longer affects: | epics-base |
More information
This is PV Gateway Version 2.0.3.0 [Mar 2 2012 09:46:57]
Gateway is built against base-R3-14-12 with a few patches applied (I can provide a full list if needed).
IOC eioc-und1-mp01 runs on RTEMS-4.9.4-slac_p0 on top of EPICS R3.14.12-SLAC_1 $Date 2010/11/27\