ipAddrToAscii immediate callback deadlock
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
EPICS Base |
Fix Released
|
High
|
Jeff Hill |
Bug Description
From Kay:
I'm looking at the hangup with multiply defined PVs on a 32bit machine now, with the basic 'camonitor'. So this is nothing specific to 64bit or the archiver code.
I simply started the same softIoc twice. It has a database created like this:
for ($i=0; $i<500; ++$i)
{
print "record(calc, "count$i")n";
print "{n";
print " field(SCAN, "1 second")n";
print " field(INPA, "count$i")n";
print " field(CALC, "A+1")n";
print "}n";
}
When I run
camonitor count0 count1 count2 count3 I get warnings about the multiple PVs, but then the updates arrive OK.
But when I monitor more, as the archive engine would do:
perl >test
print "camonitor ";
for ($i=0; $i<500; ++$i) { print "count$i "; }
print "n";
^D
sh ./test
... the camonitor hangs. Sometimes immediately, sometimes after a bunch of warnings about identical PVs. But always after a few seconds.
I didn't see the problem with R3.14.7.
I do see it with the basic R3.14.8.2.
When attaching the debugger, there's not much info:
(gdb) thread 1
[Switching to thread 1 (Thread -1222069600 (LWP 21363))]#0
0xb75326e1 in __lll_mutex_
(gdb) bt
#0 0xb75326e1 in __lll_mutex_
#1 0xb752f7a0 in _L_mutex_lock_78 () from /lib/tls/
#2 0xb758b850 in __JCR_LIST__ () from /ade/epics/
#3 0x0804da14 in ?? ()
#4 0xbfffa598 in ?? ()
#5 0xb7579375 in epicsMutexOsdLock (pmutex=0xb758b850) at ../../../
src/libCom/
Previous frame identical to this frame (corrupt stack?)
(gdb) thread 2
[Switching to thread 2 (Thread -1225532496 (LWP 21371))]#0
0xb75301fb in pthread_
(gdb) bt
#0 0xb75301fb in pthread_
#1 0xb7579786 in epicsEventWait (pevent=0xb700a8d0) at ../../../src/
libCom/
#2 0xb7566fac in errlogThread () at ../../.
errlog.c:468
#3 0xb7578d55 in start_routine (arg=0xb700afc0) at ../../../src/ libCom/
#4 0xb752ddec in start_thread () from /lib/tls/
#5 0xb736ce8a in clone () from /lib/tls/libc.so.6
(gdb) thread 3
[Switching to thread 3 (Thread -1224737872 (LWP 21368))]#0
0xb75301fb in pthread_
(gdb) bt
#0 0xb75301fb in pthread_
#1 0xb7579786 in epicsEventWait (pevent=0xb70045f8) at ../../../src/
libCom/
#2 0xb757320a in epicsEvent::wait (this=0xfffffffc) at ../../../src/
libCom/
#3 0xb75c096a in tcpSendThread::run (this=0x8085528) at ../
tcpiiu.cpp:85
#4 0xb75715e4 in epicsThreadCall
#5 0xb7578d55 in start_routine (arg=0xb70004d0) at ../../../src/ libCom/
#6 0xb752ddec in start_thread () from /lib/tls/
#7 0xb736ce8a in clone () from /lib/tls/libc.so.6
(gdb) thread 4
[Switching to thread 4 (Thread -1222997072 (LWP 21367))]#0
0xb75326e1 in __lll_mutex_
(gdb) bt
#0 0xb75326e1 in __lll_mutex_
#1 0xb752f7a0 in _L_mutex_lock_78 () from /lib/tls/
#2 0xb758b850 in __JCR_LIST__ () from /ade/epics/
#3 0x0804d9dc in ?? ()
#4 0xb71a8548 in ?? ()
#5 0xb7579375 in epicsMutexOsdLock (pmutex=0xb758b850) at ../../../
src/libCom/
Previous frame identical to this frame (corrupt stack?)
(gdb) thread 5
[Switching to thread 5 (Thread -1222730832 (LWP 21366))]#0
0xb75326e1 in __lll_mutex_
(gdb) bt
#0 0xb75326e1 in __lll_mutex_
#1 0xb752f7a0 in _L_mutex_lock_78 () from /lib/tls/
#2 0xb758b850 in __JCR_LIST__ () from /ade/epics/
#3 0x0804da14 in ?? ()
#4 0xb71e9198 in ?? ()
#5 0xb7579375 in epicsMutexOsdLock (pmutex=0xb758b850) at ../../../
src/libCom/
Previous frame identical to this frame (corrupt stack?)
(gdb) thread 6
[Switching to thread 6 (Thread -1222206544 (LWP 21365))]#0
0xb75326e1 in __lll_mutex_
(gdb) bt
#0 0xb75326e1 in __lll_mutex_
#1 0xb752f7a0 in _L_mutex_lock_78 () from /lib/tls/
#2 0xb758b850 in __JCR_LIST__ () from /ade/epics/
#3 0x0804d9dc in ?? ()
#4 0xb72696f8 in ?? ()
#5 0xb7579375 in epicsMutexOsdLock (pmutex=0xb758b850) at ../../../
src/libCom/
Previous frame identical to this frame (corrupt stack?)
(gdb) thread 7
[Switching to thread 7 (Thread -1222071376 (LWP 21364))]#0
0xb75326e1 in __lll_mutex_
(gdb) bt
#0 0xb75326e1 in __lll_mutex_
#1 0xb752f7a0 in _L_mutex_lock_78 () from /lib/tls/
#2 0xb758b850 in __JCR_LIST__ () from /ade/epics/
#3 0x0804d9dc in ?? ()
#4 0xb728a5f8 in ?? ()
#5 0xb7579375 in epicsMutexOsdLock (pmutex=0xb758b850) at ../../../
src/libCom/
Previous frame identical to this frame (corrupt stack?)
So they're almost all waiting and have a corrupted stack.
I only get errors when trying to print the "pmutex".
When switching to the 64bit computer, with tsFreeList in EPICS_FREELIST_
I'll pack that in a follow-up email.
Original Mantis Bug: mantis-260
http://
I see that the primary mutex is held when iitiating the IO for the IP address to ascii translation. Unfortunately, the ioInitiate function will call the callback directly if the queue quota is exceeded. Since this callback takes the callback mutex we can have a lock inversion, and therefore a deadlock.