EPICS Base

ipAddrToAscii immediate callback deadlock

Bug #541293 reported by Jeff Hill on 2006-05-30

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	EPICS Base	Fix Released	High	Jeff Hill	EPICS Base 3.14.9

Bug Description

From Kay:

I'm looking at the hangup with multiply defined PVs on a 32bit machine now, with the basic 'camonitor'. So this is nothing specific to 64bit or the archiver code.

I simply started the same softIoc twice. It has a database created like this:
for ($i=0; $i<500; ++$i)
{
print "record(calc, "count$i")n";
print "{n";
print " field(SCAN, "1 second")n";
print " field(INPA, "count$i")n";
print " field(CALC, "A+1")n";
print "}n";
}

When I run
camonitor count0 count1 count2 count3 I get warnings about the multiple PVs, but then the updates arrive OK.

But when I monitor more, as the archive engine would do:
  perl >test
  print "camonitor ";
  for ($i=0; $i<500; ++$i) { print "count$i "; }
  print "n";
  ^D
  sh ./test

... the camonitor hangs. Sometimes immediately, sometimes after a bunch of warnings about identical PVs. But always after a few seconds.

I didn't see the problem with R3.14.7.
I do see it with the basic R3.14.8.2.
When attaching the debugger, there's not much info:

(gdb) thread 1
[Switching to thread 1 (Thread -1222069600 (LWP 21363))]#0
0xb75326e1 in __lll_mutex_lock_wait () from /lib/tls/libpthrea d.so.0
(gdb) bt
#0 0xb75326e1 in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#1 0xb752f7a0 in _L_mutex_lock_78 () from /lib/tls/libpthread.so.0
#2 0xb758b850 in __JCR_LIST__ () from /ade/epics/supTop/base/ R3.14.8.2/lib/linux-x86/libCom.so
#3 0x0804da14 in ?? ()
#4 0xbfffa598 in ?? ()
#5 0xb7579375 in epicsMutexOsdLock (pmutex=0xb758b850) at ../../../
src/libCom/osi/os/posix/osdMutex.c:44
Previous frame identical to this frame (corrupt stack?)
(gdb) thread 2
[Switching to thread 2 (Thread -1225532496 (LWP 21371))]#0
0xb75301fb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/ libpthread.so.0
(gdb) bt
#0 0xb75301fb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/ libpthread.so.0
#1 0xb7579786 in epicsEventWait (pevent=0xb700a8d0) at ../../../src/
libCom/osi/os/posix/osdEvent.c:75
#2 0xb7566fac in errlogThread () at ../../../src/libCom/error/
errlog.c:468
#3 0xb7578d55 in start_routine (arg=0xb700afc0) at ../../../src/ libCom/osi/os/posix/osdThread.c:320
#4 0xb752ddec in start_thread () from /lib/tls/libpthread.so.0
#5 0xb736ce8a in clone () from /lib/tls/libc.so.6
(gdb) thread 3
[Switching to thread 3 (Thread -1224737872 (LWP 21368))]#0
0xb75301fb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/ libpthread.so.0
(gdb) bt
#0 0xb75301fb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/ libpthread.so.0
#1 0xb7579786 in epicsEventWait (pevent=0xb70045f8) at ../../../src/
libCom/osi/os/posix/osdEvent.c:75
#2 0xb757320a in epicsEvent::wait (this=0xfffffffc) at ../../../src/
libCom/osi/epicsEvent.cpp:63
#3 0xb75c096a in tcpSendThread::run (this=0x8085528) at ../
tcpiiu.cpp:85
#4 0xb75715e4 in epicsThreadCallEntryPoint (pPvt=0x808552c) at ../../../src/libCom/osi/epicsThread.cpp:59
#5 0xb7578d55 in start_routine (arg=0xb70004d0) at ../../../src/ libCom/osi/os/posix/osdThread.c:320
#6 0xb752ddec in start_thread () from /lib/tls/libpthread.so.0
#7 0xb736ce8a in clone () from /lib/tls/libc.so.6
(gdb) thread 4
[Switching to thread 4 (Thread -1222997072 (LWP 21367))]#0
0xb75326e1 in __lll_mutex_lock_wait () from /lib/tls/libpthrea d.so.0
(gdb) bt
#0 0xb75326e1 in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#1 0xb752f7a0 in _L_mutex_lock_78 () from /lib/tls/libpthread.so.0
#2 0xb758b850 in __JCR_LIST__ () from /ade/epics/supTop/base/ R3.14.8.2/lib/linux-x86/libCom.so
#3 0x0804d9dc in ?? ()
#4 0xb71a8548 in ?? ()
#5 0xb7579375 in epicsMutexOsdLock (pmutex=0xb758b850) at ../../../
src/libCom/osi/os/posix/osdMutex.c:44
Previous frame identical to this frame (corrupt stack?)
(gdb) thread 5
[Switching to thread 5 (Thread -1222730832 (LWP 21366))]#0
0xb75326e1 in __lll_mutex_lock_wait () from /lib/tls/libpthrea d.so.0
(gdb) bt
#0 0xb75326e1 in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#1 0xb752f7a0 in _L_mutex_lock_78 () from /lib/tls/libpthread.so.0
#2 0xb758b850 in __JCR_LIST__ () from /ade/epics/supTop/base/ R3.14.8.2/lib/linux-x86/libCom.so
#3 0x0804da14 in ?? ()
#4 0xb71e9198 in ?? ()
#5 0xb7579375 in epicsMutexOsdLock (pmutex=0xb758b850) at ../../../
src/libCom/osi/os/posix/osdMutex.c:44
Previous frame identical to this frame (corrupt stack?)
(gdb) thread 6
[Switching to thread 6 (Thread -1222206544 (LWP 21365))]#0
0xb75326e1 in __lll_mutex_lock_wait () from /lib/tls/libpthrea d.so.0
(gdb) bt
#0 0xb75326e1 in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#1 0xb752f7a0 in _L_mutex_lock_78 () from /lib/tls/libpthread.so.0
#2 0xb758b850 in __JCR_LIST__ () from /ade/epics/supTop/base/ R3.14.8.2/lib/linux-x86/libCom.so
#3 0x0804d9dc in ?? ()
#4 0xb72696f8 in ?? ()
#5 0xb7579375 in epicsMutexOsdLock (pmutex=0xb758b850) at ../../../
src/libCom/osi/os/posix/osdMutex.c:44
Previous frame identical to this frame (corrupt stack?)
(gdb) thread 7
[Switching to thread 7 (Thread -1222071376 (LWP 21364))]#0
0xb75326e1 in __lll_mutex_lock_wait () from /lib/tls/libpthrea d.so.0
(gdb) bt
#0 0xb75326e1 in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#1 0xb752f7a0 in _L_mutex_lock_78 () from /lib/tls/libpthread.so.0
#2 0xb758b850 in __JCR_LIST__ () from /ade/epics/supTop/base/ R3.14.8.2/lib/linux-x86/libCom.so
#3 0x0804d9dc in ?? ()
#4 0xb728a5f8 in ?? ()
#5 0xb7579375 in epicsMutexOsdLock (pmutex=0xb758b850) at ../../../
src/libCom/osi/os/posix/osdMutex.c:44
Previous frame identical to this frame (corrupt stack?)

So they're almost all waiting and have a corrupted stack.
I only get errors when trying to print the "pmutex".

When switching to the 64bit computer, with tsFreeList in EPICS_FREELIST_DEBUG mode and the recent Mantis patches for the ipAddeToAsciiAync.cpp mutex and the ca/searchTimer.cpp, tcpRecvWatchdog.cpp, there is a bit more detail.
I'll pack that in a follow-up email.

Original Mantis Bug: mantis-260
http://www.aps.anl.gov/epics/mantis/view_bug_page.php?f_id=260

Tags: