osiSpawnDetachedProcess() child calls atexit() handlers

Bug #541322 reported by Andrew Johnson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
EPICS Base
Fix Released
Medium
Andrew Johnson

Bug Description

From Till Strauman:

IF
 - no caRepeater is running
 - CA client tries to fork a caRepeater
   but the 'exec' syscall fails (e.g., because the
   caRepeater is not found in PATH)
THEN
   the forked process may end up blocking
   for an event that never happens and hence
   may never exit and therefore never release system
   resources.

Steps to reproduce:
#include <cadef.h>
#include <errlog.h>
#include <epicsThread.h>

/* Test program to demonstrate a problem with the CA client
 * (base-3.14.8.2 and maybe earlier, base-3.14.9):
 *
 * 1) make sure no caRepeater is running
 * 2) make sure no caRepeater is found in the PATH
 * 3) make sure no other instance of this program is running
 * the ouput of 'ps' may look like this:
 *
 * tillbook:~/test> ps
 * PID TTY TIME CMD
 * 29357 pts/1 00:00:00 tcsh
 * 1753 pts/1 00:00:00 ps
 *
 * 4) execute this program; the message
 *
 * **** The executable "caRepeater" couldn't be located
 * **** because of errno = "No such file or directory".
 * **** You may need to modify your PATH environment variable.
 * **** Unable to start "CA Repeater" process.
 *
 * should be printed to the console.
 *
 * 5) after termination, check for more instances of this
 * process. There should be one hangning around. Check
 * the output of 'ps':
 *
 * tillbook:~/test> ps
 * PID TTY TIME CMD
 * 29357 pts/1 00:00:00 tcsh
 * 1834 pts/1 00:00:00 ca_zombie_tst <<<< leftover process
 * 1835 pts/1 00:00:00 ps
 */

int main(int argc, char**argv)
{
chid cid;
ca_context_create(ca_enable_preemptive_callback);
/* errlogInit spawns a thread 'errlog' */
errlogInit(0);
/* suspend for some time so that the errlog thread can
 * run, register an epicsAtExit handler and block for work
 */
epicsThreadSleep(.5);
/* ca_create_channel forks and tries to exec the caRepeater.
 * If the 'exec' syscall fails then the forked process
 * calls 'exit()' which ends up calling epicsExitCallAtExits().
 * The errlog exit handler sends the errlog thread a
 * 'termination request' event and blocks for the errlog thread
 * to terminate. However, the 'fork'ed process doesn't inherit
 * threads and therefore the exit handler blocks forever since
 * it will never receive the 'errlog termination done' event
 * because there is no errlog thread in the forked process.
 * Thus, the forked process is stuck and will never exit.
 *
 * IMO, the forked process should _exit rather than exit
 * if exec("caRepeater") fails.
 */
ca_create_channel("dummy_nonexisting_PV",0,0,0,&cid);
ca_pend_io(1.0);
ca_clear_channel(cid);
ca_context_destroy();
return 0;
}

Additional information:
FIX: libCom/osi/os/posix/osdProcess.c:osiSpawnDetachedProcess()
 should call _exit() rather than exit() if execle() fails.

Version: R3.14.9

Original Mantis Bug: mantis-292
    http://www.aps.anl.gov/epics/mantis/view_bug_page.php?f_id=292

Tags: 3.14 3.14.9
Revision history for this message
Andrew Johnson (anj) wrote :

Partly confirmed on linux-x86 (Fedora-5) against R3.14.9 as follows:

  uranus% cau
    cau: get no_PV
  **** The executable "caRepeater" couldn't be located
  **** because of errno = "No such file or directory".
  **** You may need to modify your PATH environment variable.
  **** Unable to start "CA Repeater" process.
  error on search for no_PV
  couldn't open no_PV
    cau:

While cau is still running, a ps from another terminal gives this:

  uranus% ps -ef | grep cau
  anj 1440 2430 0 09:45 pts/7 00:00:00 cau
  anj 1444 1440 0 09:45 pts/7 00:00:00 [cau] <defunct>
  anj 1446 2428 0 09:45 pts/6 00:00:00 grep cau

In my case though, the <defunct> process' parent is still the original cau thread, and when the parent exits so does the defunct thread. Any long-running CA client application should be usable to prove this.

In libCom/osi/os/posix we can fix the problem using Till's fix of having osiSpawnDetachedProcess() call _exit() which is a Posix.1 routine. Both vxWorks and RTEMS just return osiSpawnDetachedProcessNoSupport so there's no issue there; the other implementations of osiSpawnDetachedProcess() are for VMS and WIN32, neither of which call exit().

Revision history for this message
Andrew Johnson (anj) wrote :

From Till:

I'm not familiar with the internals of cau. However, note that the bug I reported bites only if a thread with an associated exit handler that synchronizes with the tread's termination (the 'errlog' thread is an example for such a thread) is already running prior to the attempt to spawn the caRepeater. Therefore the sequence

errlogInit(0) // creates errlog thread *prior* to attempt to spawn caRepeater
epicsThreadSleep(0.5) // gives errlog thread time to register its epicsAtExit handler

prior to the first CA activity is essential.

The problem occurs because the
1 forked process doesn't inherit threads
2 execle("caRepeater") fails, the spawning wrapper then calls 'exit()'
3 the epicsAtExit handlers are executed
4 the errlog thread's exit handler blocks for the errlog thread
to terminate and *** hangs forever *** because there is no errlog thread in the forked process (see 1)

Revision history for this message
Andrew Johnson (anj) wrote :

Till's last note explains the different cleanup behavior that I saw. His fix was committed to CVS.

Revision history for this message
Andrew Johnson (anj) wrote :

R3.14.10 released.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.