osiSpawnDetachedProcess() child calls atexit() handlers
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
EPICS Base |
Fix Released
|
Medium
|
Andrew Johnson |
Bug Description
From Till Strauman:
IF
- no caRepeater is running
- CA client tries to fork a caRepeater
but the 'exec' syscall fails (e.g., because the
caRepeater is not found in PATH)
THEN
the forked process may end up blocking
for an event that never happens and hence
may never exit and therefore never release system
resources.
Steps to reproduce:
#include <cadef.h>
#include <errlog.h>
#include <epicsThread.h>
/* Test program to demonstrate a problem with the CA client
* (base-3.14.8.2 and maybe earlier, base-3.14.9):
*
* 1) make sure no caRepeater is running
* 2) make sure no caRepeater is found in the PATH
* 3) make sure no other instance of this program is running
* the ouput of 'ps' may look like this:
*
* tillbook:~/test> ps
* PID TTY TIME CMD
* 29357 pts/1 00:00:00 tcsh
* 1753 pts/1 00:00:00 ps
*
* 4) execute this program; the message
*
* **** The executable "caRepeater" couldn't be located
* **** because of errno = "No such file or directory".
* **** You may need to modify your PATH environment variable.
* **** Unable to start "CA Repeater" process.
*
* should be printed to the console.
*
* 5) after termination, check for more instances of this
* process. There should be one hangning around. Check
* the output of 'ps':
*
* tillbook:~/test> ps
* PID TTY TIME CMD
* 29357 pts/1 00:00:00 tcsh
* 1834 pts/1 00:00:00 ca_zombie_tst <<<< leftover process
* 1835 pts/1 00:00:00 ps
*/
int main(int argc, char**argv)
{
chid cid;
ca_context_
/* errlogInit spawns a thread 'errlog' */
errlogInit(0);
/* suspend for some time so that the errlog thread can
* run, register an epicsAtExit handler and block for work
*/
epicsThreadSlee
/* ca_create_channel forks and tries to exec the caRepeater.
* If the 'exec' syscall fails then the forked process
* calls 'exit()' which ends up calling epicsExitCallAt
* The errlog exit handler sends the errlog thread a
* 'termination request' event and blocks for the errlog thread
* to terminate. However, the 'fork'ed process doesn't inherit
* threads and therefore the exit handler blocks forever since
* it will never receive the 'errlog termination done' event
* because there is no errlog thread in the forked process.
* Thus, the forked process is stuck and will never exit.
*
* IMO, the forked process should _exit rather than exit
* if exec("caRepeater") fails.
*/
ca_create_
ca_pend_io(1.0);
ca_clear_
ca_context_
return 0;
}
Additional information:
FIX: libCom/
should call _exit() rather than exit() if execle() fails.
Version: R3.14.9
Original Mantis Bug: mantis-292
http://
Partly confirmed on linux-x86 (Fedora-5) against R3.14.9 as follows:
uranus% cau
cau: get no_PV
**** The executable "caRepeater" couldn't be located
**** because of errno = "No such file or directory".
**** You may need to modify your PATH environment variable.
**** Unable to start "CA Repeater" process.
error on search for no_PV
couldn't open no_PV
cau:
While cau is still running, a ps from another terminal gives this:
uranus% ps -ef | grep cau
anj 1440 2430 0 09:45 pts/7 00:00:00 cau
anj 1444 1440 0 09:45 pts/7 00:00:00 [cau] <defunct>
anj 1446 2428 0 09:45 pts/6 00:00:00 grep cau
In my case though, the <defunct> process' parent is still the original cau thread, and when the parent exits so does the defunct thread. Any long-running CA client application should be usable to prove this.
In libCom/osi/os/posix we can fix the problem using Till's fix of having osiSpawnDetache dProcess( ) call _exit() which is a Posix.1 routine. Both vxWorks and RTEMS just return osiSpawnDetache dProcessNoSuppo rt so there's no issue there; the other implementations of osiSpawnDetache dProcess( ) are for VMS and WIN32, neither of which call exit().