Comment 9 for bug 185601

Revision history for this message
Abdulaziz Ghuloum (aghuloum) wrote : Re: [Bug 185601] Re: Need to be able to know if child process failed

On Apr 2, 2008, at 2:18 AM, Derick Eddington wrote:

> It looks like the only solution is to use a SIGCHLD signal
> handler. Not
> to "test" (sorry) but to be notified when a specific process has died.
> An idea: register a procedure for a child and have that procedure
> called
> when the SIGCHLD telling of that child's death is delivered to the
> signal handler (you'd use SA_SIGINFO with sa_sigaction to install a
> signal handler that would be given a siginfo_t telling what child PID
> died); without screwing up ikarus's stack or run-time of course.
> Using
> an alternate signal stack (via sigaltstack and SA_ONSTACK) might be
> noteworthy. Ah, it looks like ikarus already does use sa_sigaction
> and
> an alternate stack for SIGINT, but the handler doesn't call back into
> Scheme.

Exactly. In the signal handler, you're pretty much helpless because
you don't even know whether you're in the Scheme code, in the GC, in
GMP, in some system call (read, write, select, ...) or just in the
middle of a cons that did not initialize its car or cdr fields.

So, for SIGINT, all that Ikarus does right now is set two fields in
the pcb record:

void handler(int signo, siginfo_t* info, void* uap){
   the_pcb->engine_counter = -1;
   the_pcb->interrupted = 1;
}

and that's it. In the Scheme code, on entry to every procedure, the
value of engine_counter is decremented and, if negative, the engine
handler is called, which resets the counter, then checks and resets
the interrupted flag and either calls the interrupt handler (which
raises an interrupted continuable condition and returns, or the
timeout handler which just returns (iirc).

So, calling into Scheme from the signal handler is just not possible.

So, you add another field (say pcb->child_died) and from the handler,
you set the engine_counter to be -1 and the child_died flag to be 1.
In Scheme, the engine handler would have to check for this flag now,
and if set, calls waitpid to reap the dead child and collect the
info, and stash it somewhere (hash table of some sort) to be
retrieved at a later time so that you know if your child has exited
or not and what the exit status was.

I'm just thinking out loud here, so, I don't know if any of this
would work. I don't know off the top of my head which of these calls
are interruptable/restartable, what happens if multiple children die
at the same time, or when one child dies while you're collecting
another.

But all of this does not answer the question: how to know if a child
process failed. The fact that you did not get a sigchild does not
mean that the process did not fail. All it means is that it did not
fail *yet* and might fail any time now. (I just read in waitpid(2)
that you can pass a WNOHANG option to waitpid so that it doesn't
hang, but that too does not answer the question.)

Let me repeat the problem statement: You want the call to (process
"foo") to return the usual values if the process is started, or raise
an exception if that process was not started for whatever reason.
Right? If so, then all this business with interrupts/waitpid/etc
does not give that behavior, and I don't know how to do it.

> Would this also be possible: If the callback procedure returns, the
> continuation of the program from where it was at when the signal
> handler
> was called is resumed, but the callback procedure could possibly not
> return as its way of dealing with the death (hahaha).

That's fine. We do that all the time. That's how we break from
"read" when we get sigint, and which I just realized that I somehow
broke at some time. Ouch! BRB! (Okay. I'm back. Just reported bug
210744
) So, it used to be fine and now it's not. :-(

I'll go to bed now.

Aziz,,,