On Apr 2, 2008, at 2:18 AM, Derick Eddington wrote: > It looks like the only solution is to use a SIGCHLD signal > handler. Not > to "test" (sorry) but to be notified when a specific process has died. > An idea: register a procedure for a child and have that procedure > called > when the SIGCHLD telling of that child's death is delivered to the > signal handler (you'd use SA_SIGINFO with sa_sigaction to install a > signal handler that would be given a siginfo_t telling what child PID > died); without screwing up ikarus's stack or run-time of course. > Using > an alternate signal stack (via sigaltstack and SA_ONSTACK) might be > noteworthy. Ah, it looks like ikarus already does use sa_sigaction > and > an alternate stack for SIGINT, but the handler doesn't call back into > Scheme. Exactly. In the signal handler, you're pretty much helpless because you don't even know whether you're in the Scheme code, in the GC, in GMP, in some system call (read, write, select, ...) or just in the middle of a cons that did not initialize its car or cdr fields. So, for SIGINT, all that Ikarus does right now is set two fields in the pcb record: void handler(int signo, siginfo_t* info, void* uap){ the_pcb->engine_counter = -1; the_pcb->interrupted = 1; } and that's it. In the Scheme code, on entry to every procedure, the value of engine_counter is decremented and, if negative, the engine handler is called, which resets the counter, then checks and resets the interrupted flag and either calls the interrupt handler (which raises an interrupted continuable condition and returns, or the timeout handler which just returns (iirc). So, calling into Scheme from the signal handler is just not possible. So, you add another field (say pcb->child_died) and from the handler, you set the engine_counter to be -1 and the child_died flag to be 1. In Scheme, the engine handler would have to check for this flag now, and if set, calls waitpid to reap the dead child and collect the info, and stash it somewhere (hash table of some sort) to be retrieved at a later time so that you know if your child has exited or not and what the exit status was. I'm just thinking out loud here, so, I don't know if any of this would work. I don't know off the top of my head which of these calls are interruptable/restartable, what happens if multiple children die at the same time, or when one child dies while you're collecting another. But all of this does not answer the question: how to know if a child process failed. The fact that you did not get a sigchild does not mean that the process did not fail. All it means is that it did not fail *yet* and might fail any time now. (I just read in waitpid(2) that you can pass a WNOHANG option to waitpid so that it doesn't hang, but that too does not answer the question.) Let me repeat the problem statement: You want the call to (process "foo") to return the usual values if the process is started, or raise an exception if that process was not started for whatever reason. Right? If so, then all this business with interrupts/waitpid/etc does not give that behavior, and I don't know how to do it. > Would this also be possible: If the callback procedure returns, the > continuation of the program from where it was at when the signal > handler > was called is resumed, but the callback procedure could possibly not > return as its way of dealing with the death (hahaha). That's fine. We do that all the time. That's how we break from "read" when we get sigint, and which I just realized that I somehow broke at some time. Ouch! BRB! (Okay. I'm back. Just reported bug 210744) So, it used to be fine and now it's not. :-( I'll go to bed now. Aziz,,,