Comment 8 for bug 57731

Revision history for this message
Rogers Veber (labelsarl) wrote : Re: [Bug 57731] Re: Futex hang when exiting using the window close button

Hi Wayne,

A - There are some "strange" behaviours with signal on Linux since the
advent of posix threads. I mean "strange" for programer that is used
with the simple fork()/wait()/exit().
     What I just concluded with all of this, is :
     if one choose to use this then he has to avoid using some library
function within the handlers (such as fprintf, ...), and has to do as
less as possible within those handlers.
     See
http://www.gnu.org/s/libc/manual/html_node/Nonreentrancy.html#Nonreentrancy

B - If you are the maintainer of the program you use, you may have to
manage differently the death of son processes.
     1) If you don't mind the exit status of the son nor mind to know
when they dies just use SIG_IGN as handler.
         There should be no zomby process at all while the process must
be removed from list by the kernel if the father is not interested with
SIGCLD (SIGCHLD).
    2) If you mind, then it is a bit more complicated. You should
maintain two lists of sons, one for the living ones and the second for
the dead ones.
        When you receive the signal, in the handler, just get the status
by wait() syscall retrieve the son from the "Living" list, remove it and
add it to the "Dead" list.
        Then, from the outside of the handler, you may use any function
(fprintf and nonreentrant functions). Of course you should periodically
take a look in the "Dead" list.

C - In my program I push a 0.
     I used no specific document, I was just expecting that the futex
was working similarly as a traditional semaphore (see semop(2)).

D - For the FUTEX_WAKE.
     I am not quite sure but, I suppose the FUTEX_WAKE should be done by
the user process NOT by the kernel.
     And may be, it has really been done but at some point of the code
that should not have been interrupted by a signal.

     Just suppose your program is calling fprintf(stderr, ....) and
waiting for some event inside this function(buffer, fflush, ...) and a
signal occurs (ie SIGCLD).
     The signal handler uses fprintf(stderr, ...) too. It could easily
lead to data corruption.

What was doing your program at the time of the signal delivery ?

I hope this will help you.

Cheers.

-Rogers

> Hi Rogers,
> Thanks for your interesting C-code!
> Occasionally I am experiencing the same problem.
> The program freezes due to a FUTEX_WAIT call (detected by the use of strace), directly after the arrival of SIGCHLD.
> After reading some documents on futexes I believe to know, that the reason for the deadlock is a missing FUTEX_WAKE call by the kernel to wake up the suspended processes/threads again.
> In your program your are writing a zero to the futex value.
> Why is this working? Did you refer to a specific document?
>
> Cheers,
> Wayne
>