Comment 9 for bug 1466273

Revision history for this message
Colin Ian King (colin-king) wrote :

It may be that the process is swapped out, so the delivery of the SIGKILL takes a while for it to be swapped back in and to hence get the signal.

To test this hypothesis:

a) one could disable swap and see if the process can be delivered the SIGKILL and how quickly it responds to that. However, turning swap off may cause the OOM killer to change the way things behave and hence is not a viable test pattern.

b) run smemstat (from ppa:colin-king/white) and see how much memory of the given blocked process is swapped out compared to the memory resident. If it is mostly swapped out, it could indicate why it is taken a while to get swapped back in and to respond to the SIGKILL.

c) one could add a mlockall() to the slow responding process or even mlock() on the range of text pages that handle the signal and do the ppoll so it's not swapped out. Note that one needs to allow the process to have the CAP_IPC_LOCK capability or one could tweak the RLIMIT_MEMLOCK soft limit using ulimit -l