Comment 0 for bug 1425579

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The common case for action stop in OCF scripts for Pacemaker RA is:
1) check, by exit code, if kill (SIGTERM) succeeds
2) issue kill -9, if it wasn't.

But 'kill' always returns 0, if the given PID matched the running process and never checks the process actual state. This can be easily checked, for example:
p=`pidof cron`; sudo kill -STOP $p; sudo kill $p; echo $0; ls /proc/$p; ps -p $p
As a result, a cron will be kept running because it cannot process signals, but kill will report an exit code 0.

That is an issue as it drastically increases the number of undesired SIGKILL cases for stop actions and RA should instead tend to perform them gracefully, if possible. Besides that, for heavy loaded system, it is possible that some process cannot process SIGTERM an instant, ending up being shot in the head with SIGKILL.

The solution is to check the results of kill command against a proc fs instead of its exit code and introduce retries for kill as well. SIGKILL should be issued only in cases then there are no more retries left for graceful termination.