Comment 15 for bug 1541029

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The root cause may be the action stop can not kill an unresponsive "beam.smp" process.
How to reproduce:
# kill -STOP `pidof beam.smp`
# ocf_handler_rabbitmq-server stop
(it throws an error snippet http://pastebin.com/d4Ki8wi5 and rabbitmqctl segfaults)
# ps -f -p `pidof beam.smp`

Expected: it shall be empty after the action stop finished
Actual: beam is left running and action stop fails with
lrmd: ERROR: RMQ-runtime (beam) couldn't be stopped and will likely became unmanaged. Take care of it manually!
lrmd: INFO: p_rabbitmq-server: stop: action end.
Exit status: Error: Generic (1)

Normally, with fencing enabled, the failed node would be recovered by STONITH. But as we don't use fencing, the only option we have is to ensure the stop action kills the beam.smp and succeeds