Okay, So the cherry-pick (for version trusty_pacemaker_1.1.10+git20130802-1ubuntu2.2, based on a upstream commit) seems ok since it makes lrmd (services, services_linux) to avoid repeating a timer when the source was already removed from glib main loop context: example: + if (op->opaque->repeat_timer) { + g_source_remove(op->opaque->repeat_timer); ++ op->opaque->repeat_timer = 0; etc... This actually solved lrmd crashes I was getting with the testcase (explained inside this bug summary). === Explanation: g_source_remove -> http://oss.clusterlabs.org/pipermail/pacemaker/2014-October/022690.html libglib2 changes -> http://oss.clusterlabs.org/pipermail/pacemaker/2014-October/022699.html === Analyzing your crash file (from stonith and not lrm), it looks like we have the following scenario: ============== exited = child_waitpid(child, WNOHANG); |_> child->callback(child, child->pid, core, signo, exitcode); |_> stonith_action_async_done (stack shows: stonith_action_destroy()) <----> call g_resource_remove 2 times |_> stonith_action_clear_tracking_data(action); |_> g_source_remove(action->timer_sigterm); |_> g_critical ("Source ID %u was not found when attempting to remove it", tag); WHERE ============== Child here is the "monitor" (0x7f1f63a08b70 "monitor"): /usr/sbin/fence_legacy "Helper that presents a RHCS-style interface for Linux-HA stonith plugins" This is the script responsible to monitor a stonith resource and it has returned (triggering monitor callback) with the following data on it: ------ data (begin) ------ agent=fence_legacy action=monitor plugin=external/ssh hostlist=kjpnode2 timeout=20 async=1 tries=1 remaining_timeout=20 timer_sigterm=13 timer_sigkill=14 max_retries=2 pid=1464 rc=0 (RETURN CODE) string buffer: "Performing: stonith -t external/ssh -S\nsuccess: 0\n" ------ data (end) ------ OBS: This means that fence_legacy returned, after checking that st_kjpnode2 was ok, and its cleanup operation (callback) caused the problem we faced. As soon as it dies, the callback for this process is called: if (child->callback) { child->callback(child, child->pid, core, signo, exitcode); In our case, callback is: 0x7f1f6189cec0 which calls 0x7f1f6189af10 and then 0x7f1f6189ae60 generating the 2nd removal (g_source_remove) with the 2nd call to g_source_remove, after glib2.0 change explained before this comment, we get a g_critical ("Source ID %u was not found when attempting to remove it", tag); and this generates the crash (since g_glob is called with a critical log_level causing crm_abort to be called). POSSIBLE CAUSE: ============== Under we have: stonith_action_t *action = 0x7f1f639f5b50. if (action->timer_sigterm > 0) { g_source_remove(action->timer_sigterm); } if (action->timer_sigkill > 0) { g_source_remove(action->timer_sigkill); } Under we have stonith_action_t *action = 0x7f1f639f5b50. and a call to: stonith_action_clear_tracking_data(action); Under stonith_action_clear_tracking_data(stonith_action_t * action) we have AGAIN: stonith_action_t *action = 0x7f1f639f5b50. if (action->timer_sigterm > 0) { g_source_remove(action->timer_sigterm); action->timer_sigterm = 0; } if (action->timer_sigkill > 0) { g_source_remove(action->timer_sigkill); action->timer_sigkill = 0; } This logic probably triggered the same problem the cherry pick addressed for lrmd, but now for stonith (calling g_source_remove 2 times for the same source after glib2.0 was changed). I'll try to discuss this with upstream and confirm that this is happening. Thank you Rafael Tinoco