Pacemaker (stonith) can seg fault in Trusty and Utopic after following message: Source ID XX was not found when attempting to remove it
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
pacemaker (Ubuntu) |
Fix Released
|
Undecided
|
Rafael David Tinoco | ||
Trusty |
Fix Released
|
Undecided
|
Rafael David Tinoco | ||
Utopic |
Fix Released
|
Undecided
|
Rafael David Tinoco |
Bug Description
[IMPACT]
- Pacemaker seg fault (stonith and lrmd) because:
- Newer glib versions uses hash_table to find GSources
- Glib can try to assert source being removed multiple times
[TEST CASE]
- Described by user
[REGRESSION POTENTIAL]
- Based on small fixes made by upstream commits
- User reports problem has been fixed
[OTHER INFO]
It was brought to my attention the following situation:
"""
lrmd process crashed when repeating "crm node standby" and "crm node online"
It was brought to my attention that pacemaker could seg fault (stonith) on some conditions. This problem
was brought to me when solving the following bug:
https:/
So you can check the problem here:
https:/
https:/
https:/
https:/
https:/
And possible explanation here:
https:/
https:/
(Copy and pasting here):
So the cherry-pick (for version trusty_
example:
+ if (op->opaque-
+ g_source_
++ op->opaque-
etc...
This actually solved lrmd crashes I was getting with the testcase (explained inside this bug summary).
===
Explanation:
g_source_remove -> http://
libglib2 changes -> http://
===
Analyzing your crash file (from stonith and not lrm), it looks like we have the following scenario:
==============
exited = child_waitpid(
|_> child->
|_> stonith_
|_> stonith_
|_> g_source_
|_> g_critical ("Source ID %u was not found when attempting to remove it", tag);
WHERE
==============
Child here is the "monitor" (0x7f1f63a08b70 "monitor"): /usr/sbin/
"Helper that presents a RHCS-style interface for Linux-HA stonith plugins"
This is the script responsible to monitor a stonith resource and it has returned (triggering monitor callback) with the following data on it:
------ data (begin) ------
agent=fence_legacy
action=monitor
plugin=external/ssh
hostlist=kjpnode2
timeout=20
async=1
tries=1
remaining_
timer_sigterm=13
timer_sigkill=14
max_retries=2
pid=1464
rc=0 (RETURN CODE)
string buffer: "Performing: stonith -t external/ssh -S\nsuccess: 0\n"
------ data (end) ------
OBS: This means that fence_legacy returned, after checking that
st_kjpnode2 was ok, and its cleanup operation (callback) caused
the problem we faced.
As soon as it dies, the callback for this process is called:
if (child->callback) {
In our case, callback is:
0x7f1f6189cec0 <stonith_
0x7f1f6189af10 <stonith_
0x7f1f6189ae60 <stonith_
with the 2nd call to g_source_remove, after glib2.0 change explained before this comment, we get a
g_critical ("Source ID %u was not found when attempting to remove it", tag);
and this generates the crash (since g_glob is called with a critical log_level causing crm_abort to be called).
POSSIBLE CAUSE:
==============
Under <stonith_
stonith_action_t *action = 0x7f1f639f5b50.
if (action-
}
if (action-
}
Under <stonith_
and a call to: stonith_
Under stonith_
stonith_action_t *action = 0x7f1f639f5b50.
if (action-
}
if (action-
}
This logic probably triggered the same problem the cherry pick addressed for lrmd, but now for stonith (calling g_source_remove 2 times for the same source after glib2.0 was changed).
##############
commit 0326f05c9e26f39
Author: Andrew Beekhof <email address hidden>
Date: Thu Aug 7 13:49:24 2014 +1000
Fix: stonith-ng: Reset mainloop source IDs after removing them
diff --git a/lib/fencing/
index 64bd8f3..2837682 100644
--- a/lib/fencing/
+++ b/lib/fencing/
@@ -663,9 +663,11 @@ stonith_
if (action-
+ action-
}
if (action-
+ action-
}
if (action-
##############
under <stonith_
Will provide you a hotfix with this fix and ask for feedback.
Changed in pacemaker (Ubuntu): | |
assignee: | nobody → Rafael David Tinoco (inaddy) |
summary: |
- Stonith can seg fault in Trusty and Utopic after following message: - Source ID XX was not found when attempting to remove it + Pacemaker (stonith) can seg fault in Trusty and Utopic after following + message: Source ID XX was not found when attempting to remove it |
tags: | added: cts |
description: | updated |
Changed in pacemaker (Ubuntu Trusty): | |
assignee: | nobody → Rafael David Tinoco (inaddy) |
Changed in pacemaker (Ubuntu Utopic): | |
assignee: | nobody → Rafael David Tinoco (inaddy) |
Peter,
I have created one PPA to be tested:
https:/ /launchpad. net/~inaddy/ +archive/ ubuntu/ lp1412962
# add-apt-repository ppa:inaddy/ lp1412962
# apt-get update
# apt-get install pacemaker
The right package version, for now, will be:
1.1.10+ git20130802- 1ubuntu2. 3~lp1412962~ 1 (for Trusty)
And they are going to be replaced by the appropriate version in case the
stable release update proposal is accepted into -updates repository.
Please provide me feedback regarding the fix (if it solved the problem
for you).
Thank you very much
Rafael Tinoco