Segfault: pacemaker segfaults randomly on Ubuntu trusty 14.04
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
corosync (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
pacemaker (Ubuntu) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
I'm running a two node HA Cluster with pacemaker/corosync and a pretty simple configuration - only an IP address, one service and two clone sets of resources are managed (see below). however i run into constant crashes of pacemaker (looked like corossync at first) on both nodes. At the moment this behaviour makes the cluster unusable.
I attached the cluster config, cib.xml and the crashdumps to the bug, hopefully someone can make something of it.
~# crm_mon -1
crm_mon -1
Last updated: Fri Jun 6 15:43:14 2014
Last change: Fri Jun 6 10:28:17 2014 via cibadmin on lbsrv52
Stack: corosync
Current DC: lbsrv51 (1) - partition with quorum
Version: 1.1.10-42f2063
2 Nodes configured
6 Resources configured
Online: [ lbsrv51 lbsrv52 ]
Resource Group: grp_HAProxy-
res_
res_pdnsd_pdnsd (lsb:pdnsd): Started lbsrv51
Clone Set: cl_isc-
Started: [ lbsrv51 lbsrv52 ]
Clone Set: cl_tftpd-hpa_1 [res_tftpd-hpa_1]
Started: [ lbsrv51 lbsrv52 ]
== corosync.log; ==
Jun 06 15:14:56 [2324] lbsrv51 cib: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Jun 06 15:14:56 [2327] lbsrv51 attrd: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Jun 06 15:14:56 [2327] lbsrv51 attrd: crit: attrd_cs_destroy: Lost connection to Corosync service!
Jun 06 15:14:56 [2327] lbsrv51 attrd: notice: main: Exiting...
Jun 06 15:14:56 [2324] lbsrv51 cib: error: cib_cs_destroy: Corosync connection lost! Exiting.
Jun 06 15:14:56 [2327] lbsrv51 attrd: notice: main: Disconnecting client 0x7f1f86244a10, pid=2329...
Jun 06 15:14:56 [2324] lbsrv51 cib: info: terminate_cib: cib_cs_destroy: Exiting fast...
Jun 06 15:14:56 [2324] lbsrv51 cib: info: crm_client_destroy: Destroying 0 events
Jun 06 15:14:56 [2327] lbsrv51 attrd: error: attrd_cib_
Jun 06 15:14:56 [2324] lbsrv51 cib: info: qb_ipcs_
Jun 06 15:14:56 [2324] lbsrv51 cib: info: crm_client_destroy: Destroying 0 events
Jun 06 15:14:56 [2324] lbsrv51 cib: info: crm_client_destroy: Destroying 0 events
Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: error: crm_ipc_read: Connection to cib_rw failed
Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: error: mainloop_
Jun 06 15:14:56 [2324] lbsrv51 cib: info: qb_ipcs_
Jun 06 15:14:56 [2324] lbsrv51 cib: info: crm_client_destroy: Destroying 0 events
Jun 06 15:14:56 [2324] lbsrv51 cib: info: qb_ipcs_
Jun 06 15:14:56 [2324] lbsrv51 cib: info: crm_xml_cleanup: Cleaning up memory from libxml2
Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: notice: cib_connection_
Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: info: stonith_shutdown: Terminating with 1 clients
Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: info: crm_client_destroy: Destroying 0 events
Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: info: qb_ipcs_
Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: info: main: Done
Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: info: crm_xml_cleanup: Cleaning up memory from libxml2
Jun 06 15:14:56 [2329] lbsrv51 crmd: error: crm_ipc_read: Connection to cib_shm failed
Jun 06 15:14:56 [2329] lbsrv51 crmd: error: mainloop_
Jun 06 15:14:56 [2329] lbsrv51 crmd: error: crmd_cib_
Jun 06 15:14:56 [2329] lbsrv51 crmd: error: do_log: FSA: Input I_ERROR from crmd_cib_
Jun 06 15:14:56 [2329] lbsrv51 crmd: notice: do_state_
Jun 06 15:14:56 [2329] lbsrv51 crmd: warning: do_recover: Fast-tracking shutdown in response to errors
Jun 06 15:14:56 [2329] lbsrv51 crmd: warning: do_election_vote: Not voting in election, we're in state S_RECOVERY
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: do_dc_release: DC role released
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: info: pcmk_child_exit: Child process stonith-ng (2325) exited: OK (0)
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: info: crm_cs_flush: Sent 0 CPG messages (1 remaining, last=10): Library error (2)
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: notice: pcmk_process_exit: Respawning failed child process: stonith-ng
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: pe_ipc_destroy: Connection to the Policy Engine released
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: do_te_control: Transitioner is now inactive
Jun 06 15:14:56 [2329] lbsrv51 crmd: error: do_log: FSA: Input I_TERMINATE from do_recover() received in state S_RECOVERY
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: do_state_
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: do_shutdown: Disconnecting STONITH...
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: tengine_
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: info: start_child: Forked child 59988 for process stonith-ng
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: stop_recurring_
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: error: pcmk_child_exit: Child process attrd (2327) exited: Transport endpoint is not connected (107)
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: notice: pcmk_process_exit: Respawning failed child process: attrd
Jun 06 15:14:56 [2328] lbsrv51 pengine: info: crm_client_destroy: Destroying 0 events
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: info: start_child: Using uid=111 and group=119 for process attrd
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: info: start_child: Forked child 59989 for process attrd
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: info: mcp_quorum_destroy: connection closed
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: error: mcp_cpg_destroy: Connection destroyed
Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: info: crm_xml_cleanup: Cleaning up memory from libxml2
Jun 06 15:14:56 [2326] lbsrv51 lrmd: info: cancel_
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: stop_recurring_
Jun 06 15:14:56 [2326] lbsrv51 lrmd: info: cancel_
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: stop_recurring_
Jun 06 15:14:56 [2326] lbsrv51 lrmd: info: cancel_
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: stop_recurring_
Jun 06 15:14:56 [2326] lbsrv51 lrmd: info: cancel_
Jun 06 15:14:56 [59989] lbsrv51 attrd: notice: crm_cluster_
Jun 06 15:14:56 [59989] lbsrv51 attrd: error: cluster_
Jun 06 15:14:56 [59989] lbsrv51 attrd: error: main: HA Signon failed
Jun 06 15:14:56 [2329] lbsrv51 crmd: notice: lrm_state_
Jun 06 15:14:56 [59989] lbsrv51 attrd: error: main: Aborting startup
Jun 06 15:14:56 [2329] lbsrv51 crmd: notice: lrm_state_
Jun 06 15:14:56 [2329] lbsrv51 crmd: notice: lrm_state_
Jun 06 15:14:56 [2329] lbsrv51 crmd: notice: lrm_state_
Jun 06 15:14:56 [2329] lbsrv51 crmd: error: lrm_state_
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: do_lrm_control: Disconnecting from the LRM
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: lrmd_api_
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: lrmd_ipc_
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: lrm_connection_
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: lrmd_api_
Jun 06 15:14:56 [2329] lbsrv51 crmd: notice: do_lrm_control: Disconnected from the LRM
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: crm_cluster_
Jun 06 15:14:56 [2329] lbsrv51 crmd: notice: terminate_
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: crm_cluster_
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: do_ha_control: Disconnected from the cluster
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: do_cib_control: Disconnecting CIB
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: qb_ipcs_
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: do_exit: [crmd] stopped (0)
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: crmd_exit: Dropping I_PENDING: [ state=S_TERMINATE cause=C_
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: crmd_exit: Dropping I_RELEASE_SUCCESS: [ state=S_TERMINATE cause=C_
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: crmd_exit: Dropping I_TERMINATE: [ state=S_TERMINATE cause=C_
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: crmd_quorum_
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: crmd_cs_destroy: connection closed
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: crmd_init: 2329 stopped: OK (0)
Jun 06 15:14:56 [2329] lbsrv51 crmd: error: crmd_fast_exit: Could not recover from internal error
Jun 06 15:14:56 [2329] lbsrv51 crmd: info: crm_xml_cleanup: Cleaning up memory from libxml2
Jun 06 15:14:56 [2326] lbsrv51 lrmd: info: crm_client_destroy: Destroying 0 events
Jun 06 15:14:56 [59988] lbsrv51 stonith-ng: info: crm_log_init: Changed active directory to /var/lib/
Jun 06 15:14:56 [59988] lbsrv51 stonith-ng: info: get_cluster_type: Verifying cluster type: 'corosync'
Jun 06 15:14:56 [59988] lbsrv51 stonith-ng: info: get_cluster_type: Assuming an active 'corosync' cluster
Jun 06 15:14:56 [59988] lbsrv51 stonith-ng: notice: crm_cluster_
Jun 06 15:14:56 [59988] lbsrv51 stonith-ng: error: cluster_
Jun 06 15:14:56 [59988] lbsrv51 stonith-ng: crit: main: Cannot sign in to the cluster... terminating
Jun 06 15:14:56 [59988] lbsrv51 stonith-ng: info: crm_xml_cleanup: Cleaning up memory from libxml2
== dmesg: ==
[60379.304488] show_signal_msg: 18 callbacks suppressed
[60379.304493] crm_resource[
[60379.858371] cib[2234]: segfault at 0 ip 00007f59013760aa sp 00007fff0e21a0d8 error 4 in libc-2.
== syslog: ==
Jun 6 15:14:56 lbsrv51 cibmon[15100]: error: crm_ipc_read: Connection to cib_ro failed
Jun 6 15:14:56 lbsrv51 cibmon[15100]: error: mainloop_
Jun 6 15:14:56 lbsrv51 cibmon[15100]: error: cib_connection_
Jun 6 15:14:56 lbsrv51 attrd[59989]: notice: crm_add_logfile: Additional logging available in /var/log/
Jun 6 15:14:56 lbsrv51 crm_simulate[
Jun 6 15:14:56 lbsrv51 stonith-ng[59988]: notice: crm_add_logfile: Additional logging available in /var/log/
Jun 6 15:14:56 lbsrv51 crm_simulate[
Jun 6 15:14:56 lbsrv51 crm_simulate[
description: | updated |
summary: |
- Segfault: corosync segfaults randomly on Ubuntu trusty 14.04 + Segfault: pacemaker segfaults randomly on Ubuntu trusty 14.04 |
Changed in corosync (Ubuntu): | |
status: | New → Invalid |
Changed in pacemaker (Ubuntu): | |
status: | New → Invalid |
At the moment I'm running corosync in debug mode, so I should get more logs soon.