Ubuntu
corosync package

Segfault: pacemaker segfaults randomly on Ubuntu trusty 14.04

Bug #1327222 reported by born2chill on 2014-06-06

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	corosync (Ubuntu)	Invalid	Undecided	Unassigned
	pacemaker (Ubuntu)	Invalid	Undecided	Unassigned

Bug Description

I'm running a two node HA Cluster with pacemaker/corosync and a pretty simple configuration - only an IP address, one service and two clone sets of resources are managed (see below). however i run into constant crashes of pacemaker (looked like corossync at first) on both nodes. At the moment this behaviour makes the cluster unusable.

I attached the cluster config, cib.xml and the crashdumps to the bug, hopefully someone can make something of it.

~# crm_mon -1
crm_mon -1
Last updated: Fri Jun 6 15:43:14 2014
Last change: Fri Jun 6 10:28:17 2014 via cibadmin on lbsrv52
Stack: corosync
Current DC: lbsrv51 (1) - partition with quorum
Version: 1.1.10-42f2063
2 Nodes configured
6 Resources configured

Online: [ lbsrv51 lbsrv52 ]

Resource Group: grp_HAProxy-Front-IPs
     res_IPaddr2_Test (ocf::heartbeat:IPaddr2): Started lbsrv51
res_pdnsd_pdnsd (lsb:pdnsd): Started lbsrv51
Clone Set: cl_isc-dhcp-server_1 [res_isc-dhcp-server_1]
     Started: [ lbsrv51 lbsrv52 ]
Clone Set: cl_tftpd-hpa_1 [res_tftpd-hpa_1]
     Started: [ lbsrv51 lbsrv52 ]

== corosync.log; ==
Jun 06 15:14:56 [2324] lbsrv51 Jun 06 15:14:56 [2327] lbsrv51 Jun 06 15:14:56 [2327] lbsrv51 Jun 06 15:14:56 [2327] lbsrv51 Jun 06 15:14:56 [2324] lbsrv51 Jun 06 15:14:56 [2327] lbsrv51 Jun 06 15:14:56 [2324] lbsrv51 Jun 06 15:14:56 [2324] lbsrv51 Jun 06 15:14:56 [2327] lbsrv51 Jun 06 15:14:56 [2324] lbsrv51 Jun 06 15:14:56 [2324] lbsrv51 Jun 06 15:14:56 [2324] lbsrv51 Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: Jun 06 15:14:56 [2324] lbsrv51 Jun 06 15:14:56 [2324] lbsrv51 Jun 06 15:14:56 [2324] lbsrv51 Jun 06 15:14:56 [2324] lbsrv51 Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: Jun 06 15:14:56 [2325] lbsrv51 stonith-ng: Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: Jun 06 15:14:56 [2328] lbsrv51 Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: Jun 06 15:14:56 [2322] lbsrv51 pacemakerd: Jun 06 15:14:56 [2326] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2326] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2326] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2326] lbsrv51 Jun 06 15:14:56 [59989] lbsrv51 Jun 06 15:14:56 [59989] lbsrv51 Jun 06 15:14:56 [59989] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [59989] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2329] lbsrv51 Jun 06 15:14:56 [2326] lbsrv51 Jun 06 15:14:56 [59988] Jun 06 15:14:56 [59988] Jun 06 15:14:56 [59988] Jun 06 15:14:56 [59988] Jun 06 15:14:56 [59988] Jun 06 15:14:56 [59988] Jun 06 15:14:56 [59988] cib: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
attrd: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
attrd: crit: attrd_cs_destroy: Lost connection to Corosync service!
attrd: notice: main: Exiting...
cib: error: cib_cs_destroy: Corosync connection lost! Exiting.
attrd: notice: main: Disconnecting client 0x7f1f86244a10, pid=2329...
cib: info: terminate_cib: cib_cs_destroy: Exiting fast...
cib: info: crm_client_destroy: Destroying 0 events
attrd: error: attrd_cib_connection_destroy: Connection to the CIB terminated...
cib: info: qb_ipcs_us_withdraw: withdrawing server sockets
cib: info: crm_client_destroy: Destroying 0 events
cib: info: crm_client_destroy: Destroying 0 events
error: crm_ipc_read: Connection to cib_rw failed
error: mainloop_gio_callback: Connection to cib_rw[0x7f52f2d82c10] closed (I/O condition=17)
cib: info: qb_ipcs_us_withdraw: withdrawing server sockets
cib: info: crm_client_destroy: Destroying 0 events
cib: info: qb_ipcs_us_withdraw: withdrawing server sockets
cib: info: crm_xml_cleanup: Cleaning up memory from libxml2
notice: cib_connection_destroy: Connection to the CIB terminated. Shutting down.
info: stonith_shutdown: Terminating with 1 clients
info: crm_client_destroy: Destroying 0 events
info: qb_ipcs_us_withdraw: withdrawing server sockets
info: main: Done
info: crm_xml_cleanup: Cleaning up memory from libxml2
crmd: error: crm_ipc_read: Connection to cib_shm failed
crmd: error: mainloop_gio_callback: Connection to cib_shm[0x7f97ed1f6980] closed (I/O condition=17)
crmd: error: crmd_cib_connection_destroy: Connection to the CIB terminated...
crmd: error: do_log: FSA: Input I_ERROR from crmd_cib_connection_destroy() received in state S_IDLE
crmd: notice: do_state_transition: State transition S_IDLE -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL origin=crmd_cib_connection_destroy ]
crmd: warning: do_recover: Fast-tracking shutdown in response to errors
crmd: warning: do_election_vote: Not voting in election, we're in state S_RECOVERY
crmd: info: do_dc_release: DC role released
info: pcmk_child_exit: Child process stonith-ng (2325) exited: OK (0)
info: crm_cs_flush: Sent 0 CPG messages (1 remaining, last=10): Library error (2)
notice: pcmk_process_exit: Respawning failed child process: stonith-ng
crmd: info: pe_ipc_destroy: Connection to the Policy Engine released
crmd: info: do_te_control: Transitioner is now inactive
crmd: error: do_log: FSA: Input I_TERMINATE from do_recover() received in state S_RECOVERY
crmd: info: do_state_transition: State transition S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE cause=C_FSA_INTERNAL origin=do_recover ]
crmd: info: do_shutdown: Disconnecting STONITH...
crmd: info: tengine_stonith_connection_destroy: Fencing daemon disconnected
info: start_child: Forked child 59988 for process stonith-ng
crmd: info: stop_recurring_actions: Cancelling op 27 for res_tftpd-hpa_1 (res_tftpd-hpa_1:27)
error: pcmk_child_exit: Child process attrd (2327) exited: Transport endpoint is not connected (107)
notice: pcmk_process_exit: Respawning failed child process: attrd
pengine: info: crm_client_destroy: Destroying 0 events
info: start_child: Using uid=111 and group=119 for process attrd
info: start_child: Forked child 59989 for process attrd
info: mcp_quorum_destroy: connection closed
error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2)
error: mcp_cpg_destroy: Connection destroyed
info: crm_xml_cleanup: Cleaning up memory from libxml2
lrmd: info: cancel_recurring_action: Cancelling operation res_tftpd-hpa_1_status_15000
crmd: info: stop_recurring_actions: Cancelling op 35 for res_IPaddr2_Test (res_IPaddr2_Test:35)
lrmd: info: cancel_recurring_action: Cancelling operation res_IPaddr2_Test_monitor_10000
crmd: info: stop_recurring_actions: Cancelling op 41 for res_pdnsd_pdnsd (res_pdnsd_pdnsd:41)
lrmd: info: cancel_recurring_action: Cancelling operation res_pdnsd_pdnsd_status_15000
crmd: info: stop_recurring_actions: Cancelling op 47 for res_isc-dhcp-server_1 (res_isc-dhcp-server_1:47)
lrmd: info: cancel_recurring_action: Cancelling operation res_isc-dhcp-server_1_status_15000
attrd: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
attrd: error: cluster_connect_cpg: Could not connect to the Cluster Process Group API: 2
attrd: error: main: HA Signon failed
crmd: notice: lrm_state_verify_stopped: Stopped 4 recurring operations at (null) (3942893656 ops remaining)
attrd: error: main: Aborting startup
crmd: notice: lrm_state_verify_stopped: Recurring action res_pdnsd_pdnsd:41 (res_pdnsd_pdnsd_monitor_15000) incomplete at shutdown
crmd: notice: lrm_state_verify_stopped: Recurring action res_isc-dhcp-server_1:47 (res_isc-dhcp-server_1_monitor_15000) incomplete at shutdown
crmd: notice: lrm_state_verify_stopped: Recurring action res_IPaddr2_Test:35 (res_IPaddr2_Test_monitor_10000) incomplete at shutdown
crmd: error: lrm_state_verify_stopped: 3 resources were active at shutdown.
crmd: info: do_lrm_control: Disconnecting from the LRM
crmd: info: lrmd_api_disconnect: Disconnecting from lrmd service
crmd: info: lrmd_ipc_connection_destroy: IPC connection destroyed
crmd: info: lrm_connection_destroy: LRM Connection disconnected
crmd: info: lrmd_api_disconnect: Disconnecting from lrmd service
crmd: notice: do_lrm_control: Disconnected from the LRM
crmd: info: crm_cluster_disconnect: Disconnecting from cluster infrastructure: corosync
crmd: notice: terminate_cs_connection: Disconnecting from Corosync
crmd: info: crm_cluster_disconnect: Disconnected from corosync
crmd: info: do_ha_control: Disconnected from the cluster
crmd: info: do_cib_control: Disconnecting CIB
crmd: info: qb_ipcs_us_withdraw: withdrawing server sockets
crmd: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd
crmd: info: do_exit: [crmd] stopped (0)
crmd: info: crmd_exit: Dropping I_PENDING: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_election_vote ]
crmd: info: crmd_exit: Dropping I_RELEASE_SUCCESS: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_dc_release ]
crmd: info: crmd_exit: Dropping I_TERMINATE: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ]
crmd: info: crmd_quorum_destroy: connection closed
crmd: info: crmd_cs_destroy: connection closed
crmd: info: crmd_init: 2329 stopped: OK (0)
crmd: error: crmd_fast_exit: Could not recover from internal error
crmd: info: crm_xml_cleanup: Cleaning up memory from libxml2
lrmd: info: crm_client_destroy: Destroying 0 events
lbsrv51 stonith-ng: info: crm_log_init: Changed active directory to /var/lib/heartbeat/cores/root
lbsrv51 stonith-ng: info: get_cluster_type: Verifying cluster type: 'corosync'
lbsrv51 stonith-ng: info: get_cluster_type: Assuming an active 'corosync' cluster
lbsrv51 stonith-ng: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
lbsrv51 stonith-ng: error: cluster_connect_cpg: Could not connect to the Cluster Process Group API: 2
lbsrv51 stonith-ng: crit: main: Cannot sign in to the cluster... terminating
lbsrv51 stonith-ng: info: crm_xml_cleanup: Cleaning up memory from libxml2

== dmesg: ==
[60379.304488] show_signal_msg: 18 callbacks suppressed
[60379.304493] crm_resource[19768]: segfault at 0 ip 00007f276681c0aa sp 00007fffe49ea2a8 error 4 in libc-2.19.so[7f27666db000+1bc000]
[60379.858371] cib[2234]: segfault at 0 ip 00007f59013760aa sp 00007fff0e21a0d8 error 4 in libc-2.19.so[7f5901235000+1bc000]

== syslog: ==
Jun 6 15:14:56 lbsrv51 cibmon[15100]: error: crm_ipc_read: Connection to cib_ro failed
Jun 6 15:14:56 lbsrv51 cibmon[15100]: error: mainloop_gio_callback: Connection to cib_ro[0x7f188c76f240] closed (I/O condition=17)
Jun 6 15:14:56 lbsrv51 cibmon[15100]: error: cib_connection_destroy: Connection to the CIB terminated... exiting
Jun 6 15:14:56 lbsrv51 attrd[59989]: notice: crm_add_logfile: Additional logging available in /var/log/corosync/corosync.log
Jun 6 15:14:56 lbsrv51 crm_simulate[59990]: notice: crm_log_args: Invoked: crm_simulate -s -S -VVVVV -L
Jun 6 15:14:56 lbsrv51 stonith-ng[59988]: notice: crm_add_logfile: Additional logging available in /var/log/corosync/corosync.log
Jun 6 15:14:56 lbsrv51 crm_simulate[60012]: notice: crm_log_args: Invoked: crm_simulate -s -S -VVVVV -L
Jun 6 15:14:56 lbsrv51 crm_simulate[60038]: notice: crm_log_args: Invoked: crm_simulate -s -S -VVVVV -L

See original description

Revision history for this message

born2chill (david-gabriel) wrote on 2014-06-06:

corosync.conf Edit (757 bytes, text/plain)

affects:

ubuntu → corosync (Ubuntu)

Revision history for this message

born2chill (david-gabriel) wrote on 2014-06-06:

Cluster Information Base XML Edit (5.1 KiB, application/xml)

Revision history for this message

born2chill (david-gabriel) wrote on 2014-06-06:

pacemaker crashdump Edit (384.7 KiB, text/plain)

Revision history for this message

born2chill (david-gabriel) wrote on 2014-06-06:

corosync crashdump Edit (222.4 KiB, text/plain)

Revision history for this message

born2chill (david-gabriel) wrote on 2014-06-06:

At the moment I'm running corosync in debug mode, so I should get more logs soon.

born2chill (david-gabriel) on 2014-06-07

description:	updated
summary:	- Segfault: corosync segfaults randomly on Ubuntu trusty 14.04 + Segfault: pacemaker segfaults randomly on Ubuntu trusty 14.04

born2chill (david-gabriel) on 2014-06-16

Changed in corosync (Ubuntu):
status:	New → Invalid
Changed in pacemaker (Ubuntu):
status:	New → Invalid

Revision history for this message

born2chill (david-gabriel) wrote on 2014-06-16:

I found out that not the cluster stack itself was causing the issues but the tool that I used to configure the cluster: LCMC. Although LCMC has been working flawlessly for me on older versions of corosync/pacemaker, it seems as it hasn't been updated to work with corosync 2.3.x and pacemaker 1.1x. So everyone watch out until the LCMC gets updated (at least 1.6.8 as of 2014-06-16 doesn't work reliably).

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntucorosync package

Segfault: pacemaker segfaults randomly on Ubuntu trusty 14.04

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
corosync package