Hi Krunal, this has already occurred 3 times today after an SST. on all times the donating node had zero work-load on it, the scenario is nodes are up, 1 has active workload on it, 2nd is siting 'idle' meaning only receiving replication events. we then add node 3 which is receiving a donation from node 2 which has no workload on it. The SST finishes successfully and node 3 starts preparing the backup then all of sudden the donating node crashes resulting, node 3 which has finish preparing the backup also doesn't start since it cannot communicate anymore with the donating node, end result is 2 dead nodes out of 3 resulting is node 1 thinking its in split brain and pushing itself to 'initializing' mode as its alone in the cluster. Here are the logs from the donating node, as you can see SST started at 08:17. 2016-08-10 08:17:41 44480 [Note] WSREP: New cluster view: global state: 83e07a6f-96f8-11e3-b8f4-679cb41d96f9:36537410774, view# 5: Primary, number of nodes: 3, my index: 1, protocol version 3 2016-08-10 08:17:41 44480 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2016-08-10 08:17:41 44480 [Note] WSREP: REPL Protocols: 7 (3, 2) 2016-08-10 08:17:41 44480 [Note] WSREP: Service thread queue flushed. 2016-08-10 08:17:41 44480 [Note] WSREP: Assign initial position for certification: 36537410774, protocol version: 3 2016-08-10 08:17:41 44480 [Note] WSREP: Service thread queue flushed. 2016-08-10 08:17:41 44480 [Note] WSREP: Member 0.0 (dbm03) requested state transfer from 'dbm01'. Selected 1.0 (dbm01)(SYNCED) as donor. 2016-08-10 08:17:41 44480 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 36537410786) 2016-08-10 08:17:41 44480 [Note] WSREP: IST request: 83e07a6f-96f8-11e3-b8f4-679cb41d96f9:36537262283-36537410774|ssl://192.168.3.72:4568 2016-08-10 08:17:41 44480 [Note] WSREP: IST first seqno 36537262284 not found from cache, falling back to SST 2016-08-10 08:17:41 44480 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2016-08-10 08:17:41 44480 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.3.72:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/data/mysql/databases/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --binlog '/data/mysql-bin/mysql-bin' --gtid '83e07a6f-96f8-11e3-b8f4-679cb41d96f9:36537410786'' 2016-08-10 08:17:41 44480 [Note] WSREP: sst_donor_thread signaled with 0 WSREP_SST: [INFO] Streaming with xbstream (20160810 08:17:41.878) WSREP_SST: [INFO] Using socat as streamer (20160810 08:17:41.881) WSREP_SST: [INFO] Using /tmp/tmp.5BxMILbRlr as innobackupex temporary directory (20160810 08:17:41.898) WSREP_SST: [INFO] Streaming GTID file before SST (20160810 08:17:41.906) WSREP_SST: [INFO] Evaluating xbstream -c ${INFO_FILE} | socat -u stdio TCP:192.168.3.72:4444; RC=( ${PIPESTATUS[@]} ) (20160810 08:17:41.920) WSREP_SST: [INFO] Sleeping before data transfer for SST (20160810 08:17:41.927) 2016-08-10 08:17:43 44480 [Note] WSREP: (5e4347dd, 'ssl://0.0.0.0:4567') turning message relay requesting off WSREP_SST: [INFO] Streaming the backup to joiner at 192.168.3.72 4444 (20160810 08:17:51.934) WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/my.cnf --defaults-group=mysqld --no-version-check --no-backup-locks $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | socat -u stdio TCP:192.168.3.72:4444; RC=( ${PIPESTATUS[@]} ) (20160810 08:17:51.938) 2016-08-10 09:13:03 7f0437b60700 InnoDB: Buffer pool(s) load completed at 160810 9:13:03 2016-08-10 09:19:07 44480 [Note] WSREP: Provider paused at 83e07a6f-96f8-11e3-b8f4-679cb41d96f9:36540726932 (3444511) 2016-08-10 09:24:50 44480 [Note] WSREP: resuming provider at 3444511 2016-08-10 09:24:50 44480 [Note] WSREP: Provider resumed. 2016-08-10 09:24:50 44480 [Note] WSREP: 1.0 (dbm01): State transfer to 0.0 (dbm03) complete. 2016-08-10 09:24:50 44480 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 36540912520) 2016-08-10 09:24:50 44480 [Note] WSREP: 1.0 (dbm01): State transfer to 0.0 (dbm03) complete. WSREP_SST: [INFO] Total time on donor: 0 seconds (20160810 09:24:50.941) WSREP_SST: [INFO] Cleaning up temporary directories (20160810 09:24:50.962) 2016-08-10 09:25:20 44480 [ERROR] WSREP: FSM: no such a transition JOINED -> JOINED 13:25:20 UTC - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Please help us make Percona XtraDB Cluster better by reporting any bugs at https://bugs.launchpad.net/percona-xtradb-cluster key_buffer_size=8388608 read_buffer_size=131072 max_used_connections=6 max_threads=1002 thread_count=13 connection_count=2 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 407388 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x7f0400000990 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 7f043635c9d8 thread_stack 0x40000 /usr/sbin/mysqld(my_print_stacktrace+0x35)[0x906e45] /usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x66ac44] /lib64/libpthread.so.0(+0xf100)[0x7f21dee16100] /lib64/libc.so.6(gsignal+0x37)[0x7f21dcfa85f7] /lib64/libc.so.6(abort+0x148)[0x7f21dcfa9ce8] /usr/lib64/libgalera_smm.so(_ZN6galera3FSMINS_10Replicator5StateENS_13ReplicatorSMM10TransitionENS_10EmptyGuardENS3_11StateActionEE8shift_toES2_+0x212)[0x7f216b46c302] /usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM12process_joinEll+0x178)[0x7f216b467478] /usr/lib64/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_actionRb+0x21d)[0x7f216b44287d] /usr/lib64/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPvRb+0x63)[0x7f216b442ed3] /usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x73)[0x7f216b4678a3] /usr/lib64/libgalera_smm.so(galera_recv+0x24)[0x7f216b477e14] /usr/sbin/mysqld[0x5a20f1] /usr/sbin/mysqld(start_wsrep_THD+0x36e)[0x58b73e] /lib64/libpthread.so.0(+0x7dc5)[0x7f21dee0edc5] /lib64/libc.so.6(clone+0x6d)[0x7f21dd069c9d] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0): is an invalid pointer Connection ID (thread ID): 11 Status: NOT_KILLED Here are the logs from innobackup.backup.log: MySQL binlog position: filename 'mysql-bin.024586', position '271', GTID of the last change '169ec9bb-9e17-11e3-b2bc-0afb0a372a8d:1-390, 25151af3-a8b0-11e3-ac95-02f8906a41be:1-6, 7c1f8590-6907-ee1c-470b-98634be26906:1-36321872376' 160810 09:24:50 [00] Streaming backup-my.cnf 160810 09:24:50 [00] ...done 160810 09:24:50 [00] Streaming xtrabackup_info 160810 09:24:50 [00] ...done xtrabackup: Transaction log of lsn (43424291037740) to (43428346893050) was copied. 160810 09:24:50 completed OK! and this is the receiving node: WSREP_SST: [INFO] Waiting for SST streaming to complete! (20160810 08:17:53.523) 2016-08-10 09:24:50 9742 [Note] WSREP: 1.0 (dbm01): State transfer to 0.0 (dbm03) complete. 2016-08-10 09:24:50 9742 [Note] WSREP: 1.0 (dbm01): State transfer to 0.0 (dbm03) complete. WSREP_SST: [INFO] Preparing the backup at /data/mysql/databases//.sst (20160810 09:24:50.955) WSREP_SST: [INFO] Evaluating innobackupex --no-version-check --apply-log $rebuildcmd ${DATA} &>${DATA}/innobackup.prepare.log (20160810 09:24:50.957) 2016-08-10 09:25:23 9742 [Note] WSREP: (35e1aeed, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers: ssl://192.168.1.24:4567 2016-08-10 09:25:24 9742 [Note] WSREP: (35e1aeed, 'ssl://0.0.0.0:4567') reconnecting to 5e4347dd (ssl://192.168.1.24:4567), attempt 0 2016-08-10 09:25:25 9742 [Note] WSREP: evs::proto(35e1aeed, GATHER, view_id(REG,35e1aeed,104)) suspecting node: 5e4347dd 2016-08-10 09:25:25 9742 [Note] WSREP: evs::proto(35e1aeed, GATHER, view_id(REG,35e1aeed,104)) suspected node without join message, declaring inactive 2016-08-10 09:25:26 9742 [Note] WSREP: declaring d71f93b5 at ssl://192.168.1.58:4567 stable 2016-08-10 09:25:26 9742 [Note] WSREP: Node 35e1aeed state prim 2016-08-10 09:25:26 9742 [Note] WSREP: view(view_id(PRIM,35e1aeed,105) memb { 35e1aeed,0 d71f93b5,0 } joined { } left { } partitioned { 5e4347dd,0 }) 2016-08-10 09:25:26 9742 [Note] WSREP: save pc into disk 2016-08-10 09:25:26 9742 [Note] WSREP: forgetting 5e4347dd (ssl://192.168.1.24:4567) 2016-08-10 09:25:26 9742 [Note] WSREP: deleting entry ssl://192.168.1.24:4567 2016-08-10 09:25:26 9742 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2 2016-08-10 09:25:26 9742 [Note] WSREP: (35e1aeed, 'ssl://0.0.0.0:4567') turning message relay requesting off 2016-08-10 09:25:26 9742 [ERROR] WSREP: handshake with remote endpoint ssl://192.168.1.24:4567 failed: 1: 'Operation aborted.' ( ) 2016-08-10 09:25:26 9742 [Note] WSREP: STATE_EXCHANGE: sent state UUID: e645e670-5efd-11e6-8266-f393e1a8f01d 2016-08-10 09:25:26 9742 [Note] WSREP: STATE EXCHANGE: sent state msg: e645e670-5efd-11e6-8266-f393e1a8f01d 2016-08-10 09:25:26 9742 [Note] WSREP: STATE EXCHANGE: got state msg: e645e670-5efd-11e6-8266-f393e1a8f01d from 0 (dbm03) 2016-08-10 09:25:26 9742 [Note] WSREP: STATE EXCHANGE: got state msg: e645e670-5efd-11e6-8266-f393e1a8f01d from 1 (dbm02) 2016-08-10 09:25:26 9742 [Note] WSREP: Quorum results: version = 4, component = PRIMARY, conf_id = 5, members = 1/2 (joined/total), act_id = 36540931761, last_appl. = 36540931593, protocols = 0/7/3 (gcs/repl/appl), group UUID = 83e07a6f-96f8-11e3-b8f4-679cb41d96f9 2016-08-10 09:25:26 9742 [Warning] WSREP: Donor 5e4347dd-5edb-11e6-8d36-0f054601ebe3 is no longer in the group. State transfer cannot be completed, need to abort. Aborting... 2016-08-10 09:25:26 9742 [Note] WSREP: /usr/sbin/mysqld: Terminated. 160810 09:25:26 mysqld_safe mysqld from pid file /var/lib/mysql/mysqld.pid ended WSREP_SST: [INFO] Moving the backup to /data/mysql/databases/ (20160810 09:31:59.142) WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/my.cnf --defaults-group=mysqld --no-version-check --move-back --force-non-empty-directories ${DATA} &>${DATA}/innobackup.move.log (20160810 09:31:59.143) WSREP_SST: [INFO] Move successful, removing /data/mysql/databases//.sst (20160810 09:32:06.818) WSREP_SST: [INFO] Galera co-ords from recovery: 83e07a6f-96f8-11e3-b8f4-679cb41d96f9:36540726932 (20160810 09:32:07.237) WSREP_SST: [ERROR] Cleanup after exit with status:141 (20160810 09:32:07.239) Thanks for the help!