a) Why is wsrep_desync used on donor since it goes to DONOR/Desynced anyways? If it is to reproduce the issue then fine. b) I tried to reproduce this. Yes, if desync=ON, the joiner hangs, but after it is set to OFF, it goes forward fine. Also, the second "wsrep_desync=OFF;" should be idempotent now where it gives: ===================== MySQL [test]> set global wsrep_desync=OFF; ERROR 1231 (42000): Variable 'wsrep_desync' can't be set to the value of 'OFF' ======================= or should it be " 5. donor> SET GLOBAL wsrep_desync=ON;" ? Here are my logs: Donor: 140308 20:47:35 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 92107) 140308 20:47:50 [Note] WSREP: declaring cfb2ec81-a6d4-11e3-98d8-1ef20a3c488f stable 140308 20:47:50 [Note] WSREP: Node 049f90ce-a6d1-11e3-bb55-3bbb14bffdcd state prim 140308 20:47:50 [Note] WSREP: view(view_id(PRIM,049f90ce-a6d1-11e3-bb55-3bbb14bffdcd,6) memb { 049f90ce-a6d1-11e3-bb55-3bbb14bffdcd, cfb2ec81-a6d4-11e3-98d8-1ef20a3c488f, } joined { } left { } partitioned { }) 140308 20:47:50 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2 140308 20:47:50 [Note] WSREP: STATE_EXCHANGE: sent state UUID: cffff748-a6d4-11e3-96a3-bbee5342f5db 140308 20:47:50 [Note] WSREP: STATE EXCHANGE: sent state msg: cffff748-a6d4-11e3-96a3-bbee5342f5db 140308 20:47:50 [Note] WSREP: STATE EXCHANGE: got state msg: cffff748-a6d4-11e3-96a3-bbee5342f5db from 0 (Arch1) 140308 20:47:50 [Note] WSREP: STATE EXCHANGE: got state msg: cffff748-a6d4-11e3-96a3-bbee5342f5db from 1 (Arch2) 140308 20:47:50 [Note] WSREP: Quorum results: version = 2, component = PRIMARY, conf_id = 5, members = 1/2 (joined/total), act_id = 92107, last_appl. = 0, protocols = 0/4/2 (gcs/repl/appl), group UUID = d4bea759-94bb-11e3-b9c3-be3eb9eca9c6 140308 20:47:50 [Note] WSREP: Flow-control interval: [23, 23] 140308 20:47:50 [Note] WSREP: New cluster view: global state: d4bea759-94bb-11e3-b9c3-be3eb9eca9c6:92107, view# 6: Primary, number of nodes: 2, my index: 0, protocol version 2 140308 20:47:50 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 140308 20:47:50 [Note] WSREP: Assign initial position for certification: 92107, protocol version: 2 140308 20:48:25 [Note] WSREP: Node 0 (Arch1) resyncs itself to group 140308 20:48:25 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 92107) 140308 20:48:25 [Note] WSREP: Member 0 (Arch1) synced with group. 140308 20:48:25 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 92107) 140308 20:48:25 [Note] WSREP: Synchronized with group, ready for connections 140308 20:48:26 [Note] WSREP: Node 1 (Arch2) requested state transfer from '*any*'. Selected 0 (Arch1)(SYNCED) as donor. 140308 20:48:26 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 92107) 140308 20:48:26 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 140308 20:48:26 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '127.0.0.1:15001/xtrabackup_sst' --auth '(null)' --socket '/pxc/datadir/pxc.sock' --datadir '/pxc/datadir/' --defaults-file '/pxc/etc/my.cnf.local' --gtid 'd4bea759-94bb-11e3-b9c3-be3eb9eca9c6:92107'' 140308 20:48:26 [Note] WSREP: sst_donor_thread signaled with 0 WSREP_SST: [INFO] Streaming with xbstream (20140308 20:48:26.981) WSREP_SST: [INFO] Using socat as streamer (20140308 20:48:26.985) WSREP_SST: [INFO] Streaming GTID file before SST (20140308 20:48:27.271) WSREP_SST: [INFO] Evaluating xbstream -c ${INFO_FILE} | socat -u stdio TCP:127.0.0.1:15001; RC=( ${PIPESTATUS[@]} ) (20140308 20:48:27.380) WSREP_SST: [INFO] Sleeping before data transfer for SST (20140308 20:48:27.389) 140308 20:48:29 [ERROR] WSREP: Node desync failed.: 11 (Resource temporarily unavailable) at galera/src/replicator_smm.cpp:desync():1544 140308 20:48:29 [Warning] WSREP: SET desync failed 3 for set global wsrep_desync=ON 140308 20:48:37 [Note] WSREP: 0 (Arch1): State transfer to 1 (Arch2) complete. 140308 20:48:37 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 92107) 140308 20:48:37 [Note] WSREP: Member 0 (Arch1) synced with group. 140308 20:48:37 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 92107) 140308 20:48:37 [Note] WSREP: Synchronized with group, ready for connections 140308 20:48:37 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. WSREP_SST: [INFO] Streaming the backup to joiner at 127.0.0.1 15001 (20140308 20:48:37.395) WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/pxc/etc/my.cnf.local --kill-long-query-type=select --parallel=8 --rebuild-threads=4 --no-version-check $INNOEXTRA --galera-info --stream=$sfmt ${TMPDIR} 2>${DATA}/innobackup.backup.log | socat -u stdio TCP:127.0.0.1:15001; RC=( ${PIPESTATUS[@]} ) (20140308 20:48:37.399) 140308 20:48:43 [Note] WSREP: Provider paused at d4bea759-94bb-11e3-b9c3-be3eb9eca9c6:92107 (28) 140308 20:48:46 [Note] WSREP: resuming provider at 28 140308 20:48:46 [Note] WSREP: Provider resumed. 140308 20:48:46 [ERROR] WSREP: sst sent called when not SST donor, state SYNCED WSREP_SST: [INFO] Total time on donor: 0 seconds (20140308 20:48:46.595) 140308 20:49:03 [Note] WSREP: 1 (Arch2): State transfer from 0 (Arch1) complete. 140308 20:49:03 [Note] WSREP: Member 1 (Arch2) synced with group. Joiner: ============================== 140308 20:47:53 [Note] WSREP: Requesting state transfer failed: -11(Resource temporarily unavailable). Will keep retrying every 1 second(s) WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:15001,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20140308 20:47:53.512) 140308 20:48:25 [Warning] WSREP: Could not find peer: 140308 20:48:25 [Note] WSREP: 0 (Arch1): State transfer to -1 (left the group) complete. 140308 20:48:25 [Note] WSREP: Member 0 (Arch1) synced with group. 140308 20:48:26 [Note] WSREP: Node 1 (Arch2) requested state transfer from '*any*'. Selected 0 (Arch1)(SYNCED) as donor. 140308 20:48:26 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 92107) 140308 20:48:26 [Note] WSREP: Requesting state transfer: success after 34 tries, donor: 0 WSREP_SST: [INFO] Proceeding with SST (20140308 20:48:27.411) WSREP_SST: [INFO] Cleaning the existing datadir (20140308 20:48:27.413) removed ‘/pxc/datadir2/backup-my.cnf’ removed ‘/pxc/datadir2/test/db.opt’ .................. removed ‘/pxc/datadir2/xtrabackup_checkpoints’ removed ‘/pxc/datadir2/ibdata1’ removed ‘/pxc/datadir2/xtrabackup_binlog_pos_innodb’ removed ‘/pxc/datadir2/ib_logfile0’ removed ‘/pxc/datadir2/ib_logfile1’ removed ‘/pxc/datadir2/mysql_bin.index’ removed ‘/pxc/datadir2/mysql_bin.000001’ WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:15001,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20140308 20:48:27.540) 140308 20:48:37 [Note] WSREP: 0 (Arch1): State transfer to 1 (Arch2) complete. 140308 20:48:37 [Note] WSREP: Member 0 (Arch1) synced with group. WSREP_SST: [INFO] Compressed qpress files found (20140308 20:48:46.613) WSREP_SST: [INFO] Decompression with 8 threads (20140308 20:48:46.618) WSREP_SST: [INFO] Evaluating find /pxc/datadir2/ -type f -name '*.qp' -printf '%p\n%h\n' | xargs -n 2 qpress -T8d (20140308 20:48:46.621) WSREP_SST: [INFO] Removing qpress files after decompression (20140308 20:48:50.612) WSREP_SST: [INFO] Preparing the backup at /pxc/datadir2/ (20140308 20:48:50.640) WSREP_SST: [INFO] Evaluating innobackupex --no-version-check --kill-long-query-type=select --throttle=30 --use-memory=2G --tmpdir=/dev/shm --apply-log $rebuildcmd ${DATA} &>${DATA}/innobackup.prepare.log (20140308 20:48:50.645) WSREP_SST: [INFO] Total time on joiner: 0 seconds (20140308 20:49:02.051) WSREP_SST: [INFO] Removing the sst_in_progress file (20140308 20:49:02.053) 140308 20:49:02 [Note] WSREP: SST complete, seqno: 92107 140308 20:49:02 InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!! 140308 20:49:02 InnoDB: The InnoDB memory heap is disabled 140308 20:49:02 InnoDB: Mutexes and rw_locks use GCC atomic builtins 140308 20:49:02 InnoDB: Compressed tables use zlib 1.2.8 140308 20:49:02 InnoDB: Using Linux native AIO 140308 20:49:02 InnoDB: Initializing buffer pool, size = 500.0M 140308 20:49:02 InnoDB: Completed initialization of buffer pool 140308 20:49:02 InnoDB: highest supported file format is Barracuda. 140308 20:49:02 InnoDB: Waiting for the background threads to start 140308 20:49:03 Percona XtraDB (http://www.percona.com) 5.5.36-34.0-29.3 started; log sequence number 6196028940 140308 20:49:03 [Note] WSREP: Initial TC log open: binlog 140308 20:49:03 [Note] Event Scheduler: Loaded 0 events 140308 20:49:03 [Note] WSREP: Signalling provider to continue. 140308 20:49:03 [Note] WSREP: SST received: d4bea759-94bb-11e3-b9c3-be3eb9eca9c6:92107 140308 20:49:03 [Note] WSREP: 1 (Arch2): State transfer from 0 (Arch1) complete. 140308 20:49:03 [Note] WSREP: Shifting JOINER -> JOINED (TO: 92107) 140308 20:49:03 [Note] WSREP: Member 1 (Arch2) synced with group. 140308 20:49:03 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 92107) 140308 20:49:03 [Note] WSREP: Synchronized with group, ready for connections 140308 20:49:03 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 140308 20:49:03 [Note] WSREP: Nobody is waiting for SST. 140308 20:49:03 [Note] /pxcd/bin/mysqld: ready for connections. Version: '5.5.36-34.0-debug-log' socket: '/pxc/datadir2/pxc.sock' port: 15000 MySQL Community Server (GPL), wsrep_25.10.r666