Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

Crash after SST donation

Bug #1611728 reported by Danny Gueta on 2016-08-10

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Fix Released	Undecided	Unassigned	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC 5.6.32-25.17

Bug Description

Hi,

I've encountered a strange issue today which I believe is a bug.

We are running version 5.6.30-76.3-56-log Release rel76.3, Revision aa929cb, WSREP version 25.16.

We've had a node crash today right after it finished giving an SST donation.

I.e node 03 was donating to node 01, it finished successfully and then crashed with a strange error, here the logs.

2016-08-10 05:18:16 15533 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.1.24:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/data/mysql/databases/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --binlog '/data/mysql-bin/mysql-bin' --gtid '83e07a6f-96f8-11e3-b8f4-679cb41d96f9:36534791772''
2016-08-10 05:18:16 15533 [Note] WSREP: sst_donor_thread signaled with 0
WSREP_SST: [INFO] Streaming with xbstream (20160810 05:18:18.470)
WSREP_SST: [INFO] Using socat as streamer (20160810 05:18:18.474)
WSREP_SST: [INFO] Using /tmp/tmp.qJd6p81xmJ as innobackupex temporary directory (20160810 05:18:18.568)
WSREP_SST: [INFO] Streaming GTID file before SST (20160810 05:18:18.578)
WSREP_SST: [INFO] Evaluating xbstream -c ${INFO_FILE} | socat -u stdio TCP:192.168.1.24:4444; RC=( ${PIPESTATUS[@]} ) (20160810 05:18:18.601)
2016-08-10 05:18:18 15533 [Note] WSREP: (35e1aeed, 'ssl://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [INFO] Sleeping before data transfer for SST (20160810 05:18:18.682)
WSREP_SST: [INFO] Streaming the backup to joiner at 192.168.1.24 4444 (20160810 05:18:28.690)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/my.cnf --defaults-group=mysqld --no-version-check --no-backup-locks $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | socat -u stdio TCP:192.168.1.24:4444; RC=( ${PIPESTATUS[@]} ) (20160810 05:18:28.695)
2016-08-10 06:20:25 15533 [Note] WSREP: Provider paused at 83e07a6f-96f8-11e3-b8f4-679cb41d96f9:36537107651 (1081311812)
2016-08-10 06:25:33 15533 [Note] WSREP: resuming provider at 1081311812
2016-08-10 06:25:33 15533 [Note] WSREP: Provider resumed.
2016-08-10 06:25:33 15533 [Note] WSREP: 0.0 (dbm03): State transfer to 1.0 (dbm01) complete.
2016-08-10 06:25:33 15533 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 36537262238)
WSREP_SST: [INFO] Total time on donor: 0 seconds (20160810 06:25:35.361)
2016-08-10 06:25:35 15533 [Note] WSREP: 0.0 (dbm03): State transfer to 1.0 (dbm01) complete.
WSREP_SST: [INFO] Cleaning up temporary directories (20160810 06:25:35.411)
2016-08-10 06:25:59 15533 [ERROR] WSREP: FSM: no such a transition JOINED -> JOINED
10:25:59 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=21
max_threads=1002
thread_count=11
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 407388 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fe070000990
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7ffb3c2f59d8 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x906e45]
/usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x66ac44]
/lib64/libpthread.so.0(+0xf100)[0x7ffe3b559100]
/lib64/libc.so.6(gsignal+0x37)[0x7ffe396e75f7]
/lib64/libc.so.6(abort+0x148)[0x7ffe396e8ce8]
/usr/lib64/libgalera_smm.so(_ZN6galera3FSMINS_10Replicator5StateENS_13ReplicatorSMM10TransitionENS_10EmptyGuardENS3_11StateActionEE8shift_toES2_+0x212)[0x7ffdc7bab302]
/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM12process_joinEll+0x178)[0x7ffdc7ba6478]
/usr/lib64/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_actionRb+0x21d)[0x7ffdc7b8187d]
/usr/lib64/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPvRb+0x63)[0x7ffdc7b81ed3]
/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x73)[0x7ffdc7ba68a3]
/usr/lib64/libgalera_smm.so(galera_recv+0x24)[0x7ffdc7bb6e14]
/usr/sbin/mysqld[0x5a20f1]
/usr/sbin/mysqld(start_wsrep_THD+0x36e)[0x58b73e]
/lib64/libpthread.so.0(+0x7dc5)[0x7ffe3b551dc5]
/lib64/libc.so.6(clone+0x6d)[0x7ffe397a8c9d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.

Revision history for this message

Danny Gueta (danny-gueta) wrote on 2016-08-10:

This has just occurred again, it seems as if the donating node is stuck in 'Joined' state after the donation has already been completed.

I.e : wsrep_local_state_comment | Joined

Revision history for this message

Danny Gueta (danny-gueta) wrote on 2016-08-10:

Sorry, forgot to add log snippet:

2016-08-10 07:52:25 15775 [Note] WSREP: 0.0 (dbm02): State transfer to 1.0 (dbm01) complete.
2016-08-10 07:52:25 15775 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 36537395115)
2016-08-10 07:52:25 15775 [Note] WSREP: SYNC message ignored as node 0.0 (dbm02) was re-transitioned to DONOR mode before it synced.
2016-08-10 07:52:26 15775 [ERROR] WSREP: sst sent called when not SST donor, state JOINED
WSREP_SST: [INFO] Total time on donor: 0 seconds (20160810 07:52:26.375)
WSREP_SST: [INFO] Cleaning up temporary directories (20160810 07:52:26.381)
2016-08-10 07:54:24 15775 [Note] WSREP: 1.0 (dbm01): State transfer from 0.0 (dbm02) complete.
2016-08-10 07:54:25 15775 [Note] WSREP: Member 1.0 (dbm01) synced with group.
2016-08-10 07:55:51 7f53c1ffb700 InnoDB: Buffer pool(s) load completed at 160810 7:55:51

Revision history for this message

Krunal Bauskar (krunal-bauskar) wrote on 2016-08-10:

Hi Danny,

Wondering if u have active workload given that we see node moving to pause state and then resuming.

2016-08-10 06:20:25 15533 [Note] WSREP: Provider paused at 83e07a6f-96f8-11e3-b8f4-679cb41d96f9:36537107651 (1081311812)
2016-08-10 06:25:33 15533 [Note] WSREP: resuming provider at 1081311812
2016-08-10 06:25:33 15533 [Note] WSREP: Provider resumed.

Can you confirm this out ?

Regards,
Krunal

Revision history for this message

Danny Gueta (danny-gueta) wrote on 2016-08-10:

Download full text (10.7 KiB)

Hi Krunal,

this has already occurred 3 times today after an SST.

on all times the donating node had zero work-load on it, the scenario is nodes are up, 1 has active workload on it, 2nd is siting 'idle' meaning only receiving replication events.

we then add node 3 which is receiving a donation from node 2 which has no workload on it.

The SST finishes successfully and node 3 starts preparing the backup then all of sudden the donating node crashes resulting, node 3 which has finish preparing the backup also doesn't start since it cannot communicate anymore with the donating node, end result is 2 dead nodes out of 3 resulting is node 1 thinking its in split brain and pushing itself to 'initializing' mode as its alone in the cluster.

Here are the logs from the donating node, as you can see SST started at 08:17.

Hi Krunal,

this has already occurred 3 times today after an SST.

on all times the donating node had zero work-load on it, the scenario is nodes are up, 1 has active workload on it, 2nd is siting 'idle' meaning only receiving replication events.

we then add node 3 which is receiving a donation from node 2 which has no workload on it.

Here are the logs from the donating node, as you can see SST started at 08:17.

2016-08-10 08:17:41 44480 [Note] WSREP: New cluster view: global state: 83e07a6f-96f8-11e3-b8f4-679cb41d96f9:36537410774, view# 5: Primary, number of nodes: 3, my index: 1, protocol version 3
2016-08-10 08:17:41 44480 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-08-10 08:17:41 44480 [Note] WSREP: REPL Protocols: 7 (3, 2)
2016-08-10 08:17:41 44480 [Note] WSREP: Service thread queue flushed.
2016-08-10 08:17:41 44480 [Note] WSREP: Assign initial position for certification: 36537410774, protocol version: 3
2016-08-10 08:17:41 44480 [Note] WSREP: Service thread queue flushed.
2016-08-10 08:17:41 44480 [Note] WSREP: Member 0.0 (dbm03) requested state transfer from 'dbm01'. Selected 1.0 (dbm01)(SYNCED) as donor.
2016-08-10 08:17:41 44480 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 36537410786)
2016-08-10 08:17:41 44480 [Note] WSREP: IST request: 83e07a6f-96f8-11e3-b8f4-679cb41d96f9:36537262283-36537410774|ssl://192.168.3.72:4568
2016-08-10 08:17:41 44480 [Note] WSREP: IST first seqno 36537262284 not found from cache, falling back to SST
2016-08-10 08:17:41 44480 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-08-10 08:17:41 44480 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.3.72:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/data/mysql/databases/' --defaults-file '/etc/my.cnf' --defaults-group-suffix ''  --binlog '/data/mysql-bin/mysql-bin' --gtid '83e07a6f-96f8-11e3-b8f4-679cb41d96f9:36537410786''
2016-08-10 08:17:41 44480 [Note] WSREP: sst_donor_thread signaled with 0
WSREP_SST: [INFO] Streaming with xbstream (20160810 08:17:41.878)
WSREP_SST: [INFO] Using socat as streamer (20160810 08:17:41.881)
WSREP_SST: [INFO] Using /tmp/tmp.5BxMILbRlr as innobackupex temporary directory (20160810 08:17:41.898)
WSREP_SST: [INFO] Streaming GTID file before SST (20160810 08:17:41.906)
WSREP_SST: [INFO] Evaluating xbstream -c ${INFO_FILE} | socat -u stdio TCP:192.168.3.72:4444; RC=( ${PIPESTATUS[@]} ) (20160810 08:17:41.920)
WSREP_SST: [INFO] Sleeping before data transfer for SST (20160810 08:17:41.927)
2016-08-10 08:17:43 44480 [Note] WSREP: (5e4347dd, 'ssl://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [INFO] Streaming the backup to joiner at 192.168.3.72 4444 (20160810 08:17:51.934)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/my.cnf  --defaults-group=mysqld --no-version-check --no-backup-locks $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | socat -u stdio TCP:192.168.3.72:4444; RC=( ${PIPESTATUS[@]} ) (20160810 08:17:51.938)
2016-08-10 09:13:03 7f0437b60700 InnoDB: Buffer pool(s) load completed at 160810  9:13:03
2016-08-10 09:19:07 44480 [Note] WSREP: Provider paused at 83e07a6f-96f8-11e3-b8f4-679cb41d96f9:36540726932 (3444511)
2016-08-10 09:24:50 44480 [Note] WSREP: resuming provider at 3444511
2016-08-10 09:24:50 44480 [Note] WSREP: Provider resumed.
2016-08-10 09:24:50 44480 [Note] WSREP: 1.0 (dbm01): State transfer to 0.0 (dbm03) complete.
2016-08-10 09:24:50 44480 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 36540912520)
2016-08-10 09:24:50 44480 [Note] WSREP: 1.0 (dbm01): State transfer to 0.0 (dbm03) complete.
WSREP_SST: [INFO] Total time on donor: 0 seconds (20160810 09:24:50.941)
WSREP_SST: [INFO] Cleaning up temporary directories (20160810 09:24:50.962)
2016-08-10 09:25:20 44480 [ERROR] WSREP: FSM: no such a transition JOINED -> JOINED
13:25:20 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=6
max_threads=1002
thread_count=13
connection_count=2
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 407388 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f0400000990
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f043635c9d8 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x906e45]
/usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x66ac44]
/lib64/libpthread.so.0(+0xf100)[0x7f21dee16100]
/lib64/libc.so.6(gsignal+0x37)[0x7f21dcfa85f7]
/lib64/libc.so.6(abort+0x148)[0x7f21dcfa9ce8]
/usr/lib64/libgalera_smm.so(_ZN6galera3FSMINS_10Replicator5StateENS_13ReplicatorSMM10TransitionENS_10EmptyGuardENS3_11StateActionEE8shift_toES2_+0x212)[0x7f216b46c302]
/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM12process_joinEll+0x178)[0x7f216b467478]
/usr/lib64/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_actionRb+0x21d)[0x7f216b44287d]
/usr/lib64/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPvRb+0x63)[0x7f216b442ed3]
/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x73)[0x7f216b4678a3]
/usr/lib64/libgalera_smm.so(galera_recv+0x24)[0x7f216b477e14]
/usr/sbin/mysqld[0x5a20f1]
/usr/sbin/mysqld(start_wsrep_THD+0x36e)[0x58b73e]
/lib64/libpthread.so.0(+0x7dc5)[0x7f21dee0edc5]
/lib64/libc.so.6(clone+0x6d)[0x7f21dd069c9d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 11
Status: NOT_KILLED

Here are the logs from innobackup.backup.log:

MySQL binlog position: filename 'mysql-bin.024586', position '271', GTID of the last change '169ec9bb-9e17-11e3-b2bc-0afb0a372a8d:1-390,
25151af3-a8b0-11e3-ac95-02f8906a41be:1-6,
7c1f8590-6907-ee1c-470b-98634be26906:1-36321872376'
160810 09:24:50 [00] Streaming backup-my.cnf
160810 09:24:50 [00]        ...done
160810 09:24:50 [00] Streaming xtrabackup_info
160810 09:24:50 [00]        ...done
xtrabackup: Transaction log of lsn (43424291037740) to (43428346893050) was copied.
160810 09:24:50 completed OK!

and this is the receiving node:

WSREP_SST: [INFO] Waiting for SST streaming to complete! (20160810 08:17:53.523)
2016-08-10 09:24:50 9742 [Note] WSREP: 1.0 (dbm01): State transfer to 0.0 (dbm03) complete.
2016-08-10 09:24:50 9742 [Note] WSREP: 1.0 (dbm01): State transfer to 0.0 (dbm03) complete.
WSREP_SST: [INFO] Preparing the backup at /data/mysql/databases//.sst (20160810 09:24:50.955)
WSREP_SST: [INFO] Evaluating innobackupex --no-version-check  --apply-log $rebuildcmd ${DATA} &>${DATA}/innobackup.prepare.log (20160810 09:24:50.957)
2016-08-10 09:25:23 9742 [Note] WSREP: (35e1aeed, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers: ssl://192.168.1.24:4567
2016-08-10 09:25:24 9742 [Note] WSREP: (35e1aeed, 'ssl://0.0.0.0:4567') reconnecting to 5e4347dd (ssl://192.168.1.24:4567), attempt 0
2016-08-10 09:25:25 9742 [Note] WSREP: evs::proto(35e1aeed, GATHER, view_id(REG,35e1aeed,104)) suspecting node: 5e4347dd
2016-08-10 09:25:25 9742 [Note] WSREP: evs::proto(35e1aeed, GATHER, view_id(REG,35e1aeed,104)) suspected node without join message, declaring inactive
2016-08-10 09:25:26 9742 [Note] WSREP: declaring d71f93b5 at ssl://192.168.1.58:4567 stable
2016-08-10 09:25:26 9742 [Note] WSREP: Node 35e1aeed state prim
2016-08-10 09:25:26 9742 [Note] WSREP: view(view_id(PRIM,35e1aeed,105) memb {
        35e1aeed,0
        d71f93b5,0
} joined {
} left {
} partitioned {
        5e4347dd,0
})
2016-08-10 09:25:26 9742 [Note] WSREP: save pc into disk
2016-08-10 09:25:26 9742 [Note] WSREP: forgetting 5e4347dd (ssl://192.168.1.24:4567)
2016-08-10 09:25:26 9742 [Note] WSREP: deleting entry ssl://192.168.1.24:4567
2016-08-10 09:25:26 9742 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2016-08-10 09:25:26 9742 [Note] WSREP: (35e1aeed, 'ssl://0.0.0.0:4567') turning message relay requesting off
2016-08-10 09:25:26 9742 [ERROR] WSREP: handshake with remote endpoint ssl://192.168.1.24:4567 failed: 1: 'Operation aborted.' ( )
2016-08-10 09:25:26 9742 [Note] WSREP: STATE_EXCHANGE: sent state UUID: e645e670-5efd-11e6-8266-f393e1a8f01d
2016-08-10 09:25:26 9742 [Note] WSREP: STATE EXCHANGE: sent state msg: e645e670-5efd-11e6-8266-f393e1a8f01d
2016-08-10 09:25:26 9742 [Note] WSREP: STATE EXCHANGE: got state msg: e645e670-5efd-11e6-8266-f393e1a8f01d from 0 (dbm03)
2016-08-10 09:25:26 9742 [Note] WSREP: STATE EXCHANGE: got state msg: e645e670-5efd-11e6-8266-f393e1a8f01d from 1 (dbm02)
2016-08-10 09:25:26 9742 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 5,
        members    = 1/2 (joined/total),
        act_id     = 36540931761,
        last_appl. = 36540931593,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 83e07a6f-96f8-11e3-b8f4-679cb41d96f9
2016-08-10 09:25:26 9742 [Warning] WSREP: Donor 5e4347dd-5edb-11e6-8d36-0f054601ebe3 is no longer in the group. State transfer cannot be completed, need to abort. Aborting...
2016-08-10 09:25:26 9742 [Note] WSREP: /usr/sbin/mysqld: Terminated.
160810 09:25:26 mysqld_safe mysqld from pid file /var/lib/mysql/mysqld.pid ended
WSREP_SST: [INFO] Moving the backup to /data/mysql/databases/ (20160810 09:31:59.142)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/my.cnf  --defaults-group=mysqld --no-version-check   --move-back --force-non-empty-directories ${DATA} &>${DATA}/innobackup.move.log (20160810 09:31:59.143)
WSREP_SST: [INFO] Move successful, removing /data/mysql/databases//.sst (20160810 09:32:06.818)
WSREP_SST: [INFO] Galera co-ords from recovery: 83e07a6f-96f8-11e3-b8f4-679cb41d96f9:36540726932 (20160810 09:32:07.237)
WSREP_SST: [ERROR] Cleanup after exit with status:141 (20160810 09:32:07.239)

Thanks for the help!

Revision history for this message

Krunal Bauskar (krunal-bauskar) wrote on 2016-08-11:

Hi Danny,

What is interesting to understand are the following lines in the donor-log ?

It suggest that node was paused and then resumed and we trying to understand what can cause it if there is no workload running on donor-node.

We are trying to reproduce the scenario locally but if you have a reproducible use-case that would help us hunt this quicker.

Regards,
Krunal

Revision history for this message

Danny Gueta (danny-gueta) wrote on 2016-08-11:

Krunal,

I ended up doing a manual xtrabackup and rsyncing to nodes 1+3 of course it finished with an IST, it seemed to have worked fine so now i have all 3 nodes up.

I believe i can re-create this scenario but let me know what further information you require before i do so.

Thanks,
Danny.

Revision history for this message

Krunal Bauskar (krunal-bauskar) wrote on 2016-08-11:

Hi Danny,

I was able to reproduce something similar which hints toward the problem you are facing.

For sure you are using XB with backup locks disabled either by use of --no-backup-locks or FORCE_FTWRL=1 and I see we hit this scenario only when XB tries to do full FLUSH TABLE WITH READ LOCK.

(As per the updated FLUSH TABLE WITH READ LOCK desync+pause the node which is already done as part of SST flow and there probably is some race in that code path)

-------------------------------
From your log snippet:

WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/my.cnf --defaults-group=mysqld --no-version-check ***********--no-backup-locks************ $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | socat -u stdio TCP:192.168.1.24:4444; RC=( ${PIPESTATUS[@]} ) (20160810 05:18:28.695)
------------------------------

What is the quick workaround:
* Avoid suppressing backup-lock. As I understand backup locks were unstable before but they are good now. I am still running some local test to ensure that we can get rid of the limitation added to PXC documents.

"Backup locks used during SST or with Xtrabackup can crash, on donor, either use inno-backup-opts=’–no-backup-locks’ under [sst] in my.cnf or set FORCE_FTWRL=1 in /etc/sysconfig/mysql (or /etc/sysconfig/mysql.%i for corresponding unit/service) for CentOS/RHEL or /etc/default/mysql in debian/ubuntu. You can also use rsync as the alternate SST method if those don’t work."

Revision history for this message

Danny Gueta (danny-gueta) wrote on 2016-08-11:

Hi Krunal,

You are correct, i am using the following option under my.cnf file:

[sst]
inno-backup-opts='--no-backup-locks'

So I'll just go ahead and temporarily remove that option.

Thanks for the help, highly appreciated!

Krunal Bauskar (krunal-bauskar) on 2016-08-15

Changed in percona-xtradb-cluster:
status:	New → Fix Committed

Hrvoje Matijakovic (hrvojem) on 2016-10-04

Changed in percona-xtradb-cluster:
milestone:	none → 5.6.32-25.17
status:	Fix Committed → Fix Released

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1919

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.