Percona XtraDB Cluster node gets stuck in joining
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona XtraBackup moved to https://jira.percona.com/projects/PXB |
Expired
|
Undecided
|
Unassigned |
Bug Description
When using xtrabackup version 2.2.4 or 2.2.5, percona cluster has an inconsistent report of the node being synced. Downgrading to 2.2.3 resolves the issue.
Steps to repeat:
apt-get install percona-
( should install the latest version of xtrabackup 2.2.5)
Packages installed as seen from apt:
ls /var/cache/
/var/cache/
/var/cache/
/var/cache/
/var/cache/
/var/cache/
/var/cache/
/var/cache/
/var/cache/
Bring up nodes:
Log from donor:
=======
2014-10-06 23:37:56 30411 [Note] WSREP: SSL handshake successful, remote endpoint ssl://172.
2014-10-06 23:37:56 30411 [Note] WSREP: (51a665e9, 'ssl://
2014-10-06 23:37:56 30411 [Note] WSREP: declaring cc905e66 at ssl://172.
2014-10-06 23:37:56 30411 [Note] WSREP: declaring e7c165d0 at ssl://172.
2014-10-06 23:37:56 30411 [Note] WSREP: Node 51a665e9 state prim
2014-10-06 23:37:56 30411 [Note] WSREP: save pc into disk
2014-10-06 23:37:56 30411 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
2014-10-06 23:37:56 30411 [Note] WSREP: STATE_EXCHANGE: sent state UUID: ccdbeebc-
2014-10-06 23:37:56 30411 [Note] WSREP: STATE EXCHANGE: sent state msg: ccdbeebc-
2014-10-06 23:37:56 30411 [Note] WSREP: STATE EXCHANGE: got state msg: ccdbeebc-
2014-10-06 23:37:56 30411 [Note] WSREP: STATE EXCHANGE: got state msg: ccdbeebc-
2014-10-06 23:37:57 30411 [Note] WSREP: STATE EXCHANGE: got state msg: ccdbeebc-
2014-10-06 23:37:57 30411 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 16,
members = 2/3 (joined/total),
act_id = 34,
last_appl. = 0,
protocols = 0/6/3 (gcs/repl/appl),
group UUID = 51a6f655-
2014-10-06 23:37:57 30411 [Note] WSREP: Flow-control interval: [28, 28]
2014-10-06 23:37:57 30411 [Note] WSREP: New cluster view: global state: 51a6f655-
2014-10-06 23:37:57 30411 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-10-06 23:37:57 30411 [Note] WSREP: REPL Protocols: 6 (3, 2)
2014-10-06 23:37:57 30411 [Note] WSREP: Service thread queue flushed.
2014-10-06 23:37:57 30411 [Note] WSREP: Assign initial position for certification: 34, protocol version: 3
2014-10-06 23:37:57 30411 [Note] WSREP: Service thread queue flushed.
2014-10-06 23:37:58 30411 [Note] WSREP: Member 1.0 (172.31.12.65) requested state transfer from '*any*'. Selected 0.0 (172.31.
2014-10-06 23:37:58 30411 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 34)
2014-10-06 23:37:58 30411 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-10-06 23:37:58 30411 [Note] WSREP: Running: 'wsrep_
2014-10-06 23:37:58 30411 [Note] WSREP: sst_donor_thread signaled with 0
WSREP_SST: [INFO] Streaming with xbstream (20141006 23:37:58.460)
WSREP_SST: [INFO] Using socat as streamer (20141006 23:37:58.464)
WSREP_SST: [INFO] Using openssl based encryption with socat: with key and crt (20141006 23:37:58.474)
WSREP_SST: [INFO] Encrypting with certificate /etc/mysql/
WSREP_SST: [INFO] Using /tmp/tmp.y4HCqjrCdf as innobackupex temporary directory (20141006 23:37:58.728)
WSREP_SST: [INFO] Streaming GTID file before SST (20141006 23:37:58.732)
WSREP_SST: [INFO] Evaluating xbstream -c ${INFO_FILE} | socat -u stdio openssl-
WSREP_SST: [INFO] Sleeping before data transfer for SST (20141006 23:37:58.794)
2014-10-06 23:37:59 30411 [Note] WSREP: (51a665e9, 'ssl://
WSREP_SST: [INFO] Streaming the backup to joiner at 172.31.12.65 4444 (20141006 23:38:08.805)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-
2014-10-06 23:38:11 30411 [Note] WSREP: Provider paused at 51a6f655-
2014-10-06 23:38:13 30411 [Note] WSREP: resuming provider at 86
2014-10-06 23:38:13 30411 [Note] WSREP: Provider resumed.
2014-10-06 23:38:13 30411 [Note] WSREP: 0.0 (172.31.35.68): State transfer to 1.0 (172.31.12.65) complete.
2014-10-06 23:38:13 30411 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 34)
2014-10-06 23:38:13 30411 [Note] WSREP: Member 0.0 (172.31.35.68) synced with group.
2014-10-06 23:38:13 30411 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 34)
2014-10-06 23:38:13 30411 [Note] WSREP: Synchronized with group, ready for connections
2014-10-06 23:38:13 30411 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
WSREP_SST: [INFO] Total time on donor: 0 seconds (20141006 23:38:13.529)
WSREP_SST: [INFO] Cleaning up temporary directories (20141006 23:38:13.535)
2014-10-06 23:38:21 30411 [Note] WSREP: 1.0 (172.31.12.65): State transfer from 0.0 (172.31.35.68) complete.
2014-10-06 23:38:21 30411 [Note] WSREP: Member 1.0 (172.31.12.65) synced with group.
================ we see it sync'd.
Check the node joining
=======
mysql> mysql> show status like 'wsrep_
+------
| Variable_name | Value |
+------
| wsrep_local_
+------
1 row in set (0.00 sec)
mysql> show status like 'wsrep_
+------
| Variable_name | Value |
+------
| wsrep_cluster_
+------
1 row in set (0.00 sec)
mysql> create database foobar;
ERROR 1047 (08S01): Unknown command
full wsrep
=======
+------
| Variable_name | Value |
+------
| wsrep_local_
| wsrep_protocol_
| wsrep_last_
| wsrep_replicated | 0 |
| wsrep_replicate
| wsrep_repl_keys | 0 |
| wsrep_repl_
| wsrep_repl_
| wsrep_repl_
| wsrep_received | 3 |
| wsrep_received_
| wsrep_local_commits | 0 |
| wsrep_local_
| wsrep_local_replays | 0 |
| wsrep_local_
| wsrep_local_
| wsrep_local_
| wsrep_local_
| wsrep_local_
| wsrep_local_
| wsrep_local_
| wsrep_local_
| wsrep_local_
| wsrep_flow_
| wsrep_flow_
| wsrep_flow_
| wsrep_flow_
| wsrep_cert_
| wsrep_apply_oooe | 0.000000 |
| wsrep_apply_oool | 0.000000 |
| wsrep_apply_window | 0.000000 |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.000000 |
| wsrep_commit_window | 0.000000 |
| wsrep_local_state | 1 |
| wsrep_local_
| wsrep_cert_
| wsrep_causal_reads | 0 |
| wsrep_cert_interval | 0.000000 |
| wsrep_incoming_
| wsrep_evs_
| wsrep_cluster_
| wsrep_cluster_size | 3 |
| wsrep_cluster_
| wsrep_cluster_
| wsrep_connected | ON |
| wsrep_local_
| wsrep_local_index | 1 |
| wsrep_provider_name | Galera |
| wsrep_provider_
| wsrep_provider_
| wsrep_ready | OFF |
+------
============== rollback to 2.2.3
on failed joining node, downgrade..
pkill -9 mysql
rm -rf /var/lib/mysql/*
wget http://
dpkg -i percona-
dpkg: warning: downgrading percona-xtrabackup from 2.2.5-5027-
(Reading database ... 79323 files and directories currently installed.)
Preparing to replace percona-xtrabackup 2.2.5-5027-
Unpacking replacement percona-xtrabackup ...
Setting up percona-xtrabackup (2.2.3-
root@ec2-
xtrabackup version 2.2.3 based on MySQL server 5.6.17 Linux (x86_64) (revision id: )
service mysql start
mysql> show status like 'wsrep_
+------
| Variable_name | Value |
+------
| wsrep_local_
+------
1 row in set (0.00 sec)
Log from donor:
=======
2014-10-06 23:51:50 30411 [Note] WSREP: SSL handshake successful, remote endpoint ssl://172.
2014-10-06 23:51:50 30411 [Note] WSREP: (51a665e9, 'ssl://
2014-10-06 23:51:51 30411 [Note] WSREP: declaring bddb8200 at ssl://172.
2014-10-06 23:51:51 30411 [Note] WSREP: declaring e7c165d0 at ssl://172.
2014-10-06 23:51:51 30411 [Note] WSREP: Node 51a665e9 state prim
2014-10-06 23:51:51 30411 [Note] WSREP: save pc into disk
2014-10-06 23:51:51 30411 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
2014-10-06 23:51:51 30411 [Note] WSREP: STATE_EXCHANGE: sent state UUID: be2751bf-
2014-10-06 23:51:51 30411 [Note] WSREP: STATE EXCHANGE: sent state msg: be2751bf-
2014-10-06 23:51:51 30411 [Note] WSREP: STATE EXCHANGE: got state msg: be2751bf-
2014-10-06 23:51:51 30411 [Note] WSREP: STATE EXCHANGE: got state msg: be2751bf-
2014-10-06 23:51:51 30411 [Note] WSREP: STATE EXCHANGE: got state msg: be2751bf-
2014-10-06 23:51:51 30411 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 18,
members = 2/3 (joined/total),
act_id = 34,
last_appl. = 0,
protocols = 0/6/3 (gcs/repl/appl),
group UUID = 51a6f655-
2014-10-06 23:51:51 30411 [Note] WSREP: Flow-control interval: [28, 28]
2014-10-06 23:51:51 30411 [Note] WSREP: New cluster view: global state: 51a6f655-
2014-10-06 23:51:51 30411 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-10-06 23:51:51 30411 [Note] WSREP: REPL Protocols: 6 (3, 2)
2014-10-06 23:51:51 30411 [Note] WSREP: Service thread queue flushed.
2014-10-06 23:51:51 30411 [Note] WSREP: Assign initial position for certification: 34, protocol version: 3
2014-10-06 23:51:51 30411 [Note] WSREP: Service thread queue flushed.
2014-10-06 23:51:52 30411 [Note] WSREP: Member 1.0 (172.31.12.65) requested state transfer from '*any*'. Selected 0.0 (172.31.
2014-10-06 23:51:52 30411 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 34)
2014-10-06 23:51:52 30411 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-10-06 23:51:52 30411 [Note] WSREP: Running: 'wsrep_
2014-10-06 23:51:52 30411 [Note] WSREP: sst_donor_thread signaled with 0
WSREP_SST: [INFO] Streaming with xbstream (20141006 23:51:52.288)
WSREP_SST: [INFO] Using socat as streamer (20141006 23:51:52.292)
WSREP_SST: [INFO] Using openssl based encryption with socat: with key and crt (20141006 23:51:52.303)
WSREP_SST: [INFO] Encrypting with certificate /etc/mysql/
WSREP_SST: [INFO] Using /tmp/tmp.kXWQezJ8L5 as innobackupex temporary directory (20141006 23:51:52.518)
WSREP_SST: [INFO] Streaming GTID file before SST (20141006 23:51:52.523)
WSREP_SST: [INFO] Evaluating xbstream -c ${INFO_FILE} | socat -u stdio openssl-
WSREP_SST: [INFO] Sleeping before data transfer for SST (20141006 23:51:52.586)
2014-10-06 23:51:53 30411 [Note] WSREP: (51a665e9, 'ssl://
WSREP_SST: [INFO] Streaming the backup to joiner at 172.31.12.65 4444 (20141006 23:52:02.592)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-
2014-10-06 23:52:04 30411 [Note] WSREP: Provider paused at 51a6f655-
2014-10-06 23:52:07 30411 [Note] WSREP: resuming provider at 92
2014-10-06 23:52:07 30411 [Note] WSREP: Provider resumed.
2014-10-06 23:52:07 30411 [Note] WSREP: 0.0 (172.31.35.68): State transfer to 1.0 (172.31.12.65) complete.
2014-10-06 23:52:07 30411 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 34)
2014-10-06 23:52:07 30411 [Note] WSREP: Member 0.0 (172.31.35.68) synced with group.
2014-10-06 23:52:07 30411 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 34)
2014-10-06 23:52:07 30411 [Note] WSREP: Synchronized with group, ready for connections
2014-10-06 23:52:07 30411 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
WSREP_SST: [INFO] Total time on donor: 0 seconds (20141006 23:52:07.377)
WSREP_SST: [INFO] Cleaning up temporary directories (20141006 23:52:07.383)
2014-10-06 23:52:15 30411 [Note] WSREP: 1.0 (172.31.12.65): State transfer from 0.0 (172.31.35.68) complete.
2014-10-06 23:52:15 30411 [Note] WSREP: Member 1.0 (172.31.12.65) synced with group.
@Marc,
Thanks for reporting the bug. Can you please share your cnf files?