SST fails if donor has to send keyring file
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC |
Fix Committed
|
Undecided
|
Unassigned | ||
5.7 |
Fix Committed
|
Undecided
|
Kenn Takara |
Bug Description
SST will fail if donor has to send keyring. Looks like the donor is trying to send the file while socat is still opening port 4444 on joiner:
20170606 09:00:15.294 WSREP_SST: [INFO] Streaming GTID file before SST
20170606 09:00:18.368 WSREP_SST: [INFO] Streaming donor-keyring file before SST
2017/06/06 09:00:18 socat[15464] E connect(4, AF=2 10.131.17.240:4444, 16): Connection refused
20170606 09:00:18.376 WSREP_SST: [ERROR] ******************* FATAL ERROR *******
20170606 09:00:18.377 WSREP_SST: [ERROR] Error while sending data to joiner node: exit codes: 0 1
20170606 09:00:18.379 WSREP_SST: [ERROR] *******
20170606 09:00:18.380 WSREP_SST: [ERROR] Cleanup after exit with status:32
Donor is showing connection refused, but port 4444 is open on joiner.
How to repeat.
1) Create a PXC cluster with 2 nodes using 5.7
2) Stop node1 and add below config to my.cnf:
in [mysqld] section:
ssl-ca=
ssl-cert=
ssl-key=
early-plugin-
keyring_
in [sst] section:
streamfmt = xbstream
encrypt=4
ssl-ca=
ssl-cert=
ssl-key=
in [xtrabackup] section:
keyring-
3) move *.pem files from /var/lib/mysql to /var/lib/
4) scp /var/lib/
5) Start node1 and wait until it joins the cluster
6) repeat step 2 on node2
7) force sst from node2, it will fail on above error
How to fix:
I was able to fix it by increasing the time the donor waits on joiner to receive the file.
Edit /usr/bin/
+++++++
1285 # joiner need to wait to receive the file.
1286 sleep 3
1287
1288 cp $keyring $KEYRING_
1289
1290 wsrep_log_info "Streaming donor-keyring file before SST"
1291 keyringbackupopt=" --keyring-
1292 FILE_TO_
1293 send_data_
+++++++
Increase sleep 3 to sleep 10
Tested on 5.7.18-15-57 Percona XtraDB Cluster (GPL), Release rel15, Revision 7693d6e, WSREP version 29.20, wsrep_29.20
tags: | added: i192112 |
description: | updated |
Changed in percona-xtradb-cluster: | |
status: | New → Confirmed |
This will happen anytime we are sending data (not just for the keyring). Anytime the joiner is slow on starting to listen on the server socket, the donor will fail (because we only try it once).
I am changing the code to have the donor retry the connection (so it will try more than once before returning an error).