Having wsrep_causal_reads enabled globally on the donor node potentially breaks SST
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC |
New
|
Undecided
|
Unassigned |
Bug Description
We (PagerDuty) have wsrep_causal_reads enabled globally on all our cluster nodes. We've noticed as we have made changes to the cluster (adding new nodes to vertically scale the cluster), that xtrabackup based SST had a tendency to fail near the end of backup process due to the following error.
DBD::mysql::db selectall_hashref failed: Lock wait timeout exceeded; try restarting transaction at /usr//bin/
innobackupex: Error:
Error executing 'SHOW STATUS': DBD::mysql::db selectall_hashref failed: Lock wait timeout exceeded; try restarting transaction at /usr//bin/
140605 22:35:15 innobackupex: Waiting for ibbackup (pid=6698) to finish
I originally posted about this issue on the mailing list: https:/
We were able to track the problem back to having wsrep_causal_reads enabled on the donor node. Disabling that setting on the donor node just prior to kicking off SST allows the xtrabackup SST process to complete successfully. We have followed this procedure for several SSTs (probably about 5) and have never experienced the issue.
We were also able to determine that this issue only ever shows up during SST on a cluster under load. Our load test cluster performs SSTs on a weekly basis and has never experienced this issue. The production cluster is frequently affected.
We are in the process of removing the global setting of wsrep_causal_reads in our my.cnf file, but perhaps the SST script should explicitly disable causal reads on the donor node before taking the backup to avoid this issue altogether.
@Doug,
This shouldn't happen with latest PXC/PXB combination where backup locks are used. Can you provide versions of pkgs installed.