Percona cluster with pc.recovery=true failes to automatically recover
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
percona-xtradb-cluster-5.7 (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
Starting this bug as a point of discussion.
Per [0] when pc.recovery = true (default) the cluster should be able to automatically recover itself after a power outage. It is possible there is a discrepancy between expectation and reality. This bug is to determine what we can expect from automatic recovery.
In re-creating a power outage scenario, percona fails to restore primary component from disk:
[Warning] WSREP: Fail to access the file (/var/lib/
[Note] WSREP: Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown
Furthermore, the cluster appears to timeout in attempting to talk to each of its nodes:
[ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_
at gcomm/src/
[ERROR] WSREP: gcs/src/
ERROR] WSREP: gcs/src/
[ERROR] WSREP: gcs connect failed: Connection timed out
[ERROR] WSREP: Provider/Node (gcomm:
[ERROR] Aborting
For Ubuntu devs:
dpkg -l |grep percona
ii percona-xtrabackup 2.4.9-0ubuntu2 amd64 Open source backup tool for InnoDB and XtraDB
ii percona-
ii percona-
root@juju-
Description: Ubuntu 18.04.2 LTS
Release: 18.04
apt-cache policy percona-
percona-
Installed: 5.7.20-
Candidate: 5.7.20-
Version table:
*** 5.7.20-
500 http://
100 /var/lib/
5.
500 http://
Find attached logs from a 3 node cluster including etc config, grastate.dat and logs for each node.
[0] https:/
"If you starting cluster nodes directly (w/o mysqld_safe) or through systemd (which seems to
have some limitation with the invocation of --wsrep_recover) this feature will not work."
https:/ /jira.percona. com/browse/ PXC-881? focusedCommentI d=224039& page=com. atlassian. jira.plugin. system. issuetabpanels% 3Acomment- tabpanel# comment- 224039