Percona cluster with pc.recovery=true failes to automatically recover

Bug #1830950 reported by David Ames
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
percona-xtradb-cluster-5.7 (Ubuntu)
New
Undecided
Unassigned

Bug Description

Starting this bug as a point of discussion.

Per [0] when pc.recovery = true (default) the cluster should be able to automatically recover itself after a power outage. It is possible there is a discrepancy between expectation and reality. This bug is to determine what we can expect from automatic recovery.

In re-creating a power outage scenario, percona fails to restore primary component from disk:

[Warning] WSREP: Fail to access the file (/var/lib/percona-xtradb-cluster//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
[Note] WSREP: Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown

Furthermore, the cluster appears to timeout in attempting to talk to each of its nodes:

[ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout): 110 (Connection timed out)
         at gcomm/src/pc.cpp:connect():159
[ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1514: Failed to open channel 'juju_cluster' at 'gcomm://10.5.0.49,10.5.0.9': -110 (Connection timed out)
[ERROR] WSREP: gcs connect failed: Connection timed out
[ERROR] WSREP: Provider/Node (gcomm://10.5.0.49,10.5.0.9) failed to establish connection with cluster (reason: 7)
[ERROR] Aborting

For Ubuntu devs:
 dpkg -l |grep percona
ii percona-xtrabackup 2.4.9-0ubuntu2 amd64 Open source backup tool for InnoDB and XtraDB
ii percona-xtradb-cluster-server 5.7.20-29.24-0ubuntu2.1 all Percona XtraDB Cluster database server
ii percona-xtradb-cluster-server-5.7 5.7.20-29.24-0ubuntu2.1 amd64 Percona XtraDB Cluster database server binaries
root@juju-fa2938-zaza-eeda2892d6b4-1:/var/lib/percona-xtradb-cluster# lsb_release -rd
Description: Ubuntu 18.04.2 LTS
Release: 18.04

apt-cache policy percona-xtradb-cluster-server
percona-xtradb-cluster-server:
  Installed: 5.7.20-29.24-0ubuntu2.1
  Candidate: 5.7.20-29.24-0ubuntu2.1
  Version table:
 *** 5.7.20-29.24-0ubuntu2.1 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages
        100 /var/lib/dpkg/status
     5.7.20-29.24-0ubuntu2 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu bionic/universe amd64 Packages

Find attached logs from a 3 node cluster including etc config, grastate.dat and logs for each node.

[0] https://www.percona.com/blog/2014/09/01/galera-replication-how-to-recover-a-pxc-cluster/

Revision history for this message
David Ames (thedac) wrote :
Revision history for this message
David Ames (thedac) wrote :

"If you starting cluster nodes directly (w/o mysqld_safe) or through systemd (which seems to
have some limitation with the invocation of --wsrep_recover) this feature will not work."

https://jira.percona.com/browse/PXC-881?focusedCommentId=224039&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-224039

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.