Ubuntu
percona-xtradb-cluster-5.7 package

Percona cluster with pc.recovery=true failes to automatically recover

Bug #1830950 reported by David Ames on 2019-05-29

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	percona-xtradb-cluster-5.7 (Ubuntu)	New	Undecided	Unassigned

Bug Description

Starting this bug as a point of discussion.

Per [0] when pc.recovery = true (default) the cluster should be able to automatically recover itself after a power outage. It is possible there is a discrepancy between expectation and reality. This bug is to determine what we can expect from automatic recovery.

In re-creating a power outage scenario, percona fails to restore primary component from disk:

[Warning] WSREP: Fail to access the file (/var/lib/percona-xtradb-cluster//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
[Note] WSREP: Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown

Furthermore, the cluster appears to timeout in attempting to talk to each of its nodes:

[ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout): 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():159
[ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1514: Failed to open channel 'juju_cluster' at 'gcomm://10.5.0.49,10.5.0.9': -110 (Connection timed out)
[ERROR] WSREP: gcs connect failed: Connection timed out
[ERROR] WSREP: Provider/Node (gcomm://10.5.0.49,10.5.0.9) failed to establish connection with cluster (reason: 7)
[ERROR] Aborting

For Ubuntu devs:
dpkg -l |grep percona
ii percona-xtrabackup 2.4.9-0ubuntu2 amd64 Open source backup tool for InnoDB and XtraDB
ii percona-xtradb-cluster-server 5.7.20-29.24-0ubuntu2.1 all Percona XtraDB Cluster database server
ii percona-xtradb-cluster-server-5.7 5.7.20-29.24-0ubuntu2.1 amd64 Percona XtraDB Cluster database server binaries
root@juju-fa2938-zaza-eeda2892d6b4-1:/var/lib/percona-xtradb-cluster# lsb_release -rd
Description: Ubuntu 18.04.2 LTS
Release: 18.04

apt-cache policy percona-xtradb-cluster-server
percona-xtradb-cluster-server:
  Installed: 5.7.20-29.24-0ubuntu2.1
  Candidate: 5.7.20-29.24-0ubuntu2.1
  Version table:
*** 5.7.20-29.24-0ubuntu2.1 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages
        100 /var/lib/dpkg/status
     5.7.20-29.24-0ubuntu2 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu bionic/universe amd64 Packages

Find attached logs from a 3 node cluster including etc config, grastate.dat and logs for each node.

[0] https://www.percona.com/blog/2014/09/01/galera-replication-how-to-recover-a-pxc-cluster/

Revision history for this message

David Ames (thedac) wrote on 2019-05-29:

Node logs and files Edit (75.5 KiB, application/x-tar)

Revision history for this message

David Ames (thedac) wrote on 2019-06-03:

"If you starting cluster nodes directly (w/o mysqld_safe) or through systemd (which seems to
have some limitation with the invocation of --wsrep_recover) this feature will not work."

https://jira.percona.com/browse/PXC-881?focusedCommentId=224039&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-224039