Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

Resume mysqld startup in case wsrep-recover fails

Bug #1378578 reported by Raghavendra D Prabhu on 2014-10-07

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Status tracked in 5.6
5.5	Fix Released	Undecided	Unassigned	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC 5.5.41-25.11
5.6	Fix Released	Undecided	Unassigned	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC 5.6.21-25.8

Bug Description

I noticed that, when mysqld is killed (ie. process group kill) during SST at

WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20141007 18:24:09.900)
2014-10-07 18:24:11 21468 [Note] WSREP: (a73fdb95, 'tcp://0.0.0.0:4567') turning message relay requesting off

it leaves an unclean data directory behind, naturally since the streaming process is killed while it is writing.

d) Next when you start mysqld again, mysqld --wsrep-recover will fail with some indeterminate error because of corrupt data directory. Since wsrep-recover fails, we bail out here.

How to fix this
===============

As I mentioned, we failed out at wsrep-recover if the recover failed. What needs to be done is let it go to next stage, and allow mysqld/galera to decide what to do next. At this point, since grastate.dat has 00000000-0000-0000-0000-000000000000:-1 it will go to full SST, cleaning up data directory and fixing things.

Optional:
========
Add an option to [mysqld_safe] to use older behavior but default to newer one.

affects: percona-xtradb-cluster

See original description

Tags: