Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

mysqld_safe --wsrep_recover not working properly

Bug #1266837 reported by Jay Janssen on 2014-01-07

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
MySQL patches by Codership	Invalid	Undecided	Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Status tracked in 5.6
5.5	Confirmed	Undecided	Unassigned	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC future-5.5
5.6	Confirmed	Undecided	Unassigned	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC future-5.6

Bug Description

This is 5.6.15:

I cannot get --wsrep-recover to work properly:

[root@node3 ~]# mysqld_safe --wsrep-recover
140107 16:09:56 mysqld_safe Logging to '/var/lib/mysql/error.log'.
140107 16:09:56 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
140107 16:09:56 mysqld_safe Skipping wsrep-recover for 7206c8e4-7705-11e3-b175-922feecc92a0:803683 pair
140107 16:09:56 mysqld_safe Assigning 7206c8e4-7705-11e3-b175-922feecc92a0:803683 to wsrep_start_position
140107 16:09:58 mysqld_safe mysqld from pid file /var/lib/mysql/node3.pid ended

^^ This is pulling from the grastate.dat, so no actual recovery is done here. I kind of think wsrep_recover should always get the recovered position (regardless of the grastate.dat), at least this is how it worked before.

If I remove the grastate.dat, I get this instead:

[root@node3 ~]# mysqld_safe --wsrep-recover
140107 16:05:24 mysqld_safe Logging to '/var/lib/mysql/error.log'.
140107 16:05:24 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
140107 16:05:24 mysqld_safe Skipping wsrep-recover for empty datadir: /var/lib/mysql
140107 16:05:24 mysqld_safe Assigning 00000000-0000-0000-0000-000000000000:-1 to wsrep_start_position
140107 16:05:26 mysqld_safe mysqld from pid file /var/lib/mysql/node3.pid ended

[root@node3 ~]# ls /var/lib/mysql/
auto.cnf GRA_4_237147.log innobackup.prepare.log test
backup-my.cnf GRA_5_237146.log mysql xtrabackup_binary
error.log GRA_5_4.log mysql.sock xtrabackup_binlog_pos_innodb
galera.cache ibdata1 performance_schema xtrabackup_checkpoints
GRA_1_237148.log ib_logfile0 RPM_UPGRADE_HISTORY xtrabackup_galera_info
GRA_3_237145.log ib_logfile1 RPM_UPGRADE_MARKER-LAST xtrabackup_logfile

So in this case a grastate.dat must be present to do a recovery, which is non-obvious.

I found if I have a non-parseable or zeroed grastate will --wsrep_recover actually pull the GTID from Innodb.

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2014-01-07:

@Jay,

In an older bug (for 5.5), it was decided to skip wsrep-recover
when its results were not going to be used (ie. not going to be
used by galera), instead get them directly from grastate.dat to
wsrep-start-position. So, what you are observing is what was
implemented then.

In addition, do you want wsrep-recover to run forced when
--wsrep_recover is provided to mysqld_safe? If so, that can be
done. That was not considered since wsrep_recover is run
automatically usually.

For now, to force recover without removing grastate.dat, you can
set seqno to -1 in grastate.dat and then start. (but this may
have other side effects).

Revision history for this message

Alex Yurchenko (ayurchen) wrote on 2014-01-07:

This is Percona-specific. Marking it not applicable to codership-mysql tree.

Changed in codership-mysql:
status:	New → Invalid

Revision history for this message

Jay Janssen (jay-janssen) wrote on 2014-01-07: Re: [Bug 1266837] mysqld_safe --wsrep_recover not working properly

On Jan 7, 2014, at 11:46 AM, Raghavendra D Prabhu <email address hidden> wrote:

> In an older bug (for 5.5), it was decided to skip wsrep-recover
> when its results were not going to be used (ie. not going to be
> used by galera), instead get them directly from grastate.dat to
> wsrep-start-position. So, what you are observing is what was
> implemented then.

Ack.

>
> In addition, do you want wsrep-recover to run forced when
> --wsrep_recover is provided to mysqld_safe? If so, that can be
> done. That was not considered since wsrep_recover is run
> automatically usually.

Probably, at least in the case where the grastate is missing, because that’s the only way to get a GTID in that case ( unless it’s an xtrabackup with —galera-info).

>
> For now, to force recover without removing grastate.dat, you can
> set seqno to -1 in grastate.dat and then start. (but this may
> have other side effects).

I would have preferred that a node could start, recover a GTID, *and* that —wsrep-start-position would actually *accept* it when there was no grastate.dat. Otherwise, I think this is fine.

Jay Janssen, MySQL Consulting Lead, Percona
http://about.me/jay.janssen

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2014-01-14:

>Probably, at least in the case where the grastate is missing, because that’s the only way to get a GTID in that case ( unless it’s >an xtrabackup with —galera-info).

Yes, though you can also do mysqld --user=mysql --wsrep-recover to achieve the same. Nevertheless, mysqld_safe can also be made to do this.

It is when wsrep-recover is not given explicitly that heuristic is used to skip or run it whenever needed.

>I would have preferred that a node could start, recover a GTID, *and* that —wsrep-start-position would actually *accept* it >when there was no grastate.dat. Otherwise, I think this is fine.

The problem here is that even if you assign anything to wsrep-start-position without a grastate it is not used. So, there is no point in recovering it here. The issue with wsrep-start-position without grastate is a galera one (and there is a separate bug for this already).

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2014-01-14:

One more thing,

>This is pulling from the grastate.dat, so no actual recovery is done here. I kind of think wsrep_recover should always get the >recovered position (regardless of the grastate.dat), at least this is how it worked before.

mysqld_safe itself never had the --wsrep-recover option. It passes that option to mysqld which recovers and prints it.

So, by passing wsrep-recover what happens is:

a) mysqld_safe runs wsrep-recover automatically (or skips it based on certain conditions). This is expected since wsrep-recover is not meant to be run manually.

b) mysqld is passed wsrep-recover which runs it and writes to its error log.

So, what you saw was expected.

If you need GTID on stderr (in this case terminal), it needs to be run as mysqld --user=mysql --wsrep-recover.

A new option can be added to mysqld_safe to take wsrep-recover and print it on mysqld_safe's stderr but 'mysqld --user=mysql --wsrep-recover' can always be done. Does this suffice?

Revision history for this message

Przemek (pmalkowski) wrote on 2014-06-18:

Download full text (4.5 KiB)

The 'mysqld_safe --wsrep-recover' command not only passes this option to mysqld but also makes it easy to check each failed node position and shows results in user-friendly way (no need to view/parse error log).
IMHO this command should always do force GTID recovery. If grastate.dat is OK and has the information (like after graceful shutdown) you can just read it, no need to involve mysqld_safe.
Or maybe add another option to mysqld_safe, like --force-wsrep-recover which would do the position recovery regardless of grastate.dat presence.

This is how it works for me in PXC 5.6.19:

percona20 mysql> select @@version,@@version_comment;
+--------------------+---------------------------------------------------------------------------------------------------+
| @@version | @@version_comment |
+--------------------+---------------------------------------------------------------------------------------------------+
| 5.6.19-67.0-56-log | Percona XtraDB Cluster (GPL), Release rel67.0, Revision 796, WSREP version 25.6, wsrep_25.6.r4096 |
+--------------------+---------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

[root@percona20 ~]# killall -9 mysqld
[root@percona20 ~]# mysql
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (111)
[root@percona20 ~]# cat /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid: 33fbfa96-f619-11e3-abd0-faf333aa6b5f
seqno: -1
cert_index:
[root@percona20 ~]# mysqld_safe --wsrep-recover
140618 13:12:34 mysqld_safe Logging to '/var/lib/mysql/percona20_error.log'.
140618 13:12:34 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
140618 13:12:34 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.1LJ4Xu' --pid-file='/var/lib/mysql/percona20-recover.pid'
140618 13:12:38 mysqld_safe WSREP: Recovered position 33fbfa96-f619-11e3-abd0-faf333aa6b5f:6
140618 13:12:42 mysqld_safe mysqld from pid file /var/lib/mysql/percona20.pid ended

And when gracefully stopped:

The 'mysqld_safe --wsrep-recover' command not only passes this option to mysqld but also makes it easy to check each failed node position and shows results in user-friendly way (no need to view/parse error log). 
IMHO this command should always do force GTID recovery. If grastate.dat is OK and has the information (like after graceful shutdown) you can just read it, no need to involve mysqld_safe. 
Or maybe add another option to mysqld_safe, like --force-wsrep-recover which would do the position recovery regardless of grastate.dat presence.

This is how it works for me in PXC 5.6.19:

percona20 mysql> select @@version,@@version_comment;
+--------------------+---------------------------------------------------------------------------------------------------+
| @@version          | @@version_comment                                                                                 |
+--------------------+---------------------------------------------------------------------------------------------------+
| 5.6.19-67.0-56-log | Percona XtraDB Cluster (GPL), Release rel67.0, Revision 796, WSREP version 25.6, wsrep_25.6.r4096 |
+--------------------+---------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

[root@percona20 ~]# killall -9 mysqld
[root@percona20 ~]# mysql
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (111)
[root@percona20 ~]# cat /var/lib/mysql/grastate.dat 
# GALERA saved state
version: 2.1
uuid:    33fbfa96-f619-11e3-abd0-faf333aa6b5f
seqno:   -1
cert_index:
[root@percona20 ~]# mysqld_safe --wsrep-recover
140618 13:12:34 mysqld_safe Logging to '/var/lib/mysql/percona20_error.log'.
140618 13:12:34 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
140618 13:12:34 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.1LJ4Xu' --pid-file='/var/lib/mysql/percona20-recover.pid'
140618 13:12:38 mysqld_safe WSREP: Recovered position 33fbfa96-f619-11e3-abd0-faf333aa6b5f:6
140618 13:12:42 mysqld_safe mysqld from pid file /var/lib/mysql/percona20.pid ended

And when gracefully stopped:

[root@percona20 ~]# /etc/init.d/mysql stop 
Shutting down MySQL (Percona XtraDB Cluster)...... SUCCESS! 
[root@percona20 ~]# cat /var/lib/mysql/grastate.dat 
# GALERA saved state
version: 2.1
uuid:    33fbfa96-f619-11e3-abd0-faf333aa6b5f
seqno:   6
cert_index:
[root@percona20 ~]# mysqld_safe --wsrep-recover
140618 13:24:13 mysqld_safe Logging to '/var/lib/mysql/percona20_error.log'.
140618 13:24:13 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
140618 13:24:13 mysqld_safe Skipping wsrep-recover for 33fbfa96-f619-11e3-abd0-faf333aa6b5f:6 pair
140618 13:24:13 mysqld_safe Assigning 33fbfa96-f619-11e3-abd0-faf333aa6b5f:6 to wsrep_start_position
140618 13:24:17 mysqld_safe mysqld from pid file /var/lib/mysql/percona20.pid ended
[root@percona20 ~]# rm -f /var/lib/mysql/grastate.dat
[root@percona20 ~]# 
[root@percona20 ~]# mysqld_safe --wsrep-recover
140618 13:24:49 mysqld_safe Logging to '/var/lib/mysql/percona20_error.log'.
140618 13:24:49 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
140618 13:24:49 mysqld_safe Skipping wsrep-recover for empty datadir: /var/lib/mysql
140618 13:24:49 mysqld_safe Assigning 00000000-0000-0000-0000-000000000000:-1 to wsrep_start_position
140618 13:24:53 mysqld_safe mysqld from pid file /var/lib/mysql/percona20.pid ended

[root@percona20 ~]# touch /var/lib/mysql/grastate.dat
[root@percona20 ~]# mysqld_safe --wsrep-recover
140618 13:35:14 mysqld_safe Logging to '/var/lib/mysql/percona20_error.log'.
140618 13:35:14 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
/usr/bin/mysqld_safe: line 228: [: -ne: unary operator expected
140618 13:35:14 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.3Q8KWm' --pid-file='/var/lib/mysql/percona20-recover.pid'
140618 13:35:17 mysqld_safe WSREP: Recovered position 33fbfa96-f619-11e3-abd0-faf333aa6b5f:6
140618 13:35:21 mysqld_safe mysqld from pid file /var/lib/mysql/percona20.pid ended

Btw. for me the 'mysqld --user=mysql --wsrep-recover' does not print anything to stderr nor stdout, the position has to be read from error log (CentOS 6.5, PXC 5.5.37 and PXC 5.6.19).

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1569

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.