wsrep_start_position does not work unless grastate.dat is parseable

Reported by Jay Janssen on 2013-02-01
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Galera
Medium
Unassigned
Percona XtraDB Cluster
Status tracked in Trunk
5.6
Undecided
Unassigned
Trunk
Undecided
Unassigned

Bug Description

I would expect --wsrep_start_position to always apply and override whatever state grastate.dat is in. My use case is a datadir that was recovered from an xtrabackup which doesn't save the grastate.dat. I recovered the position with --wsrep-recover.

Submitting a ---wsrep_start_position when the datadir does not have a grastate.dat not work -- it forces a zero state and SSTs instead.

130201 12:36:21 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
130201 12:36:21 mysqld_safe WSREP: Running position recovery with --log_error=/tmp/tmp.GQzLuIeAwV
130201 12:36:26 mysqld_safe WSREP: Recovered position 8d211006-5bf5-11e2-0800-067f71542765:426885
130201 12:36:26 [Note] WSREP: wsrep_start_position var submitted: '8d211006-5bf5-11e2-0800-067f71542765:426885'
130201 12:36:26 [Note] WSREP: wsrep_start_position var submitted: '8d211006-5bf5-11e2-0800-067f71542765:426885'
130201 12:36:26 [Note] WSREP: Read nil XID from storage engines, skipping position init
130201 12:36:26 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
130201 12:36:26 [Note] WSREP: wsrep_load(): Galera 2.3(r143) by Codership Oy <email address hidden> loaded succesfully.
130201 12:36:26 [Warning] WSREP: Could not open saved state file for reading: /var/lib/mysql//grastate.dat
130201 12:36:26 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1

Further, if I create an empty grastate.dat, it also fails:

[root@node3 lib]# ls -lah mysql/grastate.dat
-rw-r--r--. 1 mysql mysql 0 Feb 1 12:42 mysql/grastate.dat

130201 12:43:41 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
130201 12:43:41 mysqld_safe WSREP: Running position recovery with --log_error=/tmp/tmp.pnE4L8ze9L
130201 12:43:46 mysqld_safe WSREP: Recovered position 8d211006-5bf5-11e2-0800-067f71542765:431680
130201 12:43:46 [Note] WSREP: wsrep_start_position var submitted: '8d211006-5bf5-11e2-0800-067f71542765:431680'
130201 12:43:46 [Note] WSREP: wsrep_start_position var submitted: '8d211006-5bf5-11e2-0800-067f71542765:431680'
130201 12:43:46 [Note] WSREP: Read nil XID from storage engines, skipping position init
130201 12:43:46 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
130201 12:43:46 [Note] WSREP: wsrep_load(): Galera 2.3(r143) by Codership Oy <email address hidden> loaded succesfully.
130201 12:43:46 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1

From what I can tell, wsrep_start_position only works if the grastate is present and has a parseable format:

[root@node3 lib]# cat mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid: 8d211006-5bf5-11e2-0800-067f71542765
seqno: -1
cert_index:

[root@node3 lib]# service mysql start --wsrep_start_position=8d211006-5bf5-11e2-0800-067f71542765:434636

130201 12:49:09 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
130201 12:49:09 mysqld_safe WSREP: Running position recovery with --log_error=/tmp/tmp.TbDFDzLI6T
130201 12:49:14 mysqld_safe WSREP: Recovered position 8d211006-5bf5-11e2-0800-067f71542765:434636
130201 12:49:14 [Note] WSREP: wsrep_start_position var submitted: '8d211006-5bf5-11e2-0800-067f71542765:434636'
130201 12:49:14 [Note] WSREP: wsrep_start_position var submitted: '8d211006-5bf5-11e2-0800-067f71542765:434636'
130201 12:49:14 [Note] WSREP: Read nil XID from storage engines, skipping position init
130201 12:49:14 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
130201 12:49:14 [Note] WSREP: wsrep_load(): Galera 2.3(r143) by Codership Oy <email address hidden> loaded succesfully.
130201 12:49:15 [Note] WSREP: Found saved state: 8d211006-5bf5-11e2-0800-067f71542765:-1
...
130201 12:49:15 [Note] WSREP: State transfer required:
 Group state: 8d211006-5bf5-11e2-0800-067f71542765:436317
 Local state: 8d211006-5bf5-11e2-0800-067f71542765:434636

In all other cases it does a zero state reset and forces SST. This will lead to unexpected results.

Jay Janssen (jay-janssen) wrote :

Forgot to mention:

Centos 6.3
Server version: 5.5.29 Percona XtraDB Cluster (GPL), wsrep_23.7.1.r3843
| wsrep_provider_version | 2.3(r143) |

Jay Janssen (jay-janssen) wrote :

To be more concise:

130302 06:52:32 mysqld_safe WSREP: Running position recovery with --log_error=/tmp/tmp.WOzvdidmIe
130302 06:52:37 mysqld_safe WSREP: Recovered position 8797f811-7f73-11e2-0800-8b513b3819c1:314673
130302 6:52:37 [Note] WSREP: wsrep_start_position var submitted: '8797f811-7f73-11e2-0800-8b513b3819c1:314673'

130302 6:52:37 [Warning] WSREP: Could not open saved state file for reading: /var/lib/mysql//grastate.dat
130302 6:52:37 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1

130302 6:52:37 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1

Shouldn't the wsrep_start_position submitted here overrule a missing grastate.dat?

Fixing this would probably solve: https://bugs.launchpad.net/codership-mysql/+bug/1111706

Alex Yurchenko (ayurchen) wrote :

Jay,

traditionally, yes, a command line parameter should take precedence over any defaults, configs, etc.

The problem here is that this option is used in _automatic_ (read: unattended) node recovery: to pass a GTID value found via --wsrep-recover option from InnoDB table space. So it is not always user-supplied, and hence grastate.dat takes precedence, since InnoDB does not store DDL and other non-transactional GTIDs.

And I don't think that lp:1111706 can ever be fixed without a risk of inconsistency.

On Mar 2, 2013, at 11:52 AM, Alex Yurchenko <email address hidden> wrote:

> The problem here is that this option is used in _automatic_ (read:
> unattended) node recovery: to pass a GTID value found via --wsrep-
> recover option from InnoDB table space. So it is not always user-
> supplied, and hence grastate.dat takes precedence, since InnoDB does not
> store DDL and other non-transactional GTIDs.

It's really not clear at what points grastate takes precedence over --wsrep_start_position.

AFAIK, there are three (maybe 4) possible grastate.dat states:

1) UUID set, seqno is >= 0, indicating either a clean shutdown, or someone manually tinkering with the file
2) UUID set, seqno is -1: indicating an unclean shutdown/crash
3) UUID zeroed: wsrep abort (?) (like lp:1111706?, and RBR errors?)
3) grastate.dat missing or unparseable: someone trying to build a node from a backup, someone manually tinkering with the file, or something horrible (filesystem corruption)

AFAICT, --wsrep_start_position only works in case #2, am I right?

I can accept that #3 would not accept wsrep_start_position, since RBR errors should trigger SST. However, there should be a clear log entry explaining why wsrep_start_position is getting ignored (in any and every case that it is ignored, BTW).

However, I think --wsrep_start_position should apply in #4 -- this would make manual node recovery (say from a backup) much easier, and if you wanted to try an auto-recovery in case of #3, all you would need to do is delete the grastate and let it try to recover.

Jay Janssen, MySQL Consulting Lead, Percona
http://about.me/jay.janssen
Percona Live in Santa Clara, CA April 22nd-25th 2013
http://www.percona.com/live/mysql-conference-2013/

Alex Yurchenko (ayurchen) wrote :

Jay,

Now that you put it this way, I can't find any more excuses except that we have a pile of other issues with higher priorities ATM :)

affects: codership-mysql → galera
Changed in galera:
importance: Undecided → Low
milestone: none → 3.0beta
status: New → Confirmed
Changed in galera:
importance: Low → Medium

I was looking at this from POV of xtrabackup and SST, and

===========

     st_.get (uuid, seqno);

    if (0 != args->state_uuid &&
        *args->state_uuid != WSREP_UUID_UNDEFINED &&
        *args->state_uuid == uuid &&
        seqno == WSREP_SEQNO_UNDEFINED)
    {
        /* non-trivial recovery information provided on startup, and db is safe
         * so use recovered seqno value */
        seqno = args->state_seqno;
    }
    log_debug << "End state: " << uuid << ':' << seqno << " #################";
    update_state_uuid (uuid);

    cc_seqno_ = seqno; // is it needed here?
    apply_monitor_.set_initial_position(seqno);
    if (co_mode_ != CommitOrder::BYPASS)
        commit_monitor_.set_initial_position(seqno);
    cert_.assign_initial_position(seqno, trx_proto_ver_);

=======================

It looks like the provided position (with wsrep-start-position)
is allowed only when UUID matches the one in grastate.dat and
sequence number is -1 in grastate.dat

Now, from Xtrabackup's perspective, --no-lock is only used when DDL and
non-transactional tables are not in effect (at the moment this needs
to checked manually). So, doesn't that mean if that is taken care of
(automatically since SST runs unattended on donor) then grastate.dat
won't be needed? Regarding DDL, is it not possible for SST code in WSREP
to acquire a shared MDL lock (MDL_SHARED_READ or MDL_SHARED_HIGH_PRIO) which should block DDL?

Changed in galera:
milestone: 3.0beta → 3.0
Changed in galera:
milestone: 3.0-beta → 3.1
Changed in galera:
milestone: 25.3.1 → 25.3.2
Changed in galera:
milestone: 25.3.2 → 25.3.3
Changed in galera:
milestone: 25.3.3 → 25.3.4
no longer affects: galera/2.x
Changed in galera:
milestone: 25.3.4 → 25.3.5
Changed in galera:
milestone: 25.3.5 → 25.3.6
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers