Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

Grastate should be zeroed only on Replication errors

Bug #1197898 reported by Jay Janssen on 2013-07-04

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	New	Undecided	Unassigned

Bug Description

The grastate is zeroed (forcing SST without manual intervention) on almost any MySQL error. I submit that this is bad default behavior because any ungraceful mysqld exit will trigger SST. My understanding is that the zeroed grastate's primary use case is to auto-repair cluster nodes that have reached a clearly inconsistent state. As such, I think the *only* time the grastate should be zeroed is when an actual replication error (i.e., RBR apply error) happens.

To summarize:

SST-worthy:
- RBR apply error
- Innodb checksum error
- Anything that clearly indicates a node is inconsistent with the cluster

Not worthy of a SST:
- my.cnf configuration error
- mysqld crashes for some unrelated wsrep error (we should at least *try* an IST -- why assume it won't work?)

Tags:

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2013-07-09:

This is likely a duplicate of lp:1111706

However, before doing so, there is this question:

Would it be fine if there is an option which can be used to force
galera to not consider grastate.dat for that particular startup, thus
relying on co-ordinates of wsrep-recover only. (and grastate.dat
will be created subsequently by galera).

It can be like --skip-grastate and providing that option would
imply that DBA knows what he is doing when providing it.

So, this will mean that even if grastate.dat is corrupted, zeroed,
lost/deleted etc. a full SST can be avoided for that node upon
examination (examination to ensure that other things like RBR error etc.
are not there).

tags:

added: grastate

Revision history for this message

Jay Janssen (jay-janssen) wrote on 2013-07-10: Re: [Bug 1197898] Grastate should be zeroed only on Replication errors

On Jul 9, 2013, at 1:46 PM, Raghavendra D Prabhu <email address hidden> wrote:

> This is likely a duplicate of lp:1111706
>

This is a super-set of lp:1111706.

> However, before doing so, there is this question:
>
> Would it be fine if there is an option which can be used to force
> galera to not consider grastate.dat for that particular startup,

The option already exists to ignore the grastate.dat: rm -f /var/lib/mysql/grastate.dat

My issue is that wsrep abort zeroes the grastate and wsrep aborts are way too common -- I'm not sure what an option to ignore the grastate has to do with that.

> thus
> relying on co-ordinates of wsrep-recover only. (and grastate.dat
> will be created subsequently by galera).
>
grastate.dat is not overwritten by wsrep-recover currently unless the grastate is non-zeroed and the seqno is -1. If you fix that as well, then this may be a better option.

> It can be like --skip-grastate and providing that option would
> imply that DBA knows what he is doing when providing it.
>

> So, this will mean that even if grastate.dat is corrupted, zeroed,
> lost/deleted etc. a full SST can be avoided for that node upon
> examination (examination to ensure that other things like RBR error etc.
> are not there).

This would still be a manual step to override a zeroed grastate. I guess that's my main concern.

Jay Janssen, MySQL Consulting Lead, Percona
http://about.me/jay.janssen

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2013-07-10:

As per earlier discussion, can you provide log of the wsrep error
which may have caused this?

Also, I will try to reproduce this with SIGSEGV.

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1391

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.