grastate.dat zeroed on misconfiguration in my.cnf

Bug #1111706 reported by Jay Janssen
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
New
Undecided
Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Confirmed
Medium
Unassigned

Bug Description

Here is the flow:

[root@node3 mysql]# cat /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid: 8d211006-5bf5-11e2-0800-067f71542765
seqno: -1
cert_index:

[root@node3 mysql]# service mysql stop
Shutting down MySQL (Percona XtraDB Cluster)........ SUCCESS!

[root@node3 mysql]# cat /var/lib/mysql/grastate.datnode3 mysql]#
# GALERA saved state
version: 2.1
uuid: 8d211006-5bf5-11e2-0800-067f71542765
seqno: 334193
cert_index:

Now I introduce some misconfiguration into my.cnf:

[mysqld]
...
fooey
...

Now try to restart mysql:

[root@node3 mysql]# mysqld_safe &
[1] 14755
[root@node3 mysql]# 130131 13:14:56 mysqld_safe Logging to '/var/lib/mysql/error.log'.
130131 13:14:56 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
130131 13:14:56 mysqld_safe WSREP: Running position recovery with --log_error=/tmp/tmp.dCce2nZGL4
nohup: ignoring input and appending output to `nohup.out'
130131 13:15:01 mysqld_safe WSREP: Failed to recover position: 130131 13:14:56 [Note] Plugin 'FEDERATED' is disabled. 130131 13:14:56 InnoDB: The InnoDB memory heap is disabled 130131 13:14:56 InnoDB: Mutexes and rw_locks use GCC atomic builtins 130131 13:14:56 InnoDB: Compressed tables use zlib 1.2.3 130131 13:14:56 InnoDB: Using Linux native AIO 130131 13:14:56 InnoDB: Initializing buffer pool, size = 128.0M 130131 13:14:56 InnoDB: Completed initialization of buffer pool 130131 13:14:56 InnoDB: highest supported file format is Barracuda. 130131 13:14:56 InnoDB: Waiting for the background threads to start 130131 13:14:57 Percona XtraDB (http://www.percona.com) 1.1.8-rel29.1 started; log sequence number 1337199429 130131 13:14:57 [ERROR] /usr/sbin/mysqld: unknown option '--fooey' 130131 13:14:57 [ERROR] Aborting 130131 13:14:57 InnoDB: Starting shutdown... 130131 13:15:01 InnoDB: Shutdown completed; log sequence number 1337199429 130131 13:15:01 [Note] /usr/sbin/mysqld: Shutdown complete

[root@node3 mysql]# cat /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid: 00000000-0000-0000-0000-000000000000
seqno: -1
cert_index:

According to Codership, this should not happen:

"[1/31/13 1:21:22 PM] Alexey Yurchenko / Codership: I'm seeing the same logs here, the same failure to recover, but grastate.dat stays intact."

Therefore this must be something specific to PXC.

[root@node3 mysql]# rpm -qa | grep -i percona
Percona-XtraDB-Cluster-galera-2.0-1.143.rhel6.i686
Percona-XtraDB-Cluster-shared-5.5.29-23.7.1.387.rhel6.i686
percona-xtrabackup-2.0.5-499.rhel6.i686
Percona-XtraDB-Cluster-client-5.5.29-23.7.1.387.rhel6.i686
Percona-XtraDB-Cluster-server-5.5.29-23.7.1.387.rhel6.i686

| wsrep_provider_version | 2.3(r143) |

[root@node3 mysql]# uname -a
Linux node3 2.6.32-279.19.1.el6.i686 #1 SMP Wed Dec 19 04:30:58 UTC 2012 i686 i686 i386 GNU/Linux
[root@node3 mysql]# cat /etc/issue
CentOS release 6.3 (Final)
Kernel \r on an \m

Revision history for this message
Jay Janssen (jay-janssen) wrote :
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

I can repeat this.

Changed in percona-xtradb-cluster:
status: New → Confirmed
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

I think this should be repeatable in vanilla galera/wsrep as well.

This requires a node A which

a) Has a state preserved in grastate.dat
b) Shutdown this node A and add some junk to its my.cnf
c) Get The other node some additional data.
d) On startup, A will require IST but due to race condition between variable verifier/cnf parser and SST mechanism, the grastate.dat file gets zeroed.

I will also attach the debug trace (obtained with --debug)

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Debug trace obtained with

mysqld-debug --debug --user=mysql

for the grastate.dat zeroing.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

a)

The fact that variable is unknown is known only after plugin
initialization in init_server_components. However, wsrep_init is
done before plugin initialisation in same function. I don't think
it is possible to do it after plugin initialisation. So, this
limitation remains.

b) Having said that, it still shouldn't overwrite the
grastate.dat, particularly when wsrep-start-position has been
provided.

c) http://sprunge.us/TeSh has the backtrace. mark_unsafe is what
zeroes the file.

d) Also note, this not only applies to incorrect variable, but any other error during IST can produce similar results.

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

Raghavendra,

The logic is this: we start state transfer - and we crash before we know how it ended. So we have no idea what state datadir is in. In this situation we technically cannot leave grastate.dat valid. That's why mark_unsafe() was strategically (and deliberately) placed so, that in case of any state transfer error we have invalidated grastate file. It also gets deliberately invalidated in case any inconsistency is detected (e.g. duplicate key) when applying writesets, including IST.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Alexey,

Got it. But in this case I presume, the IST doesn't start its
apply yet, so data shouldn't be touched yet. I wonder if it is
possible to mark_unsafe just before any changes are made or run
it transactionally ie. if something goes wrong to revert
grastate/gcache to earlier state.

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

That is the problem - SST is not transactional. And until we initialize storage engines we cannot know how it went. We could notify wsrep provider before initializing storage engines, but it means a whole new call in wsrep API just to gracefully handle MySQL-specific misconfiguration. Sounds like an overkill to me.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Alexey,

Yes, in this case it may be an overkill since misconfiguration is
not done often. However, I wonder if this can happen in other
contexts (while testing this I may have stumbled onto one or
other) where a failure during IST (due to network etc.) also
resulted in same state.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

I wonder if lp:1147066 will require a similar fix.

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

@Raghavendra,

this should not happen on IST failure, UNLESS, this failure is due to inability to apply a writeset. Aside from possible bugs in parallel applying it is a sure sign of DB corruption. So only network failure with IST should not result in in zeroing grastate.dat.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Alex,

Ack, I will try to reproduce the other IST issues in a separate
issue.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

While testing crash safety for another issue, I noticed that
during IST if a node crashes, its grastate is zeroed and it needs
to do full SST. Granted that it is not often that crash happens
precisely at that moment, however, this makes IST crash
vulnerable. I presume since the database itself is ACID at IST
stage, is it better to keep a backup of the file - grastate.dat,
so that if there is a crash, it can be restored back and IST
done?

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

The thing is that IST failure may also mean database inconsistency. In this case SST is the safest choice for automatic recovery.

tags: added: grastate
Revision history for this message
Przemek (pmalkowski) wrote :

I can see the same problem when, for example SST configuration is incompatible between the donor and joiner, but even in a situation where IST would be completely enough to join the cluster (tested on PXC 5.6.26 and MariaDB Cluster 10.0.21):

[root@percona2 ~]# /etc/init.d/mysql stop
Shutting down MySQL (Percona XtraDB Cluster)..... SUCCESS!

[root@percona2 ~]# cat /data/myisam/grastate.dat
# GALERA saved state
version: 2.1
uuid: fadf885a-490b-11e4-9e13-27f574d828b6
seqno: 449486
cert_index:

-- edit my.cnf on joiner only:
[sst]
#streamfmt=xbstream
streamfmt=tar

error log fragment on restart:

2015-10-24 14:43:38 19271 [Note] WSREP: Requesting state transfer: success, donor: 0
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors
2015-10-24 14:43:40 19271 [Note] WSREP: 0.0 (percona1): State transfer to 1.0 (percona2) complete.
2015-10-24 14:43:40 19271 [Note] WSREP: Member 0.0 (percona1) synced with group.
WSREP_SST: [ERROR] Error while getting data from donor node: exit codes: 0 2 (20151024 14:43:40.673)
WSREP_SST: [ERROR] Cleanup after exit with status:32 (20151024 14:43:40.679)
2015-10-24 14:43:40 19271 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.99.3:4444' --datadir '/data/myisam/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '19271' '' : 32 (Broken pipe)
2015-10-24 14:43:40 19271 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2015-10-24 14:43:40 19271 [ERROR] WSREP: SST script aborted with error 32 (Broken pipe)
2015-10-24 14:43:40 19271 [ERROR] WSREP: SST failed: 32 (Broken pipe)
2015-10-24 14:43:40 19271 [ERROR] Aborting
...

[root@percona2 ~]# cat /data/myisam/grastate.dat
# GALERA saved state
version: 2.1
uuid: 00000000-0000-0000-0000-000000000000
seqno: -1
cert_index:

Should the SST misconfiguration lead to IST failure and grastate.dat reset? IMHO not.

Przemek (pmalkowski)
tags: added: i61146
Changed in percona-xtradb-cluster:
importance: Undecided → Medium
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1049

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.