SST Resumes Even When Donor Was Already Detected as SYNCED State | wsrep_desync conflict

Bug #1288528 reported by Jervin R on 2014-03-06
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Galera
Status tracked in 3.x
2.x
Undecided
Unassigned
3.x
Undecided
Unassigned
MySQL patches by Codership
Status tracked in 5.6
5.5
Undecided
Unassigned
5.6
Undecided
Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
New
Undecided
Unassigned
5.6
Fix Released
Undecided
Unassigned

Bug Description

1. donor> SET GLOBAL wsrep_desync=ON;
2. joiner> -- start SST, wait for it to complain on error log that donor is not available
3. donor> SET GLOBAL wsrep_desync=OFF;
4. joiner> -- wait until it startes clobbering its files
5. donor> SET GLOBAL wsrep_desync=OFF;

[root@pxc02 ~]# rpm -qa|grep -i percona
Percona-XtraDB-Cluster-client-56-5.6.15-25.4.731.rhel6.x86_64
percona-xtrabackup-debuginfo-2.1.7-721.rhel6.x86_64
Percona-XtraDB-Cluster-galera-3-debuginfo-3.3-1.207.rhel6.x86_64
percona-xtrabackup-2.1.7-721.rhel6.x86_64
Percona-XtraDB-Cluster-devel-56-5.6.15-25.4.731.rhel6.x86_64
Percona-XtraDB-Cluster-galera-3-3.3-1.207.rhel6.x86_64
Percona-XtraDB-Cluster-server-56-5.6.15-25.4.731.rhel6.x86_64
Percona-XtraDB-Cluster-56-debuginfo-5.6.15-25.4.731.rhel6.x86_64
Percona-XtraDB-Cluster-56-5.6.15-25.4.731.rhel6.x86_64
Percona-XtraDB-Cluster-shared-56-5.6.15-25.4.731.rhel6.x86_64
Percona-Server-shared-compat-5.5.35-rel33.0.611.rhel6.x86_64

Jervin R (revin) wrote :
Download full text (3.4 KiB)

> Step #1
2014-03-05 20:34:30 6855 [Note] WSREP: Node 2.0 (pxc02) resyncs itself to group
2014-03-05 20:34:30 6855 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 5957732)
2014-03-05 20:34:30 6855 [Note] WSREP: Member 2 (pxc02) synced with group.

> Step #2
2014-03-05 20:34:30 6855 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 5957732)
2014-03-05 20:34:30 6855 [Note] WSREP: Synchronized with group, ready for connections
2014-03-05 20:34:31 6855 [Note] WSREP: Node 0.0 (pxc03) requested state transfer from 'pxc02'. Selected 2.0 (pxc02)(SYNCED) as donor.
2014-03-05 20:34:31 6855 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 5957732)
2014-03-05 20:34:31 6855 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-03-05 20:34:31 6855 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.56.44:4444/xtrabackup_sst' --auth '(null)' --socket '/var/lib/mysql/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' '' --gtid 'de9d6a8d-a105-11e3-9d36-7789b5225560:5957732''
2014-03-05 20:34:31 6855 [Note] WSREP: sst_donor_thread signaled with 0
WSREP_SST: [INFO] Streaming with xbstream (20140305 20:34:31.664)
WSREP_SST: [INFO] Using socat as streamer (20140305 20:34:31.665)
WSREP_SST: [INFO] Streaming GTID file before SST (20140305 20:34:31.760)
WSREP_SST: [INFO] Evaluating xbstream -c ${INFO_FILE} | socat -u stdio TCP:192.168.56.44:4444; RC=( ${PIPESTATUS[@]} ) (20140305 20:34:31.762)
WSREP_SST: [INFO] Sleeping before data transfer for SST (20140305 20:34:31.766)
2014-03-05 20:34:32 6855 [Note] WSREP: 2.0 (pxc02): State transfer to 0.0 (pxc03) complete.
2014-03-05 20:34:32 6855 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 5957732)

> Step #5
2014-03-05 20:34:32 6855 [Note] WSREP: Member 2 (pxc02) synced with group.
2014-03-05 20:34:32 6855 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 5957732)
2014-03-05 20:34:32 6855 [Note] WSREP: Synchronized with group, ready for connections
2014-03-05 20:34:32 6855 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-03-05 20:34:35 6855 [Warning] WSREP: Protocol violation. JOIN message sender 2.0 (pxc02) is not in state transfer (SYNCED). Message ignored.
WSREP_SST: [INFO] Streaming the backup to joiner at 192.168.56.44 4444 (20140305 20:34:41.768)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/my.cnf --no-version-check $INNOEXTRA --galera-info --stream=$sfmt ${TMPDIR} 2>${DATA}/innobackup.backup.log | socat -u stdio TCP:192.168.56.44:4444; RC=( ${PIPESTATUS[@]} ) (20140305 20:34:41.770)
2014-03-05 20:34:46 6855 [Note] WSREP: Provider paused at de9d6a8d-a105-11e3-9d36-7789b5225560:5957732 (3195887)
2014-03-05 20:34:48 6855 [Note] WSREP: resuming provider at 3195887
2014-03-05 20:34:48 6855 [Note] WSREP: Provider resumed.

> ERROR Indicated here, jowever the joiner (pxc03) still completed SST
2014-03-05 20:34:48 6855 [ERROR] WSREP: sst sent called when not SST donor, state SYNCED
WSREP_SST: [INFO] Total time on donor: 0 seconds (20140305 20:34:48.152)
2014-03-05 20:34:52 6855 [Note] WSREP: 0.0 (pxc03): State transfer from 2.0 (pxc02) complete.
2014-03-05 20:34:52 6855 [Note] WSREP: Member 0 (pxc03) s...

Read more...

Alex Yurchenko (ayurchen) wrote :

From what I understand, the bug is essentially about SET GLOBAL wsrep_desync being "thread unsafe".

Alex Yurchenko (ayurchen) wrote :

The solution would be: a session can't set wsrep_desync=OFF before it first sets it ON.

Download full text (9.2 KiB)

a) Why is wsrep_desync used on donor since it goes to DONOR/Desynced anyways? If it is to reproduce the issue then fine.

b) I tried to reproduce this.

Yes, if desync=ON, the joiner hangs, but after it is set to OFF,
it goes forward fine.

Also, the second "wsrep_desync=OFF;" should be idempotent now
where it gives:

=====================
MySQL [test]> set global wsrep_desync=OFF;
ERROR 1231 (42000): Variable 'wsrep_desync' can't be set to the value of 'OFF'
=======================

or should it be " 5. donor> SET GLOBAL wsrep_desync=ON;" ?

Here are my logs:

Donor:

140308 20:47:35 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 92107)
140308 20:47:50 [Note] WSREP: declaring cfb2ec81-a6d4-11e3-98d8-1ef20a3c488f stable
140308 20:47:50 [Note] WSREP: Node 049f90ce-a6d1-11e3-bb55-3bbb14bffdcd state prim
140308 20:47:50 [Note] WSREP: view(view_id(PRIM,049f90ce-a6d1-11e3-bb55-3bbb14bffdcd,6) memb {
        049f90ce-a6d1-11e3-bb55-3bbb14bffdcd,
        cfb2ec81-a6d4-11e3-98d8-1ef20a3c488f,
} joined {
} left {
} partitioned {
})
140308 20:47:50 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
140308 20:47:50 [Note] WSREP: STATE_EXCHANGE: sent state UUID: cffff748-a6d4-11e3-96a3-bbee5342f5db
140308 20:47:50 [Note] WSREP: STATE EXCHANGE: sent state msg: cffff748-a6d4-11e3-96a3-bbee5342f5db
140308 20:47:50 [Note] WSREP: STATE EXCHANGE: got state msg: cffff748-a6d4-11e3-96a3-bbee5342f5db from 0 (Arch1)
140308 20:47:50 [Note] WSREP: STATE EXCHANGE: got state msg: cffff748-a6d4-11e3-96a3-bbee5342f5db from 1 (Arch2)
140308 20:47:50 [Note] WSREP: Quorum results:
        version = 2,
        component = PRIMARY,
        conf_id = 5,
        members = 1/2 (joined/total),
        act_id = 92107,
        last_appl. = 0,
        protocols = 0/4/2 (gcs/repl/appl),
        group UUID = d4bea759-94bb-11e3-b9c3-be3eb9eca9c6
140308 20:47:50 [Note] WSREP: Flow-control interval: [23, 23]
140308 20:47:50 [Note] WSREP: New cluster view: global state: d4bea759-94bb-11e3-b9c3-be3eb9eca9c6:92107, view# 6: Primary, number of nodes: 2, my index: 0, protocol version 2
140308 20:47:50 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
140308 20:47:50 [Note] WSREP: Assign initial position for certification: 92107, protocol version: 2
140308 20:48:25 [Note] WSREP: Node 0 (Arch1) resyncs itself to group
140308 20:48:25 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 92107)
140308 20:48:25 [Note] WSREP: Member 0 (Arch1) synced with group.
140308 20:48:25 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 92107)
140308 20:48:25 [Note] WSREP: Synchronized with group, ready for connections
140308 20:48:26 [Note] WSREP: Node 1 (Arch2) requested state transfer from '*any*'. Selected 0 (Arch1)(SYNCED) as donor.
140308 20:48:26 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 92107)
140308 20:48:26 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
140308 20:48:26 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '127.0.0.1:15001/xtrabackup_sst' --auth '(null)' --socket '/pxc/datadir/pxc.sock' --datadir '/pxc/datadir/' --defaults-file '/pxc/etc/my.cnf.local' --g...

Read more...

I tested above on 5.5. Please check if 5.5 is also reproduceable.

> The solution would be: a session can't set wsrep_desync=OFF before it first sets it ON.

Currently, it is a globla variable.

So, wrt. wsrep_desync and state transfer there are two ways it
can go:

a) wsrep_desync=ON and then a SST request comes to this node -
this works fine since this won't be chosen as a donor then

b) SST request is sent to this node and then wsrep_desync is set
to ON and OFF (probably by a backup application which is unaware
of SST going on).

It is b) which can be problematic, particularly, the setting
wsrep_desync to OFF which can potentially break SST.

wsrep->resync() and in turn, resync in galera needs to check
state transfer into consideration.

Alex Yurchenko (ayurchen) wrote :

Ok, good point about that wsrep_desync is global. The problem is now, if wsrep_desync is already ON, should we allow SET GLOBAL wsrep_desync=ON again (since as you said it should be idempotent)? Or should we block? Or return error?
(of course when SST is going on any wsrep_desync modification must be forbidden, but should it be forbidden when just two clients request wsrep_desync=ON independently? I think it should, but some may argue.)

> Ok, good point about that wsrep_desync is global. The problem is now, if wsrep_desync is already ON, should we allow SET GLOBAL wsrep_desync=ON again (since as you said it should be idempotent)? Or should we block? Or return error?

There is/was a bug in wsrep_desync.

I have fixed it in https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1281696 as follows:

  Bug#1281696 Fix the wsrep_desync warnings; also return true in wsrep_desync_check to avoid error in wsrep_desync_update
diff:
=== modified file 'sql/wsrep_var.cc'
--- sql/wsrep_var.cc 2014-01-14 15:59:04 +0000
+++ sql/wsrep_var.cc 2014-02-27 11:31:26 +0000
@@ -494,16 +494,20 @@

 bool wsrep_desync_check (sys_var *self, THD* thd, set_var* var)
 {
- bool new_wsrep_desync = var->value->val_bool();
+ bool new_wsrep_desync = var->save_result.ulonglong_value;
   if (wsrep_desync == new_wsrep_desync) {
     if (new_wsrep_desync) {
+ WSREP_DEBUG("wsrep_desync is already ON.");
       push_warning (thd, MYSQL_ERROR::WARN_LEVEL_WARN,
                    ER_WRONG_VALUE_FOR_VAR,
                    "'wsrep_desync' is already ON.");
+ return true;
     } else {
+ WSREP_DEBUG("wsrep_desync is already OFF.");
       push_warning (thd, MYSQL_ERROR::WARN_LEVEL_WARN,
                    ER_WRONG_VALUE_FOR_VAR,
                    "'wsrep_desync' is already OFF.");
+ return true;
     }
   }
   return 0;

The problem, reported in this bug, however is not due to
wsrep_desync. This is how it probably goes:

a) Node receives State transfer request.
b) Node becomes Donor and enters Donor/Desynced state.
c) During this, a mysql client does wsrep_desync=ON - this will
return 'wsrep_desync is already ON'.
d) Now, this client doesn wsrep_desync=OFF while state transfer
is in progress. This is not prevented.
e) SST from this donor node can probably break due to this.

So, steps a) - c) are fine, it is the step d) which is
problematic.

I think this can be handled in mysqld itself, probably with
LOCK_wsrep_sst, in wsrep_desync_update.

Alex Yurchenko (ayurchen) wrote :

No, handling it in mysqld is prone to race. It must be handled in Galera. Not yet sure how.

Actually, doing this in galera would be better since, donor/desynced are achieved in similar ways in mysqld, so can be confusing and subject to race conditions. In galera ::resync shouldn't break a current donor involved in SST.

@Alex,

Agree, doing it in galera is better.

tags: added: desync
summary: - SST Resumes Even When Donor Was Already Detected as SYNCED State
+ SST Resumes Even When Donor Was Already Detected as SYNCED State |
+ wsrep_desync conflict

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1638

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers