wsrep_OSU_method=RSU and wsrep_desync=on conflict

Bug #1330941 reported by Kenny Gryp on 2014-06-17
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Galera
Undecided
Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Invalid
High
Unassigned

Bug Description

Version: 5.6.15-56 Percona XtraDB Cluster (GPL), Release 25.5, Revision 759, wsrep_25.5.r4061

node1 mysql> set global wsrep_osu_method=rsu;
Query OK, 0 rows affected (0.00 sec)

We switch to RSU method.

node1 mysql> set global wsrep_desync=off;
Query OK, 0 rows affected (0.00 sec)
2014-06-17 10:14:26 3875 [Note] WSREP: Member 0.0 (node1) resyncs itself to group
2014-06-17 10:14:26 3875 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 33)
2014-06-17 10:14:26 3875 [Note] WSREP: Member 0.0 (node1) synced with group.
2014-06-17 10:14:26 3875 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 33)
2014-06-17 10:14:26 3875 [Note] WSREP: Synchronized with group, ready for connections

Node is primary and synced, ensuring wsrep_desync is off.

node1 mysql> alter table test add index (a);
2014-06-17 10:14:28 3875 [Note] WSREP: Member 0.0 (node1) desyncs itself from group
2014-06-17 10:14:28 3875 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 33)
2014-06-17 10:14:28 3875 [Note] WSREP: Provider paused at 62eb8c72-f601-11e3-b42c-ab6847529d86:33 (54)
2014-06-17 10:14:30 3875 [Note] WSREP: resuming provider at 54
2014-06-17 10:14:30 3875 [Note] WSREP: Provider resumed.
2014-06-17 10:14:30 3875 [Note] WSREP: Member 0.0 (node1) resyncs itself to group
2014-06-17 10:14:30 3875 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 33)
2014-06-17 10:14:30 3875 [Note] WSREP: Member 0.0 (node1) synced with group.
2014-06-17 10:14:30 3875 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 33)
2014-06-17 10:14:30 3875 [Note] WSREP: Synchronized with group, ready for connections
Query OK, 0 rows affected, 1 warning (2.21 sec)
Records: 0 Duplicates: 0 Warnings: 1

Index added (the warning is because it's a duplicate index)

node1 mysql> set global wsrep_desync=on;
2014-06-17 10:14:34 3875 [Note] WSREP: Member 0.0 (node1) desyncs itself from group
2014-06-17 10:14:34 3875 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 33)
Query OK, 0 rows affected (0.02 sec)

Let's desync the node.

node1 mysql> alter table test add index (a);
2014-06-17 10:14:35 3875 [ERROR] WSREP: Node desync failed.: 11 (Resource temporarily unavailable)
     at galera/src/replicator_smm.cpp:desync():1623
2014-06-17 10:14:35 3875 [Warning] WSREP: RSU desync failed 3 for alter table test add index (a)
2014-06-17 10:14:35 3875 [Warning] WSREP: ALTER TABLE isolation failure
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

If we then try do do an alter table, it fails immediately.

I believe that either it should become possible to do RSU on a desynced node, or a more appropriate error should be shown (Had customers using such scenarios which got them stuck).
This could be useful as part of a migration, where the node needs to be desynced during the process and a schema change is necessary.

Kenny Gryp (gryp) wrote :

Note: alter table does work concurrently with wsrep_osu_method=TOI

This is related to desync not working in tandem with manual external changes and state-changes of Galera FSM. A similar case is described in https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1288528 wherein it is conflict of wsrep_desync with SST.

This needs to be fixed in Galera.

summary: - combining wsrep_OSU_method=RSU and wsrep_desync=on gives strange error
- #Usability
+ wsrep_OSU_method=RSU and wsrep_desync=on conflict

Specifically, here:

node1 mysql> alter table test add index (a);
2014-06-17 10:14:35 3875 [ERROR] WSREP: Node desync failed.: 11 (Resource temporarily unavailable)
     at galera/src/replicator_smm.cpp:desync():1623
2014-06-17 10:14:35 3875 [Warning] WSREP: RSU desync failed 3 for alter table test add index (a)
2014-06-17 10:14:35 3875 [Warning] WSREP: ALTER TABLE isolation failure
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

RSU tries to wsrep->desync which fails since node is already desynced from wsrep_desync earlier.

tags: added: desync
Muhammad Irfan (muhammad-irfan) wrote :

Verified as described.

mysql> show variables like '%version%';
+-------------------------+----------------------------------------------------------------------------+
| Variable_name | Value |
+-------------------------+----------------------------------------------------------------------------+
| innodb_version | 5.6.15-rel63.0 |
| protocol_version | 10 |
| slave_type_conversions | |
| version | 5.6.15-56 |
| version_comment | Percona XtraDB Cluster (GPL), Release 25.5, Revision 759, wsrep_25.5.r4061 |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
+-------------------------+----------------------------------------------------------------------------+

Changed in percona-xtradb-cluster:
status: New → Confirmed
Changed in percona-xtradb-cluster:
importance: Undecided → High
Krunal Bauskar (krunal-bauskar) wrote :

I tried this with latest version from trunk and here is what I see.

1. Started 2 node cluster node-1 and node-2

2. Workload running on node-1

mysql> create table test (a int);
Query OK, 0 rows affected (0.04 sec)

mysql> set global wsrep_osu_method=rsu;
Query OK, 0 rows affected (0.00 sec)

mysql> set global wsrep_desync=off;
ERROR 1231 (42000): Variable 'wsrep_desync' can't be set to the value of 'OFF'
mysql> alter table test add index (a);
Query OK, 0 rows affected (0.05 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> set global wsrep_desync=on;
Query OK, 0 rows affected (0.01 sec)

mysql> alter table test add index (a);
Query OK, 0 rows affected, 1 warning (0.07 sec)
Records: 0 Duplicates: 0 Warnings: 1

mysql> show warnings;
+-------+------+------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+------------------------------------------------------------------------------------------------------------------------+
| Note | 1831 | Duplicate index 'a_2' defined on the table 'test.test'. This is deprecated and will be disallowed in a future release. |
+-------+------+------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

------------------------

a. I didn't observed the said error. Bug TCs probably is missing the initial steps or setting tried or things/behavior has changed since the bug was logged.

----------------------------------------------------------------------------------------------------------------------------------

Said that, RSU is special mode provided to facilitate execution of major DDL workload.
There is set process to execute such workload under RSU. (here is good example about it
http://severalnines.com/blog/online-schema-upgrade-mysql-galera-cluster-using-rsu-method)
If that is followed we can expect the proper behavior.
Anything beside it can cause undefined behavior which I don't foresee to be a bug.
(Of-course it shouldn't crash the cluster).

If we expect some difference in behavior then may be logging a FEATURE REQUEST would help but at-least for the cases like these I don't foresee why is that needed.

Changed in percona-xtradb-cluster:
status: Confirmed → Invalid

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1009

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers