wsrep_osu_method=RSU allows only one ALTER TABLE to run concurrently

Bug #1330944 reported by Kenny Gryp on 2014-06-17
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Invalid
Undecided
Unassigned

Bug Description

Version: 5.6.15-56 Percona XtraDB Cluster (GPL), Release 25.5, Revision 759, wsrep_25.5.r4061

Running schema changes in parallel is useful In order to speed up the schema changes, so it uses more cpu cores. This also requires nodes to spend less time in 'maintenance mode'.
This currently fails with RSU.

The same error is given as in bug 1330941

Example

On a single node, with 2 sessions, do...

First create 2 tables and put some data in it, so that the ALTER TABLE takes a few seconds, enough to start another ALTER TABLE on another table in another session.

session1 mysql> set global wsrep_osu_method=rsu;
Query OK, 0 rows affected (0.00 sec)

RSU

session1 mysql> alter table test add key (a);

Add an index, immediately run the other ALTER on the second table in the second session:

session2 mysql> alter table test2 add key (a);
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

You immediately get a deadlock issue.

Log output:

2014-06-17 10:26:04 3875 [Note] WSREP: Member 0.0 (node1) desyncs itself from group
2014-06-17 10:26:04 3875 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 33)
2014-06-17 10:26:04 3875 [Note] WSREP: Provider paused at 62eb8c72-f601-11e3-b42c-ab6847529d86:33 (70)
2014-06-17 10:26:05 3875 [ERROR] WSREP: Node desync failed.: 11 (Resource temporarily unavailable)
     at galera/src/replicator_smm.cpp:desync():1623
2014-06-17 10:26:05 3875 [Warning] WSREP: RSU desync failed 3 for alter table test2 add key (a)
2014-06-17 10:26:05 3875 [Warning] WSREP: ALTER TABLE isolation failure
2014-06-17 10:26:06 3875 [Note] WSREP: resuming provider at 70
2014-06-17 10:26:06 3875 [Note] WSREP: Provider resumed.
2014-06-17 10:26:06 3875 [Note] WSREP: Member 0.0 (node1) resyncs itself to group
2014-06-17 10:26:06 3875 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 33)
2014-06-17 10:26:06 3875 [Note] WSREP: Member 0.0 (node1) synced with group.
2014-06-17 10:26:06 3875 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 33)

Yes, this must be one of the limitations of RSU. You can run only one
RSU at a time.

On the galera side, a RSU desyncs the node. So, the second RSU fails because
second desync actually returns error.

This needs to be fixed on galera end to make desync after first one
idempotent(?).

Changed in percona-xtradb-cluster:
status: New → Confirmed
Krunal Bauskar (krunal-bauskar) wrote :

So the issue is operating as per the design I dont see any bug in this case.

Changed in percona-xtradb-cluster:
status: Confirmed → Invalid

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1690

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers