- node-1:
create table t (i int) engine=innodb;
insert into t values (1);
- node-2
select * from t; .... got 1 as expected
- node-1
flush table with read lock
set global wsrep_desync=1; .... this halted the cluster as both are local action and needs local commit ordering but the latch is held by FTWRL.
----------------
Let's understand what FTWRL does ?
- It pauses cluster-node. node continue to receive write-set but they are not applied to applier is paused.
Once unlocked all such events are again re-applied.
Let's now understand what wsrep_desync does ?
- It simply indicate that this node shouldn't be consider for flow control but it too continue to receive the event.
If FTWRL has paused cluster-node w/o wsrep_desync in short period of time (based on configuration and workload) node with FTWRL will start to emit flow control that will completely pause a cluster.
-----------
When normally a user would use FTWRL. While taking backup and of-course if user doesn't want the node to avoid sending flowcontrol then user can set wsrep_desync before enabling FTWRL so the sequence would be
With that flow clarified I would propose to block wsrep_desync toggling if node is already in pause state.
node can be desync only when it is unpaused.
- Tried following use-case
- Started 2 node cluster: node-1 and node-2
- node-1:
create table t (i int) engine=innodb;
insert into t values (1);
- node-2
select * from t; .... got 1 as expected
- node-1
flush table with read lock
set global wsrep_desync=1; .... this halted the cluster as both are local action and needs local commit ordering but the latch is held by FTWRL.
----------------
Let's understand what FTWRL does ?
- It pauses cluster-node. node continue to receive write-set but they are not applied to applier is paused.
Once unlocked all such events are again re-applied.
Let's now understand what wsrep_desync does ?
- It simply indicate that this node shouldn't be consider for flow control but it too continue to receive the event.
If FTWRL has paused cluster-node w/o wsrep_desync in short period of time (based on configuration and workload) node with FTWRL will start to emit flow control that will completely pause a cluster.
-----------
When normally a user would use FTWRL. While taking backup and of-course if user doesn't want the node to avoid sending flowcontrol then user can set wsrep_desync before enabling FTWRL so the sequence would be
- wsrep_desync = 1
- FTWRL
- take backup
- unlock
- wsrep_desync = 0
-----------
With that flow clarified I would propose to block wsrep_desync toggling if node is already in pause state.
node can be desync only when it is unpaused.