Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

Bug #1528020
Comment #1

Comment 1 for bug 1528020

Revision history for this message

Krunal Bauskar (krunal-bauskar) wrote on 2016-01-29:

commit fe9f72fccfd70ab75fe536e8363a8613f33f7eda
Author: Krunal Bauskar <email address hidden>
Date: Thu Dec 31 08:51:01 2015 +0530
PXC#492: PXC updates mysql.event table with same value which can break async
slave with replication filters
Issue:
To effectively understand the issue let's consider it using a topology.
galera cluster with 2 nodes (each node having unique server-id)
async slave that is replicating from one of the galera cluster
(async slave has replication filter configured to avoid replication
of events created on master)
Now let's say event is created on node#1 which is also acting as master.
This event is not replicated to slave due to replication filter
but replicated to galera-node#2 as per normal galera replication
protocol.
Suddenly node#1 goes-off and load balancer make node#2 as master
causing async slave to switch to new master.
Now for some reason if node#2 is restarted. On restart node#2 will
try to read events from its local event table which already have
status = SLAVESIDE_DISABLED.
Due to bug in code this status was re-updated to same status but this
action also generate an UPDATE bin-log statement which is then replicated
to an async slave but async slave doesn't have this event table
entry so replication fails.
What are the issues ?
a. re-updating status to same value is redundant action and should
be avoided.
b. Even if action is allowed it shouldn't generate binlog.
Infact, the complete semantics around mysql.* replication
should be re-thought but let's limit this issue to said problem
for now.
Solution:
--------
Avoid null updates as in update a = x where a = x;
Avoid writing such actions to binlogs.
This issue doesn't occur if the server-ids of galera-nodes
are same. Ideally this should be the case as galera cluster
is single atomic entity from bigger eco-system perspective
but still for some workaround issues user tend to set
unique-ids for each galera nodes.
---------------------------------------------------------------
There is still another left over issue to be fixed as part of
different tracking issue.
What happens when the event is created on node#1 before node#2 is
allowed to boot-up and both nodes has same server-id.
Existing logic will enable events on both the nodes.
So isn't it better to have different server-id for each node.
I would still say no given that galera is single atomic system.

commit fe9f72fccfd70ab75fe536e8363a8613f33f7eda
Author: Krunal Bauskar <krunal.bauskar@percona.com>
Date: Thu Dec 31 08:51:01 2015 +0530
PXC#492: PXC updates mysql.event table with same value which can break async
slave with replication filters
Issue:
To effectively understand the issue let's consider it using a topology.
galera cluster with 2 nodes (each node having unique server-id)
async slave that is replicating from one of the galera cluster
(async slave has replication filter configured to avoid replication
of events created on master)
Now let's say event is created on node#1 which is also acting as master.
This event is not replicated to slave due to replication filter
but replicated to galera-node#2 as per normal galera replication
protocol.
Suddenly node#1 goes-off and load balancer make node#2 as master
causing async slave to switch to new master.
Now for some reason if node#2 is restarted. On restart node#2 will
try to read events from its local event table which already have
status = SLAVESIDE_DISABLED.
Due to bug in code this status was re-updated to same status but this
action also generate an UPDATE bin-log statement which is then replicated
to an async slave but async slave doesn't have this event table
entry so replication fails.
What are the issues ?
a. re-updating status to same value is redundant action and should
be avoided.
b. Even if action is allowed it shouldn't generate binlog.
Infact, the complete semantics around mysql.* replication
should be re-thought but let's limit this issue to said problem
for now.
Solution:
--------
Avoid null updates as in update a = x where a = x;
Avoid writing such actions to binlogs.
This issue doesn't occur if the server-ids of galera-nodes
are same. Ideally this should be the case as galera cluster
is single atomic entity from bigger eco-system perspective
but still for some workaround issues user tend to set
unique-ids for each galera nodes.
---------------------------------------------------------------
There is still another left over issue to be fixed as part of
different tracking issue.
What happens when the event is created on node#1 before node#2 is
allowed to boot-up and both nodes has same server-id.
Existing logic will enable events on both the nodes.
So isn't it better to have different server-id for each node.
I would still say no given that galera is single atomic system.