Percona Server moved to https://jira.percona.com/projects/PS

event is passed back and forth between dual master if event is from some other mysqld

Series 5.1
Bug #940404

Bug #940404 reported by Hui Liu on 2012-02-24

This bug affects 1 person

	Status	Importance	Assigned to
Percona Server moved to https://jira.percona.com/projects/PS	Status tracked in 5.7
5.1	Won't Fix	Medium	Unassigned
5.5	Triaged	Medium	Unassigned
5.6	Triaged	Medium	Unassigned
5.7	Triaged	Medium	Unassigned

Bug Description

For master fail-over scenario, we suffered the pain of unnecessary cost
for events passing between the dual master, back and forth.

Take an example:
MySQL1 <---(dual master) ---> MySQL2 ---(rep)---> MySQL3

For the toplogical structure above, MySQL1 is readable/writable, but MySQL2
is readonly. If MySQL1 is down, then MySQL2 takes over the write ability
and makes MySQL3 as it's dual master. Consider some relay log not yet applied
on MySQL2 when MySQL1 is down(log_slave_update=1), then these events would
be passed between MySQL2 and MySQL3 back and forth.

It's easy to solve once we detect such problem:
1. break the replication on MySQL2 which received changes from MySQL3.
2. wait until these events from MySQL1 are applied on MySQL3.
3. change master to the new binary position.

However, if the events were applied very quickly on heavy workload MySQL,
it's not easy to detect these unnecessary events, and made the master/slave
lower performance. So, we try to find these scenarios inner MySQL to alert
DBA.

If such events are detected:
1) event's server_id is not the same as local server_id
2) event's server_id is not in the slaves server_id list
3) there exists a cycle of topological structure.
then an error info is printed in error.log.

A patch is attached.

Tags:

Revision history for this message

Hui Liu (hickey) wrote on 2012-02-24:

replication circular event detect with warning info Edit (7.9 KiB, text/plain)

Revision history for this message

Hui Liu (hickey) wrote on 2012-03-07:

rpl_circular_event_detect.diff Edit (9.6 KiB, text/plain)

Tweak the test case for easier understand.

Stewart Smith (stewart) on 2012-06-15

Changed in percona-server:
importance:	Undecided → Medium

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-09-11:

test-log.tar.gz Edit (2.7 KiB, application/x-tar)

@hickey, I tried testing with rpl_circular_event_detect.result rpl_circular_event_detect.test and rpl_circular_event_detect.cnf and the tests passed without the patch. Here is the log: http://sprunge.us/PcjV

I have also attached the full log.

Can you let us know?

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-09-11:

I have seen this happening in one of the older cases where following happened: (in a RBR scenario)

M1 <----> M2 in dual replication topology.

However, somehow a change master got executed which pointed M1 to M3 resulting in:

M3 ---> M1 ----> M2 ('--->' is master-of relation).

After sometime this was rectified, however, the binlog on M1 recorded events with server_id of M3 -- lets call it 3

after rectification, it again became M1 <------> M2

So, binlog events with server_id 3 ping-ponged between M1 and M2.

tags:

added: i24444

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2012-09-19:

@hickey, Can you provide more details regarding #3 ?

Marking this as incomplete till then.

Changed in percona-server:
status:	New → Incomplete

Revision history for this message

Hui Liu (hickey) wrote on 2012-09-19:

bug940404.patch Edit (6.4 KiB, text/plain)

The test case passed even no source code patch :)

$./mtr suite/rpl/t/rpl_circular_event_detect.test

==============================================================================

TEST RESULT TIME (ms) or COMMENT
--------------------------------------------------------------------------

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009
rpl.rpl_circular_event_detect 'mix' [ pass ] 1461
rpl.rpl_circular_event_detect 'row' [ pass ] 1366
rpl.rpl_circular_event_detect 'stmt' [ pass ] 1458
--------------------------------------------------------------------------

What we did for the ping-ponged events between dual master instances, is just detect this case and print some warning/error into error.log, then DBA could handle it manually.

The main purpose is to just DETECT, but not FIX, though we could just delete the event before writing to relay log, but it's not what we expected for product environment.

We updated the patch for online MySQLs, which is more clear:

1. the rpl_circular_warning switch is opened automated once slave is registered.
2. rpl_circular_warning switch is turned off automated once detect the abnormal event.

Laurynas Biveinis (laurynas-biveinis) on 2012-09-19

tags:

added: contribution

Revision history for this message

Valerii Kravchuk (valerii-kravchuk) wrote on 2014-03-27:

I see that the patch is still not applied, but IMHO it should be considered for all versions.

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-25:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-1241