heartbeats/fakerotate cause a forced sync_master_info

Bug #1708033 reported by Sveta Smirnova on 2017-08-01
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL Server
Unknown
Unknown
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.5
Invalid
Undecided
Unassigned
5.6
Invalid
Undecided
Unassigned
5.7
Fix Released
Medium
Vlad Lesin

Bug Description

Originally reported at https://bugs.mysql.com/bug.php?id=85158

Description:
With master_info_repository=TABLE and variable sync_master_info=0, we should only see table mysql.slave_master_info updated on log file rotation. That is the documented and intended design, and can significantly reduce load on slaves, allowing much greater replication throughput. Assuming a similar issue would come up with FILE repo, but that ends up as fsyncs of the file vs writes and commits to the table. Using TABLE for context here.

However, there's a case where some code crosses paths. The code handling [already-executed] gtid event skips uses a dummy heartbeat (treated as a fake rotate event) to signal the end of a skip period. Sounds great, until you get to the slave handling code, which takes that event and forces a master info sync on every heartbeat - even these dummy ones.

This can, even in a somewhat simple setup, lead to a ~5X increase in write load on a slave. Huge amount of wasted resources, very limiting to replication throughput.

fake hb handling:
https://github.com/mysql/mysql-server/blob/5.7/sql/rpl_slave.cc#L8264-L8295

which in turn calls:
https://github.com/mysql/mysql-server/blob/5.7/sql/rpl_slave.cc#L3345-L3378

note "flush_master_info(mi, TRUE)" - true here being force flush, regardless of sync settings. force is only intended to be used for things like repo init, stop slave, and *real* log rotates.

How to repeat:
Generate a lot of nonsequential skips to demonstrate. A very basic msr setup will do that under load:

gtid on, of course
- Server A, writable master. msr replication from B and C.
- Servers B and C, backup masters. msr replication from A,C and A,B respectively.
- Server D, msr slave of all three, no log_bin (keep it fast).

Now, apply a significant write load to A (sysbench works fine) - I found it nice to organically lag B and C with load to A, but you can stop their sql threads for a period to simulate lag. If you do that, restart them both after a while.
I recommend stopping the writes to A here, so that D is caught up and not executing any row changes in the data - this helps to isolate load caused by these heartbeats.

Now, they'll be chugging through a lot of dupes and sending them down to D as heartbeats. In a pretty short test, I saw ~60 million heartbeats received on D (from p_s.replication_connection_status) - ~30M from each of B and C, only a few from A. Checking the same values on B and C, still just a few on each channel.

Now there's a lot of IO happening, considering D isn't actually executing any changes. Checking p_s.table_io_waits_summary_by_table, the vast majority of load is on slave_master_info. The wait count was 4X as much as for the actual data table being written to. The total iowait time was 6X as much.

Suggested fix:
Two ways to go about this, depending on the intentions of this process.

First, is it really necessary to send so many heartbeats just for gtid skips? I don't see that adding value, even if you do sync master info every trx. They're skips, they're specifically intended to do nothing. Eliminating them (on the rpl_master.cc side) would be the best solution - but that code was written for a purpose, someone must want them, a setting to control the behavior would be good. Or it might be very important for a reason I don't see in the code!

Second, if it's actually important, and even turning it off via a setting is impractical...take away the force sync. Change that "true" in rpl_slave.cc to "false" so it obeys the same rules other events do (the documented rules!)

Test case for MTR will be attached

Sveta Smirnova (svetasmirnova) wrote :
Sveta Smirnova (svetasmirnova) wrote :

Attached test case can be used to repeat the issue.

In the end it will print something like:

[connection server_1]
select count(*) from t1;
count(*)
1325 <<<<<<<< ==== Number of rows on master
[connection server_4]
start slave;
select OBJECT_NAME, count_star, AVG_TIMER_WAIT/1024/1024/1024 from performance_schema.table_io_waits_summary_by_table where OBJECT_SCHEMA='mysql' and OBJECT_NAME ='slave_master_info';
OBJECT_NAME count_star AVG_TIMER_WAIT/1024/1024/1024
slave_master_info 36 11.149680108763 <<<< Remember IO numbers
select table_name, io_write_requests, io_write from sys.schema_table_statistics where table_name='slave_master_info';
table_name io_write_requests io_write
slave_master_info 11 176.00 KiB <<<< Remeber IO numbers
select * from performance_schema.replication_connection_status;
CHANNEL_NAME GROUP_NAME SOURCE_UUID THREAD_ID SERVICE_STATE COUNT_RECEIVED_HEARTBEATS LAST_HEARTBEAT_TIMESTAMP RECEIVED_TRANSACTION_SET LAST_ERROR_NUMBER LAST_ERROR_MESSAGE LAST_ERROR_TIMESTAMP
channel_1 922a4d56-7701-11e7-8781-30b5c2208a0f 64 ON 4 2017-08-02 00:43:22 922a4d56-7701-11e7-8781-30b5c2208a0f:1-357 0 0000-00-00 00:00:00
channel_2 925a4dbc-7701-11e7-8691-30b5c2208a0f 70 ON 4 2017-08-02 00:43:24 922a4d56-7701-11e7-8781-30b5c2208a0f:1-53 0 0000-00-00 00:00:00
channel_3 92a8be79-7701-11e7-86b1-30b5c2208a0f 76 ON 4 2017-08-02 00:43:26 922a4d56-7701-11e7-8781-30b5c2208a0f:1-57 0 0000-00-00 00:00:00
select count(*) from t1;
count(*)
1325 <<<<<<<< ==== Slave D received all changes from A
[connection server_2]
start slave;
[connection server_3]
start slave;
[connection server_4]
select OBJECT_NAME, count_star, AVG_TIMER_WAIT/1024/1024/1024 from performance_schema.table_io_waits_summary_by_table where OBJECT_SCHEMA='mysql' and OBJECT_NAME ='slave_master_info';
OBJECT_NAME count_star AVG_TIMER_WAIT/1024/1024/1024
slave_master_info 46 22.199227280915 <<<<< 46 > 36
select table_name, io_write_requests, io_write from sys.schema_table_statistics where table_name='slave_master_info';
table_name io_write_requests io_write
slave_master_info 15 240.00 KiB <<<<< 15 > 11
select * from performance_schema.replication_connection_status;
CHANNEL_NAME GROUP_NAME SOURCE_UUID THREAD_ID SERVICE_STATE COUNT_RECEIVED_HEARTBEATS LAST_HEARTBEAT_TIMESTAMP RECEIVED_TRANSACTION_SET LAST_ERROR_NUMBER LAST_ERROR_MESSAGE LAST_ERROR_TIMESTAMP
channel_1 922a4d56-7701-11e7-8781-30b5c2208a0f 64 ON 7 2017-08-02 00:46:22 922a4d56-7701-11e7-8781-30b5c2208a0f:1-357 0 0000-00-00 00:00:00
channel_2 925a4dbc-7701-11e7-8691-30b5c2208a0f 70 ON 10 2017-08-02 00:45:29 922a4d56-7701-11e7-8781-30b5c2208a0f:1-53:58-81 0 0000-00-00 00:00:00
channel_3 92a8be79-7701-11e7-86b1-30b5c2208a0f 76 ON 7 2017-08-02 00:45:33 922a4d56-7701-11e7-8781-30b5c2208a0f:1-57:59-80 0 0000-00-00 00:00:00
select count(*) from t1;
count(*)
1325 <<<<<<<< ==== No new rows added

tags: added: performance

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-1812

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.