Percona Server moved to https://jira.percona.com/projects/PS

Reducing MTS checkpointing causes high IO load

Bug #1670826 reported by Sveta Smirnova on 2017-03-07

This bug affects 1 person

	Status	Importance	Assigned to
MySQL Server	Unknown	Unknown	mysql-bugs #85142
Percona Server moved to https://jira.percona.com/projects/PS	Triaged	Medium	Unassigned
5.5	Invalid	Undecided	Unassigned
5.6	Won't Fix	Medium	Unassigned
5.7	Won't Fix	Medium	Unassigned

Bug Description

Originally reported at https://bugs.mysql.com/bug.php?id=85142

Description:
The implementation of slave_worker_info.Checkpoint_group_bitmap is inefficient when slave_checkpoint_group is large. Trying to raise that to get a system with less checkpointing causes enormous amounts of extra IO.
That column can go up to 65K - 1 bit per 8 transaction - and is written out with every gtid processed (in theory every commit but that's another issue). Setting it to max (512K) sent the writes of a test server from 12-20MB/s up to 100-150MB/s, and from table_io_waits_summary_by_table, it's clear that all the io is on slave_worker_info.
Granted, that's the max value, but it shows how the slave's internal information storage can be almost 10x the IO load of the actual application writes to the database. It's expected based on the logic of that column, but as it makes reduced checkpointing impossible, it's a blocker for a better mts (well, binlog file rollover too).

How to repeat:
This easy test used 5.7.17.
Set up two servers, master A and slave B. On B, set slave_parallel_workers to a nonzero value, say 8. Now, start writes to A, and monitor IO. Check table_io_waits_summary_by_table (with mysql enabled in setup_objects) and look at the avg/total wait on slave_worker_info. Then stop slave, and set slave_checkpoint_group to a high value, like its (512K-8) maximum. Something to make that blob field need to be stored in overflow pages. In this case, slave_worker_info is dynamic row format - you might get it to be even worse with the old compact format, but I doubt much difference. Start the slave again and check out the slave_worker_info table, much larger. Now check that IO again, and the p_s table.

Suggested fix:
Find a better way to do this which doesn't require a whole max-size bitmap write with every commit. I'll review the code at some point, and if I think of anything easy, I'll gladly suggest it - but this is a very high cost operation that, while valuable, is way out of proportion.

Tags: