TokuDB heavy syncing to disk when log_bin enabled, tokudb_commit_sync=0 ineffective

Bug #1718949 reported by Przemek
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.7
Fix Released
Undecided
George Ormond Lorch III

Bug Description

When log_bin is enabled, TokuDB does twice as many disk writes then normally when tokudb_commit_sync = 1.
Setting tokudb_commit_sync = 0 seems completely ineffective when log_bin is enabled.

Tested with simple OLTP sysbench test against Percona Server 5.7.18. IOPS checked as follows (check syscw in particular):

tokudb_commit_sync = 1
log_bin = OFF

before sysbench

$ sudo cat /proc/36361/io
rchar: 6292581
wchar: 16578195
syscr: 1380
syscw: 67
read_bytes: 8704
write_bytes: 17227776
cancelled_write_bytes: 16384

after sysbench (10k trx)

$ sudo cat /proc/36361/io
rchar: 74137694
wchar: 58352602
syscr: 3540
syscw: 10103
read_bytes: 67845120
write_bytes: 181890560
cancelled_write_bytes: 16384

tokudb_commit_sync = 0
log_bin = ON
sync_binlog=0

before sysbench:

$ sudo cat /proc/36704/io
rchar: 49890268
wchar: 16554177
syscr: 3235
syscw: 67
read_bytes: 8704
write_bytes: 17219584
cancelled_write_bytes: 16384

after sysbench:

$ sudo cat /proc/36704/io
rchar: 120845781
wchar: 64862148
syscr: 5459
syscw: 20096
read_bytes: 70959616
write_bytes: 106491392
cancelled_write_bytes: 16384

With:
tokudb_commit_sync = 0
log_bin = OFF

$ sudo cat /proc/11180/io
rchar: 80837522
wchar: 16578195
syscr: 2849
syscw: 67
read_bytes: 1077248
write_bytes: 25927680
cancelled_write_bytes: 16384

After sysbench

$ sudo cat /proc/11180/io
rchar: 106204552
wchar: 132469377
syscr: 3665
syscw: 168
read_bytes: 1101824
write_bytes: 205230080
cancelled_write_bytes: 16384

Tags: tokudb i205721
Revision history for this message
George Ormond Lorch III (gl-az) wrote :

This is likely a result of https://jira.percona.com/browse/TDB-70 and https://github.com/percona/percona-server/pull/1882/files which fixes TokuDB interaction with binlog group commit algorithm and copies InnoDB logic. Disabling durability within the binlog group commit (and 2pc since multiple engines are in use, particularly when GTID is involved). Prior to this fix in 5.7, tokudb_commit_sync = 1 did not do much, but then, TokuDB was also no properly following the binlog group commit protocol correctly so was more or less always operating in reduced durability.

You do not seem to be setting tokudb_fsync_log_period > 0 which is required now when tokudb_commit_sync == 0. tokudb_fsync_log_period == 0 basically means to TokuDB to fsync the log anytime it is required for maximum durability.

See the code in the pull request for the logic, specifically this bit in tokudb_flush_logs :
    else if (!binlog_group_commit ||
             (tokudb::sysvars::fsync_log_period == 0 &&
              tokudb::sysvars::commit_sync(NULL))) {
        error = db_env->log_flush(db_env, NULL);
        assert_always(error == 0);
    }

Revision history for this message
Przemek (pmalkowski) wrote :

When I test with high tokudb_fsync_log_period values - it doesn't seem to change much.

With minimal value - more disk writes are done:
+-------------------------+-------+
| Variable_name | Value |
+-------------------------+-------+
| tokudb_commit_sync | OFF |
| tokudb_fsync_log_period | 1 |
+-------------------------+-------+

before sysbench: syscw: 67
after sysbench: syscw: 40441

With 10k or higher, results are the same as with 0:
+-------------------------+-------+
| Variable_name | Value |
+-------------------------+-------+
| tokudb_commit_sync | OFF |
| tokudb_fsync_log_period | 10000 |
+-------------------------+-------+
before sysbench: syscw: 72
after sysbench: syscw: 20174

+-------------------------+---------+
| Variable_name | Value |
+-------------------------+---------+
| tokudb_commit_sync | OFF |
| tokudb_fsync_log_period | 1000000 |
+-------------------------+---------+

before sysbench: syscw: 67
after sysbench: syscw: 20165

I am unable to get down below 20k write IOPS when log_bin is enabled.

The same sysbench for same size InnoDB table, and innodb_flush_log_at_trx_commit=0 (log_bin=1) gives around 10k write IOPS:
syscw: 67
syscw: 10309

Revision history for this message
George Ormond Lorch III (gl-az) wrote :

Interesting. The binlog group commit fsync logic is now close to identical with InnoDB with the exception that InnoDB has a fixed 1 second fsync timer and TOkuDB timer is variable via fsync_log_period. When you tested with InnoDB, was TokuDB engine also enabled and just not used?

Revision history for this message
Przemek (pmalkowski) wrote :

Yes, TokuDB was enabled but I only benchmarked InnoDB table.

Revision history for this message
Przemek (pmalkowski) wrote :

The same tests on InnoDB only instance (no TokuDB plugin) have similar results.

I guess you can reduce the problem to the fact that when binary log is enabled, relaxing flushing in InnoDB works, while in TokuDB doesn't seem so:

tokudb_commit_sync = 1
tokudb_fsync_log_period = 0
log_bin=on
sync_binlog=0
-> syscw: 20130

tokudb_commit_sync = 0
tokudb_fsync_log_period = 10000
log_bin=on
sync_binlog=0
-> syscw: 20170

innodb_flush_log_at_trx_commit=1
log_bin=on
sync_binlog=0
-> syscw: 20303

innodb_flush_log_at_trx_commit=0
log_bin=on
sync_binlog=0
-> syscw: 10332

Revision history for this message
George Ormond Lorch III (gl-az) wrote :
Revision history for this message
George Ormond Lorch III (gl-az) wrote :

This was fixed in 5.7.19-17 and re-verified as fixed on the trunk (5.7.20).

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-3742

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.