TokuDB Hot Backup inconsistency with tokudb_commit_sync disabled

Bug #1533174 reported by Igor Shevtsov
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.6
Fix Released
High
Vlad Lesin
5.7
Fix Released
High
Vlad Lesin

Bug Description

We use as a Slave with RBR
mysql Ver 14.14 Distrib 5.6.27-75.0, for Linux (x86_64) using 6.0

replicating from Master:
mysql Ver 14.14 Distrib 5.5.30, for Linux (x86_64) using readline 5.1

For backup purpose we use tokudb_backup plugin on the Slave box.

If tokudb_commit_sync configured:
tokudb_commit_sync = 1
we have no issues creating backup, restoring it and reconnecting it back to the Master using Replication position saved in relay-log.info file.

If tokudb_commit_sync configured:
tokudb_commit_sync = 0
tokudb_fsync_log_period = 1000
we can create backup, restore it, but Replication FAILS reporting missing records.

ERROR LOG:
2016-01-11 13:46:56 6043 [ERROR] Slave SQL: Could not execute Update_rows_v1 event on table DB1.RunID; Can't find record in 'RunID', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log s2db2-binlog.060573, end_log_pos 17004133, Error_code: 1032
2016-01-11 13:46:56 6043 [Warning] Slave: Can't find record in 'RunID' Error_code: 1032

2016-01-11 13:51:14 6987 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table DB1.WebPageResultData; Can't find record in 'WebPageResultData', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log s2db2-binlog.060573, end_log_pos 17398962, Error_code: 1032
2016-01-11 13:51:14 6987 [Warning] Slave: Can't find record in 'WebPageResultData' Error_code: 1032

tags: added: i64508
description: updated
Revision history for this message
Przemek (pmalkowski) wrote :

I was able to reproduce this in simplified test.
Using slave with this parameters:
tokudb_commit_sync=0
tokudb_fsync_log_period = 1000

Ran this simple write stress on the master:

$ for i in {1..10000};do echo "insert into test.toku2 values ($i,'AAA')"|rsandbox_Percona-Server-5_6_28/m; echo "update test.toku2 set a='bbb' where id=$i"|rsandbox_Percona-Server-5_6_28/m; echo "delete from test.toku2 where id=$i"|rsandbox_Percona-Server-5_6_28/m; done

During the test, triggered backup on the slave:

set tokudb_backup_dir='/home/przemyslaw.malkowski/sandboxes/rsandbox_Percona-Server-5_6_28/node2/backup/1';

Once backup done, stopped the slave, and rewritten the data with the backup one, then resumed replication. After the slave catch up, there is data consistency issue:

master [localhost] {msandbox} (percona) > select * from test.toku2;
Empty set (0.01 sec)

slave2 [localhost] {msandbox} (test) > select * from test.toku2;
+-----+------+
| id | a |
+-----+------+
| 679 | bbb |
+-----+------+
1 row in set (0.00 sec)

tags: added: tokubackup tokudb
Revision history for this message
George Ormond Lorch III (gl-az) wrote :

This actually makes sense, by setting tokudb_commit_sync=0 and tokudb_fsync_log_period = 1000, you have effectively downgraded the durability of the storage engine. If you were to have a crash in this same window, you would lose data. The same issue would also appear if you were using some kind of volume snapshot for backups.

XtraBackup has the same kind of issue with InnoDB and downgraded durability, but it works with the server to force flushing/fsyncing everything while still copying the InnoDB redo logs and before finalizing the backup. This is something that has been discussed before w.r.t. Toku Hot Backup, i.e. implement some kind of backup library callback that tells the backup plugin to sync everything to disk while the backup library is still copying/monitoring the filesystem for changes.

Revision history for this message
George Ormond Lorch III (gl-az) wrote :
tags: removed: tokudb
Revision history for this message
George Ormond Lorch III (gl-az) wrote :

This was fixed in PS 5.6.35-81.0 and 5.7.17-12

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-955

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.