Percona Server moved to https://jira.percona.com/projects/PS

TokuDB Hot Backup inconsistency with tokudb_commit_sync disabled

Series 5.6
Bug #1533174

Bug #1533174 reported by Igor Shevtsov on 2016-01-12

This bug affects 1 person

	Status	Importance	Assigned to
Percona Server moved to https://jira.percona.com/projects/PS	Status tracked in 5.7
5.6	Fix Released	High	Vlad Lesin
5.7	Fix Released	High	Vlad Lesin

Bug Description

We use as a Slave with RBR
mysql Ver 14.14 Distrib 5.6.27-75.0, for Linux (x86_64) using 6.0

replicating from Master:
mysql Ver 14.14 Distrib 5.5.30, for Linux (x86_64) using readline 5.1

For backup purpose we use tokudb_backup plugin on the Slave box.

If tokudb_commit_sync configured:
tokudb_commit_sync = 1
we have no issues creating backup, restoring it and reconnecting it back to the Master using Replication position saved in relay-log.info file.

If tokudb_commit_sync configured:
tokudb_commit_sync = 0
tokudb_fsync_log_period = 1000
we can create backup, restore it, but Replication FAILS reporting missing records.

ERROR LOG:
2016-01-11 13:46:56 6043 [ERROR] Slave SQL: Could not execute Update_rows_v1 event on table DB1.RunID; Can't find record in 'RunID', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log s2db2-binlog.060573, end_log_pos 17004133, Error_code: 1032
2016-01-11 13:46:56 6043 [Warning] Slave: Can't find record in 'RunID' Error_code: 1032

2016-01-11 13:51:14 6987 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table DB1.WebPageResultData; Can't find record in 'WebPageResultData', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log s2db2-binlog.060573, end_log_pos 17398962, Error_code: 1032
2016-01-11 13:51:14 6987 [Warning] Slave: Can't find record in 'WebPageResultData' Error_code: 1032

See original description

Tags:

Valerii Kravchuk (valerii-kravchuk) on 2016-01-21

tags:

added: i64508

Igor Shevtsov (infoshevtsov) on 2016-01-21

description:

updated

Revision history for this message

Przemek (pmalkowski) wrote on 2016-03-07:

I was able to reproduce this in simplified test.
Using slave with this parameters:
tokudb_commit_sync=0
tokudb_fsync_log_period = 1000

Ran this simple write stress on the master:

$ for i in {1..10000};do echo "insert into test.toku2 values ($i,'AAA')"|rsandbox_Percona-Server-5_6_28/m; echo "update test.toku2 set a='bbb' where id=$i"|rsandbox_Percona-Server-5_6_28/m; echo "delete from test.toku2 where id=$i"|rsandbox_Percona-Server-5_6_28/m; done

During the test, triggered backup on the slave:

set tokudb_backup_dir='/home/przemyslaw.malkowski/sandboxes/rsandbox_Percona-Server-5_6_28/node2/backup/1';

Once backup done, stopped the slave, and rewritten the data with the backup one, then resumed replication. After the slave catch up, there is data consistency issue:

master [localhost] {msandbox} (percona) > select * from test.toku2;
Empty set (0.01 sec)

slave2 [localhost] {msandbox} (test) > select * from test.toku2;
+-----+------+
| id | a |
+-----+------+
| 679 | bbb |
+-----+------+
1 row in set (0.00 sec)

Laurynas Biveinis (laurynas-biveinis) on 2016-03-08

tags:

added: tokubackup tokudb

Revision history for this message

George Ormond Lorch III (gl-az) wrote on 2016-03-08:

This actually makes sense, by setting tokudb_commit_sync=0 and tokudb_fsync_log_period = 1000, you have effectively downgraded the durability of the storage engine. If you were to have a crash in this same window, you would lose data. The same issue would also appear if you were using some kind of volume snapshot for backups.

XtraBackup has the same kind of issue with InnoDB and downgraded durability, but it works with the server to force flushing/fsyncing everything while still copying the InnoDB redo logs and before finalizing the backup. This is something that has been discussed before w.r.t. Toku Hot Backup, i.e. implement some kind of backup library callback that tells the backup plugin to sync everything to disk while the backup library is still copying/monitoring the filesystem for changes.

Revision history for this message

George Ormond Lorch III (gl-az) wrote on 2016-03-30:

https://tokutek.atlassian.net/browse/BACKUP-132

tags:

removed: tokudb

Revision history for this message

George Ormond Lorch III (gl-az) wrote on 2017-06-15:

This was fixed in PS 5.6.35-81.0 and 5.7.17-12

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-25:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-955

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.