TokuDB Read Free Replication will not propagate reliably if table PK is missing

Bug #1536663 reported by Igor Shevtsov
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.6
Fix Released
High
Vlad Lesin
5.7
Fix Released
High
Vlad Lesin

Bug Description

If TokuDB table is lacking PK, update, delete events will not propagate reliably and eventually will create a data inconsistency.
I use 5.6.27-76.0-log Percona Server (GPL), Release 76.0 1(Master) -> 2(Slaves)
Slaves: read-only, RBR, tokudb_rpl_unique_checks = 0, tokudb_rpl_lookup_rows = 0

I drop id column with PK in world.City table.
on Master and Slaves:
☭ [world]> select * from City where Name = 'Amsterdam';
+-----------+-------------+---------------+------------+------------+
| Name | CountryCode | District | Population | RollupDate |
+-----------+-------------+---------------+------------+------------+
| Amsterdam | NLD | Noord-Holland | 731200 | 2015-02-10 |

Master:
update City set Population = Population + 10 where Name = 'Amsterdam';

After the update:
Master:
☭ [world]> select * from City where Name = 'Amsterdam';
+-----------+-------------+---------------+------------+------------+
| Name | CountryCode | District | Population | RollupDate |
+-----------+-------------+---------------+------------+------------+
| Amsterdam | NLD | Noord-Holland | 731210 | 2015-02-10 |
+-----------+-------------+---------------+------------+------------+

Slave1:
☭ [world]> select * from City where Name = 'Amsterdam';
+-----------+-------------+---------------+------------+------------+
| Name | CountryCode | District | Population | RollupDate |
+-----------+-------------+---------------+------------+------------+
| Amsterdam | NLD | Noord-Holland | 731210 | 2015-02-10 |
+-----------+-------------+---------------+------------+------------+

Slave2 extra row:
☭ [world]> select * from City where Name = 'Amsterdam';
+-----------+-------------+---------------+------------+------------+
| Name | CountryCode | District | Population | RollupDate |
+-----------+-------------+---------------+------------+------------+
| Amsterdam | NLD | Noord-Holland | 731200 | 2015-02-10 |
| Amsterdam | NLD | Noord-Holland | 731210 | 2015-02-10 |
+-----------+-------------+---------------+------------+------------+

Now after I run delete on Master:
☭ [world]> delete from City where Name = 'Amsterdam';
Query OK, 1 row affected (0.00 sec)

on Master:
☭ [world]> select * from City where Name = 'Amsterdam';
Empty set (0.00 sec)

Slave1:
☭ [world]> select * from City where Name = 'Amsterdam';
Empty set (0.00 sec)

Slave2:
☭ [world]> select * from City where Name = 'Amsterdam';
+-----------+-------------+---------------+------------+------------+
| Name | CountryCode | District | Population | RollupDate |
+-----------+-------------+---------------+------------+------------+
| Amsterdam | NLD | Noord-Holland | 731200 | 2015-02-10 |
+-----------+-------------+---------------+------------+------------+

As we see neither updates no deletes are reliable with RFR on the table lacking PK creating a data inconsistency.
This needs to be emphasised in the Documentation or fixed.

Tags: tokudb i65945
tags: added: tokudb
Revision history for this message
Przemek (pmalkowski) wrote :

Verified. Looks like the tokudb_rpl_lookup_rows disabled alone leads to inconsistency in a table without PK on a slave.

master [localhost] {msandbox} (world) > show create table City\G
*************************** 1. row ***************************
       Table: City
Create Table: CREATE TABLE `City` (
  `Name` char(35) NOT NULL DEFAULT '',
  `CountryCode` char(3) NOT NULL DEFAULT '',
  `District` char(20) NOT NULL DEFAULT '',
  `Population` int(11) NOT NULL DEFAULT '0',
  KEY `CountryCode` (`CountryCode`)
) ENGINE=TokuDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

master [localhost] {msandbox} (world) > update City set Population = Population + 10 where Name = 'Poznan';
Query OK, 1 row affected (0.03 sec)
Rows matched: 1 Changed: 1 Warnings: 0

slave1 [localhost] {msandbox} (world) > show global variables like 'tokudb_rpl_%';
+--------------------------------+-------+
| Variable_name | Value |
+--------------------------------+-------+
| tokudb_rpl_check_readonly | ON |
| tokudb_rpl_lookup_rows | ON |
| tokudb_rpl_lookup_rows_delay | 0 |
| tokudb_rpl_unique_checks | ON |
| tokudb_rpl_unique_checks_delay | 0 |
+--------------------------------+-------+
5 rows in set (0.00 sec)

slave1 [localhost] {msandbox} (world) > select * from City where Name = 'Poznan';
+--------+-------------+---------------+------------+
| Name | CountryCode | District | Population |
+--------+-------------+---------------+------------+
| Poznan | POL | Wielkopolskie | 576909 |
+--------+-------------+---------------+------------+
1 row in set (0.00 sec)

slave2 [localhost] {msandbox} (world) > show global variables like 'tokudb_rpl_%';
+--------------------------------+-------+
| Variable_name | Value |
+--------------------------------+-------+
| tokudb_rpl_check_readonly | ON |
| tokudb_rpl_lookup_rows | OFF |
| tokudb_rpl_lookup_rows_delay | 0 |
| tokudb_rpl_unique_checks | ON |
| tokudb_rpl_unique_checks_delay | 0 |
+--------------------------------+-------+
5 rows in set (0.00 sec)

slave2 [localhost] {msandbox} (world) > select * from City where Name = 'Poznan';
+--------+-------------+---------------+------------+
| Name | CountryCode | District | Population |
+--------+-------------+---------------+------------+
| Poznan | POL | Wielkopolskie | 576899 |
| Poznan | POL | Wielkopolskie | 576909 |
+--------+-------------+---------------+------------+
2 rows in set (0.00 sec)

slave2 [localhost] {msandbox} (world) > select @@version,@@version_comment;
+-------------+------------------------------------------------------+
| @@version | @@version_comment |
+-------------+------------------------------------------------------+
| 5.6.28-76.1 | Percona Server (GPL), Release 76.1, Revision 5759e76 |
+-------------+------------------------------------------------------+
1 row in set (0.00 sec)

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Perhaps this could be fixed on a slave by stopping RFR (and breaking replication) if a row event for a PK-less table is seen in the RFR configuration.

tags: added: i65945
Revision history for this message
George Ormond Lorch III (gl-az) wrote :

Another approach would be to limit the RFR functionality specifically to tables that have a PK. The decision to lookup/not lookup is done in sql/log_event.cc::decide_row_lookup_algorithm_and_key where it checks to see if an event is WRITE/DELETE/UPDATE ROWS and handler->rpl_lookup_rows(). I think adding another predicate there to test for a PK is a reasonable preventative along with clear documentation on the behavior.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

+1

Revision history for this message
George Ormond Lorch III (gl-az) wrote :

For reference, this commit to the old Tokutek MariaDB server https://github.com/Tokutek/mariadb-server/commit/0e5f4a2bd740502814bdfc652bdd3d4ec08112ef did pretty much exactly what I suggest, skips RFR for tables w/o a PK. See the context around line 11115

Revision history for this message
George Ormond Lorch III (gl-az) wrote :
Revision history for this message
George Ormond Lorch III (gl-az) wrote :

When read-free-replication is enabled for tokudb and there is no
explicit pk for replicated table there can be dublicated records in the table
on update operation.

Consider this update operation:
UPDATE t SET a = a + 10 WHERE b = 'b';
The master does rows lookup and updates the rows which values
correspond to the condition. The update events are written to binary log with
rows values from the master. As rows lookup is forbidden for slave
the new rows are added instead of updating corresponding rows.

Without the fix there will be several rows with b = 'b' in the table
on slave instead of one updated row.

The fix disables RFR for tables without explicit pk and do rows lookup
for update and delete binlog events and issues warning.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-959

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.