RBR + no PK => High load on slave (table scan/cpu) => slave failure

Bug #903506 reported by Oleg Tsarev
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MySQL Server
Unknown
Unknown
Percona Server moved to https://jira.percona.com/projects/PS
Invalid
Medium
Unassigned
5.1
Won't Fix
Undecided
Unassigned
5.5
Triaged
Medium
Unassigned
5.6
Invalid
Medium
Unassigned

Bug Description

if one runs DML on a table that has no indexes, a full table scan is done.
with RBR, the slave might need to scan the full table for *each* row changed.

consider this on the master with --binlog-format=row :
delete from t1 order by rand();

when t1 has N rows, and no primary/unique key,
the slave must read N² rows to process the delete.
for larger datasets this is unrealistic and will be doomed to fail.

on master:
mysql> delete from t1 order by rand();
Query OK, 78130 rows affected (2.61 sec)

on slave it takes 78130*78130 row reads to process (still running):
---TRANSACTION 0 1799, ACTIVE 731 sec, OS thread id 3672 fetching rows
mysql tables in use 1, locked 1
153 lock struct(s), heap size 30704, 78278 row lock(s), undo log entries 10045
MySQL thread id 5, query id 16 Reading event from the relay log

Number of rows inserted 78130, updated 0, deleted 10045, read 370899004
0.00 inserts/s, 0.00 updates/s, 16.20 deletes/s, 600754.85 reads/s

How to repeat:
#setup rbr master and slave.
#run on master:

drop table if exists t1;
create table t1(a int)engine=innodb;
insert t1 values (1),(2),(3),(4),(5);
set @a=5;
insert into t1 select (@a:=@a+1) from
t1,t1 t2,t1 t3,t1 t4,t1 t5,t1 t6,t1 t7;
delete from t1 order by rand();

#watch as slave hits 100% of 1 cpu core and reads billions or rows.

Oleg Tsarev (tsarev)
Changed in percona-server:
assignee: nobody → Oleg Tsarev (tsarev)
Oleg Tsarev (tsarev)
tags: added: cr i20448
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Upstream bug is claimed to be fixed in 5.5.19 and 5.6.4. Needs re-verification.

tags: added: upstream
tags: added: rdba
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :
Revision history for this message
yinfeng (yinfeng-zwx) wrote :

Can we define such scene as ROW-unsafe ?
These queries can be forced to log in statement format (if statement safe) to workaround this bug.

Revision history for this message
Valerii Kravchuk (valerii-kravchuk) wrote :

Re-verification for 5.5 is NOT needed, as it was not a real fix, just this:

"The current fix helps diagnose this issue by printing a message to the error log if the execution time for a given statement replicated using row-based replication takes more than 60 seconds. log_warnings must be greater than 1 for this message to be printed to the error log."

This is the real fix: http://binlogtorelaylog.blogspot.in/2012/08/Batch-operations-in-RBR.html

Variable is there since 5.6.6, see mysql.com/doc/refman/5.6/en/replication-options-slave.html#sysvar_slave_rows_search_algorithms

Moreover:

[openxs@centos Percona-Server-5.6.11-rc60.3-375.Linux.x86_64]$ strings bin/mysqld | grep slave_rows
_ZN15Hash_slave_rows4sizeEv
_ZN15Hash_slave_rows13make_hash_keyEP5TABLEP9st_bitmap
_ZN15Hash_slave_rows3getEP5TABLEP9st_bitmap
_ZN15Hash_slave_rows4initEv
_ZN15Hash_slave_rows3putEP5TABLEP9st_bitmapP17hash_row_entry_st
slave_rows_search_algorithms_options
_ZN15Hash_slave_rows6deinitEv
_ZN15Hash_slave_rows8is_emptyEv
_ZN15Hash_slave_rows3delEP17hash_row_entry_st
_ZN15Hash_slave_rows4nextEPP17hash_row_entry_st
_ZN15Hash_slave_rows10make_entryEPKhS1_S1_S1_
slave_rows_search_algorithms

So. PS 5.6.11 is NOT affected.

Revision history for this message
Hui Liu (hickey) wrote :

There exists two issues:
1. no pk but exists index => no index would be used
it's solved by finding suitable index, ref https://kb.askmonty.org/en/row-based-replication-with-no-primary-key/

2. no pk and index => each table is needed for SQL that effects many rows.
it's solved by hash scan used in 5.6, ref http://binlogtorelaylog.blogspot.in/2012/08/Batch-operations-in-RBR.html

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-1231

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.