Comment 0 for bug 1018685

Revision history for this message
yinfeng (yinfeng-zwx) wrote :

Replication is often interrupted because of some errors, and the most common errors we encountered recently are "HA_ERR_KEY_NOT_FOUND" and "HA_ERR_FOUND_DUPP_KEY".

there are two methods to handle the errors:
1). ignore the error by setting "sql_slave_skip_counter".
2).set slave_exec_mode = "idempotent" to handle "HA_ERR_FOUND_DUPP_KEY" (overwritten the record) and "HA_ERR_KEY_NOT_FOUND"(just ignore the error).

Both methods may lead to inconsistencies between master and slave.

if we are using row-based replication and innodb storage engine , why don't we fix these errors instead of simply ignore it?

So I introduced in a new value of slave_exec_mode: SMART

a simple idea is :

1) HA_ERR_KEY_NOT_FOUND
       UPDATE_ROWS_EVENT: write 'Before Image' of the record, and then update it
       WRITE_ROWS_EVENT: write the record and then delete it , or just ignore the error

2)HA_ERR_FOUND_DUPP_KEY
      WRITE_ROWS_EVENT: just overwrite the record
      UPDATE_ROWS_EVENT : delete the duplicated record and then update (if the error is caused because of duplicated unique key and the table also has a primary key , the fix may failed and all changes can be rollback)

The attached file is a simple patch based on Percona-Server-5.5.24-rel26.0, and it is still under the testing.