pt-table-checksum doesn't warn if binlog_format=row or mixed on slaves

Bug #938068 reported by Max Bowsher
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
Medium
Brian Fraser

Bug Description

If you use pt-table-checksum in --replicate mode on a replication topology like A --> B --> C, then pt-table-checksum will correctly change the binlog_format on A, but B will binlog the checksum queries in row format, which can lead to either false confidence thta C is checksummed correctly, or a break in replication between B and C.

Perhaps pt-table-checksum should, if changing the binlog_format, also check whether any path-length-1 slaves have path-length-2 slaves, and error if this is the case, unless the user passes an option to say they acknowledge this problem exists.

Related branches

Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

There's some debate with Oracle, which is not yet resolved, over whether replication is doing the Right Thing with the binlog_format setting through multiple levels of replication. I agree that the solution to this is to make pt-table-checksum abort and warn if any replica has binlog_format = row.

tags: added: binlog-format breaks-replication pt-table-checksum
summary: - pt-table-checksum --replicate will silently do the wrong thing with non-
- STATEMENT binlog_format and slaves-of-slaves
+ pt-table-checksum doesn't set or check binlog_format on slaves
tags: added: mysql-bug
Changed in percona-toolkit:
status: New → Triaged
Brian Fraser (fraserbn)
Changed in percona-toolkit:
assignee: nobody → Brian Fraser (fraserbn)
Revision history for this message
Daniel Nichter (daniel-nichter) wrote : Re: pt-table-checksum doesn't warn if binlog_format=row on slaves

Until we implement this, this limitation is being documented in response to related bug 899415.

summary: - pt-table-checksum doesn't set or check binlog_format on slaves
+ pt-table-checksum doesn't warn if binlog_format=row on slaves
Max Bowsher (maxb)
summary: - pt-table-checksum doesn't warn if binlog_format=row on slaves
+ pt-table-checksum doesn't warn if binlog_format=row or mixed on slaves
Brian Fraser (fraserbn)
Changed in percona-toolkit:
milestone: none → 2.1.5
Changed in percona-toolkit:
importance: Undecided → Medium
Brian Fraser (fraserbn)
Changed in percona-toolkit:
status: Triaged → In Progress
Changed in percona-toolkit:
status: In Progress → Fix Committed
Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

I missed something in my review:

      if ( $o->get('check-binlog-format') ) {
         my ($master_binlog) = $master_dbh->selectrow_array(
            'SELECT @@binlog_format');

That fails on 5.0 because binlog_format was introduced in 5.1. I'll fix it.

Changed in percona-toolkit:
status: Fix Committed → In Progress
Changed in percona-toolkit:
status: In Progress → Fix Committed
Brian Fraser (fraserbn)
Changed in percona-toolkit:
status: Fix Committed → Fix Released
Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

This caused bug 1080385.

Revision history for this message
Keith Murphy (bmurphy) wrote :

Why would it no be possible to fix this from the server end? If I have read this correctly, the problem is that MySQL Server doesn't honor the binlog_format setting through multiple levels of slaves. This causes the very real possibility that you could have a slave two levels deep that checksums correctly even though the tables aren't the same because of a setting of row-based replication.

So, at least for Percona Server, why can't this be fixed? In my opinion I should be able to make this change (binlog_format) at the session level and have it replicate through.. I know this is making a modest incompatibility with Oracle, but i think it's a good thing. This is utter speculation, but I would guess the majority of the percona-toolkit packages would be deployed against Percona Server pacakges and not Oracle anyways.

Many of us have moved to row or mixed based replication for performance or data drift issues. It becomes very difficult to use and recommend this tool when it doesn't work in those environments. And fixing the problem in Percona Server would give me one more reason to recommend Percona Server over Oracle or any other variant.

Just my thoughts.

Keith

Revision history for this message
Rodrigo (rodri-bernardo) wrote :

+1

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-481

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.