pt-table-checksum doesn't wait for checksum table to replicate

Bug #1008778 reported by Baron Schwartz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
High
Daniel Nichter

Bug Description

I've observed nondeterministic behavior in this test:

[baron@localhost stabilize-test-suite]$ perl t/pt-table-checksum/skip_innodb.t
# Stopping/reconfiguring/restarting sandboxes 12348 and 12349
1..3
ok 1 - Ran without InnoDB (bug 996110)
ok 2 - 0 exit status (bug 996110)
# Shutting down sandboxes
ok 3 - Sandbox servers

[baron@localhost stabilize-test-suite]$ perl t/pt-table-checksum/skip_innodb.t
# Stopping/reconfiguring/restarting sandboxes 12348 and 12349
1..3
ok 1 - Ran without InnoDB (bug 996110)
not ok 2 - 0 exit status (bug 996110)
# Failed test '0 exit status (bug 996110)'
# in t/pt-table-checksum/skip_innodb.t at line 61.
# got: '1'
# expected: '0'
# Shutting down sandboxes
ok 3 - Sandbox servers
# Looks like you failed 1 test of 3.

[baron@localhost stabilize-test-suite]$ perl t/pt-table-checksum/skip_innodb.t
# Stopping/reconfiguring/restarting sandboxes 12348 and 12349
1..3
ok 1 - Ran without InnoDB (bug 996110)
ok 2 - 0 exit status (bug 996110)
# Shutting down sandboxes
ok 3 - Sandbox servers

When I saved the output of test 2 to a file in /tmp, I found a lot of the following:

06-04T09:01:26 2 0 0 1 0 0.008 mysql.time_zone_name
06-04T09:01:26 Error waiting for the last checksum of table mysql.time_zone_transition to replicate to replica localhost.localdomain: DBD::mysql::db selectrow_array failed: Table 'percona.checksums' doesn't exist [for Statement "SELECT MAX(chunk) FROM `percona`.`checksums` WHERE db='mysql' AND tbl='time_zone_transition' AND master_crc IS NOT NULL"] at /home/baron/stabilize-test-suite//bin/pt-table-checksum line 7513.

Check that the replica is running and has the replicate table `percona`.`checksums`. Checking the replica for checksum differences will probably cause another error.
06-04T09:01:26 Error checksumming table mysql.time_zone_transition: Use of uninitialized value in numeric lt (<) at /home/baron/stabilize-test-suite//bin/pt-table-checksum line 7533.

This hints to me that pt-table-checksum needs to do something smarter. After creating the checksum table, it needs to wait until this table appears on all of the replicas it's detected. In addition, we need to fix the error on line 7533.

Related branches

Changed in percona-toolkit:
status: New → In Progress
importance: Undecided → High
assignee: nobody → Daniel Nichter (daniel-nichter)
milestone: none → 2.1.2
tags: added: breaks-replication pt-table-checksum risk
summary: - pt-query-digest doesn't wait for checksum table to replicate
+ pt-table-checksum doesn't wait for checksum table to replicate
tags: added: replication-wait
removed: breaks-replication risk
tags: added: breaks-replication replication-lag risk
removed: replication-wait
Revision history for this message
Daniel Nichter (daniel-nichter) wrote :
Changed in percona-toolkit:
status: In Progress → Fix Committed
Changed in percona-toolkit:
status: Fix Committed → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-316

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.