pt-table-checksum chunk-size-limit of 0 does not disable chunk size limit checking

Bug #938660 reported by Mrten
38
This bug affects 8 people
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Invalid
Undecided
Daniel Nichter

Bug Description

pt-table-cheumsum 2.0.3 seems to find a --chunk-size-limit of 0 not special:

I get lots of

02-22T03:50:09 Aborting table afval.hs_file at chunk 1 because it is not safe to chunk. Chunking should use the PRIMARY index, but MySQL EXPLAIN reports that no index will be used.
02-22T03:50:09 Aborting table afval.hs_itemfile at chunk 1 because it is not safe to chunk. Chunking should use the PRIMARY index, but MySQL EXPLAIN reports that no index will be used.

which I did not have yesterday with chunk-size-limit on 5, but most tellingly:

02-22T03:50:42 Skipping table afval.mg_newsletter_section because on the master it would be checksummed in one chunk but on these replicas it has too many rows:
  1 rows on erika.ii.nl
  1 rows on piro.ii.nl
The current chunk size limit is 0 rows (chunk size=2805 * chunk size limit=0).

My invocation (bash script)

ignore_tables=$(mysql -e 'SELECT CONCAT(`database`,".",`table`) FROM `ignore_tables`' --skip-column-names maatkit | tr '\n' ',' | sed 's/,$//')
ignore_databases=$(mysql --skip-column-names -e "show master status"|cut -f4)

eval pt-table-checksum --recursion-method dsn=h=127.0.0.1,P=3306,D=maatkit,t=pt_check_slave_delay_dsns --function MURMUR_HASH --replicate maatkit.pt_checksum --ignore-tables "$ignore_tables,security_log" --ignore-databases "$ignore_databases,mysql,maatkit" --no-check-replication-filters --chunk-size-limit 0 --quiet --max-lag 600 h=127.0.0.1,P=3306

The chunk-size-limit of 5 was not high enough (I think I even tried 10) to guarantee that all tables would be checksummed, I think the 'too many rows on the slave' test is too sensitive. InnoDB's guesstimated row count varies just too much (I've ran an ANALYZE table on all tables the day before this).

Related branches

Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

So if I understand correctly, you're trying to use a value of 0 to disable the check, right?

Revision history for this message
Mrten (bugzilla-ii) wrote : Re: pt-table-checksum chunk-size-limit of 0 does not disable chunk size limit checkinga

This is my intention, yes.

I thought it was documented as such, too:

http://www.percona.com/doc/percona-toolkit/2.0/pt-table-checksum.html

"You can disable oversized chunk checking by specifying a value of 0."

Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

Right. Thanks for clarifying!

Changed in percona-toolkit:
status: New → Confirmed
importance: Undecided → Medium
tags: added: chunking pt-table-checksum
Brian Fraser (fraserbn)
summary: pt-table-checksum chunk-size-limit of 0 does not disable chunk size
- limit checkinga
+ limit checking
Changed in percona-toolkit:
milestone: none → 2.1.6
Revision history for this message
Sheeri K. Cabral (awfief) wrote :

I'm having a similar problem, but interestingly the behavior of percona-toolkit is a bit different - it says the chunk size limit is 1, even though I set it to 0:

# /usr/bin/pt-table-checksum --quiet --ignore-databases=mysql,percona,information_schema,performance_schema --user checksum --password 'ELIDED' --lock-wait-time=50 --chunk-size-limit=0 --no-check-plan --replicate percona.checksums h=ELIDED
10-22T09:06:06 Skipping table addons_mozilla_org.collection_subscriptions because on the master it would be checksummed in one chunk but on these replicas it has too many rows:
  207736 rows on addons6.db.phx1.mozilla.com
  207861 rows on addons2.db.phx1.mozilla.com
  211173 rows on addons1.db.phx1.mozilla.com
The current chunk size limit is 204696 rows (chunk size=204696 * chunk size limit=1).

Revision history for this message
Sheeri K. Cabral (awfief) wrote :

FWIW this is in:

# /usr/bin/pt-table-checksum --version
pt-table-checksum 2.1.5

Changed in percona-toolkit:
assignee: nobody → Daniel Nichter (daniel-nichter)
assignee: Daniel Nichter (daniel-nichter) → Brian Fraser (fraserbn)
Brian Fraser (fraserbn)
Changed in percona-toolkit:
status: Confirmed → In Progress
Revision history for this message
Brian Fraser (fraserbn) wrote :

Sheeri, thanks! Your feedback helped me track this down. We were using chunk-size-limit for two different thigns internally; one of them relied on the value never being zero, so it clobbered the original. I've pushed a branch that should have this fixed.

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

Let me clarify what's happening because currently everything is tangled.

The original report had two errors:

1 = 02-22T03:50:09 Aborting table afval.hs_file at chunk 1 because it is not safe to chunk. Chunking should use the PRIMARY index, but MySQL EXPLAIN reports that no index will be used.

2 = 02-22T03:50:42 Skipping table afval.mg_newsletter_section because on the master it would be checksummed in one chunk but on these replicas it has too many rows:

Firstly, error 2 was fixed in 2.0.4 (yet strangely not mentioned in the Changelog--I think it got fixed by accident while refactoring code), so using that version or newer will prevent this false-positive. For all the following, I'm speaking about release 2.1.x.

Disabling --chunk-size-limit cannot bypass either error. The first prevents the tool from chunking on index X (which MySQL originally told the tool it would use) then MySQL deciding "no, I don't want to use index X, let's table scan instead". It cannot currently be disabled. There's not a single or short reason why this may happen. We'll need to see PTDEBUG output. In general, it may happen with very small tables (< the current chunk size).

The second problem cannot currently be disabled, yet bug 987495 requests and option for disabling it. It's mean to handle single-chunking a table on the master, then making sure the table on the slave can also be single-chunked. So it's a "special" use of --chunk-size-limit wherein --chunk-size-limit is >=1, i.e. can't be disabled because this check can't be disabled (unless we implement bug 987495) else we may single-chunk a huge table on a slave (that's empty on the master--which happens in the real world). So...

Sheeri's report is the same as the original #2 error, though she notes the oddity of --chunk-size-limit being 1 even when set to 0 in some cases which is related to the aforementioned ^. Being set to 1 has the affect of limiting the single-chunk size of the slave's table to the current chunk-size (because 1 * any chunk-size = chunk-size). --chunk-size-limit shouldn't probably be co-opted for this purpose, which is something we'll probably fix in 2.2 (even though it's just an internal code fix).

So in short, there's no bug here when using 2.0.4 or newer, and see bug 987495 if one wants to disable the slave table size check.

Changed in percona-toolkit:
status: In Progress → Invalid
assignee: Brian Fraser (fraserbn) → Daniel Nichter (daniel-nichter)
milestone: 2.1.6 → none
Changed in percona-toolkit:
importance: Medium → Undecided
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-938

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.