Chunking handles multi-column indexes poorly

Bug #1130498 reported by Jeremy Tillman on 2013-02-20
This bug affects 9 people
Affects Status Importance Assigned to Milestone
Percona Toolkit

Bug Description

I have a (InnoDB) table , with around 10 Millions rows, that uses a multicolumn primary key (4 INTs). When adding a column using pt-online-schema-change with no --chunk-index specified, it tries to use the PrimaryKey Constraint.

This fails during the 1st chunk transfer because:

Error copying rows from [table] to [table]_new: Error copying rows at chunk 1 of [table] because MySQL used only 8 bytes of the PRIMARY index instead of 16. See the --[no]check-plan documentation for more information.

The query that is generated for chunking is as such (for the select portion):

SELECT blah, blah, blah FROM `[table]` FORCE INDEX(`PRIMARY`) WHERE ((`pk_1` > ?) OR (`pk_1` = ? AND `pk_2` > ?) OR (`pk_1` = ? AND `pk_2` = ? AND `pk_3` > ?) OR (`pk_1` = ? AND `pk_2` = ? AND `pk_3` = ? AND `pk_4` >= ?)) AND ((`pk_1` < ?) OR (`pk_1` = ? AND `pk_2` < ?) OR (`pk_1` = ? AND `pk_2` = ? AND `pk_3` < ?) OR (`pk_1` = ? AND `pk_2` = ? AND `pk_3` = ? AND `pk_4` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 20683 copy nibble*/

When give the --no-check-plan option is used to continue past this but starts soon receive a message as such:

Rows are copying very slowly. --chunk-size has been automatically reduced to 1. Check that the server is not being overloaded, or increase --chunk-time. The last chunk selected 1 rows and took 33.835 seconds to execute.

When using an index with a single column, the tool operates quickly (30 mins) on the same table.

I would suggest a fix of not trying retrieve the full chunk size every time but reduce WHERE clause to only have equality condition up to the inner most column of the index. (pk_1 = ? AND pk_2 = ? AND pk_3 = ? AND pk_4 >= ?)

Of course this wouldn't guarantee the chunk of each write be the chunk size but it would guarantee fast queries and the index being used to its max length ever time.

description: updated
tags: added: chunking pt-online-schema-change pt-table-checksum
summary: - pt-online-schema-change queries render tool unusable when multicolumn
- key is used
+ Chunking handles multi-column indexes poorly
Daniel Nichter (daniel-nichter) wrote :

This affects pt-osc and ptc currently because these tools use the NibbleIterator module. I discovered this problem too awhile ago, but haven't gotten to fixing it because it's a weird and complex problem, and nibbling/chunking is not easy to change in the tools.

Three things are happening: first, NibbleIterator does make sub-optimal WHERE clauses with multi-column indexes. With 3+ column indexes, the tool makes a WHERE clause with mostly superfluous predicates. Second, the MySQL query planner/optimizer should cancel out these superfluous parts, but instead they confuse MySQL and make it use only part of the index. Using part of the index is where the "MySQL used only 8 bytes of the PRIMARY index instead of 16" comes from. We added --[no]check-plan this to detect when this happen in the wild because--third thing--this doesn't happen all the time. Sometimes MySQL isn't confused, and sometimes it is. When it is and uses part of the index, the result is an index scan. That's not too bad unless the table is large (e.g. millions of rows), and then even an index scan become expensive; and that's why --no-check-plan results in "Rows are copying very slowly".

So yes, we need to fix NibbleIterator, but I don't see this happening for awhile. I would like it to happen for PT 3.0, tentatively schedule for this summer, because we plan to rewrite pt-table-sync to use NibbleIterator like pt-table-checksum.

So the only workaround is currently to use a simpler index, i.e. 1 or 2 columns.

Changed in percona-toolkit:
status: New → Triaged
Changed in percona-toolkit:
milestone: none → 3.0.1
importance: Undecided → High
Changed in percona-toolkit:
importance: High → Medium
Jeremy Tillman (jeremy-tillman) wrote :

I see that the importance of this issue has been reduced. Is there any information as to the current plans for this issue? Thanks

Michael Titus (miket-r) wrote :

We have some large tables with multi-column primary keys that we cannot use pt-online-schema-change on. It would be a big help to have this bug fixed. Thanks.

Adam DePue (adepue) wrote :

We have a large portion of our schema that prevents the usage of pt-online-schema-change because of this bug as well. It would be extremely handy to have this fixed.

tags: added: rdba
tags: removed: rdba
Mrten (bugzilla-ii) wrote :

I hit this with a 4-column primary key but was able to run pt-osc with --chunk-index-columns 2 successfully.

Timur Bakeyev (timur.bakeyev) wrote :

End of 2015, pt-osc 2.2.3, still the same issue... MySQL table with 228mln rows and 4 columns primary key.

Error copying rows at chunk 1 of [table] because MySQL used only 8 bytes of the PRIMARY index instead of 15

Any progress and estimations?

Jeremy Tillman (jeremy-tillman) wrote :

We've had this ticket opened for some time now. I understand how having greater than 2 column keys create a larger problem for the algorithm. Are we at a block in seeking solutions to solve this issue or is it just getting any attention?

Daniël van Eeden (dveeden) wrote :

Maybe it could use mysql.innodb_index_stats to find a suitable column to use or suggest for --chunk-index=<some_index>

Dave Gregg (dgregg) wrote :

This issue still exists is 2.2.16 !!!!!!!!

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers