pt-online-schema-change find_child_tables slow

Bug #1690122 reported by Matthias Van Woensel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
Medium
Carlos Salguero

Bug Description

tested on
pt-online-schema-change 3.0.2
mysql Ver 14.14 Distrib 5.6.35-80.0, for debian-linux-gnu (x86_64) using readline 6.3
pt-online-schema-change D=himalaya_novocar_brainstorm,t=Estimation,u=root,p=***** --alter-foreign-keys-method=auto --alter="MODIFY type set('INVOICE','PLANNING','ATR')" --dry-run

There are a couple of tickets referencing the inherit slowness of information schema. Most are closed because you can set --alter-foreign-keys-method to none, so it doesn't do the check.

But, this does not solve the problem. It simply avoids it. What if you actually do want the foreign-keys ?

The real problem is that the query in pt-online-schema-change that is used to find the child tables in sub find_child_tables is slow:

Currently:
my $sql = "SELECT table_schema, table_name "
           . "FROM information_schema.key_column_usage "
           . "WHERE referenced_table_schema='$tbl->{db}' "
           . "AND referenced_table_name='$tbl->{tbl}'";

But... simply adding "table_schema='$tbl->{db}'" will speed this up.
I propose:
my $sql = "SELECT table_schema, table_name "
           . "FROM information_schema.key_column_usage "
           . "WHERE table_schema='$tbl->{db}' AND referenced_table_schema='$tbl->{db}' "
           . "AND referenced_table_name='$tbl->{tbl}'";

In the current version, foreign keys that are made across databases will be included. I my proposal, only foreign keys from the same database will be checked.
I can't find a reason why anyone would make a foreign key cross databases, but I'm sure there are some that do have a valid use case for this.

I believe the default way should only check the same table_schema. And if people want to check cross database they can do this using a extra flag?

I'm reporting this as a bug and not a blueprint, because the current code is simply not usable when working with a lot of databases. I hope this is ok.

Tags: pt142
Changed in percona-toolkit:
importance: Undecided → Medium
Revision history for this message
Carlos Salguero (carlos-salguero) wrote :

Hi,

Do you have any benchmark with those changes?
I would like to know how faster would it run.

Revision history for this message
Matthias Van Woensel (matthiaz) wrote :

Hi Carlos,

On my local development machine the difference is a query running in 3 seconds before the modification or in 0.03 seconds with the table_schema statement. So I would say a factor 100 difference. But this is not really a very good example.

On our production environment (where there are a lot more databases) I never allowed the query to fully finish. I killed it after several minutes.

For more information on how to improve performance of the information_schema, you might also want to look at:
https://dev.mysql.com/doc/refman/5.7/en/information-schema-optimization.html

They mention TABLE_SCHEMA and TABLE_NAME specifically to avoid a full directory scan.

tags: added: pt142
Revision history for this message
Carlos Salguero (carlos-salguero) wrote :
Changed in percona-toolkit:
status: New → Fix Committed
assignee: nobody → Carlos Salguero (carlos-salguero)
milestone: none → 3.0.4
Revision history for this message
Carlos Salguero (carlos-salguero) wrote :

Added --only-same-schema-fks parameter

Revision history for this message
Matthias Van Woensel (matthiaz) wrote : Re: [Bug 1690122] Re: pt-online-schema-change find_child_tables slow

Carlos,

Great, Thanks!

Sorry, I didn't realize you guys had a github. If I have something like
this in the future I'll add a PR myself ;-)

Matthias

2017-05-16 13:50 GMT+02:00 Carlos Salguero <email address hidden>:

> Added --only-same-schema-fks parameter
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1690122
>
> Title:
> pt-online-schema-change find_child_tables slow
>
> Status in Percona Toolkit:
> Fix Committed
>
> Bug description:
> tested on
> pt-online-schema-change 3.0.2
> mysql Ver 14.14 Distrib 5.6.35-80.0, for debian-linux-gnu (x86_64)
> using readline 6.3
> pt-online-schema-change D=himalaya_novocar_brainstorm,t=Estimation,u=root,p=*****
> --alter-foreign-keys-method=auto --alter="MODIFY type
> set('INVOICE','PLANNING','ATR')" --dry-run
>
> There are a couple of tickets referencing the inherit slowness of
> information schema. Most are closed because you can set --alter-
> foreign-keys-method to none, so it doesn't do the check.
>
> But, this does not solve the problem. It simply avoids it. What if you
> actually do want the foreign-keys ?
>
> The real problem is that the query in pt-online-schema-change that is
> used to find the child tables in sub find_child_tables is slow:
>
> Currently:
> my $sql = "SELECT table_schema, table_name "
> . "FROM information_schema.key_column_usage "
> . "WHERE referenced_table_schema='$tbl->{db}' "
> . "AND referenced_table_name='$tbl->{tbl}'";
>
> But... simply adding "table_schema='$tbl->{db}'" will speed this up.
> I propose:
> my $sql = "SELECT table_schema, table_name "
> . "FROM information_schema.key_column_usage "
> . "WHERE table_schema='$tbl->{db}' AND
> referenced_table_schema='$tbl->{db}' "
> . "AND referenced_table_name='$tbl->{tbl}'";
>
> In the current version, foreign keys that are made across databases will
> be included. I my proposal, only foreign keys from the same database will
> be checked.
> I can't find a reason why anyone would make a foreign key cross
> databases, but I'm sure there are some that do have a valid use case for
> this.
>
> I believe the default way should only check the same table_schema. And
> if people want to check cross database they can do this using a extra
> flag?
>
> I'm reporting this as a bug and not a blueprint, because the current
> code is simply not usable when working with a lot of databases. I hope
> this is ok.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/percona-toolkit/+bug/1690122/+subscriptions
>

Changed in percona-toolkit:
status: Fix Committed → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-733

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.