pt-online-schema-change, MySQL 5.6, and InnoDB optimizer stats can cause downtime

Bug #1491261 reported by Miguel Angel Nieto on 2015-09-02
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
High
Frank Cizmich

Bug Description

In 5.6 there is a bug (or bad design) about when the statistics are calculated with persistent innodb_stats_persistent enabled (it is by default).

"In PS 5.6 with innodb_stats_persistent using pt-osc to add a column will change the execution path to "Full scan" on the new table."

So, on a very loaded server that could cause that several hundred of queries that run without problems (under ms of execution time) switches to full table scan for some time, causing million of row scans and most probably a server down problem. It is clear that Oracle doesn't want to investigate/fix it, both reports are marked as "not bug":

http://bugs.mysql.com/bug.php?id=78289
https://bugs.mysql.com/bug.php?id=70617

so we need a workaround so pt-osc doesn't cause server down problems. And it seems to be easy. We just need to run analyze on the new table just before the rename. Consider this a proof of concept, not a patch :)

# diff /usr/bin/pt-online-schema-change /usr/bin/pt-online-schema-change.old
9950d9949
<
9965d9963
<
9974,9975d9971
< my $sqlanalyze = "ANALYZE TABLE $new_tbl->{name}";
< $cxn->dbh()->do($sqlanalyze);

So, we run analyze on the new table and then we rename it as usual.

PTDEBUG && _d($sql);
                  my $sqlanalyze = "ANALYZE TABLE $new_tbl->{name}";
                  $cxn->dbh()->do($sqlanalyze);
                  $cxn->dbh()->do($sql);

After that small change, the statistics are correct after rename and queries continue to use the correct execution plan.

The patch should be something like:

- we check if this is 5.6 with innodb_stats_persistent enabled.
-> NO, we do nothing.
-> YES, we analyze before rename.

There is a test case that can be used to check if the patch works here:

https://bugs.launchpad.net/percona-server/+bug/1490548

Changed in percona-toolkit:
status: New → Confirmed
tags: added: i58783
Changed in percona-toolkit:
assignee: nobody → Frank Cizmich (frank-cizmich)
Changed in percona-toolkit:
status: Confirmed → In Progress
milestone: none → 2.3.1
importance: Undecided → High
Michael Wang (xw73) wrote :

I have heard what was proposed to implement the change, I would like to request that by default the pt-osc tool will leave the environment the same as much as possible. To this end, I propose:

if -analyze-before-rename is NOT specified and the old table has stats
then analyze the new table
else
then do what as specified
end if

Please second my proposal.

tags: added: pt-online-schema-change
Frank Cizmich (frank-cizmich) wrote :

Hi Michael,

There are some cases where ANALYZE might block a table for a long time.
See: https://www.percona.com/blog/2008/09/02/beware-of-running-analyze-in-production/

This might be an argument against setting the option on by default.

Michael Wang (xw73) wrote :

Hello Frank,

You are citing a blog 7 years ago. Nothing has changed in this aspect since 7 years ago?

Frank Cizmich (frank-cizmich) wrote :

Indeed, I was aware of the age of the blog post. So I consulted with my colleagues before citing it. They confirmed that the problem has not gone away.

Michael Wang (xw73) wrote :

Hi Frank, I would like to emphasize that I was not suggesting to make analyze a default option, but rather to leave the table with the same or similar statistics as it was before as much as possible. If a table had recent statistics, but pt-online-schema-change leaves it with no statistics, it will do more harm not to analyze it than to analyze it.

Frank Cizmich (frank-cizmich) wrote :

Hi Michael,

I'm uploading a version of pt-online-schema-change with the --analyze-before-rename option.
As requested, it is automatically turned on if the necessary conditions are met (version >=5.6 + innodb_stats_persistent=ON)
You can force it off with --no-analyze-before-rename.
If you could check it out and test it that would be great.
Will be available in next release. But the default ON or OFF status might vary depending on feedback.

Regards!

Changed in percona-toolkit:
status: In Progress → Fix Committed
Michael Wang (xw73) wrote :

I tested the fix and it works fine. Thanks.

summary: - pt-online-schema-change needs a workaround for a bug on 5.6
+ pt-online-schema-change, MySQL 5.6, and InnoDB optimizer stats can cause
+ downtime
Daniel Nichter (daniel-nichter) wrote :

Renamed option to --analyze-before-swap because "swap" is the usual term.

Changed in percona-toolkit:
milestone: 2.3.1 → 2.2.16
Changed in percona-toolkit:
status: Fix Committed → Fix Released

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-393

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers