pt-online-schema-change gets stuck looking for its own _new table

Reported by Elton M. Labajo on 2013-06-28
28
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Percona Toolkit
High
Daniel Nichter

Bug Description

running this command

PTDEBUG=1 pt-online-schema-change --max-load Threads_running=100 --critical-load Threads_running=450 --nocheck-replication-filters --execute --alter 'engine=innodb' t=orders_status_history,D=xxxx,h=x.x.x.x > pt-osc.log 2>&1

get's stuck and checking the debug file shows these logs.

 tail -f pt-osc.log
# TableParser:3293 13801 Table does not exist
# TableParser:3279 13801 Checking `xxxx`.`_orders_status_history_new`
# TableParser:3283 13801 SHOW TABLES FROM `xxxx` LIKE '\_orders\_status\_history\_new'
# TableParser:3293 13801 Table does not exist
# TableParser:3279 13801 Checking `xxxx`.`_orders_status_history_new`
# TableParser:3283 13801 SHOW TABLES FROM `xxxx` LIKE '\_orders\_status\_history\_new'
# TableParser:3293 13801 Table does not exist
# TableParser:3279 13801 Checking `xxx`.`_orders_status_history_new`
# TableParser:3283 13801 SHOW TABLES FROM `xxxx` LIKE '\_orders\_status\_history\_new'
# TableParser:3293 13801 Table does not exist
# TableParser:3279 13801 Checking `xxxx`.`_orders_status_history_new`
# TableParser:3283 13801 SHOW TABLES FROM `xxxx` LIKE '\_orders\_status\_history\_new

however the table does exist.

 SHOW TABLES FROM `xxxx` LIKE '\_orders\_status\_history\_new';
+-----------------------------------------------------+
| Tables_in_xxxx (\_orders\_status\_history\_new) |
+-----------------------------------------------------+
| _orders_status_history_new |
+-----------------------------------------------------+
1 row in set (0.01 sec)

see attached log file pt-osc.log for further details.

Elton M. Labajo (elton-labajo) wrote :
description: updated
description: updated
Daniel Nichter (daniel-nichter) wrote :

Interesting, thanks for the report and log. We'll look into it.

tags: added: pt-online-schema-change
Changed in percona-toolkit:
status: New → Confirmed
Changed in percona-toolkit:
milestone: none → 2.2.5
summary: - running pt-online-schema-change gets stuck and the temp file created
- _table_name_new the size doesn't grow
+ pt-online-schema-change gets stuck looking for its own _table_new table
summary: - pt-online-schema-change gets stuck looking for its own _table_new table
+ pt-online-schema-change gets stuck looking for its own _new table
Changed in percona-toolkit:
importance: Undecided → Medium
Changed in percona-toolkit:
milestone: 2.2.5 → none
Changed in percona-toolkit:
milestone: none → 2.2.6
Changed in percona-toolkit:
importance: Medium → High
Daniel Nichter (daniel-nichter) wrote :

I think this is not a bug but poor feedback from the tool. After the tool creates the new table, it waits for the new table to appear on all slaves. So either 1) there's a slave that's really lagged or 2) there are replication filters preventing the CREATE TABLE _newt_table from replicating to one or more slaves.

A workaround is: --recursion-method none. This prevents the tool from doing anything with slaves.

The fix here is making the tool report what it's doing so users aren't left wondering.

tags: added: percona-37252
Jacky Leung (jacky-5) wrote :

That doesn't sound right, i have few time attempt to run this tool to run on a DB with no slave lagging behind. but the tool just stuck and not doing/printing anything for like 2 hours (which if i manually run it will bring do an alter table and lock up table for 30mins).

a proper first step maybe adding more logging around it as i am not sure how to reproduce it

Daniel Nichter (daniel-nichter) wrote :

Jacky, running with PTDEBUG=1 will confirm if the tool was waiting for a slave, as in the case provided by Elton.

Changed in percona-toolkit:
assignee: nobody → Daniel Nichter (daniel-nichter)
status: Confirmed → In Progress
Jacky Leung (jacky-5) wrote :

Daniel, is that environment variable?

Daniel Nichter (daniel-nichter) wrote :

Jacky, it's PTDEBUG, so run the tool like:

PTDEBUG=1 pt-online-schema-change ... > dbg 2>&1

If it gets stuck, CTRL-C to kill it. Then dbg will contain a lot of debug output. All debug output is printed to STDERR.

Changed in percona-toolkit:
status: In Progress → Fix Committed
Changed in percona-toolkit:
status: Fix Committed → Fix Released
Jacky Leung (jacky-5) wrote :

Daniel thanks now i can see the debug log and found the problem.

I have a server that is a bit multi purpose. It got a standalone mysql for solr (not a slave of the master) and that server also have a open replicator running to get the binlog data for incremental update for elastic search.

now here is the problem, the pt-online-schema-change mistaken my standalone mysql server is a slave of the master and didn't realise in fact it is a java open replicator replicating. so that standalone will never have this new table and then it just stuck forever to wait for that server replicate.

From our server setup point of view, we don't need a seperate server, and also adding new server will add additional cost (of course) so we will not separate the server (not to mention it will require us to change lot of configuration).

For now i will by pass the slave check features with recursion method none, but i think it will be better for the pt-online-schema-change to check that the server is actually a slave.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments