pt-online-schema-change prints scary/misleading message while pausing for slave lag

Bug #1089173 reported by rcoli on 2012-12-12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona Toolkit

Bug Description

pt-online-schema-change version 2.1.4

How to reproduce :

1) use default values of max-lag=1, check-interval=1, progress=time,30
2) pt-osc ALTER a master with a slave
4) get a message like "Replica lag is 32 seconds on Waiting."
5) become worried that max-lag and check-interval are malfunctioning somehow
6) set progress=time,1
7) repeat 2-3
8) get a message like "Replica lag is 2 seconds on Waiting."

In actual reality, the max-lag and check-interval options are working as intended. There is occasionally a lag of up to 1 second (which I don't *think* is caused by off-by-one but could theoretically be) but this lag is much less than the 30+ seconds implied by the message printed in 4) above.

People who don't have the time or skill to set PTDEBUG=1 and then use this DEBUG output to verify the behavior of the tool might be quite reasonably freaked out by this message. It would be ideal if the "Replica lag is.. Waiting" messages printed without regard for the --progress setting.

Of course, given that the underlying feature, should prevent values larger than 1 or 2 for slave lag in a non-test-case (no FLUSH TABLES WITH READ LOCK on the slave), this bug will only bite people who do have the skill and time to design this test case.

tags: added: ambiguity progress pt-online-schema-change
Changed in percona-toolkit:
status: New → Triaged
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers