pt-stalk --no-stalk and --iterations 1 don't wait for the collect

Reported by Daniel Nichter on 2012-10-23
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit
Medium
Daniel Nichter

Bug Description

A pt-stalk does:

$trunk/bin/pt-stalk --iterations 1 --dest $dest --variable Uptime --threshold $threshold --cycles 2 --run-time 2 --pid $pid_file -- --defaults-file=$cnf >$log_file 2>&1

It means to test --run-time, but it was failing sporadically. Turns out, on _fast_ systems (a rare case where being slow actually makes the test work) pt-stalk runs, triggers, collect subprocess starts, then pt-stalk exists because there's no more iterations. When the tool exists, it kills the collect subprocess, so nothing is collected.

Daniel Nichter (daniel-nichter) wrote :

Correction: it doesn't kill the collect subprocess, it just messes up testing because the tool finishes yet there are still collector subprocesses running.

Daniel Nichter (daniel-nichter) wrote :

t/pt-stalk passes on all boxes with all the new waiting, though the Ubuntu 12 box is pretty damn slow, slower than we can even reasonably wait for.

summary: - pt-stalk --iterations 1 may not collect
+ pt-stalk --no-stalk or --iterations 1 may not collect
Changed in percona-toolkit:
status: In Progress → Fix Committed

So the "fix" was to wait --run-time * 3 before exiting. As the new docu says, this usually won't happen because the tool runs forever by default, else if running --no-stalk or --iterations 1, then unless the system is *really* slow, the wait in collect() (bug 1047701) should have already killed anything. At worse it just means the tool takes some more time to exit, but then again, if a processes is hung or spinning out of control, this will kill it explicitly (else it may continue to run, then zombie, etc.)

summary: - pt-stalk --no-stalk or --iterations 1 may not collect
+ pt-stalk --no-stalk and --iterations 1 don't wait for the collect
Brian Fraser (fraserbn) on 2012-11-16
Changed in percona-toolkit:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers