Comment 1 for bug 1133249

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

This doesn't seem to be a bug in the tool. It sounds like a race condition, i.e. that your script is reading the file at the exact time pt-heartbeat has truncated the file but before it has written the new info and/or that new info has been flushed. The evidence seems to be the 2.7s offset which reduced the frequency of the issue.

One solution would be locking the file. We could do other more exotic things like writing to tmp file the moving the tmp to the real file. But a 1s, that's a lot of tmp files and moves.

This is is not really a bug in the tool, I'm not sure when or if we'll have time to implement a solution. In your script, can you just retry reading the file if it's empty? Or just ignore if empty because at 1s interval, missing a few seconds here and there shouldn't matter much?