pt-kill kills prepared statements without checking busy-time

Bug #1016272 reported by Gavin Towey
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Committed
Medium
Carlos Salguero

Bug Description

We're using pt-kill with the following options:
pt-kill --busy-time 120s --print --match-info "^(select|SELECT)" --interval 10

The idea is to only kill select queries running longer than 120 seconds.

However there's a problem with the logic that pt-kill uses to kill queries. If the Command is not equal to Query, then busy time is ignored, and other match parameters are free to take effect.

In this case, a prepared statement is in the "Execute" Command state, and running a SELECT in the Info part. It gets killed immediately by pt-kill every time.

The current assumption that only items in the processlist with Command=Query are "busy" But that isn't the case.

I think the right fix is to say that any query which is *not idle* is busy.

That is to change the current busy time check from:
if ( $find_spec{busy_time} && ($query->{Command} || '') eq 'Query' ) {

to
if ( $find_spec{busy_time} && ($query->{Command} || '') ne 'Sleep' ) {

However, I understand that this isn't a trivial change. The smallest change which would fix this issue is to add Execute to the if statement as well as Query, but that seems like it could leave the script vulnerable to similar issues.

The MySQL documentation in this case seems to agree with my case: that many queries not having Command=Query should actually be considered busy: http://dev.mysql.com/doc/refman/5.1/en/thread-commands.html

Revision history for this message
Gavin Towey (gtowey) wrote :

By the way, we can work around this by adding --ignore-command Execute, but it seems the default behavior does not work as intended.

affects: perconatools → percona-toolkit
tags: added: pt-kill
Changed in percona-toolkit:
status: New → Triaged
Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

I believe you. I think the tool doesn't work intuitively in any case. I'm not sure whether we will fix this (since you have a workaround) or wait until we redesign/rebuild the tool entirely.

Revision history for this message
Max Bubenick (max-bubenick) wrote :

Hi, can we resume this bug?

Is still present, and with Gavin's workaround on comment #1 you can't kill queries on Execute state which is needed in several cases.

tags: added: rdba
Revision history for this message
Sergei Turchanov (turchanov) wrote :

This bug is still present and bites us very hard. It kills prepared statements with busy time less than was configured.

I made my own fix. To be on a safe side I added 6 additional states which correspond to prepared statements (COM_STMT_PREPARE, COM_STMT_EXECUTE, COM_STMT_SEND_LONG_DATA, COM_STMT_CLOSE, COM_STMT_RESET, COM_STMT_FETCH).

--- pt-kill.orig 2017-06-21 11:10:30.000000000 +1000
+++ pt-kill.new 2017-06-22 11:34:32.000000000 +1000
@@ -3520,7 +3520,16 @@ sub find {
          next QUERY;
       }

- if ( $find_spec{busy_time} && ($query->{Command} || '') eq 'Query' ) {
+ my %busy_commands = (
+ "Query" => 1,
+ "Execute" => 1,
+ "Prepare" => 1,
+ "Fetch" => 1,
+ "Long Data" => 1,
+ "Reset stmt" => 1,
+ "Close stmt" => 1,
+ );
+ if ( $find_spec{busy_time} && exists($busy_commands{$query->{Command} || ''}) ) {
          next QUERY unless defined($query->{Time});
          if ( $query->{Time} < $find_spec{busy_time} ) {
             PTDEBUG && _d("Query isn't running long enough");

I personally like "if ( $find_spec{busy_time} && ($query->{Command} || '') ne 'Sleep' ) {" variant as it looks much shorter.

But this is a very serious bug which must be addressed ASAP.

Changed in percona-toolkit:
importance: Undecided → Medium
assignee: nobody → Carlos Salguero (carlos-salguero)
tags: added: pt167
Revision history for this message
Taavi Burns (taavi-burns) wrote :

We tried to use pt-kill at work to kill off queries still running after the app server had gone away, and because our app server uses prepared statements pt-kill wasn't able to do the one thing it's named for (in fact, it wasn't killing queries in Execute at all; but when configured manually to do so it kills them immediately as per this bug). +1 for getting this fix out.

Thanks!

Changed in percona-toolkit:
status: Triaged → In Progress
Revision history for this message
Carlos Salguero (carlos-salguero) wrote :

Hi,

I want to introduce the changes but I would need a proper way to reproduce the issue.
Could you provide the steps to reproduce this?

Thanks

Revision history for this message
Carlos Salguero (carlos-salguero) wrote :

Also, could you provide env information like MySQL version, Perl version and OS?

Revision history for this message
Taavi Burns (taavi-burns) wrote :

Linux:
* mysql Ver 14.14 Distrib 5.5.41-37.0, for debian-linux-gnu (x86_64) using EditLine wrapper
* This is perl 5, version 18, subversion 2 (v5.18.2) built for x86_64-linux-gnu-thread-multi
* pt-kill 2.2.15

OSX:
* mysql Ver 14.14 Distrib 5.5.56, for osx10.11 (x86_64) using EditLine wrapper
* This is perl 5, version 18, subversion 2 (v5.18.2) built for darwin-thread-multi-2level
* pt-kill 3.0.2

Prep:
(echo 'create table foo (id integer);'; for x in {1..20000}; do echo 'INSERT INTO foo VALUES ('$x');'; done) | mysql $DBNAME

Then issue a prepared statement via your preferred method for doing that, e.g.:
SELECT a_id, b_id from (select a.id a_id, b.id b_id from foo a, foo b) c

The query takes a minute or more to run on my laptop (400M rows will do that!), which gives plenty of time for:
* This command to NOT find anything to kill: pt-kill --verbose --print --victims=all --busy-time 30
* This command to kill 'Execute'ing prepared statements on-sight, ignoring busy-time: pt-kill --verbose --print --victims=all --busy-time 30 --match-command 'Query|Execute'

In production we are getting this due to an Elixir app (using Ecto, using MariaEx) which very strongly prefers to use server-prepared statements. It looks like it's not hard to reproduce the 'Execute' state using Perl though (first time I've written perl in YEARS!):

#!/usr/bin/perl

use DBI;
my $dbh = DBI->connect('DBI:mysql:database=YOURDATABASE;host=127.0.0.1;mysql_server_prepare=1', 'USERNAME', 'PASSWORD')
          or die "Couldn't connect to database: " . DBI->errstr;

my $sth = $dbh->prepare('SELECT COUNT(*) FROM (SELECT a.id AS a_id, b.id AS b_id FROM foo AS a, foo AS b) AS c')
          or die "Couldn't prepare statement: " . $dbh->errstr;

$sth->execute()
      or die "Couldn't execute statement: " . $sth->errstr;

Revision history for this message
Carlos Salguero (carlos-salguero) wrote :
Changed in percona-toolkit:
status: In Progress → Fix Committed
milestone: none → 3.0.5
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-548

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.