broken fingerprinting of specific INSERT/REPLACE queries

Bug #1289516 reported by Martin Ringehahn on 2014-03-07
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit
Medium
Frank Cizmich

Bug Description

When fingerprinting an INSERT/REPLACE query that contains some specific SQL fragments within a string parameter, the resulting fingerprint is invalid.

$ pt-fingerprint --version
= > pt-fingerprint 2.2.6

$ pt-fingerprint --query "REPLACE INTO a VALUES('INSERT INTO foo VALUES (1),(2)')"
=> replace into a values('insert into foo values(?+)

Notice the single opening quote and improperly normalized string value. The Insert statement in this context is just a string.

Expected output:
=> replace into a values(?+)

More context:

 * If the string only contains a single VALUES tuple, the normalization appears to be fine:

$ pt-fingerprint --query "REPLACE INTO a VALUES('INSERT INTO foo VALUES (1)')"
=> replace into a values(?+)

Some operational context how this bug bubbled up:

We were running pt-query-digest on a MySQL server, tailing its slowlog (with long_query_time=0). pt-query-digest was logging query statistics into a mysql database on that same server. As a result, all queries run by pt-query-digest were also fed back into pt-query-digest.

Now, when a client ran a query "INSERT INTO foo VALUES (1),(2)", the query would be logged by pt-query-digest into the query history table, along with a sample of said query. ("replace into `percona`.`query_history`(`checksum`, `sample`, [...] VALUES ([..] 'INSERT INTO foo VALUES (1),(2)' ").

This led to a self-reinforcing feedback loop where new "unique" fingerprints were created recursively. pt-query-digest ballooned to consume 27G within one hour. The slow query log -- normally growing at 1G/hr -- grew at a rate of 40G/hr. The outbound network interface was at a 1Gbit/s ceiling, streaming all the new unique queries to all connected slaves.

Obviously, it is bad practice to save monitoring data onto the same cluster as production data, but this is what happened.

Martin Ringehahn (martwn) wrote :

Some triaging:

The diff at the end of the comment is for a test case that reproduces this issue. It appears that the following optimization from QueryRewriter->fingerprint() is a little too greedy and chops off the VALUES incorrectly. When this block is removed, the test succeeds (with some other tests failing, of course).

 # mysqldump's INSERT statements will have long values() lists, don't waste
 # time on them... they also tend to segfault Perl on some machines when you
 # get to the "# Collapse IN() and VALUES() lists" regex below!
 if ( my ($beginning) = $query =~ m/\A((?:INSERT|REPLACE)(?: IGNORE)?\s+INTO.+?VALUES\s*\(.*?\))\s*,\s*\(/is ) {
    $query = $beginning; # Shorten multi-value INSERT statements ASAP
 }

=== modified file 't/lib/QueryRewriter.t'
--- t/lib/QueryRewriter.t 2013-09-27 02:16:19 +0000
+++ t/lib/QueryRewriter.t 2014-03-09 00:15:46 +0000
@@ -247,6 +247,12 @@
 );

 is(
+ $qr->fingerprint('insert into foo values("(1),(2)")'),
+ 'insert into foo values(?+)',
+ 'not confused by parentheses inside a quoted string',
+);
+
+is(
    $qr->fingerprint('select * from foo limit 5'),
    'select * from foo limit ?',
    'limit alone',

summary: - broken fingerprinting of specific REPLACE queries
+ broken fingerprinting of specific INSERT/REPLACE queries
description: updated
tags: added: pt-fingerprint
Changed in percona-toolkit:
importance: Undecided → Medium
status: New → In Progress
milestone: none → 2.2.8
assignee: nobody → Frank Cizmich (frank-cizmich)
Daniel Nichter (daniel-nichter) wrote :

Frank,

Since we can't fix the issue as reported, let's hold off on this branch. Maybe we can create a new bug for the bug fixed in the branch. Ideally, we'd profile this because fingerprinting is the slow spot in pt-query-digest, and the fix changes regex which means we could make performance worse. If that were the case, then we'd have to determine if the fix is worth it since technically no one has asked for the specific (subset) fix.

Changed in percona-toolkit:
milestone: 2.2.8 → none
Frank Cizmich (frank-cizmich) wrote :

It seems a definitive/complete solution for this issue would necessarily result in a significant performance hit.

Changed in percona-toolkit:
status: In Progress → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers