Quoter::serialize_list() doesn't handle multiple NULL values
Bug #1087319 reported by
Daniel Nichter
This bug affects 2 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Toolkit moved to https://jira.percona.com/projects/PT |
Fix Released
|
Medium
|
Brian Fraser |
Bug Description
Quoter:
Related branches
lp:~percona-toolkit-dev/percona-toolkit/fix-1087319-quoter-multiple-nulls
- Daniel Nichter: Approve
- Brian Fraser (community): Approve
-
Diff: 2745 lines (+1288/-728) (has conflicts)36 files modifiedbin/pt-archiver (+57/-32)
bin/pt-deadlock-logger (+57/-32)
bin/pt-duplicate-key-checker (+57/-32)
bin/pt-find (+57/-32)
bin/pt-fk-error-logger (+57/-32)
bin/pt-heartbeat (+57/-32)
bin/pt-index-usage (+57/-32)
bin/pt-kill (+57/-32)
bin/pt-online-schema-change (+57/-32)
bin/pt-query-advisor (+57/-32)
bin/pt-query-digest (+57/-32)
bin/pt-slave-restart (+57/-32)
bin/pt-table-checksum (+57/-32)
bin/pt-table-sync (+57/-32)
bin/pt-table-usage (+76/-45)
bin/pt-upgrade (+57/-32)
bin/pt-visual-explain (+12/-12)
lib/DSNParser.pm (+5/-0)
lib/Quoter.pm (+58/-56)
lib/Sandbox.pm (+2/-1)
sandbox/load-sakila-db (+5/-1)
t/lib/Quoter.t (+137/-114)
t/pt-online-schema-change/skip_innodb.t (+5/-4)
t/pt-table-checksum/basics.t (+18/-5)
t/pt-table-checksum/bugs.t (+2/-3)
t/pt-table-checksum/char_chunking.t (+1/-1)
t/pt-table-checksum/error_handling.t (+9/-0)
t/pt-table-checksum/samples/default-results-5.1.txt (+1/-1)
t/pt-table-checksum/samples/default-results-5.5.txt (+1/-1)
t/pt-table-checksum/samples/default-results-5.6.txt (+41/-0)
t/pt-table-checksum/samples/static-chunk-size-results-5.1.txt (+1/-1)
t/pt-table-checksum/samples/static-chunk-size-results-5.5.txt (+1/-1)
t/pt-table-checksum/samples/static-chunk-size-results-5.6.txt (+41/-0)
t/pt-table-checksum/skip_innodb.t (+13/-0)
t/pt-table-sync/traces.t (+2/-1)
util/checksum-test-dataset (+2/-1)
description: | updated |
description: | updated |
Changed in percona-toolkit: | |
assignee: | nobody → Brian Fraser (fraserbn) |
Changed in percona-toolkit: | |
status: | Triaged → In Progress |
Changed in percona-toolkit: | |
importance: | Undecided → Medium |
status: | In Progress → Fix Committed |
tags: |
added: pt-online-schema-change pt-table-checksum pt-table-sync value-quoting removed: all-tools |
Changed in percona-toolkit: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
This affects pt-online- schema- change pt-table-checksum, and pt-table-sync. The first two use NibbleIterator; pt-table-sync uses the --replicate table, hence the serialized boundary values, from pt-table-checksum.
In 2.1, the relevant subs (Quoter: :serialize_ list() and Quoter: :deserialize_ list()) acted like:
"""
die "Cannot serialize multiple values with undef/NULL"
if grep { !defined $_ } @args;
return join ',', map { quotemeta } @args;
"""
then,
"""
[^\\,] * # Same as above.
my @escaped_parts = $string =~ /
\G # Start of string, or end of previous match.
( # Each of these is an element in the original list.
[^\\,]* # Anything not a backslash or a comma
(?: # When we get here, we found one of the above.
\\. # A backslash followed by something so we can continue
)* # Repeat zero of more times.
)
, # Comma dividing elements
/sxgc;
push @escaped_parts, pos($string) ? substr( $string, pos($string) ) : $string;
my @unescaped_parts = map {
my $part = $_;
# Here be weirdness. Unfortunately quotemeta() is broken, and exposes
# the internal representation of scalars. Namely, the latin-1 range,
# \128-\377 (\p{Latin1} in newer Perls) is all escaped in downgraded
# strings, but left alone in UTF-8 strings. Thus, this.
# TODO: quotemeta() might change in 5.16 to mean p{ASCII} )\W|\p{ Pattern_ Syntax} / utf8($part) # If it's a UTF-8 string,
? qr/(?=\p{ASCII})\W/ # We only care about non-word
# characters in the ASCII range
: qr/(?=\ p{ASCII} )\W|[\x{ 80}-\x{ FF}]/; # Otherwise,
# same as above, but also
# unescape the latin-1 range. class)/ $1/g;
# qr/(?=\
# And also fix this whole weird behavior under
# use feature 'unicode_strings' -- If/once that's
# implemented, this will have to change.
my $char_class = utf8::is_
$part =~ s/\\($char_
$part;
} @escaped_parts;
return @unescaped_parts;
"""
So rather complex. The UTF-8 stuff was motivated by a bug in DBD::mysql 3.0007 which CentOS 5 uses: it doesn't flag UTF-8 data as UTF-8. Brian says stuff was quoted because "so that we didn't run afoul of any perl / MYsql INTERACTIONS like inserting '\n' and getting back "\n"".
As for the UTF-8 bug, we tested and discuseed and decided that it should be ok, i.e. it won't affect anything. I'll blog about this later.
As for quoting, I don't think it's necessary because boundary values are straight from the table, so we save exactly whatever we got.
Consequently, this branch, which will be the new standard in 2.2, works like:
"""
my @parts;
foreach my $arg ( @args ) {
if ( defined $arg ) {
$arg =~ s/,/\\,/g; # escape commas
$arg =~ s/\\N/\\\\N/g; # escape literal \N
push @parts, $arg;
}
else {
push @parts, '\N';
}
}
my $string = join(',',...