Any logging code in MySQL.pm (--sqltrace or otherwise) does not log all transformer queries to the RQG log

Bug #1220975 reported by Roel Van de Paar
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Random Query Generator
Confirmed
Critical
John H. Embretsen

Bug Description

The --sqltrace function does not correctly log all queries which were executed against mysqld to the RQG log. The difference seems to be in the transformed queries. Some are logged, some are not. Issue is easy to reproduce as described below.

In the example below, sqltrace output goes to /tmp/test1.log, log file output goes to /tmp/test1.txt

[roel@qaserver rqg]$ cat .bzr/branch/last-revision
893 <email address hidden>
[roel@qaserver rqg]$ cat test1
ps -ef | grep 'test1' | grep -v grep | awk '{print $2}' | xargs sudo kill -9 > /dev/null 2>&1
rm -Rf /tmp/test1
mkdir /tmp/test1
cd /ssd/rqg
bash -c "set -o pipefail; perl /ssd/rqg/runall.pl --queries=100000000 --seed=24433 --duration=900 --querytimeout=60 --reporter=Shutdown,Backtrace,QueryTimeout,ErrorLog,ErrorLogAlarm --mysqld=--log-output=FILE --mysqld=--general_log_file=/tmp/test1.txt --sqltrace --threads=25 --mysqld=--innodb_file_per_table=1 --validator=Transformer --mysqld=--innodb_adaptive_hash_index_partitions=1 --basedir=/ssd/Percona-Server-5.5.32-rel31.0-564-debug.Linux.x86_64 --grammar=test1.yy --gendata=conf/examples/example.zz --mtr-build-thread=786 --mask=48291 --vardir1=/tmp/test1 > /tmp/test1.log 2>&1"
tail -n1 /tmp/test1.log
[roel@qaserver rqg]$ ./test1
2013-09-05T11:36:25 [21726] /ssd/rqg/runall.pl will exit with exit status STATUS_SERVER_CRASHED (101) <--- status outcome is not important I think (it also happens with STATUS_ENVIRONMENT_FAILURE for example)
[roel@qaserver rqg]$ grep "transforms" /tmp/test1.txt | wc -l
4607
[roel@qaserver rqg]$ grep "transforms" /tmp/test1.log | wc -l
1054
[roel@qaserver rqg]$ grep "transforms.insert_select" /tmp/test1.txt | wc -l
1925
[roel@qaserver rqg]$ grep "transforms.insert_select" /tmp/test1.log | wc -l
457
[roel@qaserver rqg]$ grep "transforms.insert_select_15171" /tmp/test1.txt
                   10 Query DROP TABLE IF EXISTS transforms.insert_select_15171
                   10 Query CREATE TABLE transforms.insert_select_15171 SELECT GET_LOCK('LOCK', 65535)
                   10 Query SELECT * FROM transforms.insert_select_15171 /* TRANSFORM_OUTCOME_UNORDERED_MATCH */
                   10 Query DELETE FROM transforms.insert_select_15171
                   10 Query INSERT INTO transforms.insert_select_15171 SELECT GET_LOCK('LOCK', 65535)
                   10 Query SELECT * FROM transforms.insert_select_15171 /* TRANSFORM_OUTCOME_UNORDERED_MATCH */
                   10 Query DELETE FROM transforms.insert_select_15171
                   10 Query REPLACE INTO transforms.insert_select_15171 SELECT GET_LOCK('LOCK', 65535)
                   10 Query SELECT * FROM transforms.insert_select_15171 /* TRANSFORM_OUTCOME_UNORDERED_MATCH */
                   10 Query DROP TABLE transforms.insert_select_15171
[roel@qaserver rqg]$ grep "transforms.insert_select_15171" /tmp/test1.log
[roel@qaserver rqg]$

This is likely reproducible on any MySQL version, but in this case I used Percona Server (lp:percona-server/5.5) at revid 564 <email address hidden> - the issue is consistently reproducible on it. If needbe, this server can easily be build by using the lp:percona-qa/build_percona.sh script and using the debug version. (i.e. bzr branch lp:percona-server/5.5 -> bzr branch lp:percona-qa -> go to 5.5 directory -> call percona-qa/build_percona.sh script with no options -> move resulting debug build to location of choice after doing cd ..)

Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :
Revision history for this message
John H. Embretsen (johnemb) wrote :

Do you see this with, say, --threads=1?

Perl's printing functions are not 100% thread-safe, and we have seen before some issues with printing (and other things) in highly concurrent scenarios.

Revision history for this message
Roel Van de Paar (roel11) wrote :

With this patch;

[roel@qaserver rqg]$ bzr diff
=== modified file 'lib/GenTest/Executor/MySQL.pm'
--- lib/GenTest/Executor/MySQL.pm 2013-09-05 01:15:49 +0000
+++ lib/GenTest/Executor/MySQL.pm 2013-09-05 12:04:19 +0000
@@ -591,6 +591,7 @@
        my $trace_query;
        my $trace_me = 0;

+print "#TEST1# $query;\n";

        # Write query to log before execution so it's sure to get there
        if ($executor->sqltrace) {

and --sqltrace turned off, the difference is also highly visible.

[roel@qaserver rqg]$ grep "transforms" /tmp/test1.txt | wc -l
8580
[roel@qaserver rqg]$ grep "transforms" /tmp/test1.log | wc -l
1004

Testing with --threads=1 now.

Revision history for this message
Roel Van de Paar (roel11) wrote :

Yes, the issue is also visible with --threads=1

[roel@qaserver rqg]$ ./test1
2013-09-06T10:21:54 [32320] /ssd/rqg/runall.pl will exit with exit status STATUS_ENVIRONMENT_FAILURE (110)
[roel@qaserver rqg]$ grep "transforms" /tmp/test1.log | wc -l
1067
[roel@qaserver rqg]$ grep "transforms" /tmp/test1.txt | wc -l
10726

Revision history for this message
Roel Van de Paar (roel11) wrote :

To me it looks like some "hierarchy in the code" related problems; for example, something not being available in a called procedure in another script or similar.

Revision history for this message
Roel Van de Paar (roel11) wrote :

#6 seems to exclude --sqltrace involvement altogether. Any ideas on the main issue John?

summary: - --sqltrace does not log all used transformer queries to the log
+ Any logging code in MySQL.pm (--sqltrace or otherwise) does not log all
+ transformer queries to the RQG log
Revision history for this message
John H. Embretsen (johnemb) wrote :
Download full text (5.0 KiB)

Roel,

I have looked briefly into the issue. As in most cases, reducing the test as much as possible helps to understand the issue better. In this case I tried running only a single query with all transformers enabled, and there was a difference of 2 statements (I used the grammar optimizer_no_subquery.yy):

$ grep "transforms" /tmp/test1.log | wc -l
26
$ grep "transforms" /tmp/test1.txt | wc -l
28

Then I did some extra checks with grep, e.g. "grep SELECT", "grep INSERT", "grep CREATE" added to the command lines above, before the "wc -l". After quickly inspecting the output with this kind of filtering, it seemed that something related to triggers was missing, so I re-ran the test using only the ExecuteAsTrigger transformer (--transformer=ExecuteAsTrigger). Then I got:

$ grep "transforms" /tmp/test1.log | wc -l
6
$ grep "transforms" /tmp/test1.txt | wc -l
8

With this few statements, manual inspection is feasible. Two statements were in the general log, but not in the trace:

1) CREATE DATABASE IF NOT EXISTS transforms

This was the very first statement in the general log which contained the string "transforms". The trace did not include this, but it had:

/* GenTest::Transform::ExecuteAsTrigger */ CREATE DATABASE IF NOT EXISTS transforms;
which also the general log had.

It turns out that the latter is from the ExecuteAsTrigger transformer itself, while the fomer (the statement that was missing from the trace) is from GenTest/Validator/Transformer.pm::validate():

 if ($database_created == 0) {
  $executor->dbh()->do("CREATE DATABASE IF NOT EXISTS transforms");
  $database_created = 1;
 }

Here, the code bypasses the executor code by doing $executor->dbh()->do() instead of $executor->execute(). So this explains why it is not included in the trace, as the trace functionality is in the execute() sub. I do not know why the executor is bypassed like this, or if it can/should be changed.

I suspect that the code shown above is only executed once per test run, so it should not contribute that much to a huge difference in counts as shown in the bug description.

2) INSERT INTO transforms.trigger213329 SELECT alias2 . `col_time_key` AS field1 , alias1 . `col_varchar_key` AS field2 , alias2 . `col_datetime_key` AS field3 , alias1 . `col_varchar_key` AS field4 , alias2 . `col_int_key` AS field5 FROM ( CC AS alias1 , CC AS alias2 ) WHERE ( alias1 . `pk` > 122 AND alias1 . `pk` < ( 122 + 103 ) OR alias1 . `col_varchar_key` IN ('USA' , 'm') ) AND ( alias1 . `col_int_nokey` <> alias1 . `col_varchar_nokey` OR alias1 . `col_int_nokey` IS NULL ) GROUP BY field1, field2, field3, field4, field5 ORDER BY field2 DESC

This statement is also missing from the trace. This statement turns out to be the action caused by the trigger itself. The trigger definition looks like this:

CREATE TRIGGER trigger113329 BEFORE INSERT ON trigger113329 FOR EACH ROW INSERT INTO transforms.trigger213329 SELECT alias2 . `col_time_key` AS field1 , alias1 . `col_varchar_key` AS field2 , alias2 . `col_datetime_key` AS field3 , alias1 . `col_varchar_key` AS field4 , alias2 . `col_int_key` AS field5 FROM ( CC AS alias1 , CC AS alias2 ) WHERE ( alias1 . `pk` > 122 AND alias...

Read more...

Revision history for this message
Roel Van de Paar (roel11) wrote :

John, for the 1000 queries run, did the transformer already get into "issue found, simplifying" mode? Likely not; I expect none of ongoing transformed query simplification are logged, but 1000 queries may not have been enough to get into that yet. And assume version used was the same one as reported?

Revision history for this message
John H. Embretsen (johnemb) wrote :

Roel,

I no longer have the earlier results, but I did a re-run using MySQL 5.6 (I do not have time to set up a completely new environment just for this bug). It seems there were no "simplifications" during that run (grep "implif" /tmp/test1.log).

I did another run with --queries=10000 and --duration=1800 and --grammar=conf/optimizer/optimizer_subquery_semijoin.yy. Results are as follows:

$ grep "Simplified transformed query" /tmp/test1.log | wc -l
1

$ grep "transforms" /tmp/test1.txt | wc -l
10375

$ grep "transforms" /tmp/test1.log | wc -l
10078

$ grep "transforms" /tmp/test1.txt | grep -v "trigger" | wc -l
8599

$ grep "transforms" /tmp/test1.log | grep -v "trigger" | wc -l
8598

So, it seems that all discrepancies that I see are casued by reasons mentioned in my previous comment (#10) to this bug report.
I do recognize that the difference in numbers that you see seem proportionally higher, but I cannot say if this has a different root cause or not.

I suggest trying to perform some kind of "divide and conquer" approach (as indicated earlier) to get to the bottom of this. For example:
 - try with different grammars
 - try filtering out all trigger queries
 - try using a reduced set of Transformers
 - try with fewer queries
 - eliminate unnecessary complexities, such as reporters, multiple threads, server options, grammar masking, etc.

A quick look in the code suggests that the query executions done by the simplifier for failing transforms should also count in the sqltrace, as the executors are used directly, but I have not verified this 100%.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.