percona_log_slow_query_plan.test fails randomly

Bug #1287299 reported by George Ormond Lorch III on 2014-03-03
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Fix Released
Medium
Yura Sorokin
5.5
Invalid
Medium
Yura Sorokin
5.6
Fix Released
Medium
Yura Sorokin

Bug Description

While testing fix for bug 1016991 I encountered several other random slow query log test failures. Failures seen in jenkins and reproduced locally. After reverting changes I was able to reproduce these same failures easily:

> ./mtr --parallel=12 ./t/percona_log_slow_query_plan.test ./t/percona_log_slow_verbosity.test ./t/percona_slow_extended_error_on_quit.test ./t/percona_slow_extended_log_error.test ./t/percona_log_slow_slave_statements.test ./t/percona_log_slow_query_plan.test ./t/percona_log_slow_verbosity.test ./t/percona_slow_extended_error_on_quit.test ./t/percona_slow_extended_log_error.test ./t/percona_log_slow_slave_statements.test ./t/percona_log_slow_query_plan.test ./t/percona_log_slow_verbosity.test ./t/percona_slow_extended_error_on_quit.test ./t/percona_slow_extended_log_error.test ./t/percona_log_slow_slave_statements.test ./t/percona_log_slow_query_plan.test ./t/percona_log_slow_verbosity.test ./t/percona_slow_extended_error_on_quit.test ./t/percona_slow_extended_log_error.test ./t/percona_log_slow_slave_statements.test ./t/percona_log_slow_query_plan.test ./t/percona_log_slow_verbosity.test ./t/percona_slow_extended_error_on_quit.test ./t/percona_slow_extended_log_error.test ./t/percona_log_slow_slave_statements.test ./t/percona_log_slow_query_plan.test ./t/percona_log_slow_verbosity.test ./t/percona_slow_extended_error_on_quit.test ./t/percona_slow_extended_log_error.test ./t/percona_log_slow_slave_statements.test ./t/percona_log_slow_query_plan.test ./t/percona_log_slow_verbosity.test ./t/percona_slow_extended_error_on_quit.test ./t/percona_slow_extended_log_error.test ./t/percona_log_slow_slave_statements.test ./t/percona_log_slow_query_plan.test ./t/percona_log_slow_verbosity.test ./t/percona_slow_extended_error_on_quit.test ./t/percona_slow_extended_log_error.test ./t/percona_log_slow_slave_statements.test ./t/percona_log_slow_query_plan.test ./t/percona_log_slow_verbosity.test ./t/percona_slow_extended_error_on_quit.test ./t/percona_slow_extended_log_error.test ./t/percona_log_slow_slave_statements.test

[...]

CURRENT_TEST: main.percona_log_slow_query_plan
--- /home/glorch/dev/bug1016991/5.6-548/mysql-test/r/percona_log_slow_query_plan.result 2014-03-01 07:26:42.817727000 +0300
+++ /home/glorch/dev/bug1016991/5.6-548/mysql-test/var/3/log/percona_log_slow_query_plan.reject 2014-03-03 21:09:59.884963618 +0300
@@ -83,10 +83,10 @@
 SELECT * FROM t2 ORDER BY a;
 [log_stop.inc] percona_slow_query_log.query_plan_3
 [log_grep.inc] file: percona_slow_query_log.query_plan_3 pattern: ^# Filesort: Yes Filesort_on_disk: Yes Merge_passes: 6$
-[log_grep.inc] lines: 1
+[log_grep.inc] lines: 0
 SHOW SESSION STATUS LIKE 'Sort_merge_passes';
 Variable_name Value
-Sort_merge_passes 8
+Sort_merge_passes 6
 SET SESSION sort_buffer_size=default;
 SHOW SESSION STATUS LIKE 'Select_scan';
 Variable_name Value

mysqltest: Result content mismatch

George Ormond Lorch III (gl-az) wrote :

I did not test 5.1 but reproduced against both 5.5 and 5.6 using debug builds.

George Ormond Lorch III (gl-az) wrote :

Some of these failures can be seen in the following jenkins runs. The bug fix mentioned above in place but as I said, I reproduced these without the bug fix in place as well locally.

http://jenkins.percona.com/view/PS%205.5/job/percona-server-5.5-param/963/
http://jenkins.percona.com/view/PS%205.6/job/percona-server-5.6-param/535/

This is a preexisting issue: bug 1163680, please merge the bugs and close one as a duplicate of the other.
See also bug 1098857.

Easy to reproduce on recent 5.6.16 non-debug build:

main.percona_log_slow_query_plan w6 [ fail ]
        Test ended at 2014-03-04 10:26:59

CURRENT_TEST: main.percona_log_slow_query_plan
--- /home/openxs/bzr2/percona-5.6/mysql-test/r/percona_log_slow_query_plan.result 2014-01-28 14:08:51.545140000 +0300
+++ /home/openxs/bzr2/percona-5.6/mysql-test/var/6/log/percona_log_slow_query_plan.reject 2014-03-04 11:26:58.937385033 +0300
@@ -83,10 +83,10 @@
 SELECT * FROM t2 ORDER BY a;
 [log_stop.inc] percona_slow_query_log.query_plan_3
 [log_grep.inc] file: percona_slow_query_log.query_plan_3 pattern: ^# Filesort: Yes Filesort_on_disk: Yes Merge_passes: 6$
-[log_grep.inc] lines: 1
+[log_grep.inc] lines: 0
 SHOW SESSION STATUS LIKE 'Sort_merge_passes';
 Variable_name Value
-Sort_merge_passes 8
+Sort_merge_passes 6
 SET SESSION sort_buffer_size=default;
 SHOW SESSION STATUS LIKE 'Select_scan';
 Variable_name Value

mysqltest: Result content mismatch

 - saving '/home/openxs/bzr2/percona-5.6/mysql-test/var/6/log/main.percona_log_slow_query_plan/' to '/home/openxs/bzr2/percona-5.6/mysql-test/var/log/main.percona_log_slow_query_plan/'

Only 19 of 45 completed.
mysql-test-run: *** ERROR: Not all tests completed

tags: added: ci slow-extended
Yura Sorokin (yura-sorokin) wrote :

An easier way to reproduce this problem (with only one test case included):

./mtr \
  --parallel=8 \
  percona_log_slow_query_plan \
  percona_log_slow_query_plan \
  percona_log_slow_query_plan \
  percona_log_slow_query_plan \
  percona_log_slow_query_plan \
  percona_log_slow_query_plan \
  percona_log_slow_query_plan \
  percona_log_slow_query_plan

Reproduced on VMware Workstation in the following environments:
 1. CentOS 6.7 x86_64 (devtoolset-2 gcc 4.8.2) Percona Server 5.6.25 (debug)
 2. CentOS 7 x86_64 (native gcc 4.8.3) Percona Server 5.6.25 (debug)

Cannot be reproduced with Percona Server 5.5.44.

The parallel factor (8) along with the number of tests in the command line ("percona_log_slow_query_plan") can be increased on more powerful hardware.

Here is MTR log.

main.percona_log_slow_query_plan w7 [ fail ]
        Test ended at 2015-08-27 17:33:34

CURRENT_TEST: main.percona_log_slow_query_plan
--- /home/yura/ws/percona-server/mysql-test/r/percona_log_slow_query_plan.result 2015-08-27 17:28:10.980255266 +0300
+++ /home/yura/ws/percona-build/mysql-test/var/7/log/percona_log_slow_query_plan.reject 2015-08-27 17:33:33.610449008 +0300
@@ -83,10 +83,10 @@
 SELECT * FROM t2 ORDER BY a;
 [log_stop.inc] percona_slow_query_log.query_plan_3
 [log_grep.inc] file: percona_slow_query_log.query_plan_3 pattern: ^# Filesort: Yes Filesort_on_disk: Yes Merge_passes: 6$
-[log_grep.inc] lines: 1
+[log_grep.inc] lines: 0
 SHOW SESSION STATUS LIKE 'Sort_merge_passes';
 Variable_name Value
-Sort_merge_passes 8
+Sort_merge_passes 6
 SET SESSION sort_buffer_size=default;
 SHOW SESSION STATUS LIKE 'Select_scan';
 Variable_name Value

mysqltest: Result content mismatch

 - saving '/home/yura/ws/percona-build/mysql-test/var/7/log/main.percona_log_slow_query_plan/' to '/home/yura/ws/percona-build/mysql-test/var/log/main.percona_log_slow_query_plan/'

Only 1 of 8 completed.
mysql-test-run: *** ERROR: Not all tests completed

Yura Sorokin (yura-sorokin) wrote :
Download full text (8.3 KiB)

First of all, the easiest way to reproduce the problem (to receive unexpected value of "Merge_passes: xxx") is to run "abc.test" with "--debug" option.
"./mtr --debug abc"

"abc.test" is a simplified version of the "percona_log_slow_query_plan.test" with only 1 table involved.
It can be downloaded from https://github.com/percona-ysorokin/percona-server/commit/baaf911d2d2d15077d9474518f795bebe089bc04 or taken from the attachment.

The problem is in race condition between "dict_stats" thread and "sql select" thread.

In particular "SELECT * FROM t2 ORDER BY a" at some point executes "filesort()" which in turn tries to estimate the number of rows in the table
"num_rows= table->file->estimate_rows_upper_bound()". Then, the number of the "Merge_passes" directly depends on the "num_rows".

The result of the "ha_innobase::estimate_rows_upper_bound()" depends on the approximate number of leaf pages in the index tree "stat_n_leaf_pages"
which (omitting all the intermediate checkings) is roughly "prebuilt->table->indexes.start->stat_n_leaf_pages".
However, the value of this "stat_n_leaf_pages" in the middle of "estimate_rows_upper_bound()" execution can be modified by "dict_stats" thread.
In particular, when this thread is executing "dict_stats_analyze_index()", the value of the "stat_n_leaf_pages" can be in 3 different states:
1. old value.
2. intermediate value of 1 (is set in "dict_stats_empty_index(index)" ouside of any locks)
3. new value (assigned in "index->stat_n_leaf_pages = size" outside of any locks).

Depending how fast each thread runs, "num_rows" can be calculated differently and therefore we encounter different number of "Merge_passes: xxx".

Just in case, here is two backtraces of the corresponding threads

#0 dict_stats_empty_index (index=0x7fffcc039dd8)
    at /home/yura/ws/percona-server/storage/innobase/dict/dict0stats.cc:541
#1 0x0000000000d363eb in dict_stats_analyze_index (index=0x7fffcc039dd8)
    at /home/yura/ws/percona-server/storage/innobase/dict/dict0stats.cc:1904
#2 0x0000000000d36ed5 in dict_stats_update_persistent (table=0x7fffcc00c5a8)
    at /home/yura/ws/percona-server/storage/innobase/dict/dict0stats.cc:2209
#3 0x0000000000d38a43 in dict_stats_update (table=0x7fffcc00c5a8, stats_upd_option=DICT_STATS_RECALC_PERSISTENT)
    at /home/yura/ws/percona-server/storage/innobase/dict/dict0stats.cc:3127
#4 0x0000000000d3cbf7 in dict_stats_process_entry_from_recalc_pool ()
    at /home/yura/ws/percona-server/storage/innobase/dict/dict0stats_bg.cc:313
#5 0x0000000000d3ccfa in dict_stats_thread (arg=0x0)
    at /home/yura/ws/percona-server/storage/innobase/dict/dict0stats_bg.cc:363
#6 0x00007ffff7bc6df5 in start_thread () from /lib64/libpthread.so.0
#7 0x00007ffff623c1ad in clone () from /lib64/libc.so.6

#0 0x0000000000b01bec in ha_innobase::estimate_rows_upper_bound (this=0x7fffcc009810)
    at /home/yura/ws/percona-server/storage/innobase/handler/ha_innodb.cc:11299
#1 0x000000000093aa9e in filesort (thd=0x1b71150, table=0x7fffcc00ba40, filesort=0x7fffcc006f48,
    sort_positions=false, examined_rows=0x7fffec2dbe28, found_rows=0x7fffec2dbe20)
    at /home/yura/ws/percona-server/sql/filesort.cc:259
#2 0x00000000007bc...

Read more...

Yura Sorokin (yura-sorokin) wrote :

A simplified version of the "percona_log_slow_query_plan.test".

Thes scenario looks plausible and should be fixed, but the background stat recalculation is a 5.6-specific feature, while we have seen this bug on 5.5 and lower versions. Thus there could be something else. One guess would be that table stats are unstable without their background recalculation, then an ANALYZE TABLE between inserts and selects would help.

summary: - percona_log_slow_query_plan.test fails radomly
+ percona_log_slow_query_plan.test fails randomly

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-1483

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments