Invalid plan and wrong result set for Q20 from DBT3 benchmark set

Bug #822134 reported by Igor Babaev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
Fix Released
Critical
Sergey Petrunia

Bug Description

The query Q20 from DBT3 query set returns an empty result set for an InnoDB database of scale 10 in mariadb5.3.0.

The problem can be reproduced on a debug build (and a release build as well) with the following commands:
use dbt3sf10 [DBT3 of scale 10 created for InnoDB]
set optimizer_switch='semijoin=on';
set optimizer_switch='materialization=on';
set optimizer_switch='in_to_exists=off';
select sql_calc_found_rows
       s_name, s_address
from supplier, nation
where s_suppkey in (select ps_suppkey from partsupp
                    where ps_partkey in (select p_partkey from part
                                         where p_name like 'forest%')
                          and ps_availqty >
                              (select 0.5 * sum(l_quantity)
                               from lineitem
                               where l_partkey = ps_partkey
                                     and l_suppkey = ps_suppkey
                                     and l_shipdate >= date('1994-01-01')
                                     and l_shipdate < date('1994-01-01') +
                                         interval '1' year ))
and s_nationkey = n_nationkey
and n_name = 'CANADA'
order by s_name
limit 10;

EXPLAIN shows that the execution plan is invalid, because it uses an outer reference when building a
materialized table.
+----+--------------------+-------------+--------+----------------------------------------------------------+---------------------+---------+-----------------------------------------------------------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------------+--------+----------------------------------------------------------+---------------------+---------+-----------------------------------------------------------+---------+----------------------------------------------+
| 1 | PRIMARY | supplier | ALL | PRIMARY,i_s_nationkey | NULL | NULL | NULL | 99880 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | nation | eq_ref | PRIMARY | PRIMARY | 4 | dbt3sf10.supplier.s_nationkey | 1 | Using where |
| 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 4 | func | 1 | |
| 2 | SUBQUERY | part | ALL | PRIMARY | NULL | NULL | NULL | 1996969 | Using where |
| 2 | SUBQUERY | partsupp | eq_ref | PRIMARY,i_ps_partkey,i_ps_suppkey | PRIMARY | 8 | dbt3sf10.part.p_partkey,dbt3sf10.supplier.s_suppkey | 2 | Using where |
| 4 | DEPENDENT SUBQUERY | lineitem | ref | i_l_shipdate,i_l_suppkey_partkey,i_l_partkey,i_l_suppkey | i_l_suppkey_partkey | 10 | dbt3sf10.partsupp.ps_partkey,dbt3sf10.partsupp.ps_suppkey | 3 | Using where |
+----+--------------------+-------------+--------+----------------------------------------------------------+---------------------+---------+-----------------------------------------------------------+---------+----------------------------------------------+

On my computer the bug is not reproducible at every execution. Sometimes the optimizer produces a valid plan that uses index i_ps_partkey and returns a correct result.

+----+--------------------+-------------+--------+----------------------------------------------------------+---------------------+---------+-----------------------------------------------------------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------------+--------+----------------------------------------------------------+---------------------+---------+-----------------------------------------------------------+---------+----------------------------------------------+
| 1 | PRIMARY | nation | ALL | PRIMARY | NULL | NULL | NULL | 25 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | supplier | ref | PRIMARY,i_s_nationkey | i_s_nationkey | 5 | dbt3sf10.nation.n_nationkey | 3121 | |
| 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 4 | func | 1 | |
| 2 | SUBQUERY | part | ALL | PRIMARY | NULL | NULL | NULL | 2001943 | Using where |
| 2 | SUBQUERY | partsupp | ref | PRIMARY,i_ps_partkey,i_ps_suppkey | i_ps_partkey | 4 | dbt3sf10.part.p_partkey | 1 | Using where |
| 4 | DEPENDENT SUBQUERY | lineitem | ref | i_l_shipdate,i_l_suppkey_partkey,i_l_partkey,i_l_suppkey | i_l_suppkey_partkey | 10 | dbt3sf10.partsupp.ps_partkey,dbt3sf10.partsupp.ps_suppkey | 3 | Using where |
+----+--------------------+-------------+--------+----------------------------------------------------------+---------------------+---------+-----------------------------------------------------------+---------+----------------------------------------------+

Changed in maria:
importance: Undecided → Critical
assignee: nobody → Sergey Petrunia (sergefp)
milestone: none → 5.3
summary: - Invalid plan abd wrong result set for Q20 from DBT3 benchmark set
+ Invalid plan and wrong result set for Q20 from DBT3 benchmark set
Revision history for this message
Sergey Petrunia (sergefp) wrote :

Here's a way to reliably get the wrong query plan:

Use this query (note the added IGNORE INDEX clause):

explain select sql_calc_found_rows s_name, s_address
from
  supplier, nation
where
  s_suppkey in (select ps_suppkey from partsupp ignore index (i_ps_partkey)
                where ps_partkey in (select p_partkey from part
                                     where p_name like 'forest%')
                      and ps_availqty > (select 0.5 * sum(l_quantity)
                                         from lineitem
                                         where l_partkey = ps_partkey and
                                               l_suppkey = ps_suppkey and
                                               l_shipdate >= date('1994-01-01')
                                               and l_shipdate < date('1994-01-01') + interval '1' year
                                        )
               )
  and s_nationkey = n_nationkey and n_name = 'CANADA'
order by s_name limit 10;

In debugger: when best_access_path() is called with idx == 0 && !strcmp(s->table->alias->Ptr, "nation")
then at the end of the function do:
  set best=100000
  set records=25000

Changed in maria:
status: New → Fix Committed
Changed in maria:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.