Optimizer chooses a suboptimal excution plan for Q18 from DBT-3

Bug #913030 reported by Igor Babaev
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
Fix Released
High
Sergey Petrunia

Bug Description

When working with DBT-3 database (scale factor 10) created for MyISAM the optimizer of 5.3.4 chooses a suboptimal execution plan if all optimizer flags are set by default:

MariaDB [dbt3x10_myisam]> explain
    -> select sql_calc_found_rows
    -> c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
    -> from customer, orders, lineitem
    -> where o_orderkey in (select l_orderkey from lineitem
    -> group by l_orderkey having sum(l_quantity) > 300)
    -> and c_custkey = o_custkey and o_orderkey = l_orderkey
    -> group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice
    -> order by o_totalprice desc, o_orderdate
    -> limit 10;
+----+--------------+-------------+--------+--------------------------------------------+-----------------------+---------+----------------------------------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------------+--------+--------------------------------------------+-----------------------+---------+----------------------------------+----------+----------------------------------------------+
| 1 | PRIMARY | orders | ALL | PRIMARY,i_o_custkey | NULL | NULL | NULL | 15000000 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | customer | eq_ref | PRIMARY | PRIMARY | 4 | dbt3x10_myisam.orders.o_custkey | 1 | |
| 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 4 | dbt3x10_myisam.orders.o_orderkey | 1 | |
| 1 | PRIMARY | lineitem | ref | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | i_l_orderkey_quantity | 4 | dbt3x10_myisam.orders.o_orderkey | 4 | Using index |
| 2 | MATERIALIZED | lineitem | index | NULL | i_l_orderkey_quantity | 13 | NULL | 59986052 | Using index |
+----+--------------+-------------+--------+--------------------------------------------+-----------------------+---------+----------------------------------+----------+----------------------------------------------+
5 rows in set (0.00 sec)

MariaDB [dbt3x10_myisam]> select sql_calc_found_rows
    -> c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
    -> from customer, orders, lineitem
    -> where o_orderkey in (select l_orderkey from lineitem
    -> group by l_orderkey having sum(l_quantity) > 300)
    -> and c_custkey = o_custkey and o_orderkey = l_orderkey
    -> group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice
    -> order by o_totalprice desc, o_orderdate
    -> limit 10;
+--------------------+-----------+------------+-------------+--------------+-----------------+
| c_name | c_custkey | o_orderkey | o_orderdate | o_totalprice | sum(l_quantity) |
+--------------------+-----------+------------+-------------+--------------+-----------------+
| Customer#001287812 | 1287812 | 42290181 | 1997-11-26 | 558289.17 | 318 |
| Customer#000644812 | 644812 | 2745894 | 1996-07-04 | 557664.53 | 304 |
| Customer#001172513 | 1172513 | 36667107 | 1997-06-06 | 550142.18 | 322 |
| Customer#000399481 | 399481 | 43906817 | 1995-04-06 | 549431.65 | 312 |
| Customer#000571654 | 571654 | 21213895 | 1992-01-03 | 549380.08 | 327 |
| Customer#000667882 | 667882 | 2199712 | 1996-09-30 | 542154.01 | 327 |
| Customer#001492954 | 1492954 | 30332516 | 1996-03-10 | 541181.8 | 310 |
| Customer#001471966 | 1471966 | 1263015 | 1997-02-02 | 540476.8 | 320 |
| Customer#001082018 | 1082018 | 31018979 | 1995-12-06 | 537993.05 | 304 |
| Customer#001114039 | 1114039 | 30417318 | 1995-10-31 | 536420.39 | 305 |
+--------------------+-----------+------------+-------------+--------------+-----------------+
10 rows in set (11 min 22.22 sec)

If the 'semijoin' flag of the optimizer switch is set to 'off' the optimizer chooses a better execution plan:

MariaDB [dbt3x10_myisam]> set optimizer_switch='semijoin=off';
Query OK, 0 rows affected (0.00 sec)

MariaDB [dbt3x10_myisam]> explain
    -> select sql_calc_found_rows
    -> c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
    -> from customer, orders, lineitem
    -> where o_orderkey in (select l_orderkey from lineitem
    -> group by l_orderkey having sum(l_quantity) > 300)
    -> and c_custkey = o_custkey and o_orderkey = l_orderkey
    -> group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice
    -> order by o_totalprice desc, o_orderdate
    -> limit 10;
+----+--------------+----------+--------+--------------------------------------------+-----------------------+---------+----------------------------------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+----------+--------+--------------------------------------------+-----------------------+---------+----------------------------------+----------+----------------------------------------------+
| 1 | PRIMARY | orders | ALL | PRIMARY,i_o_custkey | NULL | NULL | NULL | 15000000 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | customer | eq_ref | PRIMARY | PRIMARY | 4 | dbt3x10_myisam.orders.o_custkey | 1 | |
| 1 | PRIMARY | lineitem | ref | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | i_l_orderkey_quantity | 4 | dbt3x10_myisam.orders.o_orderkey | 4 | Using index |
| 2 | MATERIALIZED | lineitem | index | NULL | i_l_orderkey_quantity | 13 | NULL | 59986052 | Using index |
+----+--------------+----------+--------+--------------------------------------------+-----------------------+---------+----------------------------------+----------+----------------------------------------------+
4 rows in set (0.00 sec)

MariaDB [dbt3x10_myisam]> select sql_calc_found_rows
    -> c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
    -> from customer, orders, lineitem
    -> where o_orderkey in (select l_orderkey from lineitem
    -> group by l_orderkey having sum(l_quantity) > 300)
    -> and c_custkey = o_custkey and o_orderkey = l_orderkey
    -> group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice
    -> order by o_totalprice desc, o_orderdate
    -> limit 10;
+--------------------+-----------+------------+-------------+--------------+-----------------+
| c_name | c_custkey | o_orderkey | o_orderdate | o_totalprice | sum(l_quantity) |
+--------------------+-----------+------------+-------------+--------------+-----------------+
| Customer#001287812 | 1287812 | 42290181 | 1997-11-26 | 558289.17 | 318 |
| Customer#000644812 | 644812 | 2745894 | 1996-07-04 | 557664.53 | 304 |
| Customer#001172513 | 1172513 | 36667107 | 1997-06-06 | 550142.18 | 322 |
| Customer#000399481 | 399481 | 43906817 | 1995-04-06 | 549431.65 | 312 |
| Customer#000571654 | 571654 | 21213895 | 1992-01-03 | 549380.08 | 327 |
| Customer#000667882 | 667882 | 2199712 | 1996-09-30 | 542154.01 | 327 |
| Customer#001492954 | 1492954 | 30332516 | 1996-03-10 | 541181.8 | 310 |
| Customer#001471966 | 1471966 | 1263015 | 1997-02-02 | 540476.8 | 320 |
| Customer#001082018 | 1082018 | 31018979 | 1995-12-06 | 537993.05 | 304 |
| Customer#001114039 | 1114039 | 30417318 | 1995-10-31 | 536420.39 | 305 |
+--------------------+-----------+------------+-------------+--------------+-----------------+
10 rows in set (2 min 31.82 sec)

The second plan is better because the materialized table is joined right after the first table orders.

summary: - Optimizer chooses suboptimal excution plan for Q18 from DBT-3
+ Optimizer chooses a suboptimal excution plan for Q18 from DBT-3
Changed in maria:
status: New → Confirmed
importance: Undecided → Critical
assignee: nobody → Sergey Petrunia (sergefp)
Changed in maria:
milestone: none → 5.3
Revision history for this message
Sergey Petrunia (sergefp) wrote :

The first attempt was this patch:

http://lists.askmonty.org/pipermail/commits/2012-January/002893.html

  BUG#913030: Optimizer chooses a suboptimal excution plan for Q18 from DBT-3
  - Added SJ_MATERIALIZATION_LOOKUP_MATCH_RATIO=0.75, made the optimizer assume
    that this is the probability that a lookup in semi-join materialized table will
    produce a match.

.. decided not to push it.

Michael Widenius (monty)
Changed in maria:
importance: Critical → High
Revision history for this message
Sergey Petrunia (sergefp) wrote :

Commited a patch, requested testing.

Changed in maria:
status: Confirmed → Fix Committed
Revision history for this message
Elena Stepanova (elenst) wrote :

Fix released in 5.3.6

Changed in maria:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.