Optimizer chooses a suboptimal excution plan for Q18 from DBT-3

Bug #913030 reported by Igor Babaev on 2012-01-07
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
Fix Released
High
Sergey Petrunia

Bug Description

When working with DBT-3 database (scale factor 10) created for MyISAM the optimizer of 5.3.4 chooses a suboptimal execution plan if all optimizer flags are set by default:

MariaDB [dbt3x10_myisam]> explain
    -> select sql_calc_found_rows
    -> c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
    -> from customer, orders, lineitem
    -> where o_orderkey in (select l_orderkey from lineitem
    -> group by l_orderkey having sum(l_quantity) > 300)
    -> and c_custkey = o_custkey and o_orderkey = l_orderkey
    -> group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice
    -> order by o_totalprice desc, o_orderdate
    -> limit 10;
+----+--------------+-------------+--------+--------------------------------------------+-----------------------+---------+----------------------------------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------------+--------+--------------------------------------------+-----------------------+---------+----------------------------------+----------+----------------------------------------------+
| 1 | PRIMARY | orders | ALL | PRIMARY,i_o_custkey | NULL | NULL | NULL | 15000000 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | customer | eq_ref | PRIMARY | PRIMARY | 4 | dbt3x10_myisam.orders.o_custkey | 1 | |
| 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 4 | dbt3x10_myisam.orders.o_orderkey | 1 | |
| 1 | PRIMARY | lineitem | ref | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | i_l_orderkey_quantity | 4 | dbt3x10_myisam.orders.o_orderkey | 4 | Using index |
| 2 | MATERIALIZED | lineitem | index | NULL | i_l_orderkey_quantity | 13 | NULL | 59986052 | Using index |
+----+--------------+-------------+--------+--------------------------------------------+-----------------------+---------+----------------------------------+----------+----------------------------------------------+
5 rows in set (0.00 sec)

MariaDB [dbt3x10_myisam]> select sql_calc_found_rows
    -> c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
    -> from customer, orders, lineitem
    -> where o_orderkey in (select l_orderkey from lineitem
    -> group by l_orderkey having sum(l_quantity) > 300)
    -> and c_custkey = o_custkey and o_orderkey = l_orderkey
    -> group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice
    -> order by o_totalprice desc, o_orderdate
    -> limit 10;
+--------------------+-----------+------------+-------------+--------------+-----------------+
| c_name | c_custkey | o_orderkey | o_orderdate | o_totalprice | sum(l_quantity) |
+--------------------+-----------+------------+-------------+--------------+-----------------+
| Customer#001287812 | 1287812 | 42290181 | 1997-11-26 | 558289.17 | 318 |
| Customer#000644812 | 644812 | 2745894 | 1996-07-04 | 557664.53 | 304 |
| Customer#001172513 | 1172513 | 36667107 | 1997-06-06 | 550142.18 | 322 |
| Customer#000399481 | 399481 | 43906817 | 1995-04-06 | 549431.65 | 312 |
| Customer#000571654 | 571654 | 21213895 | 1992-01-03 | 549380.08 | 327 |
| Customer#000667882 | 667882 | 2199712 | 1996-09-30 | 542154.01 | 327 |
| Customer#001492954 | 1492954 | 30332516 | 1996-03-10 | 541181.8 | 310 |
| Customer#001471966 | 1471966 | 1263015 | 1997-02-02 | 540476.8 | 320 |
| Customer#001082018 | 1082018 | 31018979 | 1995-12-06 | 537993.05 | 304 |
| Customer#001114039 | 1114039 | 30417318 | 1995-10-31 | 536420.39 | 305 |
+--------------------+-----------+------------+-------------+--------------+-----------------+
10 rows in set (11 min 22.22 sec)

If the 'semijoin' flag of the optimizer switch is set to 'off' the optimizer chooses a better execution plan:

MariaDB [dbt3x10_myisam]> set optimizer_switch='semijoin=off';
Query OK, 0 rows affected (0.00 sec)

MariaDB [dbt3x10_myisam]> explain
    -> select sql_calc_found_rows
    -> c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
    -> from customer, orders, lineitem
    -> where o_orderkey in (select l_orderkey from lineitem
    -> group by l_orderkey having sum(l_quantity) > 300)
    -> and c_custkey = o_custkey and o_orderkey = l_orderkey
    -> group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice
    -> order by o_totalprice desc, o_orderdate
    -> limit 10;
+----+--------------+----------+--------+--------------------------------------------+-----------------------+---------+----------------------------------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+----------+--------+--------------------------------------------+-----------------------+---------+----------------------------------+----------+----------------------------------------------+
| 1 | PRIMARY | orders | ALL | PRIMARY,i_o_custkey | NULL | NULL | NULL | 15000000 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | customer | eq_ref | PRIMARY | PRIMARY | 4 | dbt3x10_myisam.orders.o_custkey | 1 | |
| 1 | PRIMARY | lineitem | ref | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | i_l_orderkey_quantity | 4 | dbt3x10_myisam.orders.o_orderkey | 4 | Using index |
| 2 | MATERIALIZED | lineitem | index | NULL | i_l_orderkey_quantity | 13 | NULL | 59986052 | Using index |
+----+--------------+----------+--------+--------------------------------------------+-----------------------+---------+----------------------------------+----------+----------------------------------------------+
4 rows in set (0.00 sec)

MariaDB [dbt3x10_myisam]> select sql_calc_found_rows
    -> c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
    -> from customer, orders, lineitem
    -> where o_orderkey in (select l_orderkey from lineitem
    -> group by l_orderkey having sum(l_quantity) > 300)
    -> and c_custkey = o_custkey and o_orderkey = l_orderkey
    -> group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice
    -> order by o_totalprice desc, o_orderdate
    -> limit 10;
+--------------------+-----------+------------+-------------+--------------+-----------------+
| c_name | c_custkey | o_orderkey | o_orderdate | o_totalprice | sum(l_quantity) |
+--------------------+-----------+------------+-------------+--------------+-----------------+
| Customer#001287812 | 1287812 | 42290181 | 1997-11-26 | 558289.17 | 318 |
| Customer#000644812 | 644812 | 2745894 | 1996-07-04 | 557664.53 | 304 |
| Customer#001172513 | 1172513 | 36667107 | 1997-06-06 | 550142.18 | 322 |
| Customer#000399481 | 399481 | 43906817 | 1995-04-06 | 549431.65 | 312 |
| Customer#000571654 | 571654 | 21213895 | 1992-01-03 | 549380.08 | 327 |
| Customer#000667882 | 667882 | 2199712 | 1996-09-30 | 542154.01 | 327 |
| Customer#001492954 | 1492954 | 30332516 | 1996-03-10 | 541181.8 | 310 |
| Customer#001471966 | 1471966 | 1263015 | 1997-02-02 | 540476.8 | 320 |
| Customer#001082018 | 1082018 | 31018979 | 1995-12-06 | 537993.05 | 304 |
| Customer#001114039 | 1114039 | 30417318 | 1995-10-31 | 536420.39 | 305 |
+--------------------+-----------+------------+-------------+--------------+-----------------+
10 rows in set (2 min 31.82 sec)

The second plan is better because the materialized table is joined right after the first table orders.

summary: - Optimizer chooses suboptimal excution plan for Q18 from DBT-3
+ Optimizer chooses a suboptimal excution plan for Q18 from DBT-3
Changed in maria:
status: New → Confirmed
importance: Undecided → Critical
assignee: nobody → Sergey Petrunia (sergefp)
Changed in maria:
milestone: none → 5.3
Sergey Petrunia (sergefp) wrote :

The first attempt was this patch:

http://lists.askmonty.org/pipermail/commits/2012-January/002893.html

  BUG#913030: Optimizer chooses a suboptimal excution plan for Q18 from DBT-3
  - Added SJ_MATERIALIZATION_LOOKUP_MATCH_RATIO=0.75, made the optimizer assume
    that this is the probability that a lookup in semi-join materialized table will
    produce a match.

.. decided not to push it.

Michael Widenius (monty) on 2012-03-20
Changed in maria:
importance: Critical → High
Sergey Petrunia (sergefp) wrote :

Commited a patch, requested testing.

Changed in maria:
status: Confirmed → Fix Committed
Elena Stepanova (elenst) wrote :

Fix released in 5.3.6

Changed in maria:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers