Optimizer pruning chooses very inefficient join plan when all joined tables are similar

Bug #1010395 reported by Timour Katchaounov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
Confirmed
Medium
Timour Katchaounov

Bug Description

The following test case shows that if join optimization pruning heuristics
is ON (the default behavior in all MariaDB/MySQL) version, the optimizer
misses the optimal query plan and chooses a very inefficient query plan.

CREATE TABLE t1 (a INT, KEY(a));
INSERT INTO t1 VALUES (7),(5),(1),(204),(224),(9),(5),(0),(3);
INSERT INTO t1 SELECT al1.* FROM t1 al1, t1 al2;

-- default prune level is 1
set optimizer_prune_level=1;

EXPLAIN SELECT MAX(al1.a) FROM t1 AS al4, t1 AS al3, t1 AS al2, t1 AS al1 WHERE al4.a = al3.a;
+----+-------------+-------+-------+---------------+------+---------+-------------+------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+-------------+------+--------------------------------+
| 1 | SIMPLE | al4 | index | a | a | 5 | NULL | 90 | Using index |
| 1 | SIMPLE | al3 | ref | a | a | 5 | md321.al4.a | 10 | Using where; Using index |
| 1 | SIMPLE | al2 | index | NULL | a | 5 | NULL | 90 | Using index; Using join buffer |
| 1 | SIMPLE | al1 | index | NULL | a | 5 | NULL | 90 | Using index; Using join buffer |
+----+-------------+-------+-------+---------------+------+---------+-------------+------+--------------------------------+
SHOW SESSION STATUS LIKE 'Last_query_cost';
+-----------------+----------------+
| Variable_name | Value |
+-----------------+----------------+
| Last_query_cost | 1531029.266998 |
+-----------------+----------------+

Above, if we order the tables in the FROM clause according to their optimal order, the optimizer
finds the optimal plan.

Below, if the tables are in reverse order, then the optimizer misses the optimal order due to pruning:

EXPLAIN SELECT MAX(al1.a) FROM t1 AS al1, t1 AS al2, t1 AS al3, t1 AS al4 WHERE al4.a = al3.a;
+----+-------------+-------+-------+---------------+------+---------+-------------+------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+-------------+------+--------------------------------+
| 1 | SIMPLE | al1 | index | NULL | a | 5 | NULL | 90 | Using index |
| 1 | SIMPLE | al2 | index | NULL | a | 5 | NULL | 90 | Using index; Using join buffer |
| 1 | SIMPLE | al3 | index | a | a | 5 | NULL | 90 | Using index; Using join buffer |
| 1 | SIMPLE | al4 | ref | a | a | 5 | md321.al3.a | 10 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+-------------+------+--------------------------------+
SHOW SESSION STATUS LIKE 'Last_query_cost';
+-----------------+----------------+
| Variable_name | Value |
+-----------------+----------------+
| Last_query_cost | 2420964.599961 |
+-----------------+----------------+

As shown below, if we switch off the pruning heuristics, the join optimizer finds the optimal
plan in all cases.

-- switch off pruning heuristics
set optimizer_prune_level=0;

EXPLAIN SELECT MAX(al1.a) FROM t1 AS al4, t1 AS al3, t1 AS al2, t1 AS al1 WHERE al4.a = al3.a;
+----+-------------+-------+-------+---------------+------+---------+-------------+------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+-------------+------+--------------------------------+
| 1 | SIMPLE | al4 | index | a | a | 5 | NULL | 90 | Using index |
| 1 | SIMPLE | al3 | ref | a | a | 5 | md321.al4.a | 10 | Using where; Using index |
| 1 | SIMPLE | al2 | index | NULL | a | 5 | NULL | 90 | Using index; Using join buffer |
| 1 | SIMPLE | al1 | index | NULL | a | 5 | NULL | 90 | Using index; Using join buffer |
+----+-------------+-------+-------+---------------+------+---------+-------------+------+--------------------------------+
SHOW SESSION STATUS LIKE 'Last_query_cost';
+-----------------+----------------+
| Variable_name | Value |
+-----------------+----------------+
| Last_query_cost | 1531029.266998 |
+-----------------+----------------+

EXPLAIN SELECT MAX(al1.a) FROM t1 AS al1, t1 AS al2, t1 AS al3, t1 AS al4 WHERE al4.a = al3.a;
+----+-------------+-------+-------+---------------+------+---------+-------------+------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+-------------+------+--------------------------------+
| 1 | SIMPLE | al3 | index | a | a | 5 | NULL | 90 | Using index |
| 1 | SIMPLE | al4 | ref | a | a | 5 | md321.al3.a | 10 | Using where; Using index |
| 1 | SIMPLE | al2 | index | NULL | a | 5 | NULL | 90 | Using index; Using join buffer |
| 1 | SIMPLE | al1 | index | NULL | a | 5 | NULL | 90 | Using index; Using join buffer |
+----+-------------+-------+-------+---------------+------+---------+-------------+------+--------------------------------+
SHOW SESSION STATUS LIKE 'Last_query_cost';
+-----------------+----------------+
| Variable_name | Value |
+-----------------+----------------+
| Last_query_cost | 1531029.266998 |
+-----------------+----------------+

Notice that since al3 and al4 are exactly the same, the order <al3, al4> is the same as <al4, al3>.

Revision history for this message
Timour Katchaounov (timour) wrote :

Test case:

CREATE TABLE t1 (a INT, KEY(a));
INSERT INTO t1 VALUES (7),(5),(1),(204),(224),(9),(5),(0),(3);

INSERT INTO t1 SELECT al1.* FROM t1 al1, t1 al2;

-- default prune level is 1
set optimizer_prune_level=1;

EXPLAIN
SELECT MAX(al1.a) FROM t1 AS al4, t1 AS al3, t1 AS al2, t1 AS al1 WHERE al4.a = al3.a;
SHOW SESSION STATUS LIKE 'Last_query_cost';

EXPLAIN
SELECT MAX(al1.a) FROM t1 AS al1, t1 AS al2, t1 AS al3, t1 AS al4 WHERE al4.a = al3.a;
SHOW SESSION STATUS LIKE 'Last_query_cost';

-- switch off pruning heuristics
set optimizer_prune_level=0;

EXPLAIN
SELECT MAX(al1.a) FROM t1 AS al4, t1 AS al3, t1 AS al2, t1 AS al1 WHERE al4.a = al3.a;
SHOW SESSION STATUS LIKE 'Last_query_cost';

EXPLAIN
SELECT MAX(al1.a) FROM t1 AS al1, t1 AS al2, t1 AS al3, t1 AS al4 WHERE al4.a = al3.a;
SHOW SESSION STATUS LIKE 'Last_query_cost';

Changed in maria:
status: New → Confirmed
Revision history for this message
Timour Katchaounov (timour) wrote :

The bug is reproducible in any MariaDB/MySQL version since at least MariaDB 5.2.
This is a legacy bug that has existed since at least 2005, therefore I set its importance
to "medium".

Changed in maria:
importance: Undecided → Medium
assignee: nobody → Timour Katchaounov (timour)
Revision history for this message
Timour Katchaounov (timour) wrote :

According to Igor on IRC:
 1. when pruning we should take into account the cardinality of the partial join, not only its cost. this will require to change one condition in the code.
2. jorgen changed initial sorting to improve plans when pruning set to default (in MySQL)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.