Query containing IN subquery with OR in the where clause returns a wrong result
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MariaDB |
Fix Released
|
Critical
|
Sergey Petrunia |
Bug Description
A query with IN subquery that can be converted to a semi-join may return a wrong result in maridb-5.3 if the where clause of the subquery contains OR condition.
The following test case provides such a query.
create table t1 (a int, b int);
insert into t1 values (7,5), (3,3), (5,4), (9,3);
create table t2 (a int, b int, index i_a(a));
insert into t2 values
(4,2), (7,9), (7,4), (3,1), (5,3), (3,1), (9,4), (8,1);
set optimizer_
select * from t1 where t1.a in (select a from t2 where t2.a=7 or t2.b<=1);
The query in from the test case returns a wrong result if the optimizer switch flags 'semijoin' and 'materialization' are set to 'on', a it returns the correct answer if these flags are set to 'off'.
MariaDB [test]> set optimizer_
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> select * from t1 where t1.a in (select a from t2 where t2.a=7 or t2.b<=1);
+------+------+
| a | b |
+------+------+
| 7 | 5 |
| 3 | 3 |
| 5 | 4 |
| 9 | 3 |
+------+------+
4 rows in set (0.00 sec)
MariaDB [test]> set optimizer_
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> select * from t1 where t1.a in (select a from t2 where t2.a=7 or t2.b<=1);
+------+------+
| a | b |
+------+------+
| 7 | 5 |
| 3 | 3 |
+------+------+
2 rows in set (0.00 sec)
The warning returned by EXPLAIN EXTENDED executed for the query with
optimizer_switch set to 'semijoin=
shows that it happens because in this case the optimizer generates an invalid execution plan:
MariaDB [test]> set optimizer_
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> explain extended
-> select * from t1 where t1.a in (select a from t2 where t2.a=7 or t2.b<=1);
+----+-
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-
| 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | |
| 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 5 | func | 1 | 100.00 | |
| 2 | MATERIALIZED | t2 | ALL | i_a | NULL | NULL | NULL | 8 | 100.00 | |
+----+-
3 rows in set, 1 warning (0.00 sec)
MariaDB [test]> show warnings;
+------
| Level | Code | Message |
+------
| Note | 1003 | select `test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b` from `test`.`t1` semi join (`test`.`t2`) where (((`test`.`t1`.`a` = 7) or (`test`.`t2`.`b` <= 1))) |
+------
1 row in set (0.00 sec)
Changed in maria: | |
status: | New → Confirmed |
importance: | Undecided → Critical |
assignee: | nobody → Sergey Petrunia (sergefp) |
milestone: | none → 5.3 |
Changed in maria: | |
status: | Confirmed → In Progress |
Changed in maria: | |
status: | In Progress → Fix Committed |
Changed in maria: | |
status: | Fix Committed → Fix Released |
== Analysis ==
Equality propagation converts the WHERE clause into this:
(multiple equal(7, t1.a, t2.a) or (t2.b <= 1))
and
multiple equal(t1.a, t2.a)
This is ok.
Then, equality substitution produces this WHERE clause:
(t1.a = 7) or (t2.b <= 1)
we dont expect this kind of WHERE clauses to be produced when ion-Lookup strategy is used.
SJ-Materializat
With that strategy, we expect that the WHERE clause can be broken into two
AND-parts:
- Part#1. depend only on SJ-inner tables
- Part#2. depend only on SJ-outer tables
The only thing joining the two parts is the IN-equality. IN-equality is not
put into the WHERE condition, it is generated and checked inside the
SJ-Materialization code.
However, make_join_select() gets this clause:
(t1.a = 7) or (t2.b <= 1) (*)
which can only be checked when one has both t1.a and t2.b. This never happens,
so make_join_select() is unable to attach this condition anywhere, and it is
never checked.