Wrong result with outer join + subquery in ON clause +unique key in maria-5.3

Bug #817384 reported by Philip Stoev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
Fix Released
High
Timour Katchaounov

Bug Description

Not repeatable in maria-5.2,maria-5.1,mysql-5.5

Even after the fix for 813447, the following query:

SELECT t2.b FROM t1
LEFT JOIN t2
ON t1.col_int_nokey = t2.a
AND ( t2.b , t1.b ) IN ( SELECT 'c' , 'd' );

returns "a" which is wrong since the second part of the ON predicate is FALSE for all rows.

explain:

| 1 | PRIMARY | t1 | system | NULL | NULL | NULL | NULL | 1 | |
| 1 | PRIMARY | t2 | const | PRIMARY | PRIMARY | 4 | const | 1 | |
| 2 | DEPENDENT SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |

optimizer_switch:

 index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=off,derived_merge=off,derived_with_keys=off,firstmatch=off,loosescan=off,materialization=off,in_to_exists=on,semijoin=off,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=off,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=off,semijoin_with_cache=off,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on

 bzr version-info
revision-id: <email address hidden>
date: 2011-07-25 21:52:15 -0700
build-date: 2011-07-28 11:26:14 +0300
revno: 3005
branch-nick: maria-5.2

test case:

CREATE TABLE t1 ( c int NOT NULL , b varchar(32) NOT NULL ) ;
INSERT INTO t1 VALUES (1,'b');

CREATE TABLE t2 ( a int NOT NULL , b varchar(32) NOT NULL , PRIMARY KEY (a)) ;
INSERT INTO t2 VALUES (1,'a');

SELECT t2.b FROM t1
LEFT JOIN t2
ON t1.c = t2.a
AND ( t2.b , t1.b ) IN (SELECT 'c' , 'd' );

Related branches

Changed in maria:
milestone: none → 5.3
assignee: nobody → Timour Katchaounov (timour)
Revision history for this message
Timour Katchaounov (timour) wrote :

Slightly more general example:

CREATE TABLE t1 ( c int NOT NULL , b char(1) NOT NULL ) ;
INSERT INTO t1 VALUES (1,'b');

CREATE TABLE t2 ( a int NOT NULL , b char(1) NOT NULL , PRIMARY KEY (a)) ;
INSERT INTO t2 VALUES (1,'a');

create table t3 (c1 char(1), c2 char(2));
insert into t3 values ('c','d');
insert into t3 values ('c','d');

SELECT t2.b
FROM t1 LEFT JOIN t2 ON t1.c = t2.a AND ( t2.b , t1.b ) IN (SELECT * from t3);

Changed in maria:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Timour Katchaounov (timour) wrote :

Analysis:

The following simple queries demonstrates that the wrong result is a
consequence of constant optimization:

This query incorrectly produces a value:
SELECT t2.b FROM t1 LEFT JOIN t2 ON (t2.b) IN (SELECT c2 from t3) AND t2.a = 1;
+----+--------------------+-------+--------+---------------+---------+---------+-------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+--------+---------------+---------+---------+-------+------+----------+-------------+
| 1 | PRIMARY | t1 | system | NULL | NULL | NULL | NULL | 1 | 100.00 | |
| 1 | PRIMARY | t2 | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | |
| 2 | DEPENDENT SUBQUERY | t3 | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | Using where |
+----+--------------------+-------+--------+---------------+---------+---------+-------+------+----------+-------------+

while this query works correctly:
SELECT t2.b FROM t1 LEFT JOIN t2 ON (t2.b) IN (SELECT c2 from t3) AND t2.b = 'a';
+----+--------------------+-------+--------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+--------+---------------+------+---------+------+------+----------+-------------+
| 1 | PRIMARY | t1 | system | NULL | NULL | NULL | NULL | 1 | 100.00 | |
| 1 | PRIMARY | t2 | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | t3 | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | Using where |
+----+--------------------+-------+--------+---------------+------+---------+------+------+----------+-------------+

In the wrong query constant optimization finds that the condition t2.a = 1 can be used
to access the primary key of table 't2'. As a result both outer table t1,t2 are considered
as constant when we reach the execution phase. At the same time, during constant
optimization, the IN predicate is not evaluated because it is expensive.

When execution of the outer query reaches do_select(), control flow enter the branch:
if (join->table_count == join->const_tables)
{ ... }
This branch checks only the WHERE and HAVING clauses, but doesn't check the ON
clauses of the query. Since the IN predicate was not evaluated during optimization,
it is not evaluated at all, thus execution doesn't detect that the ON clause is FALSE.

Consider the second query that works correctly. In this query the second table is
not constant. As a result do_select() executes the alternative IF branch, where
it calls sub_select(). This execution path ends up calling evaluate_join_record(),
which evaluates the ON conditions as well, where the IN predicate is evaluated
and the query produces the correct result.

Changed in maria:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.