Wrong result (missing rows) with join_cache_hashed+materialization+semijoin=on, join_cache_level > 2, JOIN, IN subquery

Bug #912513 reported by Elena Stepanova
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
Fix Released
Critical
Sergey Petrunia

Bug Description

The following query

SELECT alias1.* FROM
    t1 AS alias1 INNER JOIN t1 AS alias2
      ON alias2.a = alias1.b
WHERE alias1.b IN (
        SELECT a FROM t1, t2
    )

on the test data returns a single row with join_cache_level>2, and 7 rows with join_cache_level<=2. The latter is the correct result.

bzr version-info
revision-id: <email address hidden>
date: 2012-01-02 20:06:36 -0800
build-date: 2012-01-05 23:33:41 +0400
revno: 3376
branch-nick: maria-5.3

Reproducible on 5.3.2 and 5.3.3 releases, current 5.3, 5.5.

EXPLAIN with join_cache_level=3 (wrong result)

id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY alias1 ALL NULL NULL NULL NULL 7 100.00
1 PRIMARY <subquery2> eq_ref distinct_key distinct_key 5 func 1 100.00
1 PRIMARY alias2 hash_index a #hash#a:a 5:5 test.t1.a 7 28.57 Using where; Using join buffer (flat, BNLH join)
2 MATERIALIZED t1 index a a 5 NULL 7 100.00 Using where; Using index
2 MATERIALIZED t2 ALL NULL NULL NULL NULL 7 100.00
Warnings:
Note 1003 select `test`.`alias1`.`a` AS `a`,`test`.`alias1`.`b` AS `b` from `test`.`t1` `alias1` semi join (`test`.`t1` join `test`.`t2`) join `test`.`t1` `alias2` where (`test`.`alias2`.`a` = `test`.`alias1`.`b`)
SELECT alias1.* FROM
t1 AS alias1 INNER JOIN t1 AS alias2
ON alias2.a = alias1.b
WHERE alias1.b IN (
SELECT a FROM t1, t2
)

EXPLAIN with join_cache_level=2 (correct result):

id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY alias1 ALL NULL NULL NULL NULL 7 100.00 Using where
1 PRIMARY <subquery2> eq_ref distinct_key distinct_key 5 func 1 100.00
1 PRIMARY alias2 ref a a 5 test.alias1.b 2 100.00 Using index
2 MATERIALIZED t1 index a a 5 NULL 7 100.00 Using index
2 MATERIALIZED t2 ALL NULL NULL NULL NULL 7 100.00
Warnings:
Note 1003 select `test`.`alias1`.`a` AS `a`,`test`.`alias1`.`b` AS `b` from `test`.`t1` `alias1` semi join (`test`.`t1` join `test`.`t2`) join `test`.`t1` `alias2` where (`test`.`alias2`.`a` = `test`.`alias1`.`b`)
SELECT alias1.* FROM
t1 AS alias1 INNER JOIN t1 AS alias2
ON alias2.a = alias1.b
WHERE alias1.b IN (
SELECT a FROM t1, t2
)

Minimal optimizer_switch: semijoin=on,materialization=on,join_cache_hashed=on
Full optimizer_switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on

Test case:

SET optimizer_switch = 'semijoin=on,materialization=on,join_cache_hashed=on';
SET join_cache_level = 3;

CREATE TABLE t1 ( a INT, b INT, KEY(a) );
INSERT INTO t1 VALUES
  (1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7);

CREATE TABLE t2 ( c INT );
INSERT INTO t2 VALUES (1),(2),(3),(4),(5),(6),(7);

SELECT alias1.* FROM
    t1 AS alias1 INNER JOIN t1 AS alias2
      ON alias2.a = alias1.b
WHERE alias1.b IN (
        SELECT a FROM t1, t2
    );

Expected result:
a b
1 1
2 2
3 3
4 4
5 5
6 6
7 7

Result:
a b
7 7

Revision history for this message
Elena Stepanova (elenst) wrote :

This bug has also been filed in JIRA as MDEV-71.

Changed in maria:
status: New → Confirmed
Changed in maria:
importance: Undecided → Critical
assignee: Igor Babaev (igorb-seattle) → Sergey Petrunia (sergefp)
Revision history for this message
Sergey Petrunia (sergefp) wrote :

It's interesting that the reported query with INNER JOIN ... ON

SELECT alias1.* FROM t1 AS alias1 INNER JOIN t1 AS alias2 ON alias2.a = alias1.b WHERE alias1.b IN (SELECT a FROM t1, t2);

produces a correct result, while the query with WHERE:

SELECT alias1.* FROM t1 AS alias1, t1 AS alias2 WHERE alias1.b=alias2.a AND alias1.b IN ( SELECT a FROM t1, t2);

produces a wrong result.

EXPLAINs are the same.

Revision history for this message
Sergey Petrunia (sergefp) wrote :

.. and EXPLAIN EXTENDED are the same, too
: select j3.alias1.a AS a,j3.alias1.b AS b from j3.t1 alias1 semi join (j3.t1 join j3.t2) join j3.t1 alias2 where (j3.alias2.a = j3.alias1.b)
: select j3.alias1.a AS a,j3.alias1.b AS b from j3.t1 alias1 semi join (j3.t1 join j3.t2) join j3.t1 alias2 where (j3.alias2.a = j3.alias1.b)

Changed in maria:
status: Confirmed → Fix Committed
Changed in maria:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.