Wrong result (extra rows) with semijoin+materialization, IN subqueries, join_cache_level>0

Bug #1000269 reported by Elena Stepanova on 2012-05-16
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
Fix Released
High
Sergey Petrunia

Bug Description

The following query
SELECT * FROM A, B
WHERE b1 IN ( SELECT b2 FROM B WHERE b1 > 'o' )
AND (
  b1 < 'l' OR
  a1 IN ( SELECT c1 FROM C )
)

on the test data produces 6 rows when executed with semijoin+materialization, and 3 rows otherwise. 3 rows is the correct result.

bzr version-info
revision-id: <email address hidden>
date: 2012-05-15 08:31:07 +0300
revno: 3523

Also reproducible on maria/5.5 revno 3403.
With the provided test case the problem is reproducible with MyISAM and InnoDB, but not Aria, (with Aria the plan is slightly different).

Minimal optimizer_switch: materialization=on,semijoin=on
Full optimizer_switch (default): index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on

EXPLAIN (with minimal optimizer_switch and join_cache_level=2 which is current default):

id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY A ALL NULL NULL NULL NULL 2 100.00
1 PRIMARY <subquery2> ALL distinct_key NULL NULL NULL 4 100.00
1 PRIMARY B ALL b1 NULL NULL NULL 6 83.33 Using where; Using join buffer (flat, BNL join)
2 MATERIALIZED B ALL b1 NULL NULL NULL 6 66.67 Using where
3 MATERIALIZED C ALL NULL NULL NULL NULL 2 100.00
Warnings:
Note 1003 select `test`.`A`.`a1` AS `a1`,`test`.`A`.`a2` AS `a2`,`test`.`B`.`b1` AS `b1`,`test`.`B`.`b2` AS `b2` from `test`.`A` semi join (`test`.`B`) join `test`.`B` where ((`test`.`B`.`b1` = `test`.`B`.`b2`) and ((`test`.`B`.`b2` < 'l') or <in_optimizer>(`test`.`A`.`a1`,`test`.`A`.`a1` in ( <materialize> (select `test`.`C`.`c1` from `test`.`C` ), <primary_index_lookup>(`test`.`A`.`a1` in <temporary table> on distinct_key where ((`test`.`A`.`a1` = `<subquery3>`.`c1`)))))) and (`test`.`B`.`b1` > 'o'))

# Test case

SET optimizer_switch = 'materialization=on,semijoin=on';

CREATE TABLE A (a1 VARCHAR(1), a2 VARCHAR(1))
  ENGINE=MyISAM;
INSERT INTO A VALUES ('b','b'),('e','e');

CREATE TABLE B (b1 VARCHAR(1), b2 VARCHAR(1), KEY(b1))
  ENGINE=MyISAM;
INSERT INTO B VALUES
  ('v','v'),('s','s'),('l','l'),
  ('y','y'),('c','c'),('i','i');

CREATE TABLE C (c1 VARCHAR(1)) ENGINE=MyISAM;
INSERT INTO C VALUES ('b'),('c');

SELECT * FROM A, B
WHERE b1 IN ( SELECT b2 FROM B WHERE b1 > 'o' )
AND (
  b1 < 'l' OR
  a1 IN ( SELECT c1 FROM C )
);

# End of test case

# Expected result:
# a1 a2 b1 b2
# ----------------------
# b b v v
# b b s s
# b b y y

# Actual result:
# a1 a2 b1 b2
# ----------------------
# b b v v
# e e v v
# b b s s
# e e s s
# b b y y
# e e y y

Sergey Petrunia (sergefp) wrote :

The second subquery is redundant, one can replace it with list of constants

SELECT * FROM A, B WHERE b1 IN ( SELECT b2 FROM B WHERE b1 > 'o' ) AND ( b1 < 'l' OR a1 IN ('b','c') );

and the bug is still visible.

Changed in maria:
status: New → Confirmed
Changed in maria:
status: Confirmed → In Progress
Changed in maria:
status: In Progress → Fix Committed
Elena Stepanova (elenst) wrote :

Fix released in 5.5.24 and will be in 5.3.8 when it is out

Changed in maria:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers