Wrong result (extra rows) with semijoin+materialization, IN subqueries, join_cache_level>0

Bug #1000269 reported by Elena Stepanova
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
Fix Released
High
Sergey Petrunia

Bug Description

The following query
SELECT * FROM A, B
WHERE b1 IN ( SELECT b2 FROM B WHERE b1 > 'o' )
AND (
  b1 < 'l' OR
  a1 IN ( SELECT c1 FROM C )
)

on the test data produces 6 rows when executed with semijoin+materialization, and 3 rows otherwise. 3 rows is the correct result.

bzr version-info
revision-id: <email address hidden>
date: 2012-05-15 08:31:07 +0300
revno: 3523

Also reproducible on maria/5.5 revno 3403.
With the provided test case the problem is reproducible with MyISAM and InnoDB, but not Aria, (with Aria the plan is slightly different).

Minimal optimizer_switch: materialization=on,semijoin=on
Full optimizer_switch (default): index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on

EXPLAIN (with minimal optimizer_switch and join_cache_level=2 which is current default):

id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY A ALL NULL NULL NULL NULL 2 100.00
1 PRIMARY <subquery2> ALL distinct_key NULL NULL NULL 4 100.00
1 PRIMARY B ALL b1 NULL NULL NULL 6 83.33 Using where; Using join buffer (flat, BNL join)
2 MATERIALIZED B ALL b1 NULL NULL NULL 6 66.67 Using where
3 MATERIALIZED C ALL NULL NULL NULL NULL 2 100.00
Warnings:
Note 1003 select `test`.`A`.`a1` AS `a1`,`test`.`A`.`a2` AS `a2`,`test`.`B`.`b1` AS `b1`,`test`.`B`.`b2` AS `b2` from `test`.`A` semi join (`test`.`B`) join `test`.`B` where ((`test`.`B`.`b1` = `test`.`B`.`b2`) and ((`test`.`B`.`b2` < 'l') or <in_optimizer>(`test`.`A`.`a1`,`test`.`A`.`a1` in ( <materialize> (select `test`.`C`.`c1` from `test`.`C` ), <primary_index_lookup>(`test`.`A`.`a1` in <temporary table> on distinct_key where ((`test`.`A`.`a1` = `<subquery3>`.`c1`)))))) and (`test`.`B`.`b1` > 'o'))

# Test case

SET optimizer_switch = 'materialization=on,semijoin=on';

CREATE TABLE A (a1 VARCHAR(1), a2 VARCHAR(1))
  ENGINE=MyISAM;
INSERT INTO A VALUES ('b','b'),('e','e');

CREATE TABLE B (b1 VARCHAR(1), b2 VARCHAR(1), KEY(b1))
  ENGINE=MyISAM;
INSERT INTO B VALUES
  ('v','v'),('s','s'),('l','l'),
  ('y','y'),('c','c'),('i','i');

CREATE TABLE C (c1 VARCHAR(1)) ENGINE=MyISAM;
INSERT INTO C VALUES ('b'),('c');

SELECT * FROM A, B
WHERE b1 IN ( SELECT b2 FROM B WHERE b1 > 'o' )
AND (
  b1 < 'l' OR
  a1 IN ( SELECT c1 FROM C )
);

# End of test case

# Expected result:
# a1 a2 b1 b2
# ----------------------
# b b v v
# b b s s
# b b y y

# Actual result:
# a1 a2 b1 b2
# ----------------------
# b b v v
# e e v v
# b b s s
# e e s s
# b b y y
# e e y y

Revision history for this message
Sergey Petrunia (sergefp) wrote :

The second subquery is redundant, one can replace it with list of constants

SELECT * FROM A, B WHERE b1 IN ( SELECT b2 FROM B WHERE b1 > 'o' ) AND ( b1 < 'l' OR a1 IN ('b','c') );

and the bug is still visible.

Changed in maria:
status: New → Confirmed
Changed in maria:
status: Confirmed → In Progress
Changed in maria:
status: In Progress → Fix Committed
Revision history for this message
Elena Stepanova (elenst) wrote :

Fix released in 5.5.24 and will be in 5.3.8 when it is out

Changed in maria:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.