Wrong result (missing rows) with semijoin+firstmatch, IN/ANY subquery

Bug #951283 reported by Elena Stepanova
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
Fix Released
Critical
Sergey Petrunia

Bug Description

The following query

SELECT * FROM t1 AS alias1, t1 AS alias2, t2 AS alias3
WHERE alias3.d IN (
  SELECT alias4.c FROM t2 AS alias4, t2 AS alias5
  WHERE alias5.b = alias4.b
    AND ( alias5.b >= alias3.b OR alias5.c != alias3.c )
)

on test data returns 2940 rows if it is executed with semijoin=on, firstmatch=on, optimizer_prune_level=0, and 3724 rows otherwise. The latter is correct.
In the test case SELECT * is replaced with SELECT COUNT(*) for convenience, it does not change the result.

bzr version-info
revision-id: <email address hidden>
date: 2012-03-05 22:33:46 -0800
build-date: 2012-03-10 05:45:32 +0400
revno: 3455

Also reproducible on MariaDB 5.5 (revno 3316). Not reproducible on MySQL 5.6 (revno 3706).
Could not reproduce on MyISAM or Aria.

EXPLAIN with semijoin=on, firstmatch=on, optimizer_prune_level=0 (wrong result):

id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY alias3 ALL PRIMARY NULL NULL NULL 19 100.00 Using where
1 PRIMARY alias4 ref PRIMARY,c c 4 test.alias3.d 1 100.00 Using index; FirstMatch(alias3)
1 PRIMARY alias1 ALL NULL NULL NULL NULL 14 100.00
1 PRIMARY alias2 ALL NULL NULL NULL NULL 14 100.00
1 PRIMARY alias5 eq_ref PRIMARY PRIMARY 4 test.alias4.b 1 100.00 Using where; FirstMatch(alias2)
Warnings:
Note 1276 Field or reference 'test.alias3.b' of SELECT #2 was resolved in SELECT #1
Note 1276 Field or reference 'test.alias3.c' of SELECT #2 was resolved in SELECT #1
Note 1003 select count(0) AS `COUNT(*)` from `test`.`t1` `alias1` semi join (`test`.`t2` `alias4` join `test`.`t2` `alias5`) join `test`.`t1` `alias2` join `test`.`t2` `alias3` where ((`test`.`alias4`.`c` = `test`.`alias3`.`d`) and (`test`.`alias5`.`b` = `test`.`alias4`.`b`) and ((`test`.`alias4`.`b` >= `test`.`alias3`.`b`) or (`test`.`alias5`.`c` <> `test`.`alias3`.`c`)))

EXPLAIN with semijoin=on, firstmatch=on, optimizer_prune_level=1 (correct result):

id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY alias1 ALL NULL NULL NULL NULL 14 100.00
1 PRIMARY alias2 ALL NULL NULL NULL NULL 14 100.00 Using join buffer (flat, BNL join)
1 PRIMARY alias3 ALL PRIMARY NULL NULL NULL 19 100.00 Using where; Using join buffer (incremental, BNL join)
1 PRIMARY alias4 ref PRIMARY,c c 4 test.alias3.d 1 100.00 Using index
1 PRIMARY alias5 eq_ref PRIMARY PRIMARY 4 test.alias4.b 1 100.00 Using where; FirstMatch(alias3)
Warnings:
Note 1276 Field or reference 'test.alias3.b' of SELECT #2 was resolved in SELECT #1
Note 1276 Field or reference 'test.alias3.c' of SELECT #2 was resolved in SELECT #1
Note 1003 select count(0) AS `COUNT(*)` from `test`.`t1` `alias1` semi join (`test`.`t2` `alias4` join `test`.`t2` `alias5`) join `test`.`t1` `alias2` join `test`.`t2` `alias3` where ((`test`.`alias4`.`c` = `test`.`alias3`.`d`) and (`test`.`alias5`.`b` = `test`.`alias4`.`b`) and ((`test`.`alias4`.`b` >= `test`.`alias3`.`b`) or (`test`.`alias5`.`c` <> `test`.`alias3`.`c`)))

Minimal optimizer_switch: firstmatch=on,semijoin=on
Full optimizer_switch (default): index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on

Test case:

--source include/have_innodb.inc

SET optimizer_switch='firstmatch=on,semijoin=on';
SET optimizer_prune_level=0;

CREATE TABLE t1 ( a INT ) ENGINE=InnoDB;
INSERT INTO t1 VALUES
  (10),(11),(12),(13),(14),(15),(16),
  (17),(18),(19),(20),(21),(22),(23);

CREATE TABLE t2 (
  b INT PRIMARY KEY,
  c VARCHAR(1),
  d VARCHAR(1),
  KEY(c)
) ENGINE=InnoDB;

INSERT INTO t2 VALUES
  (1,'j','j'),(2,'v','v'),(3,'c','c'),(4,'m','m'),
  (5,'d','d'),(6,'d','d'),(7,'y','y'),(8,'t','t'),
  (9,'d','d'),(10,'s','s'),(11,'r','r'),(12,'m','m'),
  (13,'b','b'),(14,'x','x'),(15,'g','g'),(16,'p','p'),
  (17,'q','q'),(18,'w','w'),(19,'d','d');

SELECT COUNT(*) FROM t1 AS alias1, t1 AS alias2, t2 AS alias3
WHERE alias3.d IN (
  SELECT alias4.c FROM t2 AS alias4, t2 AS alias5
  WHERE alias5.b = alias4.b
    AND ( alias5.b >= alias3.b OR alias5.c != alias3.c )
);

# End of test case

# Expected result
# COUNT(*)
# 3724

# Actual result
# COUNT(*)
# 2940

Revision history for this message
Elena Stepanova (elenst) wrote :

While the provided test requires optimizer_prune_level=0 and InnoDB tables, later I had similar mismatches with MyISAM tables and default optimizer_prune_level, so I am removing these conditions from the bug summary.

summary: - Wrong result (missing rows) with semijoin+firstmatch,
- optimizer_prune_level=0, IN/ANY subquery, InnoDB
+ Wrong result (missing rows) with semijoin+firstmatch, IN/ANY subquery
Changed in maria:
assignee: Alexey Botchkov (holyfoot) → Sergey Petrunia (sergefp)
importance: Undecided → Critical
Changed in maria:
status: New → Confirmed
status: Confirmed → In Progress
Revision history for this message
Sergey Petrunia (sergefp) wrote :

The query plan seems to be valid.

I've compared the query output resultsets:

- semi-join's resultset is a proper subset of non-semijoin's resultset

- whether a row is missing is a function of value of alias3.b: rows with values of 6,9,12,19 are missing. All others are present.

Revision history for this message
Sergey Petrunia (sergefp) wrote :

The join order is:

O alias3
I alias4 FirstMatch(alias3)
n alias1
n alias2
I alias5 FirstMatch(alias2)

Here "O" means outer, "I" means inner, "n" - outer, not correlated with the subquery table.

When I debug to see why records with alias3.b=6 do not make it to the output, I see this scenario:
- the code arrives at sub_select(...,alias4, .. ) call
- it sets join->return_tab={alias3} // this what FirstMatch is about.
- then it continues execution. The code builds full record combinations, but neither of them matches the condition attached to table alias5.
- at some point, execution will return into sub_select(...,alias4, .. ) function.
At this point:

- It should search for next matching record, because no match have been found yet.

- It does not do that, though. Instead, it jumps out here:

      if (join->return_tab < join_tab)
        DBUG_RETURN(NESTED_LOOP_OK);

Apparently, FirstMatch code is malfunctioning when it is handling the case where inner tables are interleaved with outer tables.

Revision history for this message
Sergey Petrunia (sergefp) wrote :

Alternate proposal: unfinished patch and exploration

Changed in maria:
status: In Progress → Fix Committed
Changed in maria:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.