Wrong result (missing rows) with semijoin+firstmatch, IN/ANY subquery

Bug #951283 reported by Elena Stepanova on 2012-03-10
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
Fix Released
Critical
Sergey Petrunia

Bug Description

The following query

SELECT * FROM t1 AS alias1, t1 AS alias2, t2 AS alias3
WHERE alias3.d IN (
  SELECT alias4.c FROM t2 AS alias4, t2 AS alias5
  WHERE alias5.b = alias4.b
    AND ( alias5.b >= alias3.b OR alias5.c != alias3.c )
)

on test data returns 2940 rows if it is executed with semijoin=on, firstmatch=on, optimizer_prune_level=0, and 3724 rows otherwise. The latter is correct.
In the test case SELECT * is replaced with SELECT COUNT(*) for convenience, it does not change the result.

bzr version-info
revision-id: <email address hidden>
date: 2012-03-05 22:33:46 -0800
build-date: 2012-03-10 05:45:32 +0400
revno: 3455

Also reproducible on MariaDB 5.5 (revno 3316). Not reproducible on MySQL 5.6 (revno 3706).
Could not reproduce on MyISAM or Aria.

EXPLAIN with semijoin=on, firstmatch=on, optimizer_prune_level=0 (wrong result):

id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY alias3 ALL PRIMARY NULL NULL NULL 19 100.00 Using where
1 PRIMARY alias4 ref PRIMARY,c c 4 test.alias3.d 1 100.00 Using index; FirstMatch(alias3)
1 PRIMARY alias1 ALL NULL NULL NULL NULL 14 100.00
1 PRIMARY alias2 ALL NULL NULL NULL NULL 14 100.00
1 PRIMARY alias5 eq_ref PRIMARY PRIMARY 4 test.alias4.b 1 100.00 Using where; FirstMatch(alias2)
Warnings:
Note 1276 Field or reference 'test.alias3.b' of SELECT #2 was resolved in SELECT #1
Note 1276 Field or reference 'test.alias3.c' of SELECT #2 was resolved in SELECT #1
Note 1003 select count(0) AS `COUNT(*)` from `test`.`t1` `alias1` semi join (`test`.`t2` `alias4` join `test`.`t2` `alias5`) join `test`.`t1` `alias2` join `test`.`t2` `alias3` where ((`test`.`alias4`.`c` = `test`.`alias3`.`d`) and (`test`.`alias5`.`b` = `test`.`alias4`.`b`) and ((`test`.`alias4`.`b` >= `test`.`alias3`.`b`) or (`test`.`alias5`.`c` <> `test`.`alias3`.`c`)))

EXPLAIN with semijoin=on, firstmatch=on, optimizer_prune_level=1 (correct result):

id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY alias1 ALL NULL NULL NULL NULL 14 100.00
1 PRIMARY alias2 ALL NULL NULL NULL NULL 14 100.00 Using join buffer (flat, BNL join)
1 PRIMARY alias3 ALL PRIMARY NULL NULL NULL 19 100.00 Using where; Using join buffer (incremental, BNL join)
1 PRIMARY alias4 ref PRIMARY,c c 4 test.alias3.d 1 100.00 Using index
1 PRIMARY alias5 eq_ref PRIMARY PRIMARY 4 test.alias4.b 1 100.00 Using where; FirstMatch(alias3)
Warnings:
Note 1276 Field or reference 'test.alias3.b' of SELECT #2 was resolved in SELECT #1
Note 1276 Field or reference 'test.alias3.c' of SELECT #2 was resolved in SELECT #1
Note 1003 select count(0) AS `COUNT(*)` from `test`.`t1` `alias1` semi join (`test`.`t2` `alias4` join `test`.`t2` `alias5`) join `test`.`t1` `alias2` join `test`.`t2` `alias3` where ((`test`.`alias4`.`c` = `test`.`alias3`.`d`) and (`test`.`alias5`.`b` = `test`.`alias4`.`b`) and ((`test`.`alias4`.`b` >= `test`.`alias3`.`b`) or (`test`.`alias5`.`c` <> `test`.`alias3`.`c`)))

Minimal optimizer_switch: firstmatch=on,semijoin=on
Full optimizer_switch (default): index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on

Test case:

--source include/have_innodb.inc

SET optimizer_switch='firstmatch=on,semijoin=on';
SET optimizer_prune_level=0;

CREATE TABLE t1 ( a INT ) ENGINE=InnoDB;
INSERT INTO t1 VALUES
  (10),(11),(12),(13),(14),(15),(16),
  (17),(18),(19),(20),(21),(22),(23);

CREATE TABLE t2 (
  b INT PRIMARY KEY,
  c VARCHAR(1),
  d VARCHAR(1),
  KEY(c)
) ENGINE=InnoDB;

INSERT INTO t2 VALUES
  (1,'j','j'),(2,'v','v'),(3,'c','c'),(4,'m','m'),
  (5,'d','d'),(6,'d','d'),(7,'y','y'),(8,'t','t'),
  (9,'d','d'),(10,'s','s'),(11,'r','r'),(12,'m','m'),
  (13,'b','b'),(14,'x','x'),(15,'g','g'),(16,'p','p'),
  (17,'q','q'),(18,'w','w'),(19,'d','d');

SELECT COUNT(*) FROM t1 AS alias1, t1 AS alias2, t2 AS alias3
WHERE alias3.d IN (
  SELECT alias4.c FROM t2 AS alias4, t2 AS alias5
  WHERE alias5.b = alias4.b
    AND ( alias5.b >= alias3.b OR alias5.c != alias3.c )
);

# End of test case

# Expected result
# COUNT(*)
# 3724

# Actual result
# COUNT(*)
# 2940

Elena Stepanova (elenst) wrote :

While the provided test requires optimizer_prune_level=0 and InnoDB tables, later I had similar mismatches with MyISAM tables and default optimizer_prune_level, so I am removing these conditions from the bug summary.

summary: - Wrong result (missing rows) with semijoin+firstmatch,
- optimizer_prune_level=0, IN/ANY subquery, InnoDB
+ Wrong result (missing rows) with semijoin+firstmatch, IN/ANY subquery
Changed in maria:
assignee: Alexey Botchkov (holyfoot) → Sergey Petrunia (sergefp)
importance: Undecided → Critical
Changed in maria:
status: New → Confirmed
status: Confirmed → In Progress
Sergey Petrunia (sergefp) wrote :

The query plan seems to be valid.

I've compared the query output resultsets:

- semi-join's resultset is a proper subset of non-semijoin's resultset

- whether a row is missing is a function of value of alias3.b: rows with values of 6,9,12,19 are missing. All others are present.

Sergey Petrunia (sergefp) wrote :

The join order is:

O alias3
I alias4 FirstMatch(alias3)
n alias1
n alias2
I alias5 FirstMatch(alias2)

Here "O" means outer, "I" means inner, "n" - outer, not correlated with the subquery table.

When I debug to see why records with alias3.b=6 do not make it to the output, I see this scenario:
- the code arrives at sub_select(...,alias4, .. ) call
- it sets join->return_tab={alias3} // this what FirstMatch is about.
- then it continues execution. The code builds full record combinations, but neither of them matches the condition attached to table alias5.
- at some point, execution will return into sub_select(...,alias4, .. ) function.
At this point:

- It should search for next matching record, because no match have been found yet.

- It does not do that, though. Instead, it jumps out here:

      if (join->return_tab < join_tab)
        DBUG_RETURN(NESTED_LOOP_OK);

Apparently, FirstMatch code is malfunctioning when it is handling the case where inner tables are interleaved with outer tables.

Sergey Petrunia (sergefp) wrote :

Alternate proposal: unfinished patch and exploration

Changed in maria:
status: In Progress → Fix Committed
Changed in maria:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments