Wrong result (missing rows) with semijoin+materialization, IN subquery, InnoDB, TEMPTABLE view or derived_merge=off

Bug #951937 reported by Elena Stepanova
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
Fix Released
Critical
Sergey Petrunia

Bug Description

The following query

SELECT * FROM v
WHERE ( a, a ) IN (
  SELECT alias2.b, alias2.a
  FROM t1 AS alias1, t1 AS alias2
  WHERE alias2.b = alias1.a
    AND ( alias1.b >= alias1.a OR alias2.b = 'z' )

on the test data returns 6 rows if it's executed with semijoin=on and materialization=on, and 19 rows otherwise. The latter is correct.
On a variation of the test data, where column b is nullable, the query returns no rows at all. This variation is added to the test case as ALTER TABLE followed by the same query.

bzr version-info
revision-id: <email address hidden>
date: 2012-03-05 22:33:46 -0800
build-date: 2012-03-11 05:27:06 +0400
revno: 3455

Also reproducible on 5.5 (revno 3316).
Not reproducible on MySQL 5.6 (revno 3706).

EXPLAIN with semijoin=on, materialization=on (wrong result):

id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <subquery2> ALL distinct_key NULL NULL NULL 19 100.00
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 19 100.00 Using where; Using join buffer (flat, BNL join)
2 MATERIALIZED alias1 ALL a NULL NULL NULL 19 100.00 Using where
2 MATERIALIZED alias2 ref a a 4 test.alias1.a 1 100.00 Using where
3 DERIVED t1 ALL NULL NULL NULL NULL 19 100.00
Warnings:
Note 1003 select `v`.`a` AS `a`,`v`.`b` AS `b` from `test`.`v` semi join (`test`.`t1` `alias1` join `test`.`t1` `alias2`) where ((`test`.`alias2`.`a` = `test`.`alias1`.`a`) and (`test`.`alias2`.`b` = `test`.`alias1`.`a`) and (`v`.`a` = `test`.`alias1`.`a`) and ((`test`.`alias1`.`b` >= `test`.`alias1`.`a`) or ((`test`.`alias1`.`a` = 'z') and (`v`.`a` = 'z'))))

Minimal optimizer_switch: materialization=on,semijoin=on
Full optimizer_switch (default): index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on

Test case (with the variation):

--source include/have_innodb.inc
SET SESSION optimizer_switch = 'materialization=on,semijoin=on';

CREATE TABLE t1 (
  a VARCHAR(1),
  b VARCHAR(1) NOT NULL,
  KEY(a)
) ENGINE=InnoDB;
INSERT INTO t1 VALUES
('j','j'),('v','v'),('c','c'),('m','m'),('d','d'),
('y','y'),('t','t'),('d','d'),('s','s'),('r','r'),
('m','m'),('b','b'),('x','x'),('g','g'),('p','p'),
('q','q'),('w','w'),('d','d'),('e','e');

CREATE ALGORITHM=TEMPTABLE VIEW v AS SELECT * FROM t1;

# This query returns 6 rows instead of 19

SELECT * FROM v
WHERE ( a, a ) IN (
  SELECT alias2.b, alias2.a
  FROM t1 AS alias1, t1 AS alias2
  WHERE alias2.b = alias1.a
    AND ( alias1.b >= alias1.a OR alias2.b = 'z' )
);

# End of the main part.
# The rest is the test case variation, where we make column b nullable

ALTER TABLE t1 MODIFY COLUMN b VARCHAR(1);

# This query returns an empty set

SELECT * FROM v
WHERE ( a, a ) IN (
  SELECT alias2.b, alias2.a
  FROM t1 AS alias1, t1 AS alias2
  WHERE alias2.b = alias1.a
    AND ( alias1.b >= alias1.a OR alias2.b = 'z' )
);

# End of test case

Changed in maria:
assignee: Oleksandr "Sanja" Byelkin (sanja-byelkin) → Sergey Petrunia (sergefp)
importance: Undecided → Critical
Revision history for this message
Sergey Petrunia (sergefp) wrote :

The part of where condition attached to to the materialized VIEW seems to be wrong:

  #0 evaluate_join_record (join=0x9f54a28, join_tab=0x9f5cdb8, error=0) at sql_select.cc:15531
  #1 0x08372131 in sub_select (join=0x9f54a28, join_tab=0x9f5cdb8, end_of_records=false) at sql_select.cc:15436
  #2 0x0836341b in evaluate_join_record (join=0x9f54a28, join_tab=0x9f5cbb0, error=0) at sql_select.cc:15633
  #3 0x08372131 in sub_select (join=0x9f54a28, join_tab=0x9f5cbb0, end_of_records=false) at sql_select.cc:15436
  #4 0x083733b2 in do_select (join=0x9f54a28, fields=0x9edb54c, table=0x0, procedure=0x0) at sql_select.cc:15097
  #5 0x0838c2b7 in JOIN::exec (this=0x9f54a28) at sql_select.cc:2731
(gdb) p join_tab->table->alias.Ptr
  $233 = 0x9f3f158 "v"
(gdb) p dbug_print_item(select_cond)
  $234 = 0x8b9f560 "((`j2`.`t1a`.`b` >= `j2`.`t1a`.`a`) or ((`j2`.`t1a`.`a` = 'z') and (`v`.`a` = 'z')))"

Changed in maria:
status: New → Confirmed
status: Confirmed → In Progress
Revision history for this message
Sergey Petrunia (sergefp) wrote :

No, I am not sure whether it's the WHERE clause is wrong or something else is.

Revision history for this message
Sergey Petrunia (sergefp) wrote :

the wrong result is caused by the following:

this part of WHERE condition:

 "(`j2`.`t1a`.`b` >= `j2`.`t1a`.`a`)"

is attached to table v (i.e. <derived3>), and it evaluates to FALSE. This is wrong because the dataset has t1.a=t1.b for all rows.

Revision history for this message
Sergey Petrunia (sergefp) wrote :

t1a.a has the right value, t1a.b has the wrong one.

Revision history for this message
Sergey Petrunia (sergefp) wrote :

Sure, one can expect that t1a.b will have the wrong value: it is not in
subquery's select list, and we're referring to it from outside the SJM nest.

The question is why we're doing that. The clause

  (t1a.b >= t1a.a OR t1b.b = 'z')

was originally inside the subquery, and has been attached to the subquery's
table. Why do we see parts of it attached to outside table...

Revision history for this message
Sergey Petrunia (sergefp) wrote :

WHERE condition after equality substitution:

  (t1b.a = t1a.a) and
  (t1b.b = t1a.a) and
  (v.a = t1a.a) and
  (t1a.b >= t1a.a or (t1a.a = 'z' and v.a = 'z'))"

so, t1b.b = 'z' got converted into (t1a.a = 'z' and v.a = 'z') (**)

We got it, because we've had a multi-equality of:

   t1a.a=t1b.a=t1b.b=v.a

and a join order of {v, t1a, t1b }. The first element is `v`, the first one inside the sjm nest is `t1a`. Substitution (**) is generally useful, but not when we get it inside an OR clause. When we do it inside an OR clause, that has other OR-part that can be evaluated only inside the subquery, we end up with un-evaluable clause.

Revision history for this message
Sergey Petrunia (sergefp) wrote :

Before the equality substitutiuon, the WHERE clause has this form (both in maria-5.3 and mysql-5.6):

"(((t1a.b >= t1a.a) or multiple equal('z', t1b.b, t1a.a, v.a, t1b.a)) and multiple equal(t1b.b, t1a.a, v.a, t1b.a))"

the multiple-equal inside the OR clause is the same as the outer one, except that it also includes constant 'z'.

After equality substitution for the inner multiple equal('z', t1b.b, t1a.a, v.a, t1b.a) , we get

In MariaDB:
  $537 = 0x8b9f560 "((t1a.a = 'z') and (v.a = 'z'))"

In MySQL:
  $117 = 0x8f7bea0 "((t1a.a = 'z') and (t1b.b = 'z') and (t1b.a = 'z'))"

Revision history for this message
Sergey Petrunia (sergefp) wrote :

The problem is related to fix for BUG#928048.

Revision history for this message
Sergey Petrunia (sergefp) wrote :

Consider an example:

select * from ot
where (ot.a, ot.b) in (select it.a, it.b
                        from it
                        where (it.a=111 and func1(it.c)) or (it.b=222 and func2(it.c))
                        )
       and (ot.a=333 or func3(ot.c))

convert_subq_to_sj(), followed by equality propagation, will produce this WHERE
clause:

01 multi-equal(ot.a, it.a) and
02 multi-equal(ot.b, it.b) and
03 (multi-equal(inherited(ot.a, it.a), 111) and func1(it.c) or
04 multi-equal(inherited(ot.b, it.b), 222) and func2(it.c)
05 ) and
06 (multi-equal(inherited(ot.a, it.a), 333) or func3(ot.c))

Suppose the chosen join order is
   ot, sj-materialization(it)
or the other way around, but using materialization.

Now, we do equality substitution:
- in line 03, we must not generate equalities that include ot.a, because the
  other part of OR uses func1(it.c) can only be checked when doing
  materialization.

- in line 06, we must not generate qualities that include it.a, because the
  other part of uses "func3(ot.c)", which cannot be checked when doing
  materialization.

This shows the problem. Suppose we're doing equality substitution for an OR
clause and its first child element is:

  multi-equal(inherited(ot.a, it.a), 111)

is this a case like in #03, or like in #06? There is no way to tell.

Revision history for this message
Sergey Petrunia (sergefp) wrote :

Patch committed, need review.

Changed in maria:
status: In Progress → Fix Committed
Changed in maria:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.