Wrong result with semijoin + "Impossible where"

Bug #849776 reported by Philip Stoev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
Fix Released
High
Sergey Petrunia

Bug Description

The following query:

select * from t5 where (a) in (
SELECT t1.a FROM t1
LEFT JOIN t2 ON ( t1.a = t2.a ) WHERE t2.b NOT IN ( SELECT t4.b FROM t4 WHERE t4.b < t1.b ));

returns no rows when executed with semijoin, even though the correct result must be "8", since table t5 contains the result from the subquery.

mysql 5.5 also wrongly returns an empty result, though the plan there is different. Postgresql confirms that the correct result is "8".

explain:

| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables |
| 3 | DEPENDENT SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables |

minimal optimizer_switch: semijoin=on
full optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=off,derived_merge=off,derived_with_keys=off,firstmatch=off,loosescan=off,materialization=off,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=off,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=off,semijoin_with_cache=off,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on

bzr version-info:

revision-id: <email address hidden>
date: 2011-09-10 18:01:27 +0300
build-date: 2011-09-14 11:28:19 +0300
revno: 3183
branch-nick: maria-5.3

test case:

CREATE TABLE t1 ( b varchar(1), a integer) ;
INSERT INTO t1 VALUES ('z',8);

CREATE TABLE t2 ( a integer, b varchar(1)) ;

CREATE TABLE t4 ( a integer, b varchar(1)) ;

CREATE TABLE t5 ( a integer) ;
INSERT INTO t5 VALUES (8);

set session optimizer_switch='semijoin=on';
select * from t5 where (a) in (
SELECT t1.a FROM t1
LEFT JOIN t2 ON ( t1.a = t2.a ) WHERE t2.b NOT IN ( SELECT t4.b FROM t4 WHERE t4.b < t1.b ));

Changed in maria:
milestone: none → 5.3
assignee: nobody → Sergey Petrunia (sergefp)
Changed in maria:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Sergey Petrunia (sergefp) wrote :

The wrong result is produced because outer join is converted into inner, which is wrong. The rewritten subquery looks like this:

Message: select 8 AS `a` from `bug849776`.`t5` semi join (`bug849776`.`t1` join `bug849776`.`t2`) where 0

the outer-to-inner conversion is made by simplify_joins(). I don't yet understand why this happens.

Revision history for this message
Sergey Petrunia (sergefp) wrote :

(gdb) wher 4
  #0 simplify_joins (join=0xa679d00, join_list=0xa65f188, conds=0xa679000, top=true, in_sj=true) at sql_select.cc:11892
  #1 0x0835d5e8 in simplify_joins (join=0xa679d00, join_list=0xa678ce8, conds=0xa679000, top=true, in_sj=true) at sql_select.cc:11857
  #2 0x0835d5e8 in simplify_joins (join=0xa679d00, join_list=0xa5fdd2c, conds=0xa679000, top=true, in_sj=false) at sql_select.cc:11857
  #3 0x08371073 in JOIN::optimize (this=0xa679d00) at sql_select.cc:960
  (More stack frames follow...)

(gdb) p (Item*)conds
  $63 = (Item_cond_and *) 0xa679000
(gdb) p dbug_print_item(((Item*)conds)->list.nth_element(0))
  $64 = 0x8b78000 "1"
(gdb) p dbug_print_item(((Item*)conds)->list.nth_element(1))
  $65 = 0x8b78000 "(not(<in_optimizer>(`bug849776`.`t2`.`b`,<exists>(select `bug849776`.`t4`.`b` from `bug849776`.`t4` where (`bug849776`.`t4`.`b` < `bug849776`.`t1`.`b`)))))"
(gdb) p dbug_print_item(((Item*)conds)->list.nth_element(2))
  $66 = 0x8b78000 "(`bug849776`.`t5`.`a` = `bug849776`.`t1`.`a`)"
(gdb) p dbug_print_item(((Item*)conds)->list.nth_element(3))
  $67 = 0x88d9fc4 "(Item*)NULL"
(gdb) p/t conds->used_tables()
  $68 = 111

That is, the condition conds has form:

    1 AND t2.b NOT IN ( SELECT t4.b FROM t4 WHERE t4.b < t1.b ) AND t5.a=t1.a

not_null_tables() attribute seems to be wrong:

(gdb) p/t conds->not_null_tables()
  $69 = 111
## All three tables are NOT-NULL, this is what causes the left join be
## converted into inner join

(gdb) p ((Item*)conds)->list.nth_element(0)->not_null_tables()
  $71 = 0
(gdb) p ((Item*)conds)->list.nth_element(1)->not_null_tables()
  $72 = 0
(gdb) p ((Item*)conds)->list.nth_element(2)->not_null_tables()
  $73 = 3

# attributes of branches are correct. The problem is with Item_cond_and's
# attributes.

Revision history for this message
Sergey Petrunia (sergefp) wrote :

This is where the problem lies:

(gdb) wher 1
  #0 Item_func::fix_after_pullout (this=0xa678838, new_parent=0xa5fdc68, ref=0xa65fc90) at item_func.cc:262
  (More stack frames follow...)
(gdb) p this
  $188 = (Item_in_optimizer *) 0xa678838
(gdb) list
  257 (*arg)->fix_after_pullout(new_parent, arg);
  258 Item *item= *arg;
  259
  260 used_tables_cache|= item->used_tables();
  261 not_null_tables_cache|= item->not_null_tables();
  262 const_item_cache&= item->const_item();
  263 }
  264 }
  265 }
  266
(gdb) p *arg
  $189 = (Item_field *) 0xa65f200

Item_in_optimizer inherits fix_after_pullout() from Item_func::fix_after_pullout. The latter function assumes that NOT-null attributes of its arguments should be spread to itself, i.e.

  argument->not_null_tables() & BIT ==> this->not_null_tables() & BIT

which is not true, particularly for IN that's inside NOT.

Changed in maria:
status: Confirmed → Fix Committed
Changed in maria:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.