Wrong result (missing rows) with materialization+semijoin+join_cache_hashed, join_cache_level>2, HAVING

Bug #934348 reported by Elena Stepanova
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
Fix Released
Critical
Igor Babaev

Bug Description

The following query

SELECT a FROM t1
WHERE a IN (
  SELECT b FROM t1, t2
  WHERE b = a
)
GROUP BY a
HAVING a != 'z'

on the test data returns only one row if it's run with materialization, semijoin, join_cache_hashed and join_cache_level>2, and two rows otherwise. The latter is correct.

bzr version-info
revision-id: <email address hidden>
date: 2012-02-16 20:15:57 +0400
build-date: 2012-02-17 21:19:42 +0400
revno: 3424

EXPLAIN with join_cache_level = 3 (wrong result):

id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <subquery2> ALL distinct_key NULL NULL NULL 2 100.00 Using temporary; Using filesort
1 PRIMARY t1 hash_index a #hash#a:a 4:4 test.t1.a 3 66.67 Using where; Using join buffer (flat, BNLH join)
2 MATERIALIZED t2 ALL NULL NULL NULL NULL 2 100.00 Using where
2 MATERIALIZED t1 hash_index a #hash#a:a 4:4 test.t2.b 3 66.67 Using where; Using join buffer (flat, BNLH join)
Warnings:
Note 1003 select `test`.`t1`.`a` AS `a` from `test`.`t1` semi join (`test`.`t1` join `test`.`t2`) where ((`test`.`t1`.`a` = `test`.`t2`.`b`) and (`test`.`t1`.`a` = `test`.`t2`.`b`)) group by `test`.`t1`.`a` having (`test`.`t1`.`a` <> 'z')

EXPLAIN with join_cache_level =2 (correct result):

id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <subquery2> ALL distinct_key NULL NULL NULL 2 100.00 Using temporary; Using filesort
1 PRIMARY t1 ref a a 4 test.t2.b 2 100.00 Using index
2 MATERIALIZED t2 ALL NULL NULL NULL NULL 2 100.00 Using where
2 MATERIALIZED t1 ref a a 4 test.t2.b 2 100.00 Using index
Warnings:
Note 1003 select `test`.`t1`.`a` AS `a` from `test`.`t1` semi join (`test`.`t1` join `test`.`t2`) where ((`test`.`t1`.`a` = `test`.`t2`.`b`) and (`test`.`t1`.`a` = `test`.`t2`.`b`)) group by `test`.`t1`.`a` having (`test`.`t1`.`a` <> 'z')

Minimal optimizer_switch: materialization=on,semijoin=on,join_cache_hashed=on
(and join_cache_level>=3)
Full optimizer_switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on

# Test case:

SET optimizer_switch = 'materialization=on,semijoin=on,join_cache_hashed=on';
# join_cache_level >= 3
SET join_cache_level = 3;

# MyISAM or Aria
CREATE TABLE t1 ( a VARCHAR(1), KEY(a) ) ENGINE=MyISAM;
INSERT INTO t1 VALUES ('c'),('v'),('c');
CREATE TABLE t2 ( b VARCHAR(1) ) ENGINE=MyISAM;
INSERT INTO t2 VALUES ('v'),('c');

SELECT a FROM t1
WHERE a IN (
  SELECT b FROM t1, t2
  WHERE b = a
)
GROUP BY a
HAVING a != 'z';

# End of test case

# Expected result:
# a
# ---
# c
# v

# Actual result:
# a
# ---
# c

# There were 6 other similar cases during the test run.

Changed in maria:
status: New → Confirmed
importance: Undecided → Critical
Revision history for this message
Igor Babaev (igorb-seattle) wrote :

The problem is seen already in the execution plan:

MariaDB [test]> EXPLAIN SELECT a FROM t1 t WHERE t.a IN ( SELECT b FROM t1, t2 WHERE b = a ) GROUP BY t.a HAVING t.a != 'z';
+----+--------------+-------------+------+---------------+------+---------+-----------+------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+-------------+------+---------------+------+---------+-----------+------+----------+---------------------------------+
| 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 2 | 100.00 | Using temporary; Using filesort |
| 1 | PRIMARY | t | ref | a | a | 4 | test.t1.a | 2 | 100.00 | Using where; Using index |
| 2 | MATERIALIZED | t2 | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | Using where |
| 2 | MATERIALIZED | t1 | ref | a | a | 4 | test.t2.b | 2 | 100.00 | Using where; Using index |
+----+--------------+-------------+------+---------------+------+---------+-----------+------+----------+-------------------

The key to access t is built oveer column a of the materialized table. Yet there is no such column there.

Revision history for this message
Igor Babaev (igorb-seattle) wrote :

The previous plan was obtained with the following settings:

SET optimizer_switch = 'materialization=on,semijoin=on,join_cache_hashed=on';
SET join_cache_level = 6;

Changed in maria:
assignee: Sergey Petrunia (sergefp) → Igor Babaev (igorb-seattle)
Changed in maria:
status: Confirmed → Fix Committed
Revision history for this message
Elena Stepanova (elenst) wrote :

Fix released with 5.3.5-ga.

Changed in maria:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.