Wrong result (extra rows, not grouped) with JOIN, AND in ON condition, multi-part key, GROUP BY, OR in WHERE, MyISAM or Aria

Bug #967242 reported by Elena Stepanova
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
Fix Released
Medium
Oleksandr "Sanja" Byelkin

Bug Description

The following query

SELECT t2_1.b
FROM t1 JOIN t2 AS t2_1 JOIN t2 AS t2_2
ON (t2_2.b = t2_1.b ) AND (t2_2.c = t2_1.c )
WHERE
( SELECT COUNT(*) FROM t2 ) IS NOT NULL
OR a = t2_1.c
GROUP BY t2_1.b;

on the test data returns multiple rows for some values of t2_1.b, which should not be happening because of GROUP BY.

bzr version-info
revision-id: <email address hidden>
date: 2012-03-28 13:58:14 +0300
build-date: 2012-03-28 19:36:15 +0400
revno: 3481

Notes:
Could not move ON condition under WHERE, the problem goes away.
In the test data t1 contains one row, could not add more -- the problem goes away.
Reproducible with the provided test case on MyISAM or Aria tables, but not InnoDB.
Reproducible with standard optimizer_switch as well as with all OFFs (except for in_to_exists or materialization, as one of them has to be ON to get the query executed).
Reproducible on MariaDB 5.3.5, MariaDB 5.5 revno 3353.
Not reproducible on MariaDB 5.2 revno 3126.
Not reproducible on MySQL 5.1.60, MySQL 5.5 revno 3737, MySQL trunk revno 3706.

EXPLAIN with minimal optimizer switch -- all OFFs except for in_to_exists (wrong result):

id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY t1 system NULL NULL NULL NULL 1 100.00
1 PRIMARY t2_1 index c c 9 NULL 3 100.00 Using where; Using index
1 PRIMARY t2_2 ref c c 9 test.t2_1.c,test.t2_1.b 2 100.00Using index
2 SUBQUERY NULL NULL NULL NULL NULL NULL NULL NULL Select tables optimized away
Warnings:
Note 1003 select `test`.`t2_1`.`b` AS `b` from `test`.`t1` join `test`.`t2` `t2_1` join `test`.`t2` `t2_2` where ((`test`.`t2_2`.`c` = `test`.`t2_1`.`c`) and (`test`.`t2_2`.`b` = `test`.`t2_1`.`b`) and (((select count(0) from `test`.`t2`) is not null) or (`test`.`t2_1`.`c` = 'x'))) group by `test`.`t2_1`.`b`

Minimal optimizer_switch: in_to_exists=on
Full optimizer_switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on

Test case:

SET optimizer_switch = 'in_to_exists=on';

CREATE TABLE t1 ( a VARCHAR(1) ) ENGINE=MyISAM;
INSERT INTO t1 VALUES ('x');

CREATE TABLE t2 ( b INT, c VARCHAR(1), KEY (c, b) ) ENGINE=MyISAM;
INSERT INTO t2 VALUES
(4, 'd'),(8, 'g'),(3, 'x'),(3, 'f'),
(0, 'p'),(3, 'j'),(8, 'c');

SELECT t2_1.b
FROM t1 JOIN t2 AS t2_1 JOIN t2 AS t2_2
    ON (t2_2.b = t2_1.b ) AND (t2_2.c = t2_1.c )
WHERE
  ( SELECT COUNT(*) FROM t2 ) IS NOT NULL
  OR a = t2_1.c
GROUP BY t2_1.b;

# End of test case

# Expected result:
# b
# --
# 0
# 3
# 4
# 8

# Actual result:
# b
# --
# 8
# 4
# 3
# 8
# 3
# 0
# 3

Related branches

Revision history for this message
Elena Stepanova (elenst) wrote :

Setting importance to medium for now, due to t1 being 1-row table, looks like an edge case. Please adjust if further analysis shows otherwise.

Changed in maria:
assignee: nobody → Oleksandr "Sanja" Byelkin (sanja-byelkin)
importance: Undecided → Medium
Elena Stepanova (elenst)
tags: added: optimizer
Elena Stepanova (elenst)
tags: added: wrong-result
Changed in maria:
status: New → In Progress
Revision history for this message
Oleksandr "Sanja" Byelkin (sanja-byelkin) wrote :

Subquery is not important:

CREATE TABLE t1 ( a VARCHAR(1) ) ENGINE=MyISAM;
INSERT INTO t1 VALUES ('x');
CREATE TABLE t2 ( b INT, c VARCHAR(1), KEY (c, b) ) ENGINE=MyISAM;
INSERT INTO t2 VALUES
(4, 'd'),(8, 'g'),(3, 'x'),(3, 'f'),
(0, 'p'),(3, 'j'),(8, 'c');
SELECT t2_1.b as zzz
FROM t1 JOIN t2 AS t2_1 JOIN t2 AS t2_2
ON (t2_2.b = t2_1.b ) AND (t2_2.c = t2_1.c )
WHERE
rand() + 1 > 0 OR
a = t2_1.c
GROUP BY zzz;
zzz
8
4
3
8
3
0
3
drop table t1, t2;

summary: Wrong result (extra rows, not grouped) with JOIN, AND in ON condition,
- multi-part key, GROUP BY, subquery and OR in WHERE, MyISAM or Aria
+ multi-part key, GROUP BY, OR in WHERE, MyISAM or Aria
Revision history for this message
Oleksandr "Sanja" Byelkin (sanja-byelkin) wrote :

It is repeatable on 5.1

Changed in maria:
milestone: 5.3 → 5.1
Revision history for this message
Oleksandr "Sanja" Byelkin (sanja-byelkin) wrote :

fixed in maria 5.5

Revision history for this message
Oleksandr "Sanja" Byelkin (sanja-byelkin) wrote :

Ok, the original test suite still repeatable in 5.5 so fix of query with rand should be modified for subselects.

Revision history for this message
Oleksandr "Sanja" Byelkin (sanja-byelkin) wrote :

It even do not need random function:

SELECT t2_1.b as zzz
FROM t1 JOIN t2 AS t2_1 JOIN t2 AS t2_2
ON (t2_2.b = t2_1.b ) AND (t2_2.c = t2_1.c )
WHERE
t2_1.b + 1 > 0 OR
a = t2_1.c
GROUP BY zzz;
zzz
8
4
3
8
3
0
3

Revision history for this message
Oleksandr "Sanja" Byelkin (sanja-byelkin) wrote :

All incorrect results has key length in t2_1 is 3 in correct result it is 7 (possible key is the same - c)

Revision history for this message
Oleksandr "Sanja" Byelkin (sanja-byelkin) wrote :

oops 3 and 7 is estimation of rows

Revision history for this message
Oleksandr "Sanja" Byelkin (sanja-byelkin) wrote :

Correct execution uses temporary table.

Revision history for this message
Oleksandr "Sanja" Byelkin (sanja-byelkin) wrote :

It correct case test_if_skip_sort_order() decide to use filesort in incorrect one: "Keep current (ordered) select->quick" (which should not be true).

Revision history for this message
Oleksandr "Sanja" Byelkin (sanja-byelkin) wrote :

test_if_order_by_key() thinks that part 't2_1.c' (first part) of the index is constant so it could skip it and then part 't2_1.b' (second part) is appropriate for making sorting. But 't2_1.c' is not constant (because condition in WHERE connected with OR).

Revision history for this message
Oleksandr "Sanja" Byelkin (sanja-byelkin) wrote :

It is "multiple equal('x', `test`.`t2_2`.`c`, `test`.`t2_1`.`c`)" which trigger assigning key part constant 'x' is value of `test`.`t1`.`a`.

Here is visible how we get this triple equality:

WHERE:(original) ((((`test`.`t2_1`.`b` + 1) > 0) or (`test`.`t1`.`a` = `test`.`t2_1`.`c`)) and (`test`.`t2_2`.`b` = `test`.`t2_1`.`b`) and (`test`.`t2_2`.`c` = `test`.`t2_1`.`c`))

WHERE:(after equal_items) ((((`test`.`t2_1`.`b` + 1) > 0) or multiple equal(`test`.`t2_2`.`c`, `test`.`t2_1`.`c`, `test`.`t1`.`a`)) and multiple equal(`test`.`t2_2`.`b`, `test`.`t2_1`.`b`) and multiple equal(`test`.`t2_2`.`c`, `test`.`t2_1`.`c`))

WHERE:(after const change) ((((`test`.`t2_1`.`b` + 1) > 0) or multiple equal(`test`.`t2_2`.`c`, `test`.`t2_1`.`c`, `test`.`t1`.`a`)) and multiple equal(`test`.`t2_2`.`b`, `test`.`t2_1`.`b`) and multiple equal(`test`.`t2_2`.`c`, `test`.`t2_1`.`c`))

WHERE:(after remove) ((((`test`.`t2_1`.`b` + 1) > 0) or multiple equal(`test`.`t2_2`.`c`, `test`.`t2_1`.`c`, `test`.`t1`.`a`)) and multiple equal(`test`.`t2_2`.`b`, `test`.`t2_1`.`b`) and multiple equal(`test`.`t2_2`.`c`, `test`.`t2_1`.`c`))

Changed in maria:
status: In Progress → Fix Committed
Revision history for this message
Oleksandr "Sanja" Byelkin (sanja-byelkin) wrote :

fixed and merged to 5.3 with the original test suite.

Changed in maria:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.