Some aggregate functions (such as MIN MAX) work incorrectly in subqueries after getting NULL value

Bug #612894 reported by Philip Stoev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
Fix Released
High
Oleksandr "Sanja" Byelkin

Bug Description

The following query

SELECT table1 .`col_varchar_nokey`
FROM CC table1 STRAIGHT_JOIN ( view_B table2 STRAIGHT_JOIN CC table3 ON table3 .`col_varchar_key` = table2 .`col_varchar_key` ) ON table3 .`col_varchar_nokey` = table2 .`col_varchar_key`
WHERE EXISTS (
SELECT DISTINCT `pk`
FROM CC
WHERE `pk` < table3 .`pk` ) OR table1 .`col_int_nokey` ;

Returns 1 extra row when executed with subquery cache, as compared to execution with no subquery cache, maria-5.2 or mysql 5.5.5-m3.

Revision history for this message
Philip Stoev (pstoev-askmonty) wrote :

Test case:

CREATE TABLE `B` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `col_int_nokey` int(11) NOT NULL,
  `col_int_key` int(11) NOT NULL,
  `col_date_key` date NOT NULL,
  `col_date_nokey` date NOT NULL,
  `col_time_key` time NOT NULL,
  `col_time_nokey` time NOT NULL,
  `col_datetime_key` datetime NOT NULL,
  `col_datetime_nokey` datetime NOT NULL,
  `col_varchar_key` varchar(1) NOT NULL,
  `col_varchar_nokey` varchar(1) NOT NULL,
  PRIMARY KEY (`pk`),
  KEY `col_int_key` (`col_int_key`),
  KEY `col_date_key` (`col_date_key`),
  KEY `col_time_key` (`col_time_key`),
  KEY `col_datetime_key` (`col_datetime_key`),
  KEY `col_varchar_key` (`col_varchar_key`,`col_int_key`)
);

INSERT INTO `B` VALUES (1,1,7,'1900-01-01','1900-01-01','01:13:38','01:13:38','2005-02-05 00:00:00','2005-02-05 00:00:00','p','p');

CREATE VIEW view_B AS SELECT * FROM B;

CREATE TABLE `CC` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `col_int_nokey` int(11) NOT NULL,
  `col_int_key` int(11) NOT NULL,
  `col_varchar_key` varchar(1) NOT NULL,
  `col_varchar_nokey` varchar(1) NOT NULL,
  PRIMARY KEY (`pk`),
  KEY `col_int_key` (`col_int_key`),
  KEY `col_varchar_key` (`col_varchar_key`,`col_int_key`)
) ENGINE=MyISAM AUTO_INCREMENT=30 DEFAULT CHARSET=latin1;
INSERT INTO `CC` VALUES (10,3,8,'v','v');
INSERT INTO `CC` VALUES (11,3,8,'f','f');
INSERT INTO `CC` VALUES (12,3,5,'v','v');
INSERT INTO `CC` VALUES (13,2,8,'s','s');
INSERT INTO `CC` VALUES (14,1,8,'a','a');
INSERT INTO `CC` VALUES (15,0,6,'p','p');
INSERT INTO `CC` VALUES (16,8,7,'z','z');
INSERT INTO `CC` VALUES (17,5,2,'a','a');
INSERT INTO `CC` VALUES (18,9,5,'h','h');
INSERT INTO `CC` VALUES (19,5,7,'h','h');
INSERT INTO `CC` VALUES (20,4,2,'v','v');
INSERT INTO `CC` VALUES (21,2,9,'v','v');
INSERT INTO `CC` VALUES (22,33,142,'b','b');
INSERT INTO `CC` VALUES (23,5,3,'y','y');
INSERT INTO `CC` VALUES (24,1,0,'v','v');
INSERT INTO `CC` VALUES (25,9,3,'m','m');
INSERT INTO `CC` VALUES (26,1,5,'z','z');
INSERT INTO `CC` VALUES (27,3,9,'n','n');
INSERT INTO `CC` VALUES (28,8,1,'d','d');
INSERT INTO `CC` VALUES (29,231,107,'a','a');

SET SESSION optimizer_switch = 'subquery_cache=off';

/* cache is ON */ SELECT COUNT(*)
FROM CC table1 STRAIGHT_JOIN ( view_B table2 STRAIGHT_JOIN CC table3 ON table3 .`col_varchar_key` = table2 .`col_varchar_key` ) ON table3 .`col_varchar_nokey` = table2 .`col_varchar_key`
WHERE EXISTS (
SELECT DISTINCT `pk`
FROM CC
WHERE `pk` < table3 .`pk` ) OR table1 .`col_int_nokey` ;

/* cache is OFF */ SELECT COUNT(*)
FROM CC table1 STRAIGHT_JOIN ( view_B table2 STRAIGHT_JOIN CC table3 ON table3 .`col_varchar_key` = table2 .`col_varchar_key` ) ON table3 .`col_varchar_nokey` = table2 .`col_varchar_key`
WHERE EXISTS (
SELECT DISTINCT `pk`
FROM CC
WHERE `pk` < table3 .`pk` ) OR table1 .`col_int_nokey` ;

Changed in maria:
importance: Undecided → High
assignee: nobody → Oleksandr "Sanja" Byelkin (sanja-byelkin)
milestone: none → 5.3
Revision history for this message
Philip Stoev (pstoev-askmonty) wrote : Re: Subquery cache returns extra rows on query with nested joins

Here is another example without views:

CREATE TABLE `CC` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `col_int_nokey` int(11) NOT NULL,
  `col_int_key` int(11) NOT NULL,
  `col_varchar_key` varchar(1) NOT NULL,
  `col_varchar_nokey` varchar(1) NOT NULL,
  PRIMARY KEY (`pk`),
  KEY `col_int_key` (`col_int_key`),
  KEY `col_varchar_key` (`col_varchar_key`,`col_int_key`)
) ENGINE=MyISAM AUTO_INCREMENT=30 DEFAULT CHARSET=latin1;
INSERT INTO `CC` VALUES (10,3,8,'v','v');
INSERT INTO `CC` VALUES (11,3,8,'f','f');
INSERT INTO `CC` VALUES (12,3,5,'v','v');
INSERT INTO `CC` VALUES (13,2,8,'s','s');
INSERT INTO `CC` VALUES (14,1,8,'a','a');
INSERT INTO `CC` VALUES (15,0,6,'p','p');
INSERT INTO `CC` VALUES (16,8,7,'z','z');
INSERT INTO `CC` VALUES (17,5,2,'a','a');
INSERT INTO `CC` VALUES (18,9,5,'h','h');
INSERT INTO `CC` VALUES (19,5,7,'h','h');
INSERT INTO `CC` VALUES (20,4,2,'v','v');
INSERT INTO `CC` VALUES (21,2,9,'v','v');
INSERT INTO `CC` VALUES (22,33,142,'b','b');
INSERT INTO `CC` VALUES (23,5,3,'y','y');
INSERT INTO `CC` VALUES (24,1,0,'v','v');
INSERT INTO `CC` VALUES (25,9,3,'m','m');
INSERT INTO `CC` VALUES (26,1,5,'z','z');
INSERT INTO `CC` VALUES (27,3,9,'n','n');
INSERT INTO `CC` VALUES (28,8,1,'d','d');
INSERT INTO `CC` VALUES (29,231,107,'a','a');
CREATE TABLE `C` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `col_int_nokey` int(11) NOT NULL,
  `col_int_key` int(11) NOT NULL,
  `col_varchar_key` varchar(1) NOT NULL,
  `col_varchar_nokey` varchar(1) NOT NULL,
  PRIMARY KEY (`pk`),
  KEY `col_int_key` (`col_int_key`),
  KEY `col_varchar_key` (`col_varchar_key`,`col_int_key`)
) ENGINE=MyISAM AUTO_INCREMENT=21 DEFAULT CHARSET=latin1;
INSERT INTO `C` VALUES (1,2,9,'x','x');
INSERT INTO `C` VALUES (2,9,5,'g','g');
INSERT INTO `C` VALUES (3,6,1,'o','o');
INSERT INTO `C` VALUES (4,7,0,'g','g');
INSERT INTO `C` VALUES (5,0,1,'v','v');
INSERT INTO `C` VALUES (6,97,190,'m','m');
INSERT INTO `C` VALUES (7,3,6,'x','x');
INSERT INTO `C` VALUES (8,3,3,'c','c');
INSERT INTO `C` VALUES (9,4,4,'z','z');
INSERT INTO `C` VALUES (10,9,3,'i','i');
INSERT INTO `C` VALUES (11,101,186,'x','x');
INSERT INTO `C` VALUES (12,0,1,'g','g');
INSERT INTO `C` VALUES (13,8,8,'q','q');
INSERT INTO `C` VALUES (14,194,226,'m','m');
INSERT INTO `C` VALUES (15,148,133,'p','p');
INSERT INTO `C` VALUES (16,9,6,'e','e');
INSERT INTO `C` VALUES (17,9,3,'t','t');
INSERT INTO `C` VALUES (18,1,8,'j','j');
INSERT INTO `C` VALUES (19,1,5,'h','h');
INSERT INTO `C` VALUES (20,5,7,'w','w');

SET SESSION optimizer_switch = 'subquery_cache=off';

/* cache is OFF */ SELECT (
SELECT MIN( SUBQUERY1_t1 .`col_int_key` )
FROM C SUBQUERY1_t1 JOIN ( C SUBQUERY1_t2 JOIN CC SUBQUERY1_t3 ON SUBQUERY1_t2 .`pk` ) ON SUBQUERY1_t3 .`col_varchar_nokey` = SUBQUERY1_t2 .`col_varchar_key`
WHERE table1 .`col_int_nokey` )
FROM CC table1 ;

SET SESSION optimizer_switch = 'subquery_cache=on';

/* cache is ON */ SELECT (
SELECT MIN( SUBQUERY1_t1 .`col_int_key` )
FROM C SUBQUERY1_t1 JOIN ( C SUBQUERY1_t2 JOIN CC SUBQUERY1_t3 ON SUBQUERY1_t2 .`pk` ) ON SUBQUERY1_t3 .`col_varchar_nokey` = SUBQUERY1_t2 .`col_varchar_key`
WHERE table1 .`col_int_nokey` )
FROM CC table1 ;

summary: - Subquery cache returns extra rows on query with view
+ Subquery cache returns extra rows on query with nested joins
Changed in maria:
status: New → In Progress
Revision history for this message
Oleksandr "Sanja" Byelkin (sanja-byelkin) wrote :

If change test suite a bit it (add subquery parameter to output) is clear that bug is bug of subquery (not subquery cache):

SET SESSION optimizer_switch = 'subquery_cache=off';
/* cache is OFF */ SELECT table1 .`col_int_nokey`,(
SELECT MIN( SUBQUERY1_t1 .`col_int_key` )
FROM C SUBQUERY1_t1 JOIN ( C SUBQUERY1_t2 JOIN CC SUBQUERY1_t3 ON SUBQUERY1_t2 .`pk` ) ON SUBQUERY1_t3 .`col_varchar_nokey` = SUBQUERY1_t2 .`col_varchar_key`
WHERE table1 .`col_int_nokey` )
FROM CC table1 ;
col_int_nokey (
SELECT MIN( SUBQUERY1_t1 .`col_int_key` )
FROM C SUBQUERY1_t1 JOIN ( C SUBQUERY1_t2 JOIN CC SUBQUERY1_t3 ON SUBQUERY1_t2 .`pk` ) ON SUBQUERY1_t3 .`col_varchar_nokey` = SUBQUERY1_t2 .`col_varchar_key`
WHERE table1 .`col_int_nokey` )
3 0
2 0
0 NULL
8 NULL
5 NULL
2 NULL
33 NULL
5 NULL
1 NULL
9 NULL
1 NULL
3 NULL
8 NULL
231 NULL
SET SESSION optimizer_switch = 'subquery_cache=on';
/* cache is ON */
SELECT table1 .`col_int_nokey`,(
SELECT
MIN( SUBQUERY1_t1 .`col_int_key` )
FROM C SUBQUERY1_t1 JOIN (
C SUBQUERY1_t2 JOIN CC SUBQUERY1_t3 ON SUBQUERY1_t2 .`pk` )
ON SUBQUERY1_t3 .`col_varchar_nokey` = SUBQUERY1_t2 .`col_varchar_key`
WHERE table1 .`col_int_nokey` )
FROM CC table1 ;
col_int_nokey (
SELECT
MIN( SUBQUERY1_t1 .`col_int_key` )
FROM C SUBQUERY1_t1 JOIN (
C SUBQUERY1_t2 JOIN CC SUBQUERY1_t3 ON SUBQUERY1_t2 .`pk` )
ON SUBQUERY1_t3 .`col_varchar_nokey` = SUBQUERY1_t2 .`col_varchar_key`
WHERE table1 .`col_int_nokey` )
3 0
2 0
0 NULL
8 NULL
5 NULL
2 0
33 NULL
5 NULL
1 NULL
9 NULL
1 NULL
3 0
8 NULL
231 NULL

Revision history for this message
Oleksandr "Sanja" Byelkin (sanja-byelkin) wrote :

The problem present in 5.1

Changed in maria:
milestone: 5.3 → 5.1
summary: - Subquery cache returns extra rows on query with nested joins
+ Some Aggregate functions (like MIN MAX) works incorrectly in subqueries
+ after getting NULL value\
summary: Some Aggregate functions (like MIN MAX) works incorrectly in subqueries
- after getting NULL value\
+ after getting NULL value
Revision history for this message
Philip Stoev (pstoev-askmonty) wrote :

maria-5.2 and mysql-5.5.5 agree that the result from the queries is as follows:

SELECT (
SELECT MIN( SUBQUERY1_t1 .`col_int_key` )
FROM C SUBQUERY1_t1 JOIN ( C SUBQUERY1_t2 JOIN CC SUBQUERY1_t3 ON SUBQUERY1_t2 .`pk` ) ON SUBQUERY1_t3 .`col_varchar_nokey` = SUBQUERY1_t2 .`col_varchar_key`
WHERE table1 .`col_int_nokey` )
FROM CC table1 ;
(
SELECT MIN( SUBQUERY1_t1 .`col_int_key` )
FROM C SUBQUERY1_t1 JOIN ( C SUBQUERY1_t2 JOIN CC SUBQUERY1_t3 ON SUBQUERY1_t2 .`pk` ) ON SUBQUERY1_t3 .`col_varchar_nokey` = SUBQUERY1_t2 .`col_varchar_key`
WHERE table1 .`col_int_nokey` )
0
0
0
0
0
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL

SELECT table1 .`col_int_nokey`, (
SELECT MIN( SUBQUERY1_t1 .`col_int_key` )
FROM C SUBQUERY1_t1 JOIN ( C SUBQUERY1_t2 JOIN CC SUBQUERY1_t3 ON SUBQUERY1_t2 .`pk` ) ON SUBQUERY1_t3 .`col_varchar_nokey` = SUBQUERY1_t2 .`col_varchar_key`
WHERE table1 .`col_int_nokey` )
FROM CC table1 ;
col_int_nokey (
SELECT MIN( SUBQUERY1_t1 .`col_int_key` )
FROM C SUBQUERY1_t1 JOIN ( C SUBQUERY1_t2 JOIN CC SUBQUERY1_t3 ON SUBQUERY1_t2 .`pk` ) ON SUBQUERY1_t3 .`col_varchar_nokey` = SUBQUERY1_t2 .`col_varchar_key`
WHERE table1 .`col_int_nokey` )
3 0
3 0
3 0
2 0
1 0
0 NULL
8 NULL
5 NULL
9 NULL
5 NULL
4 NULL
2 NULL
33 NULL
5 NULL
1 NULL
9 NULL
1 NULL
3 NULL
8 NULL
231 NULL

summary: - Some Aggregate functions (like MIN MAX) works incorrectly in subqueries
- after getting NULL value
+ Some aggregate functions (such as MIN MAX) work incorrectly in
+ subqueries after getting NULL value
Revision history for this message
Philip Stoev (pstoev-askmonty) wrote :

Maria 5.1 also returns the result pasted in the above comment. In other words maria 5.1, maria 5.2 and mysql 5.5.5 appear to agree on what the correct result is for both queries.

Revision history for this message
Oleksandr "Sanja" Byelkin (sanja-byelkin) wrote :

The bug can be reproduced on all mariadb and mysql 5.1 and 5.5

The problem is in not reset null_value...

Revision history for this message
Oleksandr "Sanja" Byelkin (sanja-byelkin) wrote :

To be clear correct result should have NULL only for parameter 0.

Revision history for this message
Oleksandr "Sanja" Byelkin (sanja-byelkin) wrote :

For simple subqueries the problem hidden by the subquery evaluation mechanism which return NULL if theer was no resulte at all, because of thie queries like:

SELECT t1.col_int_nokey, (select max(t2.col_int_key2) from t2 where t1.col_int_nokey) from t1;

works correctly with min/max optimisation in opt_sum.cc. It do not works for outer join which alwats return result.

Also the problem visible without subqueries, lets take impossible where with random function in it (which should return NULL but retuirn 0):
+SELECT MIN( SUBQUERY1_t1 .`col_int_key` )
+FROM t2 SUBQUERY1_t1 JOIN ( t2 SUBQUERY1_t2 JOIN t1 SUBQUERY1_t3 ON SUBQUERY1_t2 .`pk` ) ON SUBQUERY1_t3 .`col_varchar_nokey` = SUBQUERY1_t2 .`col_varchar_key`
+WHERE rand()*0 != 0;
+MIN( SUBQUERY1_t1 .`col_int_key` )
+0
+select rand()*0 != 0;
+rand()*0 != 0
+0

the way to solve the problem I see in the fact that such MAX/MIN has only two possible results:
1)calculated by optimizer constant
2)NULL if the WHERE clause return FALSE

Revision history for this message
Oleksandr "Sanja" Byelkin (sanja-byelkin) wrote :

CREATE TABLE `t1` (
  `col_int_nokey` int(11) NOT NULL
) ENGINE=MyISAM AUTO_INCREMENT=30 DEFAULT CHARSET=latin1;
INSERT INTO `t1` VALUES (2);
INSERT INTO `t1` VALUES (0);
INSERT INTO `t1` VALUES (8);
INSERT INTO `t1` VALUES (5);
INSERT INTO `t1` VALUES (2);
INSERT INTO `t1` VALUES (33);
INSERT INTO `t1` VALUES (5);
INSERT INTO `t1` VALUES (1);
INSERT INTO `t1` VALUES (9);
INSERT INTO `t1` VALUES (1);
INSERT INTO `t1` VALUES (3);
INSERT INTO `t1` VALUES (8);
INSERT INTO `t1` VALUES (231);
CREATE TABLE `t2` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `col_int_nokey` int(11) NOT NULL,
  `col_int_key` int(11) NOT NULL,
  `col_int_key2` varchar(1) NOT NULL,
  PRIMARY KEY (`pk`),
  KEY `col_int_key` (`col_int_key`),
  KEY `col_int_key2` (`col_int_key2`)
) ENGINE=MyISAM AUTO_INCREMENT=21 DEFAULT CHARSET=latin1;
INSERT INTO `t2` VALUES (3,6,1,1);
INSERT INTO `t2` VALUES (4,7,0,20);
INSERT INTO `t2` VALUES (5,0,1,30);
INSERT INTO `t2` VALUES (6,97,190,40);

SELECT t1.col_int_nokey, (select max(t2.col_int_key2) from t2 where
t1.col_int_nokey) from t1;

drop table t1, t2;

Revision history for this message
Oleksandr "Sanja" Byelkin (sanja-byelkin) wrote :

CREATE TABLE `t1` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `col_int_nokey` int(11) NOT NULL,
  `col_int_key` int(11) NOT NULL,
  `col_varchar_key` varchar(1) NOT NULL,
  `col_varchar_nokey` varchar(1) NOT NULL,
  PRIMARY KEY (`pk`),
  KEY `col_int_key` (`col_int_key`),
  KEY `col_varchar_key` (`col_varchar_key`,`col_int_key`)
) ENGINE=MyISAM AUTO_INCREMENT=30 DEFAULT CHARSET=latin1;
INSERT INTO `t1` VALUES (13,2,8,'s','s');
INSERT INTO `t1` VALUES (15,0,6,'p','p');
INSERT INTO `t1` VALUES (16,8,7,'z','z');
INSERT INTO `t1` VALUES (19,5,7,'h','h');
INSERT INTO `t1` VALUES (21,2,9,'v','v');
INSERT INTO `t1` VALUES (22,33,142,'b','b');
INSERT INTO `t1` VALUES (23,5,3,'y','y');
INSERT INTO `t1` VALUES (24,1,0,'v','v');
INSERT INTO `t1` VALUES (25,9,3,'m','m');
INSERT INTO `t1` VALUES (26,1,5,'z','z');
INSERT INTO `t1` VALUES (27,3,9,'n','n');
INSERT INTO `t1` VALUES (28,8,1,'d','d');
INSERT INTO `t1` VALUES (29,231,107,'a','a');
CREATE TABLE `t2` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `col_int_nokey` int(11) NOT NULL,
  `col_int_key` int(11) NOT NULL,
  `col_varchar_key` varchar(1) NOT NULL,
  `col_varchar_nokey` varchar(1) NOT NULL,
  PRIMARY KEY (`pk`),
  KEY `col_int_key` (`col_int_key`),
  KEY `col_varchar_key` (`col_varchar_key`,`col_int_key`)
) ENGINE=MyISAM AUTO_INCREMENT=21 DEFAULT CHARSET=latin1;
INSERT INTO `t2` VALUES (3,6,1,'o','o');
INSERT INTO `t2` VALUES (4,7,0,'g','g');
INSERT INTO `t2` VALUES (5,0,1,'v','v');
INSERT INTO `t2` VALUES (6,97,190,'m','m');

SELECT table1 .`col_int_nokey`,(
SELECT MIN( SUBQUERY1_t1 .`col_int_key` )
FROM t2 SUBQUERY1_t1 JOIN ( t2 SUBQUERY1_t2 JOIN t1 SUBQUERY1_t3 ON SUBQUERY1_t2 .`pk` ) ON SUBQUERY1_t3 .`col_varchar_nokey` = SUBQUERY1_t2 .`col_varchar_key`
WHERE table1 .`col_int_nokey` )
FROM t1 table1 ;

SELECT MIN( SUBQUERY1_t1 .`col_int_key` )
FROM t2 SUBQUERY1_t1 JOIN ( t2 SUBQUERY1_t2 JOIN t1 SUBQUERY1_t3 ON SUBQUERY1_t2 .`pk` ) ON SUBQUERY1_t3 .`col_varchar_nokey` = SUBQUERY1_t2 .`col_varchar_key`
WHERE rand()*0 != 0;

select rand()*0 != 0;

drop table t1, t2;

Revision history for this message
Michael Widenius (monty) wrote :

The bug is related to how we mark items when there is no rows in a group.
The following query is related to this:

CREATE TABLE t1 (a int(11) NOT NULL);
INSERT INTO t1 VALUES (1),(2);
CREATE TABLE t2 (
 key_col int(11) NOT NULL,
 KEY (key_col)
);
INSERT INTO t2 VALUES (1),(2);

select min(t2.key_col) from t1,t2 where t1.a=1;
> 1
select min(t2.key_col) from t1,t2 where t1.a > 1000;
> NULL
select min(t2.key_col)+1 from t1,t2 where t1.a> 1000;
> 2

The last query should return NULL

Michael Widenius (monty)
Changed in maria:
status: In Progress → Fix Committed
Revision history for this message
Philip Stoev (pstoev-askmonty) wrote :

Sanja, what is the situation with the first query from this bug report and the first test case? They do not involve MIN and MAX and COUNT is only used for convenience in order to reduce the size of the result set.

Changed in maria:
status: Fix Committed → Fix Released
status: Fix Released → Fix Committed
Changed in maria:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.