A Lightweight SQL Database for Cloud Infrastructure and Web Applications

Adding STRAIGHT_JOIN to certain queries results in a large number of duplicate rows

Reported by Patrick Crews on 2010-05-25
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Drizzle
Medium
Prafulla Tekawade
Dexter
Medium
Prafulla Tekawade

Bug Description

Adding STRAIGHT_JOIN to certain queries can add a large number of duplicate rows to the result set.
This appears to be related to MySQL bug - http://bugs.mysql.com/bug.php?id=48103 Using STRAIGHT_JOIN with Innodb tables can produce duplicate results

From the test case:
SELECT COUNT(table1 .`pk`)
FROM m table1 RIGHT JOIN o JOIN o table3 ON table3 .`col_int_key` ON table1 .`col_int` = table3 .`col_int_key` ;
COUNT(table1 .`pk`)
187
SELECT STRAIGHT_JOIN COUNT(table1 .`pk`)
FROM m table1 RIGHT JOIN o JOIN o table3 ON table3 .`col_int_key` ON table1 .`col_int` = table3 .`col_int_key` /* TRANSFORM_OUTCOME_UNORDERED_MATCH */;
COUNT(table1 .`pk`)
5083

Query plan for regular query:
# |1|SIMPLE|o|index||PRIMARY|4||25|Using index|
# |1|SIMPLE|table3|index||PRIMARY|4||25|Using where; Using join buffer|
# |1|SIMPLE|table1|ALL|||||23||
# */

Query plan for query with STRAIGHT_JOIN:
=# |1|SIMPLE|table1|ALL|||||23||
# |1|SIMPLE|o|index||PRIMARY|4||25|Using index; Using join buffer|
# |1|SIMPLE|table3|index||PRIMARY|4||25|Using where; Using join buffer|
# */

Patrick Crews (patrick-crews) wrote :

Test suite test case:
#/* Server0: Drizzle 2010.05.1561 */

--disable_warnings
DROP TABLE IF EXISTS `m`;
CREATE TABLE `m` (
  `col_int` int DEFAULT NULL,
  `pk` int NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`pk`)
) ENGINE=InnoDB;

ALTER TABLE `m` DISABLE KEYS;
INSERT INTO `m` VALUES (1,1),(NULL,2),(NULL,3),(-207224832,4),(714801152,5),(-1265106944,6),(2023948288,7),(865599488,8),(1,9),(-1895301120,10),(3,11),(9,12),(5,13),(NULL,14),(NULL,15),(1,16),(-224657408,17),(2,18),(-1219493888,19),(7,20),(1,21),(NULL,22),(NULL,23);
ALTER TABLE `m` ENABLE KEYS;

DROP TABLE IF EXISTS `o`;
CREATE TABLE `o` (
  `col_int_key` int DEFAULT NULL,
  `pk` int NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`pk`),
  KEY `col_int_key` (`col_int_key`)
) ENGINE=InnoDB;

ALTER TABLE `o` DISABLE KEYS;
INSERT INTO `o` VALUES (4,1),(-867368960,2),(655032320,3),(1,4),(NULL,5),(-118751232,6),(7,7),(1,8),(3,9),(-518324224,10),(7,11),(1129709568,12),(-2081357824,13),(6,14),(NULL,15),(NULL,16),(NULL,17);
ALTER TABLE `o` ENABLE KEYS;

#/* Begin test case for query 0 */

SELECT COUNT(table1 .`pk`)
FROM m table1 RIGHT JOIN o JOIN o table3 ON table3 .`col_int_key` ON table1 .`col_int` = table3 .`col_int_key` ;

#/* Query plan Server 0:
# |1|SIMPLE|o|index||PRIMARY|4||25|Using index|
# |1|SIMPLE|table3|index||PRIMARY|4||25|Using where; Using join buffer|
# |1|SIMPLE|table1|ALL|||||23||
# */

#/* End of test case for query 0 */

#/* Begin test case for query 1 */

SELECT STRAIGHT_JOIN COUNT(table1 .`pk`)
FROM m table1 RIGHT JOIN o JOIN o table3 ON table3 .`col_int_key` ON table1 .`col_int` = table3 .`col_int_key` /* TRANSFORM_OUTCOME_UNORDERED_MATCH */;

#/* Query plan Server 0:
# |1|SIMPLE|table1|ALL|||||23||
# |1|SIMPLE|o|index||PRIMARY|4||25|Using index; Using join buffer|
# |1|SIMPLE|table3|index||PRIMARY|4||25|Using where; Using join buffer|
# */

#/* End of test case for query 1 */

DROP TABLE m;
DROP TABLE o;

Patrick Crews (patrick-crews) wrote :

NOTE: This appears related to bug - https://bugs.launchpad.net/drizzle/+bug/585628 Adding STRAIGHT_JOIN to certain queries results in loss of returned rows, but the effect is the opposite.

Still researching a bit on both of these and will update / close one out as a duplicate if evidence appears to point to a single root cause.

Changed in drizzle:
importance: Undecided → Medium
status: New → Confirmed
Changed in drizzle:
assignee: nobody → Prafulla Tekawade (prafulla-tekawade)

Analysis:

Fix for Bug 585644,585628

There was problem in dependency flags setting in a table which is
used to sort tables automatically in case of straght_join option.
These dependency flags are set in simplify_join method.
For a case like (a join b) LEFT JOIN c, c was not getting set to
depend on both a AND b, it was getting set only for either a or b,
as specified in original query.
Fixed it properly. Both bug queries now works fine and
also tested some subquery tests, all looks clean.

Hi Patrick,
I have committed fix for this bug at following branch
https://code.launchpad.net/~prafulla-tekawade/drizzle/drz-bugfix-585644_585628

Could you please take a look at it ?

-
Thanks
Prafulla

Changed in drizzle:
status: Confirmed → Fix Committed
Patrick Crews (patrick-crews) wrote :

Tested the patch with the randgen (result set + execution time comparisons).
Bug is fixed via the patch - randgen testing looked good.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.