ORDERBY + LIMIT index not chosen when joining an extra table
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Server moved to https://jira.percona.com/projects/PS |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
Summary:
++++++++
SELECT id, x, y FROM t1 ORDER BY x, y, id LIMIT 0, 20
-> works as expected when there is an index on (x,y): it uses the index
but when adding a join, and the second table filter is BAD (> %10 of t1 population):
SELECT id, x, y FROM t1 JOIN t2 ON t1.id = t2.t1Id WHERE t2.id = z ORDER BY x, y, id LIMIT 0, 20
-> fails using t2.id index instead of keep using (x,y) index
Since so many t1 rows match this join, and we just want the first 20, then it should keep using the (x, y) index.
Test case
+++++++
Notice we just use a 100k record table, in a system with more joins and millions of records, getting the first 20 without the proper index takes dozens of seconds.
use test
drop table if exists contact;
drop table if exists rContactTag;
create table contact (
id int(10),
firstName varchar(200),
lastName varchar(200)
) engine = innodb default charset = utf8;
create table rContactTag (
contactId int(10),
tagId int(10)
) engine = innodb default charset = utf8;
drop procedure if exists insert1;
delimiter //
CREATE PROCEDURE insert1()
BEGIN
SET @x = 0;
PREPARE stmt FROM 'INSERT INTO contact values (?,?,?)';
WHILE @x <= 100000 DO
SET @x = @x + 1;
EXECUTE stmt USING @x, @x, @x;
END WHILE;
END
//
delimiter ;
drop procedure if exists insert2;
delimiter //
CREATE PROCEDURE insert2()
BEGIN
SET @x = 0;
SET @y = 5;
PREPARE stmt FROM 'INSERT INTO rContactTag values (?,?)';
WHILE @x < 50000 DO
SET @x = @x + 2;
EXECUTE stmt USING @x, @y;
END WHILE;
END
//
delimiter ;
call insert1;
call insert2;
alter table contact
add primary key (id),
add key (lastName, firstName);
alter table rContactTag
add key (contactId, tagId),
add key (tagId, contactId);
analyze table contact;
analyze table rContactTag;
Query plans
+++++++++
OK:
mysql> explain SELECT contact.id, contact.firstName, contact.lastName FROM contact ORDER BY contact.lastName, contact.firstName, contact.id LIMIT 0, 20;
+----+-
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-
| 1 | SIMPLE | contact | index | NULL | lastName | 1206 | NULL | 20 | Using index |
+----+-
1 row in set (0.00 sec)
FAIL:
mysql> explain SELECT contact.id, contact.firstName, contact.lastName FROM contact JOIN rContactTag ON rContactTag.
+----+-
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-
| 1 | SIMPLE | rContactTag | ref | contactId,tagId | tagId | 5 | const | 12023 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | contact | eq_ref | PRIMARY | PRIMARY | 4 | test.rContactTa
+----+-
OK with straight join, but this is not always good since we don't know how good are the filters
mysql> explain SELECT STRAIGHT_JOIN contact.id, contact.firstName, contact.lastName FROM contact JOIN rContactTag ON rContactTag.
+----+-
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-
| 1 | SIMPLE | contact | index | PRIMARY | lastName | 1206 | NULL | 20 | Using index |
| 1 | SIMPLE | rContactTag | ref | contactId,tagId | contactId | 10 | test.contact.
+----+-
Finally, a real live test with the queries:
SELECT SQL_NO_CACHE contact.id, contact.firstName, contact.lastName FROM contact JOIN rContactTag ON rContactTag.
+------
| id | firstName | lastName |
+------
| 10 | 10 | 10 |
| 100 | 100 | 100 |
| 1000 | 1000 | 1000 |
| 10000 | 10000 | 10000 |
| 10002 | 10002 | 10002 |
| 10004 | 10004 | 10004 |
| 10006 | 10006 | 10006 |
| 10008 | 10008 | 10008 |
| 10010 | 10010 | 10010 |
| 10012 | 10012 | 10012 |
| 10014 | 10014 | 10014 |
| 10016 | 10016 | 10016 |
| 10018 | 10018 | 10018 |
| 1002 | 1002 | 1002 |
| 10020 | 10020 | 10020 |
| 10022 | 10022 | 10022 |
| 10024 | 10024 | 10024 |
| 10026 | 10026 | 10026 |
| 10028 | 10028 | 10028 |
| 10030 | 10030 | 10030 |
+------
20 rows in set (0.59 sec)
SELECT SQL_NO_CACHE STRAIGHT_JOIN contact.id, contact.firstName, contact.lastName FROM contact JOIN rContactTag ON rContactTag.
+------
| id | firstName | lastName |
+------
| 10 | 10 | 10 |
| 100 | 100 | 100 |
| 1000 | 1000 | 1000 |
| 10000 | 10000 | 10000 |
| 10002 | 10002 | 10002 |
| 10004 | 10004 | 10004 |
| 10006 | 10006 | 10006 |
| 10008 | 10008 | 10008 |
| 10010 | 10010 | 10010 |
| 10012 | 10012 | 10012 |
| 10014 | 10014 | 10014 |
| 10016 | 10016 | 10016 |
| 10018 | 10018 | 10018 |
| 1002 | 1002 | 1002 |
| 10020 | 10020 | 10020 |
| 10022 | 10022 | 10022 |
| 10024 | 10024 | 10024 |
| 10026 | 10026 | 10026 |
| 10028 | 10028 | 10028 |
| 10030 | 10030 | 10030 |
+------
20 rows in set (0.00 sec)
I've found a similar "bug" at http:// bugs.mysql. com/bug. php?id= 8036
There's a good explanation at
[29 Jan 2005 2:16] Michael Widenius
I wonder if indeed this was something developed in later releases, or not. And if this is something it can be optimized without affecting other queries.
We have a budget to develop this optimization.
Thanks,
Jose