Percona Server moved to https://jira.percona.com/projects/PS

ORDERBY + LIMIT index not chosen when joining an extra table

Bug #1131426 reported by josecanciani on 2013-02-21

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Percona Server moved to https://jira.percona.com/projects/PS	Won't Fix	Undecided	Unassigned

Bug Description

Summary:
++++++++

SELECT id, x, y FROM t1 ORDER BY x, y, id LIMIT 0, 20
-> works as expected when there is an index on (x,y): it uses the index

but when adding a join, and the second table filter is BAD (> %10 of t1 population):

SELECT id, x, y FROM t1 JOIN t2 ON t1.id = t2.t1Id WHERE t2.id = z ORDER BY x, y, id LIMIT 0, 20
-> fails using t2.id index instead of keep using (x,y) index

Since so many t1 rows match this join, and we just want the first 20, then it should keep using the (x, y) index.

Test case
+++++++

Notice we just use a 100k record table, in a system with more joins and millions of records, getting the first 20 without the proper index takes dozens of seconds.

use test

drop table if exists contact;
drop table if exists rContactTag;

create table contact (
    id int(10),
    firstName varchar(200),
    lastName varchar(200)
) engine = innodb default charset = utf8;

create table rContactTag (
contactId int(10),
tagId int(10)
) engine = innodb default charset = utf8;

drop procedure if exists insert1;
delimiter //
CREATE PROCEDURE insert1()
BEGIN
    SET @x = 0;
    PREPARE stmt FROM 'INSERT INTO contact values (?,?,?)';
    WHILE @x <= 100000 DO
        SET @x = @x + 1;
        EXECUTE stmt USING @x, @x, @x;
    END WHILE;
END
//
delimiter ;

drop procedure if exists insert2;
delimiter //
CREATE PROCEDURE insert2()
BEGIN
    SET @x = 0;
    SET @y = 5;
    PREPARE stmt FROM 'INSERT INTO rContactTag values (?,?)';
    WHILE @x < 50000 DO
        SET @x = @x + 2;
        EXECUTE stmt USING @x, @y;
    END WHILE;
END
//
delimiter ;

call insert1;
call insert2;

alter table contact
add primary key (id),
add key (lastName, firstName);

alter table rContactTag
add key (contactId, tagId),
add key (tagId, contactId);

analyze table contact;
analyze table rContactTag;

Query plans
+++++++++

OK:

FAIL:

OK with straight join, but this is not always good since we don't know how good are the filters

Finally, a real live test with the queries:

SELECT SQL_NO_CACHE contact.id, contact.firstName, contact.lastName FROM contact JOIN rContactTag ON rContactTag.contactId = contact.id WHERE rContactTag.tagId = 5 ORDER BY contact.lastName, contact.firstName, contact.id LIMIT 0, 20;
+-------+-----------+----------+
| id | firstName | lastName |
+-------+-----------+----------+
| 10 | 10 | 10 |
| 100 | 100 | 100 |
| 1000 | 1000 | 1000 |
| 10000 | 10000 | 10000 |
| 10002 | 10002 | 10002 |
| 10004 | 10004 | 10004 |
| 10006 | 10006 | 10006 |
| 10008 | 10008 | 10008 |
| 10010 | 10010 | 10010 |
| 10012 | 10012 | 10012 |
| 10014 | 10014 | 10014 |
| 10016 | 10016 | 10016 |
| 10018 | 10018 | 10018 |
| 1002 | 1002 | 1002 |
| 10020 | 10020 | 10020 |
| 10022 | 10022 | 10022 |
| 10024 | 10024 | 10024 |
| 10026 | 10026 | 10026 |
| 10028 | 10028 | 10028 |
| 10030 | 10030 | 10030 |
+-------+-----------+----------+
20 rows in set (0.59 sec)

SELECT SQL_NO_CACHE STRAIGHT_JOIN contact.id, contact.firstName, contact.lastName FROM contact JOIN rContactTag ON rContactTag.contactId = contact.id WHERE rContactTag.tagId = 5 ORDER BY contact.lastName, contact.firstName, contact.id LIMIT 0, 20;
+-------+-----------+----------+
| id | firstName | lastName |
+-------+-----------+----------+
| 10 | 10 | 10 |
| 100 | 100 | 100 |
| 1000 | 1000 | 1000 |
| 10000 | 10000 | 10000 |
| 10002 | 10002 | 10002 |
| 10004 | 10004 | 10004 |
| 10006 | 10006 | 10006 |
| 10008 | 10008 | 10008 |
| 10010 | 10010 | 10010 |
| 10012 | 10012 | 10012 |
| 10014 | 10014 | 10014 |
| 10016 | 10016 | 10016 |
| 10018 | 10018 | 10018 |
| 1002 | 1002 | 1002 |
| 10020 | 10020 | 10020 |
| 10022 | 10022 | 10022 |
| 10024 | 10024 | 10024 |
| 10026 | 10026 | 10026 |
| 10028 | 10028 | 10028 |
| 10030 | 10030 | 10030 |
+-------+-----------+----------+
20 rows in set (0.00 sec)

Tags:

Revision history for this message

josecanciani (a-launchpad-net-jluis-com-ar) wrote on 2013-02-21:

I've found a similar "bug" at http://bugs.mysql.com/bug.php?id=8036

There's a good explanation at
[29 Jan 2005 2:16] Michael Widenius

I wonder if indeed this was something developed in later releases, or not. And if this is something it can be optimized without affecting other queries.

We have a budget to develop this optimization.

Thanks,
Jose

Revision history for this message

Alexey Kopytov (akopytov) wrote on 2013-02-26:

This requires rather invasive changes to the way the query optimizer evaluates possible execution plans. We don't currently have sufficient resources to implement and maintain such a change.

Changed in percona-server:
status:	New → Won't Fix

Revision history for this message

josecanciani (a-launchpad-net-jluis-com-ar) wrote on 2013-02-26:

Thanks for the info, we are in talks with MariaDB for a possible fix. If we can, we'll include it in Mysql base, I'll keep you updated here.

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-25:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-2896

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

mysql-bugs #8036 Edit

Bug watches keep track of this bug in other bug trackers.