Optimizer chooses wrong plan when joining 2 tables

Bug #1664287 reported by Alexander Rubin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL Server
Unknown
Unknown
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.5
New
Undecided
Unassigned
5.6
Triaged
Medium
Unassigned
5.7
Triaged
Medium
Unassigned

Bug Description

Upstream bug: https://bugs.mysql.com/bug.php?id=83323
Affects Percona Server 5.6 and 5.7 as well

Optimizer chooses wrong plan when joining 2 tables:

mysql> explain select * from events_new_distr e join profiles p on e.profile_id = p.id where event_date between '2016-01-01' and '2016-01-01' + interval 15 day and status = 'active' and profile_id in (1,2,4,5,6,7,8,9,10,11,12,13,14,15,16) order by event_date desc limit 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: p
   partitions: NULL
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 15
     filtered: 10.00
        Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: e
   partitions: NULL
         type: ref
possible_keys: profile_id,event_date,comb
          key: comb
      key_len: 5
          ref: optimizer_issue.p.id
         rows: 75696
     filtered: 3.28
        Extra: Using where; Using index
2 rows in set, 1 warning (0.00 sec)

Response time: 40 seconds.

The problem here is the optimizer does not choose the right table order and decided to use temp table.

The right way is this:

mysql> explain select STRAIGHT_JOIN * from events_new_distr e join profiles p on e.profile_id = p.id where event_date between '2016-01-01' and '2016-01-01' + interval 15 day and status = 'active' and profile_id in (1,2,4,5,6,7,8,9,10,11,12,13,14,15,16) order by event_date desc limit 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: e
   partitions: NULL
         type: range
possible_keys: profile_id,event_date,comb
          key: event_date
      key_len: 6
          ref: NULL
         rows: 1703196
     filtered: 50.00
        Extra: Using index condition; Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: p
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: optimizer_issue.e.profile_id
         rows: 1
     filtered: 10.00
        Extra: Using where
2 rows in set, 1 warning (0.01 sec)

With ORDER BY + LIMIT 10 MySQL can:
1. Avoid creating temp table and scan the table in the order of the index
2. Abort the query execution when it will find 10 rows.

(That is ORDER BY + LIMIT optimization, http://dev.mysql.com/doc/refman/5.7/en/limit-optimization.html)

However, 75696*15 = 1135440 rows < 1703196 rows.
What is also interesting is that the second plan is not even considered in optimizer trace, attached.

How to repeat:
Smaller test case:

CREATE TABLE `profiles` (
  `id` int(11) NOT NULL DEFAULT '0' AUTO_INCREMENT,
  `email` varchar(255) DEFAULT NULL,
  `status` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `email` (`email`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

mysql> insert into profiles(status) values ('active');
Query OK, 1 row affected (0.00 sec)

...

mysql> insert into profiles (status) select status from profiles;
Query OK, 16384 rows affected (0.20 sec)
Records: 16384 Duplicates: 0 Warnings: 0

CREATE TABLE `events` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `event_date` datetime DEFAULT NULL,
  `profile_id` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `profile_id` (`profile_id`),
  KEY `event_date` (`event_date`),
  KEY `comb` (`profile_id`,`event_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

mysql> insert into events (event_date, profile_id) select now() - interval id second, id from profiles;
Query OK, 32768 rows affected (0.93 sec)
Records: 32768 Duplicates: 0 Warnings: 0

...

up to 425984 rows

mysql> update events set event_date = now() - interval id second;
Query OK, 425984 rows affected (15.88 sec)
Rows matched: 425984 Changed: 425984 Warnings: 0

mysql> update events set profile_id = 1;
Query OK, 425971 rows affected (15.41 sec)
Rows matched: 425984 Changed: 425971 Warnings: 0

mysql> update events set profile_id = 2 where mod(id, 1000) = 1;
Query OK, 426 rows affected (0.41 sec)
Rows matched: 426 Changed: 426 Warnings: 0

mysql> update events set profile_id = 3 where mod(id, 100) = 1;
Query OK, 4260 rows affected (0.55 sec)
Rows matched: 4260 Changed: 4260 Warnings: 0

mysql> update events set profile_id = 3 where mod(id, 5000) = 1;
Query OK, 0 rows affected (0.39 sec)
Rows matched: 84 Changed: 0 Warnings: 0

mysql> explain select * from events e join profiles p on e.profile_id = p.id where event_date between now() - interval 1 day and now() and status = 'active' and profile_id in (1,2,4,5,6,7,8,9,10,11,12,13,14,15,16) order by event_date desc limit 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: p
   partitions: NULL
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 15
     filtered: 10.00
        Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: e
   partitions: NULL
         type: ref
possible_keys: profile_id,event_date,comb
          key: profile_id
      key_len: 5
          ref: test.p.id
         rows: 1
     filtered: 28.74
        Extra: Using where
2 rows in set, 1 warning (0.00 sec)

mysql> select * from events e join profiles p on e.profile_id = p.id where event_date between now() - interval 1 day and now() and status = 'active' and profile_id in (1,2,4,5,6,7,8,9,10,11,12,13,14,15,16) order by event_date desc limit 10;
+----+---------------------+------------+----+-------+--------+
| id | event_date | profile_id | id | email | status |
+----+---------------------+------------+----+-------+--------+
| 2 | 2016-10-10 17:50:04 | 1 | 1 | NULL | active |
| 3 | 2016-10-10 17:50:03 | 1 | 1 | NULL | active |
| 4 | 2016-10-10 17:50:02 | 1 | 1 | NULL | active |
| 5 | 2016-10-10 17:50:01 | 1 | 1 | NULL | active |
| 6 | 2016-10-10 17:50:00 | 1 | 1 | NULL | active |
| 7 | 2016-10-10 17:49:59 | 1 | 1 | NULL | active |
| 8 | 2016-10-10 17:49:58 | 1 | 1 | NULL | active |
| 9 | 2016-10-10 17:49:57 | 1 | 1 | NULL | active |
| 10 | 2016-10-10 17:49:56 | 1 | 1 | NULL | active |
| 11 | 2016-10-10 17:49:55 | 1 | 1 | NULL | active |
+----+---------------------+------------+----+-------+--------+
10 rows in set (0.84 sec)

mysql> explain select STRAIGHT_JOIN * from events e join profiles p on e.profile_id = p.id where event_date between now() - interval 1 day and now() and status = 'active' and profile_id in (1,2,4,5,6,7,8,9,10,11,12,13,14,15,16) order by event_date desc limit 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: e
   partitions: NULL
         type: index
possible_keys: profile_id,event_date,comb
          key: event_date
      key_len: 6
          ref: NULL
         rows: 394
     filtered: 14.30
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: p
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.e.profile_id
         rows: 1
     filtered: 10.00
        Extra: Using where
2 rows in set, 1 warning (0.00 sec)

mysql> select STRAIGHT_JOIN * from events e join profiles p on e.profile_id = p.id where event_date between now() - interval 1 day and now() and status = 'active' and profile_id in (1,2,4,5,6,7,8,9,10,11,12,13,14,15,16) order by event_date desc limit 10;
+----+---------------------+------------+----+-------+--------+
| id | event_date | profile_id | id | email | status |
+----+---------------------+------------+----+-------+--------+
| 2 | 2016-10-10 17:50:04 | 1 | 1 | NULL | active |
| 3 | 2016-10-10 17:50:03 | 1 | 1 | NULL | active |
| 4 | 2016-10-10 17:50:02 | 1 | 1 | NULL | active |
| 5 | 2016-10-10 17:50:01 | 1 | 1 | NULL | active |
| 6 | 2016-10-10 17:50:00 | 1 | 1 | NULL | active |
| 7 | 2016-10-10 17:49:59 | 1 | 1 | NULL | active |
| 8 | 2016-10-10 17:49:58 | 1 | 1 | NULL | active |
| 9 | 2016-10-10 17:49:57 | 1 | 1 | NULL | active |
| 10 | 2016-10-10 17:49:56 | 1 | 1 | NULL | active |
| 11 | 2016-10-10 17:49:55 | 1 | 1 | NULL | active |
+----+---------------------+------------+----+-------+--------+
10 rows in set (0.00 sec)

Suggested fix:
Apply the ORDER BY + LIMIT optimization to the table joins

Tags: upstream
tags: added: upstream
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-1782

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.