Optimizer chooses wrong plan when joining 2 tables
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
MySQL Server |
Unknown
|
Unknown
|
||||
Percona Server moved to https://jira.percona.com/projects/PS | Status tracked in 5.7 | |||||
5.5 |
New
|
Undecided
|
Unassigned | |||
5.6 |
Triaged
|
Medium
|
Unassigned | |||
5.7 |
Triaged
|
Medium
|
Unassigned |
Bug Description
Upstream bug: https:/
Affects Percona Server 5.6 and 5.7 as well
Optimizer chooses wrong plan when joining 2 tables:
mysql> explain select * from events_new_distr e join profiles p on e.profile_id = p.id where event_date between '2016-01-01' and '2016-01-01' + interval 15 day and status = 'active' and profile_id in (1,2,4,
*******
id: 1
select_type: SIMPLE
table: p
partitions: NULL
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 15
filtered: 10.00
Extra: Using where; Using temporary; Using filesort
*******
id: 1
select_type: SIMPLE
table: e
partitions: NULL
type: ref
possible_keys: profile_
key: comb
key_len: 5
ref: optimizer_
rows: 75696
filtered: 3.28
Extra: Using where; Using index
2 rows in set, 1 warning (0.00 sec)
Response time: 40 seconds.
The problem here is the optimizer does not choose the right table order and decided to use temp table.
The right way is this:
mysql> explain select STRAIGHT_JOIN * from events_new_distr e join profiles p on e.profile_id = p.id where event_date between '2016-01-01' and '2016-01-01' + interval 15 day and status = 'active' and profile_id in (1,2,4,
*******
id: 1
select_type: SIMPLE
table: e
partitions: NULL
type: range
possible_keys: profile_
key: event_date
key_len: 6
ref: NULL
rows: 1703196
filtered: 50.00
Extra: Using index condition; Using where
*******
id: 1
select_type: SIMPLE
table: p
partitions: NULL
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: optimizer_
rows: 1
filtered: 10.00
Extra: Using where
2 rows in set, 1 warning (0.01 sec)
With ORDER BY + LIMIT 10 MySQL can:
1. Avoid creating temp table and scan the table in the order of the index
2. Abort the query execution when it will find 10 rows.
(That is ORDER BY + LIMIT optimization, http://
However, 75696*15 = 1135440 rows < 1703196 rows.
What is also interesting is that the second plan is not even considered in optimizer trace, attached.
How to repeat:
Smaller test case:
CREATE TABLE `profiles` (
`id` int(11) NOT NULL DEFAULT '0' AUTO_INCREMENT,
`email` varchar(255) DEFAULT NULL,
`status` varchar(100) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `email` (`email`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
mysql> insert into profiles(status) values ('active');
Query OK, 1 row affected (0.00 sec)
...
mysql> insert into profiles (status) select status from profiles;
Query OK, 16384 rows affected (0.20 sec)
Records: 16384 Duplicates: 0 Warnings: 0
CREATE TABLE `events` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`event_date` datetime DEFAULT NULL,
`profile_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `profile_id` (`profile_id`),
KEY `event_date` (`event_date`),
KEY `comb` (`profile_
) ENGINE=InnoDB DEFAULT CHARSET=latin1
mysql> insert into events (event_date, profile_id) select now() - interval id second, id from profiles;
Query OK, 32768 rows affected (0.93 sec)
Records: 32768 Duplicates: 0 Warnings: 0
...
up to 425984 rows
mysql> update events set event_date = now() - interval id second;
Query OK, 425984 rows affected (15.88 sec)
Rows matched: 425984 Changed: 425984 Warnings: 0
mysql> update events set profile_id = 1;
Query OK, 425971 rows affected (15.41 sec)
Rows matched: 425984 Changed: 425971 Warnings: 0
mysql> update events set profile_id = 2 where mod(id, 1000) = 1;
Query OK, 426 rows affected (0.41 sec)
Rows matched: 426 Changed: 426 Warnings: 0
mysql> update events set profile_id = 3 where mod(id, 100) = 1;
Query OK, 4260 rows affected (0.55 sec)
Rows matched: 4260 Changed: 4260 Warnings: 0
mysql> update events set profile_id = 3 where mod(id, 5000) = 1;
Query OK, 0 rows affected (0.39 sec)
Rows matched: 84 Changed: 0 Warnings: 0
mysql> explain select * from events e join profiles p on e.profile_id = p.id where event_date between now() - interval 1 day and now() and status = 'active' and profile_id in (1,2,4,
*******
id: 1
select_type: SIMPLE
table: p
partitions: NULL
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 15
filtered: 10.00
Extra: Using where; Using temporary; Using filesort
*******
id: 1
select_type: SIMPLE
table: e
partitions: NULL
type: ref
possible_keys: profile_
key: profile_id
key_len: 5
ref: test.p.id
rows: 1
filtered: 28.74
Extra: Using where
2 rows in set, 1 warning (0.00 sec)
mysql> select * from events e join profiles p on e.profile_id = p.id where event_date between now() - interval 1 day and now() and status = 'active' and profile_id in (1,2,4,
+----+-
| id | event_date | profile_id | id | email | status |
+----+-
| 2 | 2016-10-10 17:50:04 | 1 | 1 | NULL | active |
| 3 | 2016-10-10 17:50:03 | 1 | 1 | NULL | active |
| 4 | 2016-10-10 17:50:02 | 1 | 1 | NULL | active |
| 5 | 2016-10-10 17:50:01 | 1 | 1 | NULL | active |
| 6 | 2016-10-10 17:50:00 | 1 | 1 | NULL | active |
| 7 | 2016-10-10 17:49:59 | 1 | 1 | NULL | active |
| 8 | 2016-10-10 17:49:58 | 1 | 1 | NULL | active |
| 9 | 2016-10-10 17:49:57 | 1 | 1 | NULL | active |
| 10 | 2016-10-10 17:49:56 | 1 | 1 | NULL | active |
| 11 | 2016-10-10 17:49:55 | 1 | 1 | NULL | active |
+----+-
10 rows in set (0.84 sec)
mysql> explain select STRAIGHT_JOIN * from events e join profiles p on e.profile_id = p.id where event_date between now() - interval 1 day and now() and status = 'active' and profile_id in (1,2,4,
*******
id: 1
select_type: SIMPLE
table: e
partitions: NULL
type: index
possible_keys: profile_
key: event_date
key_len: 6
ref: NULL
rows: 394
filtered: 14.30
Extra: Using where
*******
id: 1
select_type: SIMPLE
table: p
partitions: NULL
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: test.e.profile_id
rows: 1
filtered: 10.00
Extra: Using where
2 rows in set, 1 warning (0.00 sec)
mysql> select STRAIGHT_JOIN * from events e join profiles p on e.profile_id = p.id where event_date between now() - interval 1 day and now() and status = 'active' and profile_id in (1,2,4,
+----+-
| id | event_date | profile_id | id | email | status |
+----+-
| 2 | 2016-10-10 17:50:04 | 1 | 1 | NULL | active |
| 3 | 2016-10-10 17:50:03 | 1 | 1 | NULL | active |
| 4 | 2016-10-10 17:50:02 | 1 | 1 | NULL | active |
| 5 | 2016-10-10 17:50:01 | 1 | 1 | NULL | active |
| 6 | 2016-10-10 17:50:00 | 1 | 1 | NULL | active |
| 7 | 2016-10-10 17:49:59 | 1 | 1 | NULL | active |
| 8 | 2016-10-10 17:49:58 | 1 | 1 | NULL | active |
| 9 | 2016-10-10 17:49:57 | 1 | 1 | NULL | active |
| 10 | 2016-10-10 17:49:56 | 1 | 1 | NULL | active |
| 11 | 2016-10-10 17:49:55 | 1 | 1 | NULL | active |
+----+-
10 rows in set (0.00 sec)
Suggested fix:
Apply the ORDER BY + LIMIT optimization to the table joins
tags: | added: upstream |
Percona now uses JIRA for bug reports so this bug report is migrated to: https:/ /jira.percona. com/browse/ PS-1782