tokudb does not use index even if cardinality is good
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Percona Server moved to https://jira.percona.com/projects/PS | Status tracked in 5.7 | |||||
5.6 |
Fix Released
|
High
|
George Ormond Lorch III | |||
5.7 |
Fix Released
|
High
|
George Ormond Lorch III |
Bug Description
Please consider the following partitioned table:
CREATE TABLE `BIG_STORAGE_TOKU` (
`HASH_ID` char(64) NOT NULL,
`SERIALIZATIO
`COMPRESSION_
`RAW_DATA` mediumblob NOT NULL,
`LAST_UPDATE` datetime NOT NULL,
`EXPIRE_DATE` date NOT NULL,
KEY `HASH_ID_IX` (`HASH_ID`),
KEY `EXPIRE_DATE_IX` (`EXPIRE_DATE`)
) ENGINE=TokuDB DEFAULT CHARSET=latin1 ROW_FORMAT=
/*!50100 PARTITION BY RANGE (TO_DAYS(
(PARTITION p364 VALUES LESS THAN (736753) ENGINE = TokuDB,
PARTITION p365 VALUES LESS THAN (736754) ENGINE = TokuDB,
PARTITION p366 VALUES LESS THAN (736755) ENGINE = TokuDB,
PARTITION p367 VALUES LESS THAN (736756) ENGINE = TokuDB,
PARTITION p368 VALUES LESS THAN (736757) ENGINE = TokuDB,
PARTITION p369 VALUES LESS THAN (736758) ENGINE = TokuDB,
PARTITION p370 VALUES LESS THAN (736759) ENGINE = TokuDB,
PARTITION p371 VALUES LESS THAN (736760) ENGINE = TokuDB,
PARTITION p372 VALUES LESS THAN (736761) ENGINE = TokuDB,
PARTITION p373 VALUES LESS THAN (736762) ENGINE = TokuDB,
PARTITION p374 VALUES LESS THAN (736763) ENGINE = TokuDB,
PARTITION p375 VALUES LESS THAN (736764) ENGINE = TokuDB,
PARTITION p376 VALUES LESS THAN (736765) ENGINE = TokuDB,
PARTITION p377 VALUES LESS THAN (736766) ENGINE = TokuDB,
PARTITION p378 VALUES LESS THAN (736767) ENGINE = TokuDB,
PARTITION p379 VALUES LESS THAN (736768) ENGINE = TokuDB,
PARTITION p380 VALUES LESS THAN (736769) ENGINE = TokuDB,
PARTITION p381 VALUES LESS THAN (736770) ENGINE = TokuDB,
PARTITION p382 VALUES LESS THAN (736771) ENGINE = TokuDB,
PARTITION p383 VALUES LESS THAN (736772) ENGINE = TokuDB,
PARTITION p384 VALUES LESS THAN (736773) ENGINE = TokuDB,
PARTITION p385 VALUES LESS THAN (736774) ENGINE = TokuDB,
PARTITION p386 VALUES LESS THAN (736775) ENGINE = TokuDB,
PARTITION p387 VALUES LESS THAN (736776) ENGINE = TokuDB,
PARTITION p388 VALUES LESS THAN (736777) ENGINE = TokuDB,
PARTITION p389 VALUES LESS THAN (736778) ENGINE = TokuDB,
PARTITION p390 VALUES LESS THAN (736779) ENGINE = TokuDB,
PARTITION p391 VALUES LESS THAN (736780) ENGINE = TokuDB,
PARTITION p392 VALUES LESS THAN (736781) ENGINE = TokuDB,
PARTITION p393 VALUES LESS THAN (736782) ENGINE = TokuDB,
PARTITION p394 VALUES LESS THAN (736783) ENGINE = TokuDB,
PARTITION p395 VALUES LESS THAN (736784) ENGINE = TokuDB) */
Load the table with sysbench using following LUA code (let it run until row count is about 100,000)
function event(thread_id)
local s1 = sb_rand_
db_query("DELETE FROM BIG_STORAGE_INNO WHERE HASH_ID = SHA2('" .. s1 .. "', 256) AND EXPIRE_DATE > NOW()");
db_query("INSERT INTO BIG_STORAGE_INNO VALUES(SHA2('" .. s1 .. "', 256), 'PROTOSTUFF', 'GZIP', REPEAT(
end
This will create a distribution of roughly 100,000 random records, having a random payload between 0 and 16K, and will scatter them throughout 10 partitions.
Run analyze table on it, then try the following explains:
explain DELETE from BIG_STORAGE_TOKU where HASH_ID = SHA2('70164', 256);
explain SELECT * from BIG_STORAGE_TOKU where HASH_ID = SHA2('70164', 256);
You will see that TokuDB refuses to use any index.
InnoDB works fine in the same exact setup and the index is always chosen. See below.
We are experiencing this in production with much larger tables.
dbcache05>analyze table BIG_STORAGE_TOKU;
+------
| Table | Op | Msg_type | Msg_text |
+------
| sbtest.
+------
1 row in set (0.42 sec)
dbcache05>show indexes from BIG_STORAGE_TOKU;
+------
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------
| BIG_STORAGE_TOKU | 1 | HASH_ID_IX | 1 | HASH_ID | A | 100024 | NULL | NULL | | BTREE | | |
| BIG_STORAGE_TOKU | 1 | EXPIRE_DATE_IX | 1 | EXPIRE_DATE | A | 19 | NULL | NULL | | BTREE | | |
+------
2 rows in set (0.02 sec)
dbcache05>explain select * from BIG_STORAGE_TOKU where HASH_ID = SHA2('70164', 256);
+----+-
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-
| 1 | SIMPLE | BIG_STORAGE_TOKU | ALL | HASH_ID_IX | NULL | NULL | NULL | 100024 | Using where |
+----+-
1 row in set (0.00 sec)
dbcache05>explain delete from BIG_STORAGE_TOKU where HASH_ID = SHA2('70164', 256);
+----+-
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-
| 1 | SIMPLE | BIG_STORAGE_TOKU | ALL | HASH_ID_IX | NULL | NULL | NULL | 100024 | Using where |
+----+-
1 row in set (0.00 sec)
This is result with identical steps but using InnoDB:
dbcache05>analyze table BIG_STORAGE_INNO;
+------
| Table | Op | Msg_type | Msg_text |
+------
| sbtest.
+------
1 row in set (0.10 sec)
dbcache05>show indexes from BIG_STORAGE_INNO;
+------
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------
| BIG_STORAGE_INNO | 1 | HASH_ID_IX | 1 | HASH_ID | A | 64665 | NULL | NULL | | BTREE | | |
| BIG_STORAGE_INNO | 1 | EXPIRE_DATE_IX | 1 | EXPIRE_DATE | A | 14 | NULL | NULL | | BTREE | | |
+------
2 rows in set (0.01 sec)
dbcache05>explain select * from BIG_STORAGE_INNO where HASH_ID = SHA2('70164', 256);
+----+-
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-
| 1 | SIMPLE | BIG_STORAGE_INNO | ref | HASH_ID_IX | HASH_ID_IX | 64 | const | 7 | Using where |
+----+-
1 row in set (0.00 sec)
dbcache05>explain delete from BIG_STORAGE_INNO where HASH_ID = SHA2('70164', 256);
+----+-
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-
| 1 | SIMPLE | BIG_STORAGE_INNO | range | HASH_ID_IX | HASH_ID_IX | 64 | const | 7 | Using where |
+----+-
1 row in set (0.00 sec)
Here's the optimizer trace that indicates that MySQL thinks that the cost for the index scan is higher than the cost for a full table scan. This is of course wrong.
{ join_preparatio n": {
"expanded_ query": "/* select#1 */ select `BIG_STORAGE_ TOKU`.` EXPIRE_ DATE` AS `EXPIRE_DATE` from `BIG_STORAGE_TOKU` where (`BIG_STORAGE_ TOKU`.` HASH_ID` = sha2('70164',256))" join_optimizati on": {
"condition _processing" : {
"condition" : "WHERE",
"original_ condition" : "(`BIG_ STORAGE_ TOKU`.` HASH_ID` = sha2('70164' ,256))" ,
"steps" : [
"transforma tion": "equality_ propagation" ,
"resulting_ condition" : "multiple equal(sha2( '70164' ,256), `BIG_STORAGE_ TOKU`.` HASH_ID` )"
"transforma tion": "constant_ propagation" ,
"resulting_ condition" : "multiple equal(sha2( '70164' ,256), `BIG_STORAGE_ TOKU`.` HASH_ID` )"
"transforma tion": "trivial_ condition_ removal" ,
"resulting_ condition" : "multiple equal(sha2( '70164' ,256), `BIG_STORAGE_ TOKU`.` HASH_ID` )"
"table_ dependencies" : [
"table" : "`BIG_STORAGE_ TOKU`",
"row_ may_be_ null": false,
"map_ bit": 0,
"depends_ on_map_ bits": [
"ref_ optimizer_ key_uses" : [
"table" : "`BIG_STORAGE_ TOKU`",
"field" : "HASH_ID",
"equals" : "sha2(' 70164', 256)",
"null_ rejecting" : false
"rows_ estimation" : [
"table" : "`BIG_STORAGE_ TOKU`",
"range_ analysis" : {
"table_ scan": {
" rows": 100030,
" cost": 53351
},
"potential_ range_indices" : [
{
"index" : "HASH_ID_IX",
"usable" : true,
"key_parts" : [
"HASH_ ID"
]
} ,
{
"index" : "EXPIRE_DATE_IX",
"usable" : false,
"cause" : "not_applicable"
}
],
"setup_ range_condition s": [
],
"group_ index_range" : {
" chosen" : false,
" cause": "not_group_ by_or_distinct"
},
"analyzing_ range_alternati ves": {
" range_scan_ alternatives" : [
{
"index" : "HASH_ID_IX",
"ranges" : [
"ffff925a4 4d...
"steps": [
{
"
"select#": 1,
"steps": [
{
}
]
}
},
{
"
"select#": 1,
"steps": [
{
{
},
{
},
{
}
]
}
},
{
{
]
}
]
},
{
{
}
]
},
{
{