innodb index stats inadequate using constant innodb_stats_sample_pages

Bug #1394483 reported by Jan Lindström
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL Server
Unknown
Unknown
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.5
Triaged
Medium
Unassigned
5.6
Triaged
Medium
Unassigned
5.7
Triaged
Medium
Unassigned

Bug Description

If you set the number of analyzed pages to very low number compared to actual pages on that table/index it randomly pics those pages (default 8 pages), this leads to fact that query after analyze table returns different results. If the index tree is small, smaller than 10 * n_sample_pages + total_external_size, then the estimate is ok. For bigger index trees it is common that we do not see any borders between key values in the few pages we pick. But still there may be n_sample_pages different key values, or even more. And it just tries to approximate to n_sample_pages (8).

How to repeat:

{noformat}
drop table if exists obs_daily;
CREATE TABLE `obs_daily` ( `iId` int(11) NOT NULL AUTO_INCREMENT, `sLocType` varchar(10) NOT NULL DEFAULT '', `sLocCode` varchar(30) NOT NULL DEFAULT '', `dtDate` date NOT NULL DEFAULT '0000-00-00', PRIMARY KEY (`iId`), UNIQUE KEY `idx_obs_daily_1` (`sLocType`,`sLocCode`,`dtDate`), KEY `idx_obs_daily_2` (`dtDate`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;
insert into obs_daily values (null, 'A', 'c1', '2014-11-11'),(null, 'A', 'c2', '2014-11-11'),(null, 'A', 'c3', '2014-11-11'),(null, 'A', 'c4', '2014-11-11'),(null, 'A', 'c5', '2014-11-11'),(null, 'A', 'c6', '2014-11-11'),(null, 'A', 'c7', '2014-11-11'),(null, 'A', 'c8', '2014-11-11');
insert into obs_daily select null, 'A', sLocCode, '2014-08-12' from obs_daily;
insert into obs_daily select null, 'A', concat(sLocCode,iId), '2014-08-13' from obs_daily;
insert into obs_daily select null, 'A', concat(sLocCode,iId), '2014-08-12' from obs_daily;
insert into obs_daily select null, 'A', concat(sLocCode,iId), '2014-08-11' from obs_daily;
insert into obs_daily select null, 'A', concat(sLocCode,iId), '2014-08-10' from obs_daily;
insert into obs_daily select null, 'A', concat(sLocCode,iId), '2014-08-16' from obs_daily;
insert into obs_daily select null, 'A', concat(sLocCode,iId), '2014-08-19' from obs_daily;
insert into obs_daily select null, 'A', concat(sLocCode,iId), '2014-08-22' from obs_daily;
insert into obs_daily select null, 'A', concat(sLocCode,iId), '2014-08-21' from obs_daily;
insert into obs_daily select null, 'A', concat(sLocCode,iId), '2014-08-31' from obs_daily;
insert into obs_daily select null, 'A', concat(sLocCode,iId), '2014-08-30' from obs_daily;
insert into obs_daily select null, 'A', concat('code-1-',iId), '2014-08-31' from obs_daily;
insert into obs_daily select null, 'A', concat('code-2-',iId), '2014-08-31' from obs_daily;
insert into obs_daily select null, 'A', concat('code-2-',iId), '2014-08-29' from obs_daily;
insert into obs_daily select null, 'A', concat('code-3-',iId), '2014-08-09' from obs_daily;
insert into obs_daily select null, 'A', concat('code-4-',iId), '2014-08-05' from obs_daily;
insert into obs_daily select null, 'A', concat('code-5-',iId), '2014-08-03' from obs_daily;
select * from information_schema.INNODB_INDEX_STATS where table_name='obs_daily';
set global innodb_stats_sample_pages = 8;
analyze table obs_daily;
select * from information_schema.INNODB_INDEX_STATS where table_name='obs_daily';
analyze table obs_daily;
select * from information_schema.INNODB_INDEX_STATS where table_name='obs_daily';
analyze table obs_daily;
select * from information_schema.INNODB_INDEX_STATS where table_name='obs_daily';
{noformat}

Output can be following:

{noformat}
MariaDB [test]> select count(*) from obs_daily;
+----------+
| count(*) |
+----------+
| 1048576 |
+----------+
1 row in set (2.22 sec)

MariaDB [test]> select count(distinct sLocType) from obs_daily;
+--------------------------+
| count(distinct sLocType) |
+--------------------------+
| 1 |
+--------------------------+
1 row in set (0.01 sec)

MariaDB [test]> select count(distinct sLocCode) from obs_daily;
+--------------------------+
| count(distinct sLocCode) |
+--------------------------+
| 1007624 |
+--------------------------+
1 row in set (5.65 sec)

MariaDB [test]> select count(distinct dtDate) from obs_daily;
+------------------------+
| count(distinct dtDate) |
+------------------------+
| 15 |
+------------------------+
1 row in set (0.00 sec)

MariaDB [test]> set global innodb_stats_sample_pages = 8;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> analyze table obs_daily;
+----------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+----------------+---------+----------+----------+
| test.obs_daily | analyze | status | OK |
+----------------+---------+----------+----------+
1 row in set (0.02 sec)

MariaDB [test]> select * from information_schema.INNODB_INDEX_STATS where table_name='obs_daily';
+--------------+------------+-----------------+--------+--------------+-------------------+------------------+
| table_schema | table_name | index_name | fields | rows_per_key | index_total_pages | index_leaf_pages |
+--------------+------------+-----------------+--------+--------------+-------------------+------------------+
| test | obs_daily | PRIMARY | 1 | 1 | 2852 | 2844 |
| test | obs_daily | idx_obs_daily_1 | 3 | 115183, 0, 0 | 3628 | 3138 |
| test | obs_daily | idx_obs_daily_2 | 2 | 115183, 0 | 865 | 814 |
+--------------+------------+-----------------+--------+--------------+-------------------+------------------+
3 rows in set (0.00 sec)

MariaDB [test]> analyze table obs_daily;
+----------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+----------------+---------+----------+----------+
| test.obs_daily | analyze | status | OK |
+----------------+---------+----------+----------+
1 row in set (0.02 sec)

MariaDB [test]> select * from information_schema.INNODB_INDEX_STATS where table_name='obs_daily';
+--------------+------------+-----------------+--------+--------------+-------------------+------------------+
| table_schema | table_name | index_name | fields | rows_per_key | index_total_pages | index_leaf_pages |
+--------------+------------+-----------------+--------+--------------+-------------------+------------------+
| test | obs_daily | PRIMARY | 1 | 1 | 2852 | 2844 |
| test | obs_daily | idx_obs_daily_1 | 3 | 116842, 1, 0 | 3628 | 3138 |
| test | obs_daily | idx_obs_daily_2 | 2 | 116842, 0 | 865 | 814 |
+--------------+------------+-----------------+--------+--------------+-------------------+------------------+
3 rows in set (0.00 sec)

MariaDB [test]> analyze table obs_daily;
+----------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+----------------+---------+----------+----------+
| test.obs_daily | analyze | status | OK |
+----------------+---------+----------+----------+
1 row in set (0.02 sec)

MariaDB [test]> select * from information_schema.INNODB_INDEX_STATS where table_name='obs_daily';
+--------------+------------+-----------------+--------+--------------+-------------------+------------------+
| table_schema | table_name | index_name | fields | rows_per_key | index_total_pages | index_leaf_pages |
+--------------+------------+-----------------+--------+--------------+-------------------+------------------+
| test | obs_daily | PRIMARY | 1 | 1 | 2852 | 2844 |
| test | obs_daily | idx_obs_daily_1 | 3 | 116407, 1, 1 | 3628 | 3138 |
| test | obs_daily | idx_obs_daily_2 | 2 | 116407, 0 | 865 | 814 |
+--------------+------------+-----------------+--------+--------------+-------------------+------------------+
3 rows in set (0.00 sec)

{noformat}

See https://mariadb.atlassian.net/browse/MDEV-7084 for more discussion.

Revision history for this message
Jan Lindström (jplindst) wrote :
Revision history for this message
Jan Lindström (jplindst) wrote :
Revision history for this message
Jan Lindström (jplindst) wrote :
tags: added: userstat
Revision history for this message
Valerii Kravchuk (valerii-kravchuk) wrote :

So, this is essentially what's suggested in the patch (with new variables to control behavior and tests to check all these):

...
+ /* New logaritmic number of pages that are estimated. We
+ first pick minimun from srv_stats_sample_pages and number of
+ pages on index. Then we pick maximum from previous number of
+ pages and log2(number of index pages) * srv_stats_sample_pages. */
   if (index->stat_index_size > 0) {
- n_sample_pages = index->stat_index_size;
+ n_sample_pages = ut_max(ut_min(srv_stats_sample_pages, index->stat_index_size),
+ log2(index->stat_index_size)*srv_stats_sample_pages);
   } else {
    n_sample_pages = 1;
...

This may surely work better than default setting or even any one picked by DBA based on some reasoning. So, I think we should consider this feature seriously (for 5.5 and for tables that do not use persistent statistics in 5.6).

tags: added: contribution upstream
Revision history for this message
Jan Lindström (jplindst) wrote :
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-1585

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.