Wrong statistics for InnoDB if persistent statistics used

Bug #1538765 reported by Sveta Smirnova on 2016-01-27
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL Server
Unknown
Unknown
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.6
Triaged
Medium
Unassigned
5.7
Triaged
Medium
Unassigned

Bug Description

When persistent statistics is used for InnoDB it still updates cardinality with values which are incorrect.

Can be related to https://bugs.launchpad.net/percona-server/+bug/1484311 (but exists in 5.7.10) and http://bugs.mysql.com/bug.php?id=75428

How to repeat:

Option file:

--innodb_stats_auto_recalc=0 --innodb_stats_method=nulls_equal --innodb_stats_on_metadata=0 --innodb_stats_persistent=1 --innodb_stats_persistent_sample_pages=1000 --innodb_stats_sample_pages=8 --innodb_stats_transient_sample_pages=8

Test file:

--source include/have_innodb.inc

create table sale
(
id int primary key auto_increment,
customer_id int not null,
product_id int not null,
sale_time datetime not null,
sale_value decimal(10,2) not null,
filler varchar(250) not null,
key(customer_id, sale_time),
key(product_id)
)
engine innodb STATS_AUTO_RECALC=0 STATS_SAMPLE_PAGES=1000;

 insert into sale
(customer_id,product_id,sale_time,sale_value,filler)
values
(1,1,NOW(),100,LPAD('X',250,'X')),
(1,2,NOW(),200,LPAD('X',250,'X')),
(1,3,NOW(),300,LPAD('X',250,'X')),
(2,4,NOW(),100,LPAD('X',250,'X')),
(2,5,NOW(),200,LPAD('X',250,'X')),
(3,6,NOW(),100,LPAD('X',250,'X'));

insert into sale (customer_id, product_id, sale_time, sale_value, filler) select customer_id, product_id, NOW(), sale_value, filler from sale;
insert into sale (customer_id, product_id, sale_time, sale_value, filler) select customer_id, product_id, NOW(), sale_value, filler from sale;
insert into sale (customer_id, product_id, sale_time, sale_value, filler) select customer_id, product_id, NOW(), sale_value, filler from sale;
insert into sale (customer_id, product_id, sale_time, sale_value, filler) select customer_id, product_id, NOW(), sale_value, filler from sale;
insert into sale (customer_id, product_id, sale_time, sale_value, filler) select customer_id, product_id, NOW(), sale_value, filler from sale;
insert into sale (customer_id, product_id, sale_time, sale_value, filler) select customer_id, product_id, NOW(), sale_value, filler from sale;

show indexes from sale;
select count(distinct customer_id), count(distinct sale_time), count(distinct product_id) from sale;
select count(*), count(distinct customer_id), count(distinct customer_id, sale_time), count(distinct product_id) from sale;
analyze table sale;
show indexes from sale;

insert into sale (customer_id, product_id, sale_time, sale_value, filler) select customer_id, product_id, NOW(), sale_value, filler from sale where id < 441/5;
show indexes from sale;
select count(distinct customer_id), count(distinct sale_time), count(distinct product_id) from sale;
select count(*), count(distinct customer_id), count(distinct customer_id, sale_time), count(distinct product_id) from sale;
insert into sale (customer_id, product_id, sale_time, sale_value, filler) select customer_id, product_id, NOW(), sale_value, filler from sale where id < 581/2;
show indexes from sale;
select count(distinct customer_id), count(distinct sale_time), count(distinct product_id) from sale;
select count(*), count(distinct customer_id), count(distinct customer_id, sale_time), count(distinct product_id) from sale;

Actual result:

show indexes from sale;
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
sale 0 PRIMARY 1 id A 128 NULL NULL BTREE
sale 1 customer_id 1 customer_id A 128 NULL NULL BTREE
sale 1 customer_id 2 sale_time A 128 NULL NULL BTREE
sale 1 product_id 1 product_id A 128 NULL NULL BTREE
select count(distinct customer_id), count(distinct sale_time), count(distinct product_id) from sale;
count(distinct customer_id) count(distinct sale_time) count(distinct product_id)
3 2 6
select count(*), count(distinct customer_id), count(distinct customer_id, sale_time), count(distinct product_id) from sale;
count(*) count(distinct customer_id) count(distinct customer_id, sale_time) count(distinct product_id)
384 3 6 6
analyze table sale;
Table Op Msg_type Msg_text
test.sale analyze status OK
show indexes from sale;
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
sale 0 PRIMARY 1 id A 384 NULL NULL BTREE
sale 1 customer_id 1 customer_id A 6 NULL NULL BTREE
sale 1 customer_id 2 sale_time A 12 NULL NULL BTREE
sale 1 product_id 1 product_id A 12 NULL NULL BTREE
insert into sale (customer_id, product_id, sale_time, sale_value, filler) select customer_id, product_id, NOW(), sale_value, filler from sale where id < 441/5;
show indexes from sale;
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
sale 0 PRIMARY 1 id A 461 NULL NULL BTREE
sale 1 customer_id 1 customer_id A 7 NULL NULL BTREE
sale 1 customer_id 2 sale_time A 14 NULL NULL BTREE
sale 1 product_id 1 product_id A 14 NULL NULL BTREE
select count(distinct customer_id), count(distinct sale_time), count(distinct product_id) from sale;
count(distinct customer_id) count(distinct sale_time) count(distinct product_id)
3 3 6
select count(*), count(distinct customer_id), count(distinct customer_id, sale_time), count(distinct product_id) from sale;
count(*) count(distinct customer_id) count(distinct customer_id, sale_time) count(distinct product_id)
461 3 9 6
insert into sale (customer_id, product_id, sale_time, sale_value, filler) select customer_id, product_id, NOW(), sale_value, filler from sale where id < 581/2;
show indexes from sale;
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
sale 0 PRIMARY 1 id A 694 NULL NULL BTREE
sale 1 customer_id 1 customer_id A 10 NULL NULL BTREE
sale 1 customer_id 2 sale_time A 21 NULL NULL BTREE
sale 1 product_id 1 product_id A 21 NULL NULL BTREE
select count(distinct customer_id), count(distinct sale_time), count(distinct product_id) from sale;
count(distinct customer_id) count(distinct sale_time) count(distinct product_id)
3 4 6
select count(*), count(distinct customer_id), count(distinct customer_id, sale_time), count(distinct product_id) from sale;
count(*) count(distinct customer_id) count(distinct customer_id, sale_time) count(distinct product_id)
694 3 12 6

Expected result:

all calls of show indexes from sale; (except first one) should return

show indexes from sale;
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
sale 0 PRIMARY 1 id A 384 NULL NULL BTREE
sale 1 customer_id 1 customer_id A 6 NULL NULL BTREE
sale 1 customer_id 2 sale_time A 12 NULL NULL BTREE
sale 1 product_id 1 product_id A 12 NULL NULL BTREE

tags: added: upstream
Sveta Smirnova (svetasmirnova) wrote :
Sveta Smirnova (svetasmirnova) wrote :

FLUSH TABLE fixes SHOW INDEX output.

Sveta Smirnova (svetasmirnova) wrote :

Analysis.

InnoDB stores persistent statistics and does not update it.

Optimizer does not use InnoDB statistics directly, but via class ha_statistics (sql/handler.h) which does not have any member to represent Cardinality. Instead it has member rec_per_key: number of repeated values in index in time when ANALYZE was running last time.

When InnoDB opens table first time it reads its persistent statistics using function ha_innodbase::info with flag HA_STATUS_CONST and stores it in memory. All subsequent table accesses call ha_innodbase::info with different flag: HA_STATUS_VARIABLE. Therefore function execution goes to another branch. In this branch in-memory data is used and also few stats are updated, particularly number of rows in the table (which is pretty accurate).

Optimizer, in its turn, uses number of rows in table (constantly changing) and divides it on rec_per_key value (persistent). Thus we see different cardinality after few updates.

George Ormond Lorch III (gl-az) wrote :

Correct Sveta, this is exactly how InnoDB (and TokuDB) stats work. Generally the rec_per_key is 'static' in that it only gets counted and stored during an ANALYZE TABLE (or auto analyze). The optimizer/SQL layer then uses this static rec_per_key data set that it gets from the ::info call with a 'dynamic' or constantly changing row count estimate, thus giving the appearance that the resulting cardinality calculation is changing when it is the row count that is changing. This is why inaccurate row counts (a nagging TokuDB issue) can cause cardinality calculation errors and thus sub-optimal index choices.

tags: added: sfr-108

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-1693

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.