range_query_buff and blob_buff in ha_tokudb handler could easily cause OOM
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Percona Server moved to https://jira.percona.com/projects/PS | Status tracked in 5.7 | |||||
5.5 |
Invalid
|
Undecided
|
Unassigned | |||
5.6 |
Confirmed
|
Undecided
|
Unassigned | |||
5.7 |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
I test this with PS 5.6.36-82.0:
## test case as follow:
###
### prepare stage, create a TokuDB partition table with 100 paritions
###
#### change the connection string as needed
MYSQL="mysql -h127.0.0.1 -P70011 -uroot"
$MYSQL <<EOF
CREATE TABLE test.t1(
id int,
uid int,
post longtext,
index idx_uid (uid)
) ENGINE=TokuDB
PARTITION BY RANGE(uid) (
PARTITION p0 VALUES LESS THAN (100),
PARTITION p1 VALUES LESS THAN (200),
PARTITION p2 VALUES LESS THAN (300),
PARTITION p3 VALUES LESS THAN (400),
PARTITION p4 VALUES LESS THAN (500),
PARTITION p5 VALUES LESS THAN (600),
PARTITION p6 VALUES LESS THAN (700),
PARTITION p7 VALUES LESS THAN (800),
PARTITION p8 VALUES LESS THAN (900),
PARTITION p9 VALUES LESS THAN (1000),
PARTITION p10 VALUES LESS THAN (1100),
PARTITION p11 VALUES LESS THAN (1200),
PARTITION p12 VALUES LESS THAN (1300),
PARTITION p13 VALUES LESS THAN (1400),
PARTITION p14 VALUES LESS THAN (1500),
PARTITION p15 VALUES LESS THAN (1600),
PARTITION p16 VALUES LESS THAN (1700),
PARTITION p17 VALUES LESS THAN (1800),
PARTITION p18 VALUES LESS THAN (1900),
PARTITION p19 VALUES LESS THAN (2000),
PARTITION p20 VALUES LESS THAN (2100),
PARTITION p21 VALUES LESS THAN (2200),
PARTITION p22 VALUES LESS THAN (2300),
PARTITION p23 VALUES LESS THAN (2400),
PARTITION p24 VALUES LESS THAN (2500),
PARTITION p25 VALUES LESS THAN (2600),
PARTITION p26 VALUES LESS THAN (2700),
PARTITION p27 VALUES LESS THAN (2800),
PARTITION p28 VALUES LESS THAN (2900),
PARTITION p29 VALUES LESS THAN (3000),
PARTITION p30 VALUES LESS THAN (3100),
PARTITION p31 VALUES LESS THAN (3200),
PARTITION p32 VALUES LESS THAN (3300),
PARTITION p33 VALUES LESS THAN (3400),
PARTITION p34 VALUES LESS THAN (3500),
PARTITION p35 VALUES LESS THAN (3600),
PARTITION p36 VALUES LESS THAN (3700),
PARTITION p37 VALUES LESS THAN (3800),
PARTITION p38 VALUES LESS THAN (3900),
PARTITION p39 VALUES LESS THAN (4000),
PARTITION p40 VALUES LESS THAN (4100),
PARTITION p41 VALUES LESS THAN (4200),
PARTITION p42 VALUES LESS THAN (4300),
PARTITION p43 VALUES LESS THAN (4400),
PARTITION p44 VALUES LESS THAN (4500),
PARTITION p45 VALUES LESS THAN (4600),
PARTITION p46 VALUES LESS THAN (4700),
PARTITION p47 VALUES LESS THAN (4800),
PARTITION p48 VALUES LESS THAN (4900),
PARTITION p49 VALUES LESS THAN (5000),
PARTITION p50 VALUES LESS THAN (5100),
PARTITION p51 VALUES LESS THAN (5200),
PARTITION p52 VALUES LESS THAN (5300),
PARTITION p53 VALUES LESS THAN (5400),
PARTITION p54 VALUES LESS THAN (5500),
PARTITION p55 VALUES LESS THAN (5600),
PARTITION p56 VALUES LESS THAN (5700),
PARTITION p57 VALUES LESS THAN (5800),
PARTITION p58 VALUES LESS THAN (5900),
PARTITION p59 VALUES LESS THAN (6000),
PARTITION p60 VALUES LESS THAN (6100),
PARTITION p61 VALUES LESS THAN (6200),
PARTITION p62 VALUES LESS THAN (6300),
PARTITION p63 VALUES LESS THAN (6400),
PARTITION p64 VALUES LESS THAN (6500),
PARTITION p65 VALUES LESS THAN (6600),
PARTITION p66 VALUES LESS THAN (6700),
PARTITION p67 VALUES LESS THAN (6800),
PARTITION p68 VALUES LESS THAN (6900),
PARTITION p69 VALUES LESS THAN (7000),
PARTITION p70 VALUES LESS THAN (7100),
PARTITION p71 VALUES LESS THAN (7200),
PARTITION p72 VALUES LESS THAN (7300),
PARTITION p73 VALUES LESS THAN (7400),
PARTITION p74 VALUES LESS THAN (7500),
PARTITION p75 VALUES LESS THAN (7600),
PARTITION p76 VALUES LESS THAN (7700),
PARTITION p77 VALUES LESS THAN (7800),
PARTITION p78 VALUES LESS THAN (7900),
PARTITION p79 VALUES LESS THAN (8000),
PARTITION p80 VALUES LESS THAN (8100),
PARTITION p81 VALUES LESS THAN (8200),
PARTITION p82 VALUES LESS THAN (8300),
PARTITION p83 VALUES LESS THAN (8400),
PARTITION p84 VALUES LESS THAN (8500),
PARTITION p85 VALUES LESS THAN (8600),
PARTITION p86 VALUES LESS THAN (8700),
PARTITION p87 VALUES LESS THAN (8800),
PARTITION p88 VALUES LESS THAN (8900),
PARTITION p89 VALUES LESS THAN (9000),
PARTITION p90 VALUES LESS THAN (9100),
PARTITION p91 VALUES LESS THAN (9200),
PARTITION p92 VALUES LESS THAN (9300),
PARTITION p93 VALUES LESS THAN (9400),
PARTITION p94 VALUES LESS THAN (9500),
PARTITION p95 VALUES LESS THAN (9600),
PARTITION p96 VALUES LESS THAN (9700),
PARTITION p97 VALUES LESS THAN (9800),
PARTITION p98 VALUES LESS THAN (9900),
PARTITION p99 VALUES LESS THAN MAXVALUE
);
EOF
#### populate every parition with one record, each has a 10MB longtext
for i in `seq 0 199`; do value=$((($i + 1) * 100 - 1)); $MYSQL -e "insert into test.t1 values($i, $value, repeat('MySQL', 2*1024*1024));"; done
###
### sysbench presure stage, run below lua scripts with sysbench 0.5 with 64 threads
###
pathtest = string.match(test, "(.*/)")
if pathtest then
dofile(pathtest .. "common.lua")
else
require(
end
function thread_
set_vars()
end
function event(thread_id)
--use a bad query to make sure every partition is searched
db_query("select * from t1 where id = 100")
end
###
### monitor stage, use top or `pidstat -r` to monitor mysqld memeory usage
### we can see that RSS memory usage will grow up to 129G !!!!!!
###
### And the more memory will be consumed if the more sysbench threads number is given.
###
08:28:39 PM PID minflt/s majflt/s VSZ RSS %MEM Command
08:28:40 PM 15748 0.00 0.00 7963124 437492 0.08 mysqld
08:28:41 PM 15748 0.00 0.00 7963124 437492 0.08 mysqld
08:28:42 PM 15748 166724.00 0.00 8596212 671604 0.13 mysqld
08:28:43 PM 15748 249058.00 0.00 9456372 971248 0.18 mysqld
08:28:44 PM 15748 180253.00 0.00 10873588 1446404 0.27 mysqld
08:28:45 PM 15748 1173937.00 0.00 15592180 4637980 0.88 mysqld
08:28:46 PM 15748 2926548.00 0.00 28386036 15495996 2.93 mysqld
08:28:47 PM 15748 2972244.00 0.00 42666740 27307432 5.16 mysqld
08:28:48 PM 15748 2987646.00 0.00 56935156 39181144 7.40 mysqld
08:28:49 PM 15748 2988845.00 0.00 71178996 51060980 9.65 mysqld
08:28:50 PM 15748 2996056.00 0.00 85445364 62969896 11.90 mysqld
08:28:51 PM 15748 2984125.00 0.00 99676916 74830384 14.14 mysqld
08:28:52 PM 15748 2983838.00 0.00 113871604 86689420 16.38 mysqld
08:28:53 PM 15748 2975173.00 0.00 128140020 98513336 18.61 mysqld
08:28:54 PM 15748 2962377.00 0.00 142236404 110287912 20.84 mysqld
08:28:55 PM 15748 2977520.00 0.00 156381940 122122028 23.07 mysqld
08:28:56 PM 15748 2881326.00 0.00 170052340 133579136 25.24 mysqld
08:28:57 PM 15748 348169.00 0.00 171418356 134846956 25.48 mysqld
08:28:58 PM 15748 79710.00 0.00 171760372 135022380 25.51 mysqld
08:28:59 PM 15748 79712.00 0.00 172102388 135197812 25.54 mysqld
08:29:00 PM 15748 72843.00 0.00 172430068 135366320 25.57 mysqld
08:29:01 PM 15748 79006.00 0.00 172772084 135538936 25.61 mysqld
08:29:02 PM 15748 79604.00 0.00 173099764 135713864 25.64 mysqld
08:29:03 PM 15748 34403.00 0.00 173243124 135779768 25.65 mysqld
08:29:04 PM 15748 0.00 0.00 173243124 135779768 25.65 mysqld
08:29:05 PM 15748 0.00 0.00 173243124 135779768 25.65 mysqld
08:29:06 PM 15748 0.00 0.00 173243124 135779768 25.65 mysqld
The RSS grow from 437492(427M) to 135779768(129G) in less than 30s!
BTW, the tokudb_cache_size I used is about 5G.
After the sysbench stopped, the RSS still keep at a high number, until I issued
`FLUSH TABLE test.t1` to close this table.
###
### problem analysis
###
If start mysql using `valgrind --tool=massif mysqld`(massif is a memory profiling tool in valgrind), we will see clearly where the memory goes to:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
-------
85 80,910,947,821 127,076,668,408 127,007,881,441 68,786,967 0
86 81,002,324,005 127,201,890,632 127,133,055,395 68,835,237 0
99.95% (127,133,055,395B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->49.18% (62,561,194,013B) 0xF909B1: my_malloc (my_malloc.c:38)
| ->48.80% (62,075,750,631B) 0xF91070: my_realloc (my_malloc.c:95)
| | ->48.80% (62,075,722,880B) 0x1692F5E: tokudb:
| | | ->48.80% (62,075,722,880B) 0x1638FDE: ha_tokudb:
| | | ->48.80% (62,075,722,880B) 0x16498E8: ha_tokudb:
| | | ->48.80% (62,075,722,880B) 0x164AF92: ha_tokudb:
| | | ->48.80% (62,075,722,880B) 0x164CF6C: ha_tokudb:
| | | ->48.80% (62,075,722,880B) 0x6CDF3F: handler:
| | | ->48.80% (62,075,722,880B) 0x193ECE5: ha_partition:
| | | ->48.80% (62,075,722,880B) 0x6CDF3F: handler:
->48.95% (62,265,682,762B) 0xF9109F: my_realloc (my_malloc.c:99)
| ->48.95% (62,264,615,082B) 0x1692F5E: tokudb:
| | ->48.95% (62,264,615,082B) 0x164A096: ha_tokudb:
b.cc:5299)
| | ->48.95% (62,264,615,082B) 0x1649AA4: smart_dbt_
| | ->48.95% (62,264,615,082B) 0x17E5006: c_getf_
| | ->48.95% (62,264,615,082B) 0x17BCA55: ft_search_
| | ->48.95% (62,264,615,082B) 0x17BE43A: ft_search_
| | ->48.65% (61,887,126,678B) 0x17BE9D0: toku_ft_
we can see that almost half memory(49.18% (62,561,194,013B) consumed by ha_tokudb:
To make things more clearly, I added some print info to ha_tokudb::close()
```
diff --git a/storage/
index 9079d15..a1be44b 100644
--- a/storage/
@@ -2218,6 +2218,11 @@ int ha_tokudb::close() { int ha_tokudb:
TOKUDB_
+ sql_print_
+ share->
+ size_range_
tokudb:
```
We'll see many error message in error log like these:
2017-07-23 20:35:41 15748 [ERROR] TokuDB: Table ./test/t1#P#p40 blob_buff size is 10485764, range_query_buff size is 10485789
2017-07-23 20:35:41 15748 [ERROR] TokuDB: Table ./test/t1#P#p41 blob_buff size is 10485764, range_query_buff size is 10485789
Now the problem is very clear, the memory is mainly consumed by `ha_tokudb:
has two such buffer instances, for a BAD query that need search every partition, 100 tokudb handler is needed.
So the total memory needed for one thread is 100*2*10M=2G, and total memory for 64 thread is 64*2G = 128G!!!
Maybe we should release `ha_tokudb:
Or some more elagant way should be conducted.
Changed in percona-server: | |
assignee: | nobody → Sveta Smirnova (svetasmirnova) |
Thank you for the report.
Verified as described. Maybe related to https:/ /bugs.mysql. com/bug. php?id= 38002 and https:/ /bugs.mysql. com/bug. php?id= 15089, thought not repeatable with InnoDB