replication delay too much on tokudb partition table

Bug #1740459 reported by HolmerD on 2017-12-29
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
New
Undecided
Unassigned

Bug Description

Hi:
my slave is delay too much, the TPS and QPS very low, memory, io(fusion-io), cpu everything is ok.
analye the latest six binlog files, found 99% binlog event is abount table `abtest_monitor_fac`(partition table), event type is: Write_rows, Update_rows.

table message:
*************************** 1. row ***************************
       Table: abtest_monitor_fac
Create Table: CREATE TABLE `abtest_monitor_fac` (
  `day` int(11) NOT NULL COMMENT '20161022',
  `time` bigint(11) NOT NULL COMMENT '201610221100',
  `hash_id` bigint(20) unsigned NOT NULL COMMENT 'logic id',
  `pv` bigint(20) NOT NULL COMMENT 'pv`',
  `used_time` bigint(11) NOT NULL COMMENT '耗时 微秒',
  `uv` bigint(20) NOT NULL COMMENT 'uv`',
  `last_update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '最后修改时间',
  PRIMARY KEY (`time`,`hash_id`),
  KEY `idx_day` (`day`),
  KEY `idx_last_update_time` (`last_update_time`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8
/*!50500 PARTITION BY RANGE COLUMNS(`time`)
(PARTITION p20161028 VALUES LESS THAN (20161029000000) ENGINE = TokuDB,
........
PARTITION p20171230 VALUES LESS THAN (20171231000000) ENGINE = TokuDB) */

+--------------------+----------------------+----------+
| table_name | PARTITION_EXPRESSION | p_count |
+--------------------+----------------------+----------+
| abtest_monitor_fac | `time` | 429 |
+--------------------+----------------------+----------+

+--------------------+--------+-------------+------------------+------------------+
| table_name | engine | table_rows | data_gb | index_gb |
+--------------------+--------+-------------+------------------+------------------+
| abtest_monitor_fac | TokuDB | 16723416943 | 763.579869242385 | 673.313777318224 |
+--------------------+--------+-------------+------------------+------------------+

production environment: single master slave
version: Percona-Server-5.6.33

lscpu:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 40
On-line CPU(s) list: 0-39
Thread(s) per core: 2
Core(s) per socket: 10
Socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Stepping: 4
CPU MHz: 3000.195
BogoMIPS: 5999.11
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 25600K
NUMA node0 CPU(s): 0-39

cpu used:
12:36:16 PM PID %usr %system %guest %CPU CPU Command
12:36:17 PM 5220 100.00 3.00 0.00 100.00 10 mysqld
12:36:18 PM 5220 100.00 3.00 0.00 100.00 0 mysqld
12:36:19 PM 5220 100.00 3.00 0.00 100.00 2 mysqld
12:36:20 PM 5220 100.00 2.00 0.00 100.00 1 mysqld
12:36:21 PM 5220 100.00 2.00 0.00 100.00 11 mysqld
12:36:22 PM 5220 100.00 3.00 0.00 100.00 10 mysqld
12:36:23 PM 5220 100.00 4.00 0.00 100.00 1 mysqld
12:36:24 PM 5220 100.00 4.00 0.00 100.00 11 mysqld
12:36:25 PM 5220 100.00 2.00 0.00 100.00 13 mysqld
12:36:26 PM 5220 100.00 5.00 0.00 100.00 10 mysqld
12:36:27 PM 5220 100.00 2.00 0.00 100.00 0 mysqld
12:36:28 PM 5220 100.00 3.00 0.00 100.00 10 mysqld
12:36:29 PM 5220 100.00 11.00 0.00 100.00 12 mysqld
12:36:30 PM 5220 100.00 5.00 0.00 100.00 0 mysqld

memory used:
             total used free shared buffers cached
Mem: 126 123 2 0 0 48
-/+ buffers/cache: 74 51
Swap: 31 0 31

perf record message:
# Overhead sys usr Samples Command Shared Object
# ........ ........ ........ ............ ....... ......................................................................................... ..................
#
    15.89% 0.00% 15.89% 40054 mysqld /apps/svr/Percona-Server-5.6.33-rel79.0-Linux.x86_64.ssl101/bin/mysqld 0x4f5245
             |
             --- bitmap_get_next_set
                |
                |--50.36%-- ha_partition::rnd_init(bool)
                | handler::ha_rnd_init(bool)
                | Rows_log_event::do_index_scan_and_update(Relay_log_info const*)
                | Rows_log_event::do_apply_event(Relay_log_info const*)
                | slave_worker_exec_job(Slave_worker*, Relay_log_info*)
                | handle_slave_worker
                | start_thread
                |
                |--48.26%-- ha_partition::rnd_end()
                | Rows_log_event::do_index_scan_and_update(Relay_log_info const*)
                | Rows_log_event::do_apply_event(Relay_log_info const*)
                | slave_worker_exec_job(Slave_worker*, Relay_log_info*)
                | handle_slave_worker
                | start_thread
                 --1.38%-- [...]

     7.30% 0.00% 7.30% 18401 mysqld /apps/svr/Percona-Server-5.6.33-rel79.0-Linux.x86_64.ssl101/lib/mysql/plugin/ha_tokudb.so 0x5a48e
             |
             --- ha_tokudb::index_init(unsigned int, bool)
                |
                |--98.76%-- ha_tokudb::rnd_init(bool)
                | handler::ha_rnd_init(bool)
                | ha_partition::rnd_init(bool)
                | handler::ha_rnd_init(bool)
                | Rows_log_event::do_index_scan_and_update(Relay_log_info const*)
                | Rows_log_event::do_apply_event(Relay_log_info const*)
                | slave_worker_exec_job(Slave_worker*, Relay_log_info*)
                | handle_slave_worker
                | start_thread
                |
                 --1.24%-- handler::ha_rnd_init(bool)
                           ha_partition::rnd_init(bool)
                           handler::ha_rnd_init(bool)
                           Rows_log_event::do_index_scan_and_update(Relay_log_info const*)
                           Rows_log_event::do_apply_event(Relay_log_info const*)
                           slave_worker_exec_job(Slave_worker*, Relay_log_info*)
                           handle_slave_worker
                           start_thread

     6.21% 0.00% 6.21% 15657 mysqld /apps/svr/Percona-Server-5.6.33-rel79.0-Linux.x86_64.ssl101/lib/mysql/plugin/ha_tokudb.so 0xaf438
             |
             --- toku_db_cursor_internal(__toku_db*, __toku_db_txn*, __toku_dbc*, unsigned int, int)
                |
                |--98.99%-- toku_db_cursor(__toku_db*, __toku_db_txn*, __toku_dbc**, unsigned int)
                | ha_tokudb::index_init(unsigned int, bool)
                | ha_tokudb::rnd_init(bool)
                | handler::ha_rnd_init(bool)
                | ha_partition::rnd_init(bool)
                | handler::ha_rnd_init(bool)
                | Rows_log_event::do_index_scan_and_update(Relay_log_info const*)
                | Rows_log_event::do_apply_event(Relay_log_info const*)
                | slave_worker_exec_job(Slave_worker*, Relay_log_info*)
                | handle_slave_worker
                | start_thread
                |
                |--0.84%-- ha_tokudb::index_init(unsigned int, bool)
                | ha_tokudb::rnd_init(bool)
                | handler::ha_rnd_init(bool)
                | ha_partition::rnd_init(bool)
                | handler::ha_rnd_init(bool)
                | Rows_log_event::do_index_scan_and_update(Relay_log_info const*)
                | Rows_log_event::do_apply_event(Relay_log_info const*)
                | slave_worker_exec_job(Slave_worker*, Relay_log_info*)
                | handle_slave_worker
                | start_thread
                 --0.17%-- [...]

Is this a bug?
And how to fix it?

HolmerD (dong.dg) on 2017-12-29
information type: Private Security → Public
HolmerD (dong.dg) on 2017-12-29
information type: Public → Public Security
information type: Public Security → Public
tags: added: performance tokudb
HolmerD (dong.dg) wrote :

This bug fixed in 5.6.38, 5.7.20.
https://github.com/mysql/mysql-server/commit/9414c7035020a1915db56101aa7929d8b4069d83#diff-8c78bfc758be72b7df2306f1d9984c47

If slave_rows_search_algorithms is set to TABLE_SCAN,HASH_SCAN there is a performance improvement.

set global slave_rows_search_algorithms='TABLE_SCAN,HASH_SCAN';

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-3770

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers