tokudb+mariadb: server stalled

Bug #1621852 reported by Manuel Arostegui on 2016-09-09
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Opinion
Undecided
Unassigned

Bug Description

We have a server running several processes of Mariadb (10.0.22) and tables running tokudb engine (tokudb-7.5.7) replicating from a master in a normal master-slave setup.

We realised that one of the proceses was lagging a lot found a query that was stuck. The process was completely stuck (stop slave didn't even work, it was just hanging trying to kill the replication thread). The only solution was to kill -9 mysqld.

This is some troubleshooting.

Affected table structure:

CREATE TABLE `user_groups` (
  `ug_user` int(5) unsigned NOT NULL DEFAULT '0',
  `ug_group` varbinary(255) NOT NULL DEFAULT '',
  PRIMARY KEY (`ug_user`,`ug_group`),
  KEY `ug_group` (`ug_group`)
) ENGINE=TokuDB DEFAULT CHARSET=binary `compression`='tokudb_zlib'

The following query has been running for a long time:

*************************** 4. row ***************************
      Id: 44
    User: system user
    Host:
      db: zhwiki
 Command: Connect
    Time: 81310
   State: update
    Info: INSERT /* User::addGroup 127.0.0.1 */ IGNORE INTO `user_groups` (ug_user,ug_group) VALUES ('1546118','bot')
Progress: 0.000
Nothing else is running

Id User Host db Command Time State Info Progress
2 event_scheduler localhost NULL Daemon 177 Waiting for next activation NULL 0.000
42 system user NULL Connect 6676093 Waiting for master to send event NULL 0.000
43 system user NULL Connect 81133 Waiting for prior transaction to commit NULL 0.000
44 system user zhwiki Connect 81133 update INSERT /* User::addGroup 127.0.0.1 */ IGNORE INTO `user_groups` (ug_user,ug_group) VALUES ('1546118','bot') 0.000
45 system user NULL Connect 97087 Waiting for room in worker thread event queue NULL 0.000
830653 repl 10.64.37.5:57304 NULL Binlog Dump 2815399 Master has sent all binlog to slave; waiting for binlog to be updated NULL 0.000
839855 repl 10.64.4.11:44118 NULL Binlog Dump 2769700 Master has sent all binlog to slave; waiting for binlog to be updated NULL 0.000
1254774 watchdog 10.64.0.15:51206 information_schema Sleep 5 NULL 0.000
1254775 watchdog 10.64.0.15:51627 mysql Sleep 0 NULL 0.000
1410875 root localhost NULL Sleep 469 NULL 0.000
1410978 root localhost NULL Query 0 init show full processlist 0.000
*************************** 1. row ***************************
  Type: InnoDB
  Name:
Status:
=====================================
2016-09-08 13:16:53 7ffa5a5e7700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 9 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 6475205 srv_active, 0 srv_shutdown, 177284 srv_idle
srv_master_thread log flush and writes: 6652339
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 34309329
OS WAIT ARRAY INFO: signal count 51354558
Mutex spin waits 57127135, rounds 892504058, OS waits 25672980
RW-shared spins 13506675, rounds 235123829, OS waits 6441900
RW-excl spins 9307856, rounds 143680432, OS waits 1919206
Spin rounds per wait: 15.62 mutex, 17.41 RW-shared, 15.44 RW-excl
------------
TRANSACTIONS
------------
Trx id counter 5158085477
Purge done for trx's n:o < 5158085476 undo n:o < 0 state: running but idle
History list length 572
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 1410965, OS thread handle 0x7ffa5a5e7700, query id 1309267452 localhost root init
show engine innodb status
---TRANSACTION 4225399955, not started
MySQL thread id 42, OS thread handle 0x7ffa5a585700, query id 0 Waiting for master to send event
---TRANSACTION 5158070750, not started
MySQL thread id 44, OS thread handle 0x7ffa56dce700, query id 1309267436 update
INSERT /* User::addGroup 127.0.0.1 */ IGNORE INTO `user_groups` (ug_user,ug_group) VALUES ('1546118','bot')
---TRANSACTION 4225399812, not started
MySQL thread id 1, OS thread handle 0x7ffa5a73e700, query id 0 Waiting for background binlog tasks
---TRANSACTION 5158070754, ACTIVE (PREPARED) 81069 sec
3 lock struct(s), heap size 1184, 1 row lock(s), undo log entries 3
MySQL thread id 43, OS thread handle 0x7ffa58433700, query id 1309131308 Waiting for prior transaction to commit
Trx #rec lock waits 3121 #table lock waits 0
Trx total rec lock wait time 18 SEC
Trx total table lock wait time 0 SEC
--------
FILE I/O
--------
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: waiting for i/o request (read thread)
I/O thread 4 state: waiting for i/o request (read thread)
I/O thread 5 state: waiting for i/o request (read thread)
I/O thread 6 state: waiting for i/o request (read thread)
I/O thread 7 state: waiting for i/o request (read thread)
I/O thread 8 state: waiting for i/o request (read thread)
I/O thread 9 state: waiting for i/o request (read thread)
I/O thread 10 state: waiting for i/o request (read thread)
I/O thread 11 state: waiting for i/o request (read thread)
I/O thread 12 state: waiting for i/o request (read thread)
I/O thread 13 state: waiting for i/o request (read thread)
I/O thread 14 state: waiting for i/o request (read thread)
I/O thread 15 state: waiting for i/o request (read thread)
I/O thread 16 state: waiting for i/o request (read thread)
I/O thread 17 state: waiting for i/o request (read thread)
I/O thread 18 state: waiting for i/o request (write thread)
I/O thread 19 state: waiting for i/o request (write thread)
I/O thread 20 state: waiting for i/o request (write thread)
I/O thread 21 state: waiting for i/o request (write thread)
I/O thread 22 state: waiting for i/o request (write thread)
I/O thread 23 state: waiting for i/o request (write thread)
I/O thread 24 state: waiting for i/o request (write thread)
I/O thread 25 state: waiting for i/o request (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0, 0, 0, 0, 0] ,
 ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
4463767 OS file reads, 562387708 OS file writes, 535312345 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 2600, seg size 2602, 888183 merges
merged operations:
 insert 2856041, delete mark 2732633, delete 855981
discarded operations:
 insert 0, delete mark 0, delete 0
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 2594485252999
Log flushed up to 2594485252999
Pages flushed up to 2594485252999
Last checkpoint at 2594485252999
Max checkpoint age 3478212404
Checkpoint age target 3369518267
Modified age 0
Checkpoint age 0
0 pending log writes, 0 pending chkp writes
514432583 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 8791261184; in additional pool allocated 0
Total memory allocated by read views 432
Internal hash tables (constant factor + variable factor)
    Adaptive hash index 833192288 (151384312 + 681807976)
    Page hash 1107208 (buffer pool 0 only)
    Dictionary cache 37243056 (35402768 + 1840288)
    File system 905296 (812272 + 93024)
    Lock system 21253192 (21250568 + 2624)
    Recovery system 0 (0 + 0)
Dictionary memory allocated 1840288
Buffer pool size 524280
Buffer pool size, bytes 8589803520
Free buffers 8904
Database pages 473762
Old database pages 174877
Modified db pages 0
Percent of dirty pages(LRU & free pages): 0.000
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 2754509, not young 811412628
0.00 youngs/s, 0.00 non-youngs/s
Pages read 5226290, created 4170968, written 54158543
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 473762, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
----------------------
INDIVIDUAL BUFFER POOL INFO
----------------------
---BUFFER POOL 0
Buffer pool size 65535
Buffer pool size, bytes 1073725440
Free buffers 1144
Database pages 59149
Old database pages 21829
Modified db pages 0
Percent of dirty pages(LRU & free pages): 0.000
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 351299, not young 98612103
0.00 youngs/s, 0.00 non-youngs/s
Pages read 661252, created 524914, written 7393879
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 59149, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 1
Buffer pool size 65535
Buffer pool size, bytes 1073725440
Free buffers 1196
Database pages 59098
Old database pages 21829
Modified db pages 0
Percent of dirty pages(LRU & free pages): 0.000
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 347217, not young 98373232
0.00 youngs/s, 0.00 non-youngs/s
Pages read 660394, created 522916, written 7025071
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 59098, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 2
Buffer pool size 65535
Buffer pool size, bytes 1073725440
Free buffers 1144
Database pages 59210
Old database pages 21860
Modified db pages 0
Percent of dirty pages(LRU & free pages): 0.000
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 343753, not young 104864852
0.00 youngs/s, 0.00 non-youngs/s
Pages read 658594, created 526885, written 6525218
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 59210, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 3
Buffer pool size 65535
Buffer pool size, bytes 1073725440
Free buffers 1124
Database pages 59221
Old database pages 21870
Modified db pages 0
Percent of dirty pages(LRU & free pages): 0.000
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 338562, not young 100821994
0.00 youngs/s, 0.00 non-youngs/s
Pages read 650016, created 518134, written 6897026
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 59221, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 4
Buffer pool size 65535
Buffer pool size, bytes 1073725440
Free buffers 1026
Database pages 59380
Old database pages 21899
Modified db pages 0
Percent of dirty pages(LRU & free pages): 0.000
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 347327, not young 103727368
0.00 youngs/s, 0.00 non-youngs/s
Pages read 653023, created 523331, written 6911900
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 59380, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 5
Buffer pool size 65535
Buffer pool size, bytes 1073725440
Free buffers 1108
Database pages 59204
Old database pages 21861
Modified db pages 0
Percent of dirty pages(LRU & free pages): 0.000
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 341341, not young 102578398
0.00 youngs/s, 0.00 non-youngs/s
Pages read 647211, created 518967, written 6812852
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 59204, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 6
Buffer pool size 65535
Buffer pool size, bytes 1073725440
Free buffers 1101
Database pages 59203
Old database pages 21861
Modified db pages 0
Percent of dirty pages(LRU & free pages): 0.000
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 343242, not young 102572143
0.00 youngs/s, 0.00 non-youngs/s
Pages read 646626, created 518401, written 6240981
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 59203, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
---BUFFER POOL 7
Buffer pool size 65535
Buffer pool size, bytes 1073725440
Free buffers 1061
Database pages 59297
Old database pages 21868
Modified db pages 0
Percent of dirty pages(LRU & free pages): 0.000
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 341768, not young 99862538
0.00 youngs/s, 0.00 non-youngs/s
Pages read 649174, created 517420, written 6351616
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 59297, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
1 RW transactions active inside InnoDB
0 RO transactions active inside InnoDB
1 out of 1000 descriptors used
Main thread process no. 16572, id 140703350900480, state: sleeping
Number of rows inserted 293087206, updated 887359, deleted 58720205, read 73067195
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
Number of system rows inserted 505799786, updated 0, deleted 505799788, read 505799789
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

The replication thread never goes forward, the query is stuck.

An strace doesn't show anything relevant

Process 16572 attached
restart_syscall(<... resuming interrupted call ...>) = 1
fcntl(32, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(32, F_SETFL, O_RDWR|O_NONBLOCK) = 0
accept(32, {sa_family=AF_INET6, sin6_port=htons(35658), inet_pton(AF_INET6, "::ffff:10.64.0.15", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 3579
fcntl(32, F_SETFL, O_RDWR) = 0
fcntl(3579, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
setsockopt(3579, SOL_IP, IP_TOS, [8], 4) = 0
setsockopt(3579, SOL_TCP, TCP_NODELAY, [1], 4) = 0
futex(0x13dafe4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x13dafe0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x13d8660, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=32, events=POLLIN}, {fd=33, events=POLLIN}], 2, 4294967295) = 1 ([{fd=32, revents=POLLIN}])
fcntl(32, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(32, F_SETFL, O_RDWR|O_NONBLOCK) = 0
accept(32, {sa_family=AF_INET6, sin6_port=htons(35666), inet_pton(AF_INET6, "::ffff:10.64.0.15", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 3579
fcntl(32, F_SETFL, O_RDWR) = 0
fcntl(3579, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
setsockopt(3579, SOL_IP, IP_TOS, [8], 4) = 0
setsockopt(3579, SOL_TCP, TCP_NODELAY, [1], 4) = 0
futex(0x13dafe4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x13dafe0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
poll([{fd=32, events=POLLIN}, {fd=33, events=POLLIN}], 2, 4294967295

Stopping the slave doesn't even work and hangs forever (strace not showing anything relevant when we try to stop it)

accept(32, {sa_family=AF_INET6, sin6_port=htons(51810), inet_pton(AF_INET6, "::ffff:10.64.0.15", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 3580
fcntl(32, F_SETFL, O_RDWR) = 0
fcntl(3580, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
setsockopt(3580, SOL_IP, IP_TOS, [8], 4) = 0
setsockopt(3580, SOL_TCP, TCP_NODELAY, [1], 4) = 0
futex(0x13dafe4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x13dafe0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x13d8660, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=32, events=POLLIN}, {fd=33, events=POLLIN}], 2, 4294967295) = 1 ([{fd=32, revents=POLLIN}])
fcntl(32, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(32, F_SETFL, O_RDWR|O_NONBLOCK) = 0
accept(32, {sa_family=AF_INET6, sin6_port=htons(52870), inet_pton(AF_INET6, "::ffff:10.64.0.15", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 3581
fcntl(32, F_SETFL, O_RDWR) = 0
fcntl(3581, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
setsockopt(3581, SOL_IP, IP_TOS, [8], 4) = 0
setsockopt(3581, SOL_TCP, TCP_NODELAY, [1], 4) = 0
clone(child_stack=0x7ffa56d09f30, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7ffa56d0a9d0, tls=0x7ffa56d0a700, child_tidptr=0x7ffa56d0a9d0) = 42109
poll([{fd=32, events=POLLIN}, {fd=33, events=POLLIN}], 2, 4294967295) = 1 ([{fd=32, revents=POLLIN}])
fcntl(32, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(32, F_SETFL, O_RDWR|O_NONBLOCK) = 0
accept(32, {sa_family=AF_INET6, sin6_port=htons(53205), inet_pton(AF_INET6, "::ffff:10.64.0.15", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 3582
fcntl(32, F_SETFL, O_RDWR) = 0
fcntl(3582, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
setsockopt(3582, SOL_IP, IP_TOS, [8], 4) = 0
setsockopt(3582, SOL_TCP, TCP_NODELAY, [1], 4) = 0
clone(child_stack=0x7ffa56d6bf30, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7ffa56d6c9d0, tls=0x7ffa56d6c700, child_tidptr=0x7ffa56d6c9d0) = 42120
poll([{fd=32, events=POLLIN}, {fd=33, events=POLLIN}], 2, 4294967295

The only solution is to kill -9 mysqld and let it come back again

After a few hours it happened again, and we migrated the table to InnoDB and never happened again.
We have seen this error in the past and migrating to InnoDB stopped it from happening again.

description: updated
George Ormond Lorch III (gl-az) wrote :

Can you ensure that you have debug symbols, then attach gdb and capture a "thread apply all backtrace". Since this is within MariaDB and not Percona Server I also suggest you log it with them and reference this issue as there is the possibility that it is related to some interaction with the server specific code.

Manuel Arostegui (marostegui) wrote :

We have it with debug:

0000000000000000 l d .debug_aranges 0000000000000000 .debug_aranges
0000000000000000 l d .debug_info 0000000000000000 .debug_info
0000000000000000 l d .debug_abbrev 0000000000000000 .debug_abbrev
0000000000000000 l d .debug_line 0000000000000000 .debug_line
0000000000000000 l d .debug_str 0000000000000000 .debug_str
0000000000000000 l d .debug_loc 0000000000000000 .debug_loc
0000000000000000 l d .debug_ranges 0000000000000000 .debug_ranges
000000000038de40 l O .data 0000000000000048 _ZL31mysql_sysvar_lock_timeout_debug
000000000038c9e0 l O .data 0000000000000050 _ZL18mysql_sysvar_debug
000000000038f960 g O .bss 0000000000000001 garbage_collection_debug
000000000038e5a0 g O .bss 0000000000000008 tokudb_debug
000000000038fbe0 g O .bss 0000000000000004 toku_ft_debug_mode

We are not able to reproduce this with a certain pattern, but once it has happened again I will attach gdb and try to capture it.

I will also inform MariaDB.

Thanks
Manuel.

Manuel Arostegui (marostegui) wrote :

This happened again:

I have attached the stacktraces as well as:

- show engine tokudb status
- show table status for the affected table.

The configuration for this host is as follows (it is running multiple instances of MySQL)

[mysqld]

user = mysql
read_only = 1

skip-external-locking
skip-name-resolve
skip-slave-start
log-slave-updates
temp-pool

max_connections = 100
max_connect_errors = 1000000000
max_allowed_packet = 32M
connect_timeout = 3
query_cache_size = 0
query_cache_type = 0
event_scheduler = 1
userstat = 0
log-warnings = 0
thread_stack = 192K
thread_cache_size = 300
interactive_timeout = 28800
wait_timeout = 3600
plugin-load = ha_tokudb
transaction-isolation = REPEATABLE-READ
slave_transaction_retries = 4294967295
slave_parallel_threads = 2
slave_parallel_max_queued = 16M
binlog-format = ROW
expire_logs_days = 7
sql-mode = IGNORE_BAD_TABLE_OPTIONS

table_open_cache = 10000
table_definition_cache = 50000
open-files-limit = 400000
character_set_server = binary
character_set_filesystem = binary
collation_server = binary

default-storage-engine = InnoDB
innodb_file_per_table = 1
innodb_buffer_pool_size = 8G
innodb_log_file_size = 2G
innodb_flush_log_at_trx_commit = 1
innodb_flush_method = O_DIRECT
tokudb_cache_size = 8G
tokudb_lock_timeout = 50000
tokudb_empty_scan = disabled
tokudb_read_buf_size = 256K
tokudb_pk_insert_mode = 2
join_cache_level = 8

# dump and load innodb buffer at start and stop
innodb_buffer_pool_load_at_startup = 1
innodb_buffer_pool_dump_at_shutdown = 1
innodb_use_native_aio = 0
innodb_read_io_threads = 16
innodb_write_io_threads = 8
optimizer_switch = 'mrr=on,mrr_cost_based=on,mrr_sort_keys=on,optimize_join_buffer_size=on,engine_condition_pushdown=on,index_condition_pushdown=off'
[mysqld_multi]
log = /var/log/mysqld_multi.log
mysqld = /opt/wmf-mariadb10/bin/mysqld_safe
mysqladmin = /opt/wmf-mariadb10/bin/mysqladmin

[mysqld2]

port = 3312
socket = /tmp/mysql.s2.sock
log-bin = s2-bin
relay-log = s2-rel
datadir = /srv/sqldata.s2
pid-file = /srv/sqldata.s2/pid
tmpdir = /srv/tmp.s2
server_id = 1719787763312

Manuel Arostegui (marostegui) wrote :

This is the show table status

Manuel Arostegui (marostegui) wrote :

This is the show engine tokudb status

Manuel Arostegui (marostegui) wrote :

This happened again last night, with the same end.
We had to kill -9 the server and rebuilt the table as InnoDB.

Unfortunately we couldn't get the stracktraces this time.

George Ormond Lorch III (gl-az) wrote :

I found your upstream report here https://jira.mariadb.org/browse/MDEV-10796 good to have it loggeed in both places.

First question, you have a combination of InnoDB and TokuDB, each configured to use ~8GB of memory, so this means that you should have at least 24GB of memory or you risk swapping.

Next, looking at the gdb output, thee really isn't a lot of activity, you have a bunch of totally idle threads, then those that look active in any fashion are:

Thread 18 : in the server event handler looking for an event to process
Thread 16, 15 : looks like a slave servers relay connection sending binlog to other servers
Thread 17, 14, 13, 12, 11, 2 : connected but waiting for client command
Thread 10 : parallel replication slave SQL thread for this server in a transaction commit writing a binlog event
Thread 9 : parallel replication slave SQL thread for this server trying to obtain a TokuDB range lock for an insert event
Thread 8 : parallel replication slave SQL thread for this server preparing to execute an event

So there is only one thread in the TokuDB SE at this point in time, I don't see how it can be stalling the entire server unless there is some lock held higher up in the stack. It is _possible_ that this state was captured after a range lock was released during a race condition described here https://tokutek.atlassian.net/browse/FT-332 but the symptoms don't match exactly. They are not a complete server stall, just lock contention/slowing on parallel replication with overlapping row/range locks. We are working on a patch for this but I don't think it is related.

So if this gdb capture is truly representative of the state of the server during one of these stalls, I think you might need to go back to MariaDB folks to review the trace and see if they see anything that points to something in their server or TokuDB.

Manuel Arostegui (marostegui) wrote :

Thanks for the input.

The server isn't swapping - no traces of swap anywhere.

The gdb capture was taken once the problem happened again, so that is the state of the server at the time replication wasn't going ahead.

Again, the only solution (which prevents this from happening) is to convert the table to InnoDB.
I have pinged MariaDB in the ticket you found so they can check the stacktraces too.

Is there anything else I can do to narrow this problem a bit more so we can find out what's going on or get more troubleshooting scenarios?

Manuel Arostegui (marostegui) wrote :
Download full text (4.7 KiB)

By the way, thread 9 is the query that always hangs the server for the last few times

Thread 9 (Thread 0x7f31cb99e700 (LWP 27968)):
#0 0x00007f31cce007be in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f31c9d207ec in toku_cond_timedwait (cond=<optimized out>, mutex=<optimized out>, wakeup_at=<optimized out>) at /home/jynus/mariadb-10.0.22/storage/tokudb/ft-index/portabi
lity/toku_pthread.h:325
#2 wait (killed_callback=0x7f31c9c6fdd0 <tokudb_killed_callback()>, killed_time_ms=<optimized out>, wait_time_ms=<optimized out>, this=0x7f31cb99b460) at /home/jynus/mariadb-10.0.2
2/storage/tokudb/ft-index/locktree/lock_request.cc:273
#3 toku_db_wait_range_lock (db=db@entry=0x7f2c59fdb0c0, txn=txn@entry=0x7f2f2d1b8e00, request=request@entry=0x7f31cb99b460) at /home/jynus/mariadb-10.0.22/storage/tokudb/ft-index/s
rc/ydb_row_lock.cc:279
#4 0x00007f31c9d3cbd6 in toku_db_get_range_lock (db=db@entry=0x7f2c59fdb0c0, txn=txn@entry=0x7f2f2d1b8e00, left_key=left_key@entry=0x7f31cb99b890, right_key=right_key@entry=0x7f31c
b99b890, lock_type=lock_type@entry=toku::lock_request::WRITE)
    at /home/jynus/mariadb-10.0.22/storage/tokudb/ft-index/src/ydb_row_lock.cc:239
#5 0x00007f31c9d3d113 in toku_db_get_point_write_lock (db=db@entry=0x7f2c59fdb0c0, txn=txn@entry=0x7f2f2d1b8e00, key=key@entry=0x7f31cb99b890) at /home/jynus/mariadb-10.0.22/storag
e/tokudb/ft-index/src/ydb_row_lock.cc:293
#6 0x00007f31c9d3d752 in env_put_multiple_internal (flags_array=0x7f2c59943420, vals=0x7f2c59943010, keys=0x7f2c599427f0, db_array=0x7f2c59987158, num_dbs=2, src_val=0x7f31cb99b870
, src_key=0x7f31cb99b890, txn=0x7f2f2d1b8e00, src_db=0x7f2c59fdb0c0, env=0x7f2fa83c1980)
    at /home/jynus/mariadb-10.0.22/storage/tokudb/ft-index/src/ydb_write.cc:790
#7 env_put_multiple (env=0x7f2fa83c1980, src_db=0x7f2c59fdb0c0, txn=0x7f2f2d1b8e00, src_key=0x7f31cb99b890, src_val=0x7f31cb99b870, num_dbs=2, db_array=0x7f2c59987158, keys=0x7f2c5
99427f0, vals=0x7f2c59943010, flags_array=0x7f2c59943420)
    at /home/jynus/mariadb-10.0.22/storage/tokudb/ft-index/src/ydb_write.cc:1174
#8 0x00007f31c9c86bd8 in ha_tokudb::insert_rows_to_dictionaries_mult (this=this@entry=0x7f2c59942020, pk_key=pk_key@entry=0x7f31cb99b890, pk_val=pk_val@entry=0x7f31cb99b870, txn=tx
n@entry=0x7f2f2d1b8e00, thd=thd@entry=0x7f2f4c014008)
    at /home/jynus/mariadb-10.0.22/storage/tokudb/ha_tokudb.cc:3852
#9 0x00007f31c9c95071 in ha_tokudb::write_row (this=0x7f2c59942020, record=0x7f2c59fea820 "@\321\a") at /home/jynus/mariadb-10.0.22/storage/tokudb/ha_tokudb.cc:4006
#10 0x0000000000747f7b in handler::ha_write_row (this=0x7f2c59942020, buf=0x7f2c59fea820 "@\321\a") at /home/jynus/mariadb-10.0.22/sql/handler.cc:6004
#11 0x00000000005b59bf in write_record (thd=thd@entry=0x7f2f4c014008, table=table@entry=0x7f2c5988b208, info=info@entry=0x7f31cb99bb80) at /home/jynus/mariadb-10.0.22/sql/sql_insert
.cc:1847
#12 0x00000000005b8fcb in mysql_insert (thd=thd@entry=0x7f2f4c014008, table_list=0x7f2f4c0300e8, fields=..., values_list=..., update_fields=..., update_values=..., duplic=DUP_ERROR,
 ignore=true) at /home/jynus/mariadb-10.0.22/sql/sql_insert.cc:964
#13 0x00000000005d511c in m...

Read more...

It very well could be TokuDB as a result of some odd interaction with MariaDB. We do not maintain, build nor test TokuDB with MariaDB so it is difficult to say.

So far, we don't have any reports of this behavior or any similar in Percona Server.

Throwing a few things at the wall that maybe the MariaDB folks can think about:

I still do not understand how this one thread possibly being blocked on something causing all other activities to halt. If there were other threads stuck somewhere in TokuDB, then yes, could be some internal deadlock. Maybe there is some similar but different issue as https://tokutek.atlassian.net/browse/FT-332 and https://tokutek.atlassian.net/browse/FT-738 in the locktree logic that is freezing this one thread. Perhaps more context on queries happening before the hang might help.

One thing comes to mind and that is that the TokuDB locktree only implements exclusive read locks and not shared read locks, it is possible that something in the MariaDB parallel replication does not understand or know about this and inadvertently allows a situation where a row lock deadlock can occur, but even then, the blocked thread should eventually timeout and error back to the replication SQL thread.

Manuel Arostegui (marostegui) wrote :

Note that we do have parallel replication disabled:

MariaDB SANITARIUM localhost (none) > show global variables like 'slave_parallel_mode';
Empty set (0.00 sec)

I wonder just how 'hung' the server is. Can you get into it via the mysql cli when it is in this state?

If so perhaps getting a look at some of the tokudb information_schema tables might show us something : tokudb_trx, tokudb_lock_waits, and tokudb_locks

Manuel Arostegui (marostegui) wrote :

When the server is "hung" I can get into the MySQL prompt indeed, but I cannot do anything with the replication thread.
ie: stopping slave

I can run mysql commands though, so what you suggest can be done.
I am waiting on MariaDB to check what you suggested in the previous post.

In the meantime, we have converted all the tables to InnoDB as this was biting us quite often and degrading our service.

Thanks

Kristian Nielsen (knielsen) wrote :

The stack trace shows TokuDB waiting for a row lock.

There is a pull request for fixing some bugs with similar symptoms as this:

  https://github.com/percona/PerconaFT/pull/360

There were some race conditions where one transaction releasing a lock could
miss the wakeup of another transaction waiting for the lock. The result is
that the waiting transaction would be stuck until tokudb_lock_timeout
expires.

Maybe that pull request could help in this case also.

So far we believe this may be related to lock tree stalls. MariaDB has imported a patch a while ago to address this and we have been reviewing and improving the patch for Percona Server. This work is being tracked here https://jira.percona.com/browse/TDB-3, please login there to Percona JIRA and 'watch' for future updates. Marking it as opinion here as there is no other accurate matching state.

Changed in percona-server:
status: New → Opinion

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-3553

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers