Replication stall with multi-threaded replication
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| MySQL Server |
Unknown
|
Unknown
|
||
| Percona Server moved to https://jira.percona.com/projects/PS |
Fix Released
|
Medium
|
Sergei Glushchenko | |
| 5.6 |
Fix Released
|
Medium
|
Sergei Glushchenko |
Bug Description
Percona Server version 5.6.17-66.0 compiled from source.
OS: SunOS 5.11 joyent_
Periodically what happens every few days that the replication gets stalled with the coordinator thread in the state "Waiting for Slave Worker to release partition" and the worker threads in the state "Waiting for an event from Coordinator". At that point trying to stop the slave by executing "STOP SLAVE" also gets stalled, together with stalling all the further "SHOW SLAVE STAUS" calls.
Below is how typically the processlist looks in that particular case:
+------
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+------
| 2477 | root | localhost | NULL | Sleep | 33355 | | NULL | 1 | 46 |
| 2967 | system user | | NULL | Connect | 24248 | Waiting for master to send event | NULL | 0 | 0 |
| 2968 | system user | | NULL | Connect | 24085 | Waiting for Slave Worker to release partition | NULL | 0 | 0 |
| 2969 | system user | | NULL | Connect | 24248 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2970 | system user | | NULL | Connect | 24248 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2971 | system user | | NULL | Connect | 24248 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2972 | system user | | NULL | Connect | 24248 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2973 | system user | | NULL | Connect | 24248 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2974 | system user | | NULL | Connect | 84603 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2975 | system user | | NULL | Connect | 84603 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2976 | system user | | NULL | Connect | 84603 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2977 | system user | | NULL | Connect | 84603 | Waiting for an event from Coordinator | NULL | 0 | 1 |
| 2978 | system user | | NULL | Connect | 84603 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 4268 | root | localhost | some_db | Query | 871 | Killing slave | stop slave | 0 | 0 |
| 4370 | root | localhost | some_db | Sleep | 4 | | NULL | 0 | 0 |
| 4377 | monitoring | 192.168.1.222:11159 | NULL | Query | 464 | init | show slave status | 0 | 0 |
| 4419 | root | localhost | NULL | Query | 0 | init | show processlist | 0 | 0 |
+------
17 rows in set (0.00 sec)
Looking into this further, the InnoDB status shows us that one of the worker thread actually has an open transaction:
---TRANSACTION 53684133006, ACTIVE 24245 sec
2 lock struct(s), heap size 360, 1 row lock(s), undo log entries 1
MySQL thread id 2977, OS thread handle 0x36, query id 56740540 Waiting for an event from Coordinator
TABLE LOCK table `db2`.`tbl_name` /* Partition `p24` */ trx id 53684133006 lock mode IX
RECORD LOCKS space id 3504 page no 23250 n bits 440 index `PRIMARY` of table `db2`.`tbl_name` /* Partition `p24` */ trx id 53684133006 lock_mode X locks rec but not gap
First off all the transaction is a replicated transaction being executed by the worker thread and is still open for an unknown reason, while holding only a single row lock.
Secondly, apparently the coordinator is blocked because the next even to apply is also to the `db2` database and hence it must wait for the worker thread id 2977 to commit the transaction. This is based on the implementation described here http://
The MySQL configuration is as follows:
[mysqld]
core-file
user = mysql
port = 3306
datadir = /db
socket = /tmp/mysql.sock
default-
skip-external-
log_warnings=1
skip_name_resolve
character-
collation-server = utf8mb4_unicode_ci
group_concat_
server-id = 666
# InnoDB settings
innodb_
innodb_
innodb_
innodb_
innodb_
innodb_
innodb_
innodb_
# allows for row_format=
innodb_
innodb_
innodb_
innodb_
innodb_
innodb_doublewrite = 0
innodb_io_capacity = 1200
innodb_
innodb_
innodb_
# Slow query log settings
# The default logs all full table scans,tmp tables,filesorts on disk queries
#use_global_
#long_query_time = 0.5
slow_query_log_file = slowquery.log
slow_query_log = 1
long_query_time = 3
log_slow_filter = "full_scan,
log_slow_verbosity = "full"
# Other general MySQL settings
sync_binlog = 0
query_cache_type = 0
query_cache_size = 0
max_connections = 3000
thread_cache_size = 1000
back_log = 1024
thread_concurrency = 32
innodb_
tmpdir = /var/tmp
max_allowed_packet = 24M
max_join_size = 4294967295
net_buffer_length = 2K
thread_stack = 512K
tmp_table_size = 64M
max_heap_table_size = 64M
table_open_cache = 2000
# Replication settings (master to slave)
binlog_cache_size = 2M
binlog_format=mixed
log-bin = bin
log-error = error.log
expire_logs_days = 5
slave-parallel-
master-
relay-log-
gtid_mode = ON
enforce_
log-slave-updates
replicate-
replicate-
replicate-
replicate-
# Started tuning slave catchup speed, can use more research
slave-checkpoin
slave-checkpoin
I will be attaching the GDB stack traces in a follow up comment here
Related branches
- Laurynas Biveinis (community): Approve on 2014-10-07
-
Diff: 526 lines (+409/-1)9 files modifiedmysql-test/suite/rpl/r/percona_bug1331586.result (+139/-0)
mysql-test/suite/rpl/t/percona_bug1331586-master.opt (+3/-0)
mysql-test/suite/rpl/t/percona_bug1331586-slave.opt (+5/-0)
mysql-test/suite/rpl/t/percona_bug1331586.test (+148/-0)
sql/log_event.h (+25/-0)
sql/rpl_master.cc (+10/-0)
sql/rpl_rli.cc (+2/-1)
sql/rpl_rli.h (+23/-0)
sql/rpl_slave.cc (+54/-0)
Ovais Tariq (ovais-tariq) wrote : | #1 |
description: | updated |
Looks similar to upstream http://
tags: | added: upstream |
Ovais Tariq (ovais-tariq) wrote : | #4 |
@Valeri,
The bug http://
Ovais Tariq (ovais-tariq) wrote : | #5 |
So I have another instance of a slave stuck because one of the worker has a transaction open for 22367 seconds and the coordinator is waiting on the worker to finish.
+-----+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+-----+
| 91 | system user | | NULL | Connect | 35042 | Waiting for master to send event | NULL | 0 | 0 |
| 92 | system user | | NULL | Connect | 22356 | Waiting for Slave Worker to release partition | NULL | 0 | 0 |
| 93 | system user | | NULL | Connect | 35042 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 94 | system user | | NULL | Connect | 35042 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 95 | system user | | NULL | Connect | 35042 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 96 | system user | | NULL | Connect | 35042 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 97 | system user | | NULL | Connect | 35042 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 98 | system user | | NULL | Connect | 71771 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 99 | system user | | NULL | Connect | 71771 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 100 | system user | | NULL | Connect | 71771 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 101 | system user | | NULL | Connect | 71771 | Waiting for an event from Coordinator | NULL | 0 | 1 |
| 102 | system user | | NULL | Connect | 71771 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 176 | root | localhost | NULL | Query | 0 | init | show processlist | 0 | 0 |
+-----+
---TRANSACTION 53789616362, ACTIVE 22367 sec
2 lock struct(s), heap size 360, 1 row lock(s), undo log entries 1
MySQL thread id 101, OS thread handle 0x39, query id 95380210 Waiting for an event from Coordinator
TABLE LOCK table `db2`.`some_tbl` /* Partition `p5` */ trx id 53789616362 lock mode IX
RECORD LOCKS space id 3485 page no 16641 n bits 424 index `PRIMARY` of table `db2`.`some_tbl` /* Partition `p5` *...
Ovais Tariq (ovais-tariq) wrote : | #6 |
More details on the state of the coordinator thread:
set print pretty on
p *w_entry
$9 = {
<Relay_log_info> = {
<Rpl_info> = {
<
err_lock = {
m_mutex = {
},
},
},
},
},
m_psi = 0x0
},
number = 0,
message = '\000' <repeats 1023 times>,
timestamp = '\000' <repeats 15 times>
},
},
members of Rpl_info:
data_lock = {
m_mutex = {
},
},
},
},
},
m_psi = 0x0
},
run_lock = {
},
},
},
},
},
m_psi = 0x0
},
sleep_lock = {
m_mutex = {
},
},
...
Ovais Tariq (ovais-tariq) wrote : | #9 |
So based on the backtrace this is what we have as the position up to which the workers have executed everything:
group_relay_
group_relay_log_pos = 616886798,
Looking at the relay log following are the events in the relay log starting at position 616886798:
# at 616886798
#140618 19:31:55 server id 11 end_log_pos 616886824 CRC32 0xca064602 GTID [commit=yes]
SET @@SESSION.
# at 616886846
#140618 19:31:55 server id 11 end_log_pos 616886911 CRC32 0x893a663b Query thread_id=708866417 exec_time=0 error_code=0
SET TIMESTAMP=
BEGIN
/*!*/;
# at 616886933
#140618 19:31:55 server id 11 end_log_pos 616887078 CRC32 0xb7f1ec9f Query thread_id=708866417 exec_time=0 error_code=0
SET TIMESTAMP=
UPDATE some_table ...
/*!*/;
# at 616887100
#140618 19:32:05 server id 6 end_log_pos 616887123 CRC32 0xfdd70dd7 Stop
DELIMITER ;
# End of log file
So after the UPDATE there is no COMMIT event, but a STOP event. A STOP event is written when mysqld is being shutdown. The above UPDATE is what the worker thread executed in the transaction that is still open.
So what happened was that while IO thread was writing an event group to the relay log, mysqld was shutdown and the event group was written partially.
When MySQL comes back up again, the log is rotated, and IO thread starts writing from the event group which was partially written. So the next log has the following event group:
#140618 19:31:55 server id 11 end_log_pos 616886824 CRC32 0xca064602 GTID [commit=yes]
SET @@SESSION.
# at 478
#140618 19:31:55 server id 11 end_log_pos 616886911 CRC32 0x893a663b Query thread_id=708866417 exec_time=0 error_code=0
SET TIMESTAMP=
SET @@session.
SET @@session.
SET @@session.
SET @@session.
/*!\C utf8mb4 *//*!*/;
SET @@session.
SET @@session.
SET @@session.
BEGIN
/*!*/;
# at 565
#140618 19:31:55 server id 11 end_log_pos 616887078 CRC32 0xb7f1ec9f Query thread_id=708866417 exec_time=0 error_code=0
use `db2`/*!*/;
SET TIMESTAMP=
UPDATE some_tbl ....
/*!*/;
# at 732
#140618 19:31:55 server id 11 end_log_pos 616887109 CRC32 0x8f5b2f7c Xid = 30416488792
COMMIT/*!*/;
You can see that the "end_log_pos" values are the same for the partial event I showed earlier and this one. It is the same UPDATE transaction that was partially written.
Now comes the interesting part. When the coordinator reads the relay log, it sends the partial event to the worker, since the event is partial hence the worker never commits the transaction and the transaction is kept open. The coordinator reads the next event (which is the full version of the partial event) but the coordinator can...
Ovais Tariq (ovais-tariq) wrote : | #10 |
Would need to see how partial events are handled by single-threaded replication and whats happens when SQL_thread sees a STOP event or the start of a new binlog. If SQL_thread does not execute rollback in such case, then that might mean executing the same transaction twice. Because in our case, after the UPDATE in the partial transaction, a BEGIN would be seen which automatically commit the previous partial transaction.
linzuxiong (linzuxiong1988) wrote : | #11 |
It also accours in Percona Server 5.6.16 .
Our data have MyISAM (with partition) and InnoDB.
sunny4715 (wanggongpu) wrote : | #12 |
when I use mysql-5.6.19 to debug the problem, got the heap/stack as the attach file.
gdb > thread apply all bt
this bug hang the slave SQL thread lead Seconds_
This is repeatable with upstream MySQL (see last comment at http://
Here is test case I am using to reproduce the stall
=== added file 'mysql-
--- mysql-test/
+++ mysql-test/
@@ -0,0 +1,3 @@
+--gtid_mode=ON
+--enforce_
+--log-
=== added file 'mysql-
--- mysql-test/
+++ mysql-test/
@@ -0,0 +1,4 @@
+--slave_
+--gtid_mode=ON
+--enforce_
+--log-
=== added file 'mysql-
--- mysql-test/
+++ mysql-test/
@@ -0,0 +1,38 @@
+--source include/
+--source include/
+--source include/
+
+--disable_
+call mtr.add_
+--enable_query_log
+
+create table a(id int) engine=InnoDB;
+start transaction;
+insert into a values(1);
+
+connection slave;
+set global debug="
+--exec echo "wait" > $MYSQLTEST_
+
+connection master;
+insert into a values(2);
+insert into a values(3);
+commit;
+
+connection slave;
+--source include/
+
+--let $rpl_server_number= 2
+--source include/
+--source include/
+
+connection master;
+sync_slave_
+select * from a;
+
+connection master;
+drop table a;
+
+sync_slave_
+
+--source include/rpl_end.inc
=== modified file 'sql/rpl_slave.cc'
--- sql/rpl_slave.cc 2014-06-04 14:48:06 +0000
+++ sql/rpl_slave.cc 2014-07-08 17:35:34 +0000
@@ -6370,6 +6370,12 @@
mysql_
DBUG_
}
+ DBUG_EXECUTE_
+ if (event_type == XID_EVENT)
+ {
+ DBUG_SUICIDE();
+ }
+ );
switch (event_type) {
case STOP_EVENT:
Relay log can contain partially written group at the end of log file in following cases:
1. Slave crashed, in this case group will be written again in the next file
2. STOP SALVE at the specific moment, group will be written again in the next file
3. Server shutdown at the specific moment, group will be written again in the next file
4. FLUSH LOGS, group will *not* be written again in the next file
We need to test all cases and make sure they all work correctly.
Ovais Tariq (ovais-tariq) wrote : | #16 |
There is a related fix in Facebook branch for partial transactions which might be worth checking:
https:/
Percona now uses JIRA for bug reports so this bug report is migrated to: https:/
thread apply all bt