Replication stall with multi-threaded replication

Bug #1331586 reported by Ovais Tariq
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MySQL Server
Unknown
Unknown
Percona Server moved to https://jira.percona.com/projects/PS
Fix Released
Medium
Sergei Glushchenko
5.6
Fix Released
Medium
Sergei Glushchenko

Bug Description

Percona Server version 5.6.17-66.0 compiled from source.

OS: SunOS 5.11 joyent_20131105T084235Z i86pc i386 i86pc Solaris

Periodically what happens every few days that the replication gets stalled with the coordinator thread in the state "Waiting for Slave Worker to release partition" and the worker threads in the state "Waiting for an event from Coordinator". At that point trying to stop the slave by executing "STOP SLAVE" also gets stalled, together with stalling all the further "SHOW SLAVE STAUS" calls.

Below is how typically the processlist looks in that particular case:
+------+-------------+---------------------+---------+---------+-------+-----------------------------------------------+-------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+------+-------------+---------------------+---------+---------+-------+-----------------------------------------------+-------------------+-----------+---------------+
| 2477 | root | localhost | NULL | Sleep | 33355 | | NULL | 1 | 46 |
| 2967 | system user | | NULL | Connect | 24248 | Waiting for master to send event | NULL | 0 | 0 |
| 2968 | system user | | NULL | Connect | 24085 | Waiting for Slave Worker to release partition | NULL | 0 | 0 |
| 2969 | system user | | NULL | Connect | 24248 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2970 | system user | | NULL | Connect | 24248 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2971 | system user | | NULL | Connect | 24248 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2972 | system user | | NULL | Connect | 24248 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2973 | system user | | NULL | Connect | 24248 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2974 | system user | | NULL | Connect | 84603 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2975 | system user | | NULL | Connect | 84603 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2976 | system user | | NULL | Connect | 84603 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 2977 | system user | | NULL | Connect | 84603 | Waiting for an event from Coordinator | NULL | 0 | 1 |
| 2978 | system user | | NULL | Connect | 84603 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 4268 | root | localhost | some_db | Query | 871 | Killing slave | stop slave | 0 | 0 |
| 4370 | root | localhost | some_db | Sleep | 4 | | NULL | 0 | 0 |
| 4377 | monitoring | 192.168.1.222:11159 | NULL | Query | 464 | init | show slave status | 0 | 0 |
| 4419 | root | localhost | NULL | Query | 0 | init | show processlist | 0 | 0 |
+------+-------------+---------------------+---------+---------+-------+-----------------------------------------------+-------------------+-----------+---------------+
17 rows in set (0.00 sec)

Looking into this further, the InnoDB status shows us that one of the worker thread actually has an open transaction:
---TRANSACTION 53684133006, ACTIVE 24245 sec
2 lock struct(s), heap size 360, 1 row lock(s), undo log entries 1
MySQL thread id 2977, OS thread handle 0x36, query id 56740540 Waiting for an event from Coordinator
TABLE LOCK table `db2`.`tbl_name` /* Partition `p24` */ trx id 53684133006 lock mode IX
RECORD LOCKS space id 3504 page no 23250 n bits 440 index `PRIMARY` of table `db2`.`tbl_name` /* Partition `p24` */ trx id 53684133006 lock_mode X locks rec but not gap

First off all the transaction is a replicated transaction being executed by the worker thread and is still open for an unknown reason, while holding only a single row lock.
Secondly, apparently the coordinator is blocked because the next even to apply is also to the `db2` database and hence it must wait for the worker thread id 2977 to commit the transaction. This is based on the implementation described here http://andreithedolphin.blogspot.com/2012/10/parallel-slave-in-mysql-replication.html

The MySQL configuration is as follows:
[mysqld]
core-file
user = mysql
port = 3306
datadir = /db
socket = /tmp/mysql.sock
default-storage-engine = innodb
skip-external-locking
log_warnings=1
skip_name_resolve
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
group_concat_max_len = 1000000

server-id = 666

# InnoDB settings
innodb_data_home_dir = /db
innodb_log_group_home_dir = /db
innodb_data_file_path = ibdata1:100M:autoextend
innodb_buffer_pool_size = 20G
innodb_buffer_pool_instances = 8
innodb_log_file_size = 10G
innodb_buffer_pool_load_at_startup=OFF
innodb_buffer_pool_dump_at_shutdown=OFF
# allows for row_format=compressed
innodb_file_format=Barracuda
innodb_log_buffer_size = 64M
innodb_flush_log_at_trx_commit = 0
innodb_lock_wait_timeout = 50
innodb_file_per_table
innodb_doublewrite = 0
innodb_io_capacity = 1200
innodb_read_io_threads = 6
innodb_write_io_threads = 6
innodb_stats_on_metadata = OFF

# Slow query log settings
# The default logs all full table scans,tmp tables,filesorts on disk queries
#use_global_long_query_time = 1
#long_query_time = 0.5
slow_query_log_file = slowquery.log
slow_query_log = 1
long_query_time = 3
log_slow_filter = "full_scan,tmp_table_on_disk,filesort_on_disk"
log_slow_verbosity = "full"

# Other general MySQL settings
sync_binlog = 0
query_cache_type = 0
query_cache_size = 0
max_connections = 3000
thread_cache_size = 1000
back_log = 1024
thread_concurrency = 32
innodb_thread_concurrency = 64
tmpdir = /var/tmp
max_allowed_packet = 24M
max_join_size = 4294967295
net_buffer_length = 2K
thread_stack = 512K
tmp_table_size = 64M
max_heap_table_size = 64M
table_open_cache = 2000

# Replication settings (master to slave)
binlog_cache_size = 2M
binlog_format=mixed
log-bin = bin
log-error = error.log
expire_logs_days = 5
slave-parallel-workers = 10
master-info-repository = "table"
relay-log-info-repository = "table"
gtid_mode = ON
enforce_gtid_consistency = true
log-slave-updates
replicate-ignore-table = mysql.user
replicate-ignore-table = mysql.db
replicate-ignore-table = mysql.tables_priv
replicate-ignore-table = mysql.proxies_priv

# Started tuning slave catchup speed, can use more research
slave-checkpoint-period = 1000
slave-checkpoint-group = 2048

I will be attaching the GDB stack traces in a follow up comment here

Related branches

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

thread apply all bt

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

thread apply all bt full

tags: added: i43091
description: updated
Revision history for this message
Valerii Kravchuk (valerii-kravchuk) wrote :

Looks similar to upstream http://bugs.mysql.com/bug.php?id=72537 (check processlist attached there).

tags: added: upstream
Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

@Valeri,

The bug http://bugs.mysql.com/bug.php?id=72537 links the issue to the table being MyISAM but in my case the table is InnoDB. So this might be a more generic storage-engine independent bug.

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :
Download full text (3.2 KiB)

So I have another instance of a slave stuck because one of the worker has a transaction open for 22367 seconds and the coordinator is waiting on the worker to finish.

+-----+-------------+-----------+------+---------+-------+-----------------------------------------------+------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+-----+-------------+-----------+------+---------+-------+-----------------------------------------------+------------------+-----------+---------------+
| 91 | system user | | NULL | Connect | 35042 | Waiting for master to send event | NULL | 0 | 0 |
| 92 | system user | | NULL | Connect | 22356 | Waiting for Slave Worker to release partition | NULL | 0 | 0 |
| 93 | system user | | NULL | Connect | 35042 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 94 | system user | | NULL | Connect | 35042 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 95 | system user | | NULL | Connect | 35042 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 96 | system user | | NULL | Connect | 35042 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 97 | system user | | NULL | Connect | 35042 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 98 | system user | | NULL | Connect | 71771 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 99 | system user | | NULL | Connect | 71771 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 100 | system user | | NULL | Connect | 71771 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 101 | system user | | NULL | Connect | 71771 | Waiting for an event from Coordinator | NULL | 0 | 1 |
| 102 | system user | | NULL | Connect | 71771 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 176 | root | localhost | NULL | Query | 0 | init | show processlist | 0 | 0 |
+-----+-------------+-----------+------+---------+-------+-----------------------------------------------+------------------+-----------+---------------+

---TRANSACTION 53789616362, ACTIVE 22367 sec
2 lock struct(s), heap size 360, 1 row lock(s), undo log entries 1
MySQL thread id 101, OS thread handle 0x39, query id 95380210 Waiting for an event from Coordinator
TABLE LOCK table `db2`.`some_tbl` /* Partition `p5` */ trx id 53789616362 lock mode IX
RECORD LOCKS space id 3485 page no 16641 n bits 424 index `PRIMARY` of table `db2`.`some_tbl` /* Partition `p5` *...

Read more...

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :
Download full text (39.7 KiB)

More details on the state of the coordinator thread:

set print pretty on
p *w_entry

$9 = {
  <Relay_log_info> = {
    <Rpl_info> = {
      <Slave_reporting_capability> = {
        _vptr.Slave_reporting_capability = 0x1146d30 <vtable for Slave_worker+16>,
        err_lock = {
          m_mutex = {
            __pthread_mutex_flags = {
              __pthread_mutex_flag1 = 4,
              __pthread_mutex_flag2 = 0 '\000',
              __pthread_mutex_ceiling = 0 '\000',
              __pthread_mutex_type = 0,
              __pthread_mutex_magic = 19800
            },
            __pthread_mutex_lock = {
              __pthread_mutex_lock64 = {
                __pthread_mutex_pad = "\000\000\000\000\000\000\000"
              },
              __pthread_mutex_lock32 = {
                __pthread_ownerpid = 0,
                __pthread_lockword = 0
              },
              __pthread_mutex_owner64 = 0
            },
            __pthread_mutex_data = 0
          },
          m_psi = 0x0
        },
        m_last_error = {
          number = 0,
          message = '\000' <repeats 1023 times>,
          timestamp = '\000' <repeats 15 times>
        },
        m_thread_name = 0x10b0199 "SQL"
      },
      members of Rpl_info:
      data_lock = {
        m_mutex = {
          __pthread_mutex_flags = {
            __pthread_mutex_flag1 = 4,
            __pthread_mutex_flag2 = 0 '\000',
            __pthread_mutex_ceiling = 0 '\000',
            __pthread_mutex_type = 0,
            __pthread_mutex_magic = 19800
          },
          __pthread_mutex_lock = {
            __pthread_mutex_lock64 = {
              __pthread_mutex_pad = "\000\000\000\000\000\000\000"
            },
            __pthread_mutex_lock32 = {
              __pthread_ownerpid = 0,
              __pthread_lockword = 0
            },
            __pthread_mutex_owner64 = 0
          },
          __pthread_mutex_data = 0
        },
        m_psi = 0x0
      },
      run_lock = {
                m_mutex = {
          __pthread_mutex_flags = {
            __pthread_mutex_flag1 = 4,
            __pthread_mutex_flag2 = 0 '\000',
            __pthread_mutex_ceiling = 0 '\000',
            __pthread_mutex_type = 0,
            __pthread_mutex_magic = 19800
          },
          __pthread_mutex_lock = {
            __pthread_mutex_lock64 = {
              __pthread_mutex_pad = "\000\000\000\000\000\000\000"
            },
            __pthread_mutex_lock32 = {
              __pthread_ownerpid = 0,
              __pthread_lockword = 0
            },
            __pthread_mutex_owner64 = 0
          },
          __pthread_mutex_data = 0
        },
        m_psi = 0x0
      },
      sleep_lock = {
        m_mutex = {
          __pthread_mutex_flags = {
            __pthread_mutex_flag1 = 4,
            __pthread_mutex_flag2 = 0 '\000',
            __pthread_mutex_ceiling = 0 '\000',
            __pthread_mutex_type = 0,
            __pthread_mutex_magic = 19800
          },
          __pthread_mutex_lock = {
            __pthread_mutex_lock64 = {
              __pthread_mutex_pad = "\000\000\000\000\000\000\000"
            },
           ...

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :
Download full text (3.4 KiB)

So based on the backtrace this is what we have as the position up to which the workers have executed everything:
group_relay_log_name = "./relay-bin.000973", '\000' <repeats 482 times>,
group_relay_log_pos = 616886798,

Looking at the relay log following are the events in the relay log starting at position 616886798:
# at 616886798
#140618 19:31:55 server id 11 end_log_pos 616886824 CRC32 0xca064602 GTID [commit=yes]
SET @@SESSION.GTID_NEXT= 'da2d1223-8151-11e3-8e1d-997ca3babb7c:4367700088'/*!*/;
# at 616886846
#140618 19:31:55 server id 11 end_log_pos 616886911 CRC32 0x893a663b Query thread_id=708866417 exec_time=0 error_code=0
SET TIMESTAMP=1403119915/*!*/;
BEGIN
/*!*/;
# at 616886933
#140618 19:31:55 server id 11 end_log_pos 616887078 CRC32 0xb7f1ec9f Query thread_id=708866417 exec_time=0 error_code=0
SET TIMESTAMP=1403119915/*!*/;
UPDATE some_table ...
/*!*/;
# at 616887100
#140618 19:32:05 server id 6 end_log_pos 616887123 CRC32 0xfdd70dd7 Stop
DELIMITER ;
# End of log file

So after the UPDATE there is no COMMIT event, but a STOP event. A STOP event is written when mysqld is being shutdown. The above UPDATE is what the worker thread executed in the transaction that is still open.

So what happened was that while IO thread was writing an event group to the relay log, mysqld was shutdown and the event group was written partially.

When MySQL comes back up again, the log is rotated, and IO thread starts writing from the event group which was partially written. So the next log has the following event group:
#140618 19:31:55 server id 11 end_log_pos 616886824 CRC32 0xca064602 GTID [commit=yes]
SET @@SESSION.GTID_NEXT= 'da2d1223-8151-11e3-8e1d-997ca3babb7c:4367700088'/*!*/;
# at 478
#140618 19:31:55 server id 11 end_log_pos 616886911 CRC32 0x893a663b Query thread_id=708866417 exec_time=0 error_code=0
SET TIMESTAMP=1403119915/*!*/;
SET @@session.pseudo_thread_id=708866417/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1073741824/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8mb4 *//*!*/;
SET @@session.character_set_client=45,@@session.collation_connection=45,@@session.collation_server=224/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 565
#140618 19:31:55 server id 11 end_log_pos 616887078 CRC32 0xb7f1ec9f Query thread_id=708866417 exec_time=0 error_code=0
use `db2`/*!*/;
SET TIMESTAMP=1403119915/*!*/;
UPDATE some_tbl ....
/*!*/;
# at 732
#140618 19:31:55 server id 11 end_log_pos 616887109 CRC32 0x8f5b2f7c Xid = 30416488792
COMMIT/*!*/;

You can see that the "end_log_pos" values are the same for the partial event I showed earlier and this one. It is the same UPDATE transaction that was partially written.

Now comes the interesting part. When the coordinator reads the relay log, it sends the partial event to the worker, since the event is partial hence the worker never commits the transaction and the transaction is kept open. The coordinator reads the next event (which is the full version of the partial event) but the coordinator can...

Read more...

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

Would need to see how partial events are handled by single-threaded replication and whats happens when SQL_thread sees a STOP event or the start of a new binlog. If SQL_thread does not execute rollback in such case, then that might mean executing the same transaction twice. Because in our case, after the UPDATE in the partial transaction, a BEGIN would be seen which automatically commit the previous partial transaction.

Revision history for this message
linzuxiong (linzuxiong1988) wrote :

It also accours in Percona Server 5.6.16 .
Our data have MyISAM (with partition) and InnoDB.

Revision history for this message
sunny4715 (wanggongpu) wrote :

when I use mysql-5.6.19 to debug the problem, got the heap/stack as the attach file.
gdb > thread apply all bt

this bug hang the slave SQL thread lead Seconds_Behind_Master more and more big.

Revision history for this message
Valerii Kravchuk (valerii-kravchuk) wrote :

This is repeatable with upstream MySQL (see last comment at http://bugs.mysql.com/bug.php?id=73066)

Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

Here is test case I am using to reproduce the stall

=== added file 'mysql-test/suite/rpl/t/percona_bug1331586-master.opt'
--- mysql-test/suite/rpl/t/percona_bug1331586-master.opt 1970-01-01 00:00:00 +0000
+++ mysql-test/suite/rpl/t/percona_bug1331586-master.opt 2014-07-08 17:35:34 +0000
@@ -0,0 +1,3 @@
+--gtid_mode=ON
+--enforce_gtid_consistency=true
+--log-slave-updates

=== added file 'mysql-test/suite/rpl/t/percona_bug1331586-slave.opt'
--- mysql-test/suite/rpl/t/percona_bug1331586-slave.opt 1970-01-01 00:00:00 +0000
+++ mysql-test/suite/rpl/t/percona_bug1331586-slave.opt 2014-07-08 17:35:34 +0000
@@ -0,0 +1,4 @@
+--slave_parallel_workers=8
+--gtid_mode=ON
+--enforce_gtid_consistency=true
+--log-slave-updates

=== added file 'mysql-test/suite/rpl/t/percona_bug1331586.test'
--- mysql-test/suite/rpl/t/percona_bug1331586.test 1970-01-01 00:00:00 +0000
+++ mysql-test/suite/rpl/t/percona_bug1331586.test 2014-07-08 17:35:34 +0000
@@ -0,0 +1,38 @@
+--source include/have_debug.inc
+--source include/master-slave.inc
+--source include/have_binlog_format_statement.inc
+
+--disable_query_log
+call mtr.add_suppression("Slave:*");
+--enable_query_log
+
+create table a(id int) engine=InnoDB;
+start transaction;
+insert into a values(1);
+
+connection slave;
+set global debug="d,crash_xid_log_event";
+--exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.2.expect
+
+connection master;
+insert into a values(2);
+insert into a values(3);
+commit;
+
+connection slave;
+--source include/wait_until_disconnected.inc
+
+--let $rpl_server_number= 2
+--source include/rpl_start_server.inc
+--source include/start_slave.inc
+
+connection master;
+sync_slave_with_master;
+select * from a;
+
+connection master;
+drop table a;
+
+sync_slave_with_master;
+
+--source include/rpl_end.inc

=== modified file 'sql/rpl_slave.cc'
--- sql/rpl_slave.cc 2014-06-04 14:48:06 +0000
+++ sql/rpl_slave.cc 2014-07-08 17:35:34 +0000
@@ -6370,6 +6370,12 @@
     mysql_mutex_unlock(&mi->data_lock);
     DBUG_RETURN(ret);
   }
+ DBUG_EXECUTE_IF("crash_xid_log_event",
+ if (event_type == XID_EVENT)
+ {
+ DBUG_SUICIDE();
+ }
+ );

   switch (event_type) {
   case STOP_EVENT:

Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

Relay log can contain partially written group at the end of log file in following cases:

1. Slave crashed, in this case group will be written again in the next file
2. STOP SALVE at the specific moment, group will be written again in the next file
3. Server shutdown at the specific moment, group will be written again in the next file
4. FLUSH LOGS, group will *not* be written again in the next file

We need to test all cases and make sure they all work correctly.

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

There is a related fix in Facebook branch for partial transactions which might be worth checking:
https://github.com/facebook/mysql-5.6/commit/9bcc118bee733a53e871867443d611e0606ee750

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-1511

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.