Slave IO_THREAD Loses State When Connection is Killed on Master

Bug #1182719 reported by Jervin R
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL Server
Unknown
Unknown
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.1
Won't Fix
Medium
Unassigned
5.5
Triaged
Medium
Unassigned
5.6
Triaged
Medium
Unassigned
5.7
Triaged
Medium
Unassigned

Bug Description

Description:
When the IO_THREAD is killed on the master, somehow the slave loses state i.e. it does not seem to maintain a connection to the master and times out based on slave_net_timeout.

How to repeat:
Setup a simple master-slave, for testing set slave_net_timeout on slave to 10seconds or shorter then, kill the corresponding IO_THREAD connection on the master. With log_warnings = 2 on the master you should see frequent reconnects of the IO_THREAD.

On the slave:

node2 [localhost] {msandbox} ((none)) > show slave status \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 55311
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 4205
               Relay_Log_File: mysql_sandbox55312-relay-bin.000002
                Relay_Log_Pos: 2302
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 4205
              Relay_Log_Space: 2471
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 101
1 row in set (0.00 sec)

node2 [localhost] {msandbox} ((none)) > set global slave_net_timeout = 10;
Query OK, 0 rows affected, 1 warning (0.00 sec)

node2 [localhost] {msandbox} ((none)) >

On the master:

node1 [localhost] {msandbox} ((none)) > set global log_warnings = 2;
Query OK, 0 rows affected (0.00 sec)

node1 [localhost] {msandbox} ((none)) > show processlist;
+----+-------------+-----------------+------+-------------+------+-----------------------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------------+------+-------------+------+-----------------------------------------------------------------------------+------------------+
| 6 | system user | | NULL | Connect | 325 | Waiting for master to send event | NULL |
| 7 | system user | | NULL | Connect | 325 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL |
| 8 | rsandbox | localhost:44899 | NULL | Binlog Dump | 325 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL |
| 9 | msandbox | localhost | NULL | Query | 0 | NULL | show processlist |
+----+-------------+-----------------+------+-------------+------+-----------------------------------------------------------------------------+------------------+
4 rows in set (0.00 sec)

node1 [localhost] {msandbox} ((none)) > kill 8;
Query OK, 0 rows affected (0.00 sec)

node1 [localhost] {msandbox} ((none)) > show processlist;
+----+-------------+-----------------+------+-------------+------+-----------------------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------------+------+-------------+------+-----------------------------------------------------------------------------+------------------+
| 6 | system user | | NULL | Connect | 337 | Waiting for master to send event | NULL |
| 7 | system user | | NULL | Connect | 337 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL |
| 9 | msandbox | localhost | NULL | Query | 0 | NULL | show processlist |
| 10 | rsandbox | localhost:44906 | NULL | Binlog Dump | 2 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL |
+----+-------------+-----------------+------+-------------+------+-----------------------------------------------------------------------------+------------------+
4 rows in set (0.00 sec)

node1 [localhost] {msandbox} ((none)) > \q
Bye
[revin@forge rcsandbox_5_5_31]$ tail -f node1/data/msandbox.err
130521 23:15:33 [Note] Server socket created on IP: '127.0.0.1'.
130521 23:15:33 [Note] Event Scheduler: Loaded 0 events
130521 23:15:33 [Note] /wok/bin/mysql/5.5.31/bin/mysqld: ready for connections.
Version: '5.5.31-log' socket: '/tmp/mysql_sandbox55311.sock' port: 55311 MySQL Community Server (GPL)
130521 23:15:36 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysql_sandbox55311-relay-bin' to avoid this problem.
130521 23:15:36 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='', master_port='3306', master_log_file='', master_log_pos='4'. New state master_host='127.0.0.1', master_port='55312', master_log_file='', master_log_pos='4'.
130521 23:15:36 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './mysql_sandbox55311-relay-bin.000001' position: 4
130521 23:15:36 [Note] Slave I/O thread: connected to master 'rsandbox@127.0.0.1:55312',replication started in log 'FIRST' at position 4
130521 23:21:11 [Note] Start binlog_dump to slave_server(102), pos(mysql-bin.000001, 4205)
130521 23:21:21 [Note] Start binlog_dump to slave_server(102), pos(mysql-bin.000001, 4205)
130521 23:21:31 [Note] Start binlog_dump to slave_server(102), pos(mysql-bin.000001, 4205)
130521 23:21:41 [Note] Start binlog_dump to slave_server(102), pos(mysql-bin.000001, 4205)
^C
[revin@forge rcsandbox_5_5_31]$

Suggested fix:
Restarting the slave in this case fixes the problem.

Revision history for this message
Jervin R (revin) wrote :
Jervin R (revin)
tags: added: i31639
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Jervin -

Please verify 5.1 and 5.6 as well,

tags: added: upstream
Revision history for this message
Jervin R (revin) wrote :

Verified on 5.1.69, though with log_warnings = 2, the binlog dump reconnect is not logged but you can see the IO thread on the master incrementing every 10seconds:

node1 [localhost] {msandbox} ((none)) > pager grep 'Binlog Dump'
PAGER set to 'grep 'Binlog Dump''
node1 [localhost] {msandbox} ((none)) > show processlist;
| 17 | rsandbox | localhost:45141 | NULL | Binlog Dump | 4 | Has sent all binlog to slave; waiting for binlog to be updated | NULL |
4 rows in set (0.00 sec)

node1 [localhost] {msandbox} ((none)) > show processlist;
| 18 | rsandbox | localhost:45142 | NULL | Binlog Dump | 3 | Has sent all binlog to slave; waiting for binlog to be updated | NULL |
4 rows in set (0.00 sec)

node1 [localhost] {msandbox} ((none)) > show processlist; select sleep(10); show processlist; select sleep(10); show processlist;
| 21 | rsandbox | localhost:45145 | NULL | Binlog Dump | 6 | Has sent all binlog to slave; waiting for binlog to be updated | NULL |
4 rows in set (0.00 sec)

1 row in set (10.00 sec)

| 22 | rsandbox | localhost:45146 | NULL | Binlog Dump | 6 | Has sent all binlog to slave; waiting for binlog to be updated | NULL |
4 rows in set (0.00 sec)

1 row in set (10.00 sec)

| 23 | rsandbox | localhost:45148 | NULL | Binlog Dump | 6 | Has sent all binlog to slave; waiting for binlog to be updated | NULL |
4 rows in set (0.00 sec)

Revision history for this message
Jervin R (revin) wrote :

Verified just the same on 5.6.11:

[revin@forge rcsandbox_5_6_11]$ tail -f node1/data/msandbox.err
2013-05-22 00:03:32 24529 [Note] /wok/bin/mysql/5.6.11/bin/mysqld: ready for connections.
Version: '5.6.11-log' socket: '/tmp/mysql_sandbox56111.sock' port: 56111 MySQL Community Server (GPL)
2013-05-22 00:03:39 24529 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysql_sandbox56111-relay-bin' to avoid this problem.
2013-05-22 00:03:39 24529 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='127.0.0.1', master_port= 56112, master_log_file='', master_log_pos= 4, master_bind=''.
2013-05-22 00:03:39 24529 [Warning] Storing MySQL user name or password information in the master.info repository is not secure and is therefore not recommended. Please see the MySQL Manual for more about this issue and possible alternatives.
2013-05-22 00:03:39 24529 [Note] Slave I/O thread: connected to master 'rsandbox@127.0.0.1:56112',replication started in log 'FIRST' at position 4
2013-05-22 00:03:39 24529 [Warning] Slave SQL: If a crash happens this configuration does not guarantee that the relay log info will be consistent, Error_code: 0
2013-05-22 00:03:39 24529 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './mysql_sandbox56111-relay-bin.000001' position: 4
2013-05-22 00:04:51 24529 [Note] Start binlog_dump to master_thread_id(10) slave_server(102), pos(mysql-bin.000001, 5210)
2013-05-22 00:05:01 24529 [Note] Start binlog_dump to master_thread_id(11) slave_server(102), pos(mysql-bin.000001, 5210)
2013-05-22 00:05:11 24529 [Note] Start binlog_dump to master_thread_id(12) slave_server(102), pos(mysql-bin.000001, 5210)
^C
[revin@forge rcsandbox_5_6_11]$

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-95

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.