Slave IO_THREAD Loses State When Connection is Killed on Master
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
MySQL Server |
Unknown
|
Unknown
|
||||
Percona Server moved to https://jira.percona.com/projects/PS | Status tracked in 5.7 | |||||
5.1 |
Won't Fix
|
Medium
|
Unassigned | |||
5.5 |
Triaged
|
Medium
|
Unassigned | |||
5.6 |
Triaged
|
Medium
|
Unassigned | |||
5.7 |
Triaged
|
Medium
|
Unassigned |
Bug Description
Description:
When the IO_THREAD is killed on the master, somehow the slave loses state i.e. it does not seem to maintain a connection to the master and times out based on slave_net_timeout.
How to repeat:
Setup a simple master-slave, for testing set slave_net_timeout on slave to 10seconds or shorter then, kill the corresponding IO_THREAD connection on the master. With log_warnings = 2 on the master you should see frequent reconnects of the IO_THREAD.
On the slave:
node2 [localhost] {msandbox} ((none)) > show slave status \G
*******
Replicate
Replicate_
Master_
Replicate_
1 row in set (0.00 sec)
node2 [localhost] {msandbox} ((none)) > set global slave_net_timeout = 10;
Query OK, 0 rows affected, 1 warning (0.00 sec)
node2 [localhost] {msandbox} ((none)) >
On the master:
node1 [localhost] {msandbox} ((none)) > set global log_warnings = 2;
Query OK, 0 rows affected (0.00 sec)
node1 [localhost] {msandbox} ((none)) > show processlist;
+----+-
| Id | User | Host | db | Command | Time | State | Info |
+----+-
| 6 | system user | | NULL | Connect | 325 | Waiting for master to send event | NULL |
| 7 | system user | | NULL | Connect | 325 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL |
| 8 | rsandbox | localhost:44899 | NULL | Binlog Dump | 325 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL |
| 9 | msandbox | localhost | NULL | Query | 0 | NULL | show processlist |
+----+-
4 rows in set (0.00 sec)
node1 [localhost] {msandbox} ((none)) > kill 8;
Query OK, 0 rows affected (0.00 sec)
node1 [localhost] {msandbox} ((none)) > show processlist;
+----+-
| Id | User | Host | db | Command | Time | State | Info |
+----+-
| 6 | system user | | NULL | Connect | 337 | Waiting for master to send event | NULL |
| 7 | system user | | NULL | Connect | 337 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL |
| 9 | msandbox | localhost | NULL | Query | 0 | NULL | show processlist |
| 10 | rsandbox | localhost:44906 | NULL | Binlog Dump | 2 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL |
+----+-
4 rows in set (0.00 sec)
node1 [localhost] {msandbox} ((none)) > \q
Bye
[revin@forge rcsandbox_5_5_31]$ tail -f node1/data/
130521 23:15:33 [Note] Server socket created on IP: '127.0.0.1'.
130521 23:15:33 [Note] Event Scheduler: Loaded 0 events
130521 23:15:33 [Note] /wok/bin/
Version: '5.5.31-log' socket: '/tmp/mysql_
130521 23:15:36 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-
130521 23:15:36 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='', master_port='3306', master_log_file='', master_log_pos='4'. New state master_
130521 23:15:36 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './mysql_
130521 23:15:36 [Note] Slave I/O thread: connected to master 'rsandbox@
130521 23:21:11 [Note] Start binlog_dump to slave_server(102), pos(mysql-
130521 23:21:21 [Note] Start binlog_dump to slave_server(102), pos(mysql-
130521 23:21:31 [Note] Start binlog_dump to slave_server(102), pos(mysql-
130521 23:21:41 [Note] Start binlog_dump to slave_server(102), pos(mysql-
^C
[revin@forge rcsandbox_5_5_31]$
Suggested fix:
Restarting the slave in this case fixes the problem.
Upstream report http:// bugs.mysql. com/bug. php?id= 69300