Slave I/O thread won't attempt to automatically reconnect to the master / error-code 1593

Bug #1268735 reported by Agustín G on 2014-01-13
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MySQL Server
Unknown
Unknown
Percona Server
Status tracked in 5.6
5.1
Undecided
Vlad Lesin
5.5
Medium
Vlad Lesin
5.6
Undecided
Vlad Lesin

Bug Description

140111 17:02:01 [ERROR] Slave I/O: The slave I/O thread stops because SET @master_heartbeat_period on master failed. Error: , Error_code: 1593
140111 17:02:01 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.015318', position 887067847

The error in question is:

$ perror 1593
MySQL error code 1593 (ER_SLAVE_FATAL_ERROR): Fatal error: %s

Percona Server 5.5.29-29.4 is running on the affected slave. Looking at the source code for that release, the problematic code path appears to be:

1536 if (mysql_real_query(mysql, query, strlen(query))
1537 && !check_io_slave_killed(mi->io_thd, mi, NULL))
1538 {
1539 errmsg= "The slave I/O thread stops because SET @master_heartbeat_period "
1540 "on master failed.";
1541 err_code= ER_SLAVE_FATAL_ERROR;
1542 sprintf(err_buff, "%s Error: %s", errmsg, mysql_error(mysql));
1543 mysql_free_result(mysql_store_result(mysql));
1544 goto err;
1545 }
1546 mysql_free_result(mysql_store_result(mysql));

I believe this exhibits a bug. Instead of just assuming the error is fatal, it should do "is_network_error(mysql_errno(mysql))" and determine whether the slave thread should be restarted (this is done in Percona Server 5.6.13, for instance).

Additionally, since there is already an error code from mysql_real_query, should it be later overwritten with ER_SLAVE_FATAL_ERROR?

tags: added: upstream

Upstream 5.6 fixed by

5.6$ bzr log -r 2661.723.1
------------------------------------------------------------
revno: 2661.723.1
committer: Andrei Elkin <email address hidden>
branch nick: rep2-wl2540-checksum
timestamp: Fri 2010-05-28 12:47:19 +0300
message:
  wl#2540 replication checksum

  intermediate changeset implements the task w/o relying on FD (to be refined by the following patch) as well as with per-event (A) (should be removed from all but FD). The 3rd todo will be correct affected tests because of FD is going to be extended by (A) size of 1 bytte. Finally, merging with fixes for bug#49741 shall complete the show

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.