Connection issues after promoting slave to master
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Server moved to https://jira.percona.com/projects/PS |
New
|
Undecided
|
Unassigned |
Bug Description
Percona server version: 5.6.21-69.0-675
OS version: Ubuntu 12.04.5
We have a MySQL host with 13 slaves. We switch masters by promoting one of the existing slaves. We switch using a script so it happens fairly quickly, and all slaves connect to the new master at the same time. All the slaves are caught up to the old master before their slave threads are stopped.
When all the slaves connect to the new master, the new master starts a binlog_dump to each slave. Shortly after the binlog_dump is started the slave will disconnect, wait for one minute, reconnect, start the binlog_dump, then disconnect/
This happens continuously for around 10-15 minutes, with less slaves having to reconnect each time. After 10-15 minutes they seem to have caught up with downloading the binlog, and they start working properly without intervention.
Stopping and starting the slave thread during the first 10-15 minutes does not seem to change the behaviour.
The master host has 6 binary log files of 1GB each.
While a slave is waiting to reconnect, we see this in SHOW SLAVE STATUS:
Slave_IO_State: Waiting to reconnect after a failed master event read
[...]
Slave_IO_Running: Connecting
In the attached log excerpt from the new master host (failover_log.txt) you can see the disconnect/
Changed in percona-server: | |
status: | Incomplete → New |
Thank you for the report.
This can be caused by networking error in case if too many events are sending same time when you just started master. Please check your network bandwidth and amount of data which server tries to receive. As workaround you can add new slaves one-by-one: start replication on slave 1, wait 15 minutes, then on slave 2 and so on.