Percona Server moved to https://jira.percona.com/projects/PS

slave IO_THREAD blocks replication - sql thread

Series 5.1
Bug #1057087

Bug #1057087 reported by MF on 2012-09-26

10

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	MySQL Server	Unknown	Unknown	mysql-bugs #66868
	Percona Server moved to https://jira.percona.com/projects/PS	Invalid	Undecided	Unassigned
	5.1	Won't Fix	Medium	Unassigned
	5.5	Triaged	Medium	Unassigned
	5.6	Invalid	Undecided	Unassigned

Bug Description

After snapshot with Xtrabackup, slave was restored with logical lag about 6 hours. Replication started, no error. Than lag (by Seconds_Behind_Master, Relay_Master_Log_File and other variables from SHOW SLAVE STATUS) was only grew about ~30 min every 1 hour (this is core of reported problem). All master logs was transfered by IO thread immediately (1G LAN network). One day later lag was more than 24 hours. After
STOP SLAVE IO_THREAD;
was slave up2date during <1 hod (with Seconds_Behind_Master: 0).
Than IO_THREAD was started again and slave now runs about one day without any lag (there is monitoring).

Sumary
with running SLAVE IO_THREAD replication goes to grew lag and not catch master during 1 day.
Withhout running SLAVE IO_THREAD replication catch up master <1 hod.
Something wrong!?

Slave is Xeon E5 with SSD for data and SAS 10k for logs. Master is "only" Xeon 56xx with SAS 15k + SAS 10k. All in RAID 1. Acording to our deployment benchmark is slave faster than master (logicaly).
There wasn't any task on slave, which can slow down replication (as raid check, backup, ...).

Both are Percona-Server-server-55-5.5.27-rel28.1.296. Master is Centos 6.x and slave is Centos 5.x in up2date slave.

Revision history for this message

MF (fuxa-kos) wrote on 2012-09-26:

#1

during lag status was
slave
| 2 | system user | | | Connect | 10583 | Waiting for master to send event | | 0 | 0 | 1 |
| 3 | system user | | | Connect | 35127 | Reading event from the relay log | | 0 | 0 | 1 |

master
| 10174490 | repl_isp_pr | ...:43412 | | Binlog Dump | 10607 | Master has sent all binlog to slave; waiting for binlog to be updated |

Revision history for this message

Valerii Kravchuk (valerii-kravchuk) wrote on 2012-09-27:

#2

Looks related to http://bugs.mysql.com/bug.php?id=53167 (see my last comment there), http://bugs.mysql.com/bug.php?id=56363 and http://bugs.mysql.com/bug.php?id=66868. Looks like this happens at relay log rotation, so when no new relay logs appear there is no problem.

Revision history for this message

MF (fuxa-kos) wrote on 2012-09-27:

#3

I confirm, there was many relay log rotations. I look at master brunch bugs.

Revision history for this message

Valerii Kravchuk (valerii-kravchuk) wrote on 2013-05-03:

#4

Based on the upstream http://bugs.mysql.com/bug.php?id=66868 status and last comment, this is a confirmed bug in 5.1.x and 5.5.x, while in 5.6.x the problem is fixed in upstream.

Revision history for this message

Erez Zarum (erezzarum-q) wrote on 2013-05-16:

#5

It's seems as this is a critical bug, are there any estimation for a fix in 5.1.x?

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-05-17:

#6

Erez -

Unfortunately currently we don't have any estimate.

Revision history for this message

Erez Zarum (erezzarum-q) wrote on 2013-05-17:

#7

after upgrading to 5.1.68 my slaves became to lag for no reason up to yesterday when the lag went up to over 1000s (i have never had a lag for over 5s in the last 2 years, there were no changes in the schema/application), it's seems very critical, perhaps someone should write it in the release notes.

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-05-19:

#8

Erez -

What version did you upgrade from? I am not sure that bugs.mysql.com/bug.php?id=66868 is a regression introduced in 5.1.68.

Revision history for this message

Erez Zarum (erezzarum-q) wrote on 2013-06-08:

#9

Laurynas,
I have upgraded from 5.1.67 to 5.1.68 on the master, and the slaves from 5.1.66 to 5.1.68

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2013-06-10:

#10

Erez -

http://bugs.mysql.com/bug.php?id=66868 was not introduced between 5.1.66/67 and 5.1.68. Thus, since your replication started lagging after having performed an upgrade from 66/67 to 68, I wouldn't assume that the current bug is to blame and would troubleshoot the issue to determine the actual cause first.

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-25:

#11

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-2808

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.