Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

Asynchronous slave thread remains stopped after node re-joins the cluster

Bug #1288479 reported by markus_albe on 2014-03-06

6

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	MySQL patches by Codership	Fix Released	High	Seppo Jaakola	MySQL patches by Codership 5.5.37-25.10
	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Status tracked in 5.6
	5.5	Fix Released	Undecided	Unassigned	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC 5.5.37-25.10
	5.6	Fix Released	Undecided	Unassigned	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC 5.6.19-25.6

Bug Description

When a node goes into non-primary state asynchronous slave thread is halted, but then remains stopped after node successfully re-joins the cluster.

It was mentioned by Seppo that "node could restart slave thread after he joined back in cluster", but he also mentioned a caveat: "automatic slave restart can be dangerous also, DBA may have started slave in some other cluster node during the node absense".

Thus, the ideal fix would be a switch "async_slave_auto_restart" or similarly named that would give DBAs the freedom to have the thread stared automatically when node joins the cluster after going into non-primary state.

Tags:

markus_albe (markus-albe) on 2014-03-06

tags:

added: i39956

Seppo Jaakola (seppo-jaakola) on 2014-03-20

Changed in codership-mysql:
assignee:	nobody → Seppo Jaakola (seppo-jaakola)
importance:	Undecided → High
status:	New → In Progress
milestone:	none → 5.5.36-25.10

Revision history for this message

Seppo Jaakola (seppo-jaakola) wrote on 2014-03-20:

#1

Replication slave thread is not stopped automatically when node changes to non primary state, but the slave thread stopping happens when next replication event tries to apply. Node in non primary state will return 'Unknown Error' to slave applier, and slave thread then decides to stop, error message in the log is:

140320 23:56:53 [ERROR] Slave SQL: Error 'Unknown command' on query. Default database: 'test'. Query: 'BEGIN', Error_code: 1047
140320 23:56:53 [Warning] Slave: Unknown command Error_code: 1047
140320 23:56:53 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysqlbin.000001' position 555

Revision history for this message

Seppo Jaakola (seppo-jaakola) wrote on 2014-03-24:

#2

Created an optional fix for this. f wsrep_restart_slave is set (default is unset), then mysql will automatically restart slave, which was stopped due to node going non primary. Slave restart will happen when node joins back to cluster, or was bootstrapped to start new cluster.

Fix was pushed in wsrep-5.5 revision: http://bazaar.launchpad.net/~codership/codership-mysql/wsrep-5.5/revision/3967

Revision history for this message

Seppo Jaakola (seppo-jaakola) wrote on 2014-03-24:

#3

Fix merged in wsrep-5.6, in revision: http://bazaar.launchpad.net/~codership/codership-mysql/5.6/revision/4065

Changed in codership-mysql:
status:	In Progress → Fix Committed
no longer affects:	galera

Revision history for this message

Seppo Jaakola (seppo-jaakola) wrote on 2014-03-26:

#4

Pushed a tuned fix, which takes in consideration that node state may have turned back primary, before the slave error is handled. Pushed fixes for revisions:
wsrep-5.5: http://bazaar.launchpad.net/~codership/codership-mysql/wsrep-5.5/revision/3969
wsrep-5.6: http://bazaar.launchpad.net/~codership/codership-mysql/5.6/revision/4066

Alex Yurchenko (ayurchen) on 2014-05-14

Changed in codership-mysql:
status:	Fix Committed → Fix Released

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

#5

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1637

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.