kolla

HAproxy does not consider wsrep_local_state when considering which db nodes/containers can accept traffic

Bug #1578752 reported by Mark Casey on 2016-05-05

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	kolla	Fix Released	Undecided	Mark Casey

Bug Description

If/when a maria db node/container ("node" from here on, but I mean container) fails and is restarted it will run a state transfer from a donor node to get back into sync with the cluster. The sync may be an incremental sync (if the container has persistence on the datadir [/var/lib/mysql or similar]) or, if the data that has changed in the interim is too large, a complete retransfer of all data.

Depending on the config this may make both the recovering node and the donor node fail any queries that are sent to them. This is a problem because although these nodes will not execute queries they will (*LAST I CHECKED) allow incoming DB connections, which means that HAProxy's mysql-check will not fail and the nodes will not be removed from the pool of healthy nodes.

One solution is to run a separate HTTP daemon with has access via a dedicated user to check the mysql variable "wsrep_local_state" on each node. This daemon will return 200 OK if all is well and 503 Service Unavailable if the node in question is currently receiving from or acting as a donor in a Galera State Transfer.

In this way the only queries which fail are those that HAProxy routes to the failed node between the time it fails and the next HAProxy call to this "checking daemon."

This repo (https://github.com/olafz/percona-clustercheck) shows one such setup that uses xinetd, though obviously this could be done behind any HTTP daemon as a "sidecar container" to each db node.

See original description

Mark Casey (mark-casey) on 2016-05-05

description:	updated
Changed in kolla:
assignee:	nobody → Mark Casey (mark-casey)

Mark Casey (mark-casey) on 2016-05-09

description:

updated

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-27: Related fix proposed to kolla (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/322200

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-03: Fix merged to kolla (master)

Reviewed: https://review.openstack.org/322200
Committed: https://git.openstack.org/cgit/openstack/kolla/commit/?id=b4759b280c6cfad602ad9b4084d4a4f1f5f1ff2f
Submitter: Jenkins
Branch: master

commit b4759b280c6cfad602ad9b4084d4a4f1f5f1ff2f
Author: Ettore Simone <email address hidden>
Date: Fri May 27 16:13:54 2016 +0200

Enable HAProxy consider MariaDB wsrep_local_state

This patch enable wsrep_notify_cmd to rename haproxy user in haproxy_blocked
when the node is not ready to serve and restore it when ready.

    Change-Id: I4f49960d7ff2fa689d6ea730b2574f16f083edc1
    Closes-Bug: 1578752
    Closes-Bug: 1587752

Changed in kolla:
status:	New → Fix Released

Revision history for this message

Doug Hellmann (doug-hellmann) wrote on 2016-07-14: Fix included in openstack/kolla 3.0.0.0b2

This issue was fixed in the openstack/kolla 3.0.0.0b2 development milestone.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.