HAproxy does not consider wsrep_local_state when considering which db nodes/containers can accept traffic

Bug #1578752 reported by Mark Casey
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
kolla
Fix Released
Undecided
Mark Casey

Bug Description

If/when a maria db node/container ("node" from here on, but I mean container) fails and is restarted it will run a state transfer from a donor node to get back into sync with the cluster. The sync may be an incremental sync (if the container has persistence on the datadir [/var/lib/mysql or similar]) or, if the data that has changed in the interim is too large, a complete retransfer of all data.

Depending on the config this may make both the recovering node and the donor node fail any queries that are sent to them. This is a problem because although these nodes will not execute queries they will (*LAST I CHECKED) allow incoming DB connections, which means that HAProxy's mysql-check will not fail and the nodes will not be removed from the pool of healthy nodes.

One solution is to run a separate HTTP daemon with has access via a dedicated user to check the mysql variable "wsrep_local_state" on each node. This daemon will return 200 OK if all is well and 503 Service Unavailable if the node in question is currently receiving from or acting as a donor in a Galera State Transfer.

In this way the only queries which fail are those that HAProxy routes to the failed node between the time it fails and the next HAProxy call to this "checking daemon."

This repo (https://github.com/olafz/percona-clustercheck) shows one such setup that uses xinetd, though obviously this could be done behind any HTTP daemon as a "sidecar container" to each db node.

Mark Casey (mark-casey)
description: updated
Changed in kolla:
assignee: nobody → Mark Casey (mark-casey)
Mark Casey (mark-casey)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/322200

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla (master)

Reviewed: https://review.openstack.org/322200
Committed: https://git.openstack.org/cgit/openstack/kolla/commit/?id=b4759b280c6cfad602ad9b4084d4a4f1f5f1ff2f
Submitter: Jenkins
Branch: master

commit b4759b280c6cfad602ad9b4084d4a4f1f5f1ff2f
Author: Ettore Simone <email address hidden>
Date: Fri May 27 16:13:54 2016 +0200

    Enable HAProxy consider MariaDB wsrep_local_state

    This patch enable wsrep_notify_cmd to rename haproxy user in haproxy_blocked
    when the node is not ready to serve and restore it when ready.

    Change-Id: I4f49960d7ff2fa689d6ea730b2574f16f083edc1
    Closes-Bug: 1578752
    Closes-Bug: 1587752

Changed in kolla:
status: New → Fix Released
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/kolla 3.0.0.0b2

This issue was fixed in the openstack/kolla 3.0.0.0b2 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.