HAproxy does not consider wsrep_local_state when considering which db nodes/containers can accept traffic
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
kolla |
Fix Released
|
Undecided
|
Mark Casey |
Bug Description
If/when a maria db node/container ("node" from here on, but I mean container) fails and is restarted it will run a state transfer from a donor node to get back into sync with the cluster. The sync may be an incremental sync (if the container has persistence on the datadir [/var/lib/mysql or similar]) or, if the data that has changed in the interim is too large, a complete retransfer of all data.
Depending on the config this may make both the recovering node and the donor node fail any queries that are sent to them. This is a problem because although these nodes will not execute queries they will (*LAST I CHECKED) allow incoming DB connections, which means that HAProxy's mysql-check will not fail and the nodes will not be removed from the pool of healthy nodes.
One solution is to run a separate HTTP daemon with has access via a dedicated user to check the mysql variable "wsrep_local_state" on each node. This daemon will return 200 OK if all is well and 503 Service Unavailable if the node in question is currently receiving from or acting as a donor in a Galera State Transfer.
In this way the only queries which fail are those that HAProxy routes to the failed node between the time it fails and the next HAProxy call to this "checking daemon."
This repo (https:/
description: | updated |
Changed in kolla: | |
assignee: | nobody → Mark Casey (mark-casey) |
description: | updated |
Related fix proposed to branch: master /review. openstack. org/322200
Review: https:/