We ran into this issue today as well. More observations for you here. When rebooting the infra nodes we found that the Galera node would take a longtime to have a status of "Synced". It would remain in "joined" status[0]. We found that the status wouldn't change to "Synced" until after the "wsrep_local_recv_queue" returned to a value of 0. In this particular case the node in question was the primary node from HAProxy's standpoint[1]. The node remains up and primary in the pool from HAproxy's stand point because HAproxy is only checking if it can login as the monitoring user and not the status of the node in the cluster. In this case it can. However, the node has not joined the cluster and is effectively down until " wsrep_local_state_comment" is synced. The end result is the cloud is broken. All API requests to the galera VIP are going to land on the primary node which is not active in the cluster and errors are returned to the services[2] and drive up utilization on the node with all of the stack traces. The HAproxy Galera check should be more robust and check that the node is synced and in the cluster before passing the health check. [0] MariaDB [(none)]> show global status like 'wsrep_%'; +------------------------------+-------------------------------------------------+ | Variable_name | Value | +------------------------------+-------------------------------------------------+ | wsrep_local_state_uuid | 172cf19e-c986-11e7-b23e-ced96a3e883b | | wsrep_protocol_version | 7 | | wsrep_last_committed | 414136 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_received | 5140 | | wsrep_received_bytes | 5540262 | | wsrep_local_commits | 0 | | wsrep_local_cert_failures | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_recv_queue | 390 | | wsrep_local_recv_queue_max | 1438 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_recv_queue_avg | 1079.684448 | | wsrep_local_cached_downto | 409074 | | wsrep_flow_control_paused_ns | 877872864509 | | wsrep_flow_control_paused | 0.409347 | | wsrep_flow_control_sent | 65 | | wsrep_flow_control_recv | 66 | | wsrep_cert_deps_distance | 56.951052 | | wsrep_apply_oooe | 0.891276 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 23.620531 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 21.765556 | | wsrep_local_state | 3 | *| wsrep_local_state_comment | Joined |* | wsrep_cert_index_size | 66 | | wsrep_causal_reads | 0 | | wsrep_cert_interval | 11.836642 | | wsrep_incoming_addresses | 0.0.0.0:3306,0.0.0.0:3306,0.0.0.0:3306 | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0.00067127/0.00067127/0.00067127/0/1 | | wsrep_evs_state | OPERATIONAL | | wsrep_gcomm_uuid | d2cf151b-ca34-11e7-853d-efeef6f46c49 | | wsrep_cluster_conf_id | 23 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 172cf19e-c986-11e7-b23e-ced96a3e883b | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_bf_aborts | 0 | | wsrep_local_index | 2 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy