We ran into this issue today as well. More observations for you here. When rebooting the infra nodes we found that the Galera node would take a longtime to have a status of "Synced".  It would remain in "joined" status[0]. We found that the status wouldn't change to "Synced" until after the "wsrep_local_recv_queue" returned to a value of 0.  In this particular case the node in question was the primary node from HAProxy's standpoint[1].  The node remains up and primary in the pool from HAproxy's stand point because HAproxy is only checking if it can login as the monitoring user and not the status of the node in the cluster.  In this case it can.  However, the node has not joined the cluster and is effectively down until " wsrep_local_state_comment" is synced.  The end result is the cloud is broken.  All API requests to the galera VIP are going to land on the primary node which is not active in the cluster and errors are returned to the services[2] and drive up utilization on the node with all of the stack traces.  The HAproxy Galera check should be more robust and check that the node is synced and in the cluster before passing the health check.  


[0]
MariaDB [(none)]> show global status like 'wsrep_%';
+------------------------------+-------------------------------------------------+
| Variable_name                | Value                                           |
+------------------------------+-------------------------------------------------+
| wsrep_local_state_uuid       | 172cf19e-c986-11e7-b23e-ced96a3e883b            |
| wsrep_protocol_version       | 7                                               |
| wsrep_last_committed         | 414136                                          |
| wsrep_replicated             | 0                                               |
| wsrep_replicated_bytes       | 0                                               |
| wsrep_repl_keys              | 0                                               |
| wsrep_repl_keys_bytes        | 0                                               |
| wsrep_repl_data_bytes        | 0                                               |
| wsrep_repl_other_bytes       | 0                                               |
| wsrep_received               | 5140                                            |
| wsrep_received_bytes         | 5540262                                         |
| wsrep_local_commits          | 0                                               |
| wsrep_local_cert_failures    | 0                                               |
| wsrep_local_replays          | 0                                               |
| wsrep_local_send_queue       | 0                                               |
| wsrep_local_send_queue_max   | 1                                               |
| wsrep_local_send_queue_min   | 0                                               |
| wsrep_local_send_queue_avg   | 0.000000                                        |
| wsrep_local_recv_queue       | 390                                             |
| wsrep_local_recv_queue_max   | 1438                                            |
| wsrep_local_recv_queue_min   | 0                                               |
| wsrep_local_recv_queue_avg   | 1079.684448                                     |
| wsrep_local_cached_downto    | 409074                                          |
| wsrep_flow_control_paused_ns | 877872864509                                    |
| wsrep_flow_control_paused    | 0.409347                                        |
| wsrep_flow_control_sent      | 65                                              |
| wsrep_flow_control_recv      | 66                                              |
| wsrep_cert_deps_distance     | 56.951052                                       |
| wsrep_apply_oooe             | 0.891276                                        |
| wsrep_apply_oool             | 0.000000                                        |
| wsrep_apply_window           | 23.620531                                       |
| wsrep_commit_oooe            | 0.000000                                        |
| wsrep_commit_oool            | 0.000000                                        |
| wsrep_commit_window          | 21.765556                                       |
| wsrep_local_state            | 3                                               |
*| wsrep_local_state_comment    | Joined                                          |*
| wsrep_cert_index_size        | 66                                              |
| wsrep_causal_reads           | 0                                               |
| wsrep_cert_interval          | 11.836642                                       |
| wsrep_incoming_addresses     | 0.0.0.0:3306,0.0.0.0:3306,0.0.0.0:3306 |
| wsrep_desync_count           | 0                                               |
| wsrep_evs_delayed            |                                                 |
| wsrep_evs_evict_list         |                                                 |
| wsrep_evs_repl_latency       | 0.00067127/0.00067127/0.00067127/0/1            |
| wsrep_evs_state              | OPERATIONAL                                     |
| wsrep_gcomm_uuid             | d2cf151b-ca34-11e7-853d-efeef6f46c49            |
| wsrep_cluster_conf_id        | 23                                              |
| wsrep_cluster_size           | 3                                               |
| wsrep_cluster_state_uuid     | 172cf19e-c986-11e7-b23e-ced96a3e883b            |
| wsrep_cluster_status         | Primary                                         |
| wsrep_connected              | ON                                              |
| wsrep_local_bf_aborts        | 0                                               |
| wsrep_local_index            | 2                                               |
| wsrep_provider_name          | Galera                                          |
| wsrep_provider_vendor        | Codership Oy <email address hidden>               |
| wsrep_provider_version       | 25.3.20(r3703)                                  |
| wsrep_ready                  | OFF                                             |
| wsrep_thread_count           | 25                                              |
+------------------------------+-------------------------------------------------+
58 rows in set (0.01 sec)

[1]
# Ansible managed: /etc/ansible/roles/haproxy_server/templates/service.j2 modified on 2017-08-21 17:18:32 by root on hdp004.hdpcloud.net


frontend galera-front-1
    bind 10.0.16.10:3306
    option tcplog
    timeout client 5000s
    acl white_list src 127.0.0.1/8 192.168.0.0/16 172.16.0.0/12 10.0.0.0/8
    tcp-request content accept if white_list
    tcp-request content reject
    mode tcp
    default_backend galera-back


backend galera-back
    mode tcp
    balance leastconn
    timeout server 5000s
    stick store-request src
    stick-table type ip size 256k expire 30m
    option tcplog
    option mysql-check user monitoring


    *server hdp006_galera_container-d175d0a8 0.0.0.0:3306 check port 3306 inter 12000 rise 1 fall 1*

    server hdp004_galera_container-f6c7981d 0.0.0.0:3306 check port 3306 inter 12000 rise 2 fall 2 backup
    server hdp005_galera_container-6b17af2a 10.0.19.153:3306 check port 3306 inter 12000 rise 2 fall 2 backup

[2]
2017-11-15 16:11:17.587 1110 CRITICAL nova [req-4f8f8e7b-ecbb-4c7a-9865-490be073b66a - - - - -] InternalError: (pymysql.err.InternalError) (1047, u'WSREP has not yet prepared node for application use') [SQL: 'SELECT DATABASE()']