I tested with good results on: percona2 mysql> select @@version,@@version_comment; show status like 'wsrep_provider_version'; +----------------+---------------------------------------------------------------------------------------------------+ | @@version | @@version_comment | +----------------+---------------------------------------------------------------------------------------------------+ | 5.6.21-69.0-56 | Percona XtraDB Cluster (GPL), Release rel69.0, Revision 910, WSREP version 25.8, wsrep_25.8.r4126 | +----------------+---------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) +------------------------+---------------+ | Variable_name | Value | +------------------------+---------------+ | wsrep_provider_version | 3.8(r1dd46ba) | +------------------------+---------------+ 1 row in set (0.00 sec) First I tested without using the new options, so evs.version=0 and evs.auto_evict not set. When one of the nodes starts having high packet loss or high latency, the cluster still goes into non-Primary state but after some time it recovers and later goes non-Primary again. So in general cluster status is flapping and also while the broken/delayed node is in the cluster we can observe huge commit delays. However I was not able to end up with any node having an exception in gcomm and completely stuck like before. The wsrep_evs_delayed counter grows for the bad node, an example: | wsrep_local_state_comment | Initialized (...) | wsrep_incoming_addresses | unspecified,unspecified,unspecified,unspecified,unspecified,192.168.90.11:3306 | wsrep_evs_delayed | fbebe800-59e2-11e4-85ec-7698aa6cc406:tcp://192.168.90.2:4567:255 | wsrep_evs_evict_list | | wsrep_evs_repl_latency | 1.8112/1.8112/1.8112/0/1 | wsrep_evs_state | GATHER (...) | wsrep_cluster_status | non-Primary So even without using auto eviction functionality, there is much better chance a cluster will auto-recover after intermittent network problem. After I enabled the new evs.version=1 and set evs.auto_evict=25 on all nodes, the cluster still had flapping problems because of the single bad node, but as soon as the wsrep_evs_delayed counter reached 25 for this node, it was evicted properly from the cluster and since then no more problems observed. The bad node's uuid appears in the wsrep_evs_evict_list list: | wsrep_evs_evict_list | 572af5eb-5dd2-11e4-8f67-4ed3860f88c4 and in the error log on the bad node we can see: 2014-10-27 13:15:04 19941 [Note] WSREP: (572af5eb, 'tcp://0.0.0.0:4567') address 'tcp://192.168.90.2:4567' pointing to uuid 572af5eb is blacklisted, skipping (...) 2014-10-27 13:15:06 19941 [Warning] WSREP: handshake with a292793c tcp://192.168.90.4:4567 failed: 'evicted' 2014-10-27 13:15:06 19941 [Warning] WSREP: handling gmcast protocol message failed: this node has been evicted out of the cluster, gcomm backend restart is required (FATAL) at gcomm/src/gmcast_proto.cpp:handle_failed():208 2014-10-27 13:15:06 19941 [ERROR] WSREP: exception from gcomm, backend must be restarted: this node has been evicted out of the cluster, gcomm backend restart is required (FATAL) at gcomm/src/gmcast_proto.cpp:handle_failed():208 2014-10-27 13:15:06 19941 [Note] WSREP: gcomm: terminating thread 2014-10-27 13:15:06 19941 [Note] WSREP: gcomm: joining thread 2014-10-27 13:15:06 19941 [Note] WSREP: gcomm: closing backend 2014-10-27 13:15:06 19941 [Note] WSREP: Forced PC close 2014-10-27 13:15:06 19941 [Note] WSREP: gcomm: closed (...) So the eviction funciton seems to work as expected, I have some comments though: * All the nodes should have the evs.version=1 and set evs.auto_evict set, otherwise when only half of the nodes had it, the bad node was not entirely evicted and cluster end up in endless non-Primary state. * Normal, clean node restart can increase the wsrep_evs_delayed counter by 1. So beware of setting the evs.auto_evict to very low values.