I reproduced this problem on 4 node cluster from percona1-4. PXC 5.5.37 with wsrep 2.10(r175).
I introduced bad network with loss/delay on percona1
1) After sometime cluster started malfunctioning all nodes went into wsrep_local_state_comment = Initialized and wsrep_cluster_status = non-Primary
2) I made percona2 as primary, all other nodes are into the cluster and percona1 (having network issues) still trying to connect and eventually entire cluster went down. wsrep stopped working & i have to issue kill -9 on all nodes to bring cluster up again.
I reproduced this problem on 4 node cluster from percona1-4. PXC 5.5.37 with wsrep 2.10(r175).
I introduced bad network with loss/delay on percona1
1) After sometime cluster started malfunctioning all nodes went into wsrep_local_ state_comment = Initialized and wsrep_cluster_ status = non-Primary
2) I made percona2 as primary, all other nodes are into the cluster and percona1 (having network issues) still trying to connect and eventually entire cluster went down. wsrep stopped working & i have to issue kill -9 on all nodes to bring cluster up again.
[root@percona2 ~]# mysql ------- ------- ------- -+----- ------- ------- ------- ------- -----+ ------- ------- ------- -+----- ------- ------- ------- ------- -----+ state_uuid | 9f581f39- eb03-11e3- 8eb4-97664aaec9 7d | version | 4 | committed | 0 | d_bytes | 0 | bytes | 34204 | cert_failures | 0 | send_queue | 0 | send_queue_ avg | 0.000000 | recv_queue | 0 | recv_queue_ avg | 0.000000 | control_ paused | 0.000000 | control_ sent | 0 | control_ recv | 0 | deps_distance | 0.000000 | state_comment | Initialized | index_size | 0 | addresses | | conf_id | 184467440737095 51615 | state_uuid | 9f581f39- eb03-11e3- 8eb4-97664aaec9 7d | status | non-Primary | bf_aborts | 0 | 51615 | vendor | Codership Oy <email address hidden> | version | 2.10(r175) | ------- ------- ------- -+----- ------- ------- ------- ------- -----+
mysql> show status like 'wsrep%';
+------
| Variable_name | Value |
+------
| wsrep_local_
| wsrep_protocol_
| wsrep_last_
| wsrep_replicated | 0 |
| wsrep_replicate
| wsrep_received | 151 |
| wsrep_received_
| wsrep_local_commits | 0 |
| wsrep_local_
| wsrep_local_replays | 0 |
| wsrep_local_
| wsrep_local_
| wsrep_local_
| wsrep_local_
| wsrep_flow_
| wsrep_flow_
| wsrep_flow_
| wsrep_cert_
| wsrep_apply_oooe | 0.000000 |
| wsrep_apply_oool | 0.000000 |
| wsrep_apply_window | 0.000000 |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.000000 |
| wsrep_commit_window | 0.000000 |
| wsrep_local_state | 0 |
| wsrep_local_
| wsrep_cert_
| wsrep_causal_reads | 0 |
| wsrep_incoming_
| wsrep_cluster_
| wsrep_cluster_size | 0 |
| wsrep_cluster_
| wsrep_cluster_
| wsrep_connected | ON |
| wsrep_local_
| wsrep_local_index | 184467440737095
| wsrep_provider_name | Galera |
| wsrep_provider_
| wsrep_provider_
| wsrep_ready | OFF |
+------
mysql> select 1;
ERROR 1047 (08S01): Unknown command
[root@percona2 ~]# tail -f /var/log/mysqld.log 8f61539e- ec8c-11e3- ae76-6b64399ac8 3dInstall message self state does not match, message state: prim=0, un=0,last_ seq=2,last_ prim=view_ id(PRIM, 8f61539e- ec8c-11e3- ae76-6b64399ac8 3d,1165) ,to_seq= 445,weight= 1, local state: prim=0, un=1,last_ seq=2,last_ prim=view_ id(PRIM, 8f61539e- ec8c-11e3- ae76-6b64399ac8 3d,1165) ,to_seq= 445,weight= 1 (FATAL) pc_proto. cpp:handle_ install( ):1047 eb03-11e3- 8eb4-97664aaec9 7d:0, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol version 2
140605 10:08:25 [ERROR] WSREP: exception from gcomm, backend must be restarted:
at gcomm/src/
140605 10:08:25 [Note] WSREP: Received self-leave message.
140605 10:08:25 [Note] WSREP: Flow-control interval: [0, 0]
140605 10:08:25 [Note] WSREP: Received SELF-LEAVE. Closing connection.
140605 10:08:25 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 0)
140605 10:08:25 [Note] WSREP: RECV thread exiting 0: Success
140605 10:08:25 [Note] WSREP: New cluster view: global state: 9f581f39-
140605 10:08:25 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
140605 10:08:25 [Note] WSREP: applier thread exiting (code:0)