I can confirm the whole cluster can go down in such case. Well, at least no node is available for access.
Let's look at following example.
node1>show variables like '%myisam';
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| wsrep_replicate_myisam | OFF |
+------------------------+-------+
1 row in set (0.00 sec)
Start attempts of node2 or node3 are failing as there is no node in Primary status. This is log fragment from a node3 being restarted and trying to connect to a cluster:
130803 3:16:43 [Note] WSREP: gcomm: connecting to group 'cluster1', peer 'node1:,node2:,node3:'
130803 3:16:43 [Warning] WSREP: (a64c939a-fc0c-11e2-9b1f-2fa595cede12, 'tcp://0.0.0.0:4567') address 'tcp://10.8.0.203:4567' points to own listening address, blacklisting
130803 3:16:43 [Note] WSREP: (a64c939a-fc0c-11e2-9b1f-2fa595cede12, 'tcp://0.0.0.0:4567') address 'tcp://10.8.0.203:4567' pointing to uuid a64c939a-fc0c-11e2-9b1f-2fa595cede12 is blacklisted, skipping
130803 3:16:43 [Note] WSREP: declaring 40c408fb-f844-11e2-8f5a-02577ce11790 stable
130803 3:16:43 [Note] WSREP: view(view_id(NON_PRIM,40c408fb-f844-11e2-8f5a-02577ce11790,21) memb { 40c408fb-f844-11e2-8f5a-02577ce11790, a64c939a-fc0c-11e2-9b1f-2fa595cede12,
} joined {
} left {
} partitioned { 1a6cd2fd-fc0b-11e2-b1cc-3a867f991b74, 392c5097-fb84-11e2-bc39-43e986d05fbc, bbbbdab0-f844-11e2-8c18-73c48a26a0f8,
})
130803 3:16:44 [Note] WSREP: (a64c939a-fc0c-11e2-9b1f-2fa595cede12, 'tcp://0.0.0.0:4567') address 'tcp://10.8.0.203:4567' pointing to uuid a64c939a-fc0c-11e2-9b1f-2fa595cede12 is blacklisted, skipping
130803 3:16:45 [Note] WSREP: (a64c939a-fc0c-11e2-9b1f-2fa595cede12, 'tcp://0.0.0.0:4567') address 'tcp://10.8.0.203:4567' pointing to uuid a64c939a-fc0c-11e2-9b1f-2fa595cede12 is blacklisted, skipping
(...)
130803 3:17:14 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():139
130803 3:17:14 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195: Failed to open backend connection: -110 (Connection timed out)
130803 3:17:14 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1289: Failed to open channel 'cluster1' at 'gcomm://node1,node2,node3': -110 (Connection timed out)
130803 3:17:14 [ERROR] WSREP: gcs connect failed: Connection timed out
130803 3:17:14 [ERROR] WSREP: wsrep::connect() failed: 6
130803 3:17:14 [ERROR] Aborting
The only way to restore the cluster is rebootstrap from the most advanced node.
The above happened because the node1 considered the situation as a split brain condition. It would be avoided if only one other node would have consistent data with primary node, or if we had an arbitrator node (http://www.codership.com/wiki/doku.php?id=galera_arbitrator).
Now the question is if emergency shutdown of all other nodes should result in non-primary state of the primary node? It does not happen when we shutdown all other nodes manually - the only single existing node remains in primary state...
I think yes, since this way we are able to investigate data differences without allowing the primary node (which in fact could be the least advanced one) to further diverge or being untrusted source of data for SST -> other nodes.
Given the above, I think this is not a bug but a correct behaviour for severe data inconsistency situation.
I can confirm the whole cluster can go down in such case. Well, at least no node is available for access.
Let's look at following example.
node1>show variables like '%myisam'; ------- ------- ----+-- -----+ ------- ------- ----+-- -----+ _myisam | OFF | ------- ------- ----+-- -----+
+------
| Variable_name | Value |
+------
| wsrep_replicate
+------
1 row in set (0.00 sec)
node1>show create table myisam1\G ******* ******* ****** 1. row ******* ******* ******* ******
*******
Table: myisam1
Create Table: CREATE TABLE `myisam1` (
`id` int(11) DEFAULT NULL,
KEY `id` (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
node1>select * from myisam1;
+------+
| id |
+------+
| 0 |
| 1 |
| 2 |
| 3 |
| 99 |
| 100 |
+------+
6 rows in set (0.00 sec)
node2>select * from myisam1;
+------+
| id |
+------+
| 0 |
| 1 |
| 99 |
| 500 |
+------+
4 rows in set (0.00 sec)
node3>select * from myisam1;
+------+
| id |
+------+
| 0 |
| 1 |
| 99 |
+------+
3 rows in set (0.00 sec)
node1>alter table myisam1 engine=InnoDB;
Query OK, 6 rows affected (0.03 sec)
Records: 6 Duplicates: 0 Warnings: 0
node1>update myisam1 set id=300 where id=3;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
Now both node2 and node3 are doing emergency shutdown (not crash) as data inconsistency was detected:
130803 2:59:53 [ERROR] WSREP: Failed to apply trx: source: 40c408fb- f844-11e2- 8f5a-02577ce117 90 version: 2 local: 0 state: APPLYING flags: 1 conn_id: 11 trx_id: 473664 seqnos (l: 13, g: 215593, s: 215592, d: 215591, ts: 137551322572203 8951) src/replicator_ smm.cpp: apply_wscoll( ):52 src/replicator_ smm.cpp: apply_trx_ ws():118
130803 2:59:53 [ERROR] WSREP: Failed to apply app buffer: seqno: 215593, status: WSREP_FATAL
at galera/
at galera/
130803 2:59:53 [ERROR] WSREP: Node consistency compromized, aborting...
But also a while, node1 does this:
130803 3:00:25 [Note] WSREP: declaring bbbbdab0- f844-11e2- 8c18-73c48a26a0 f8 stable f844-11e2- 8f5a-02577ce117 90, 'tcp:// 0.0.0.0: 4567') turning message relay requesting on, nonlive peers: tcp://10. 8.0.203: 4567 f844-11e2- 8f5a-02577ce117 90 state prim f844-11e2- 8f5a-02577ce117 90 sending install message failed: Resource temporarily unavailable id(NON_ PRIM,40c408fb- f844-11e2- 8f5a-02577ce117 90,17) memb {
40c408fb- f844-11e2- 8f5a-02577ce117 90,
392c5097- fb84-11e2- bc39-43e986d05f bc,
bbbbdab0- f844-11e2- 8c18-73c48a26a0 f8, c781-11e2- 0800-942090b7f4 98:215593, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2 id(NON_ PRIM,40c408fb- f844-11e2- 8f5a-02577ce117 90,18) memb {
40c408fb- f844-11e2- 8f5a-02577ce117 90,
392c5097- fb84-11e2- bc39-43e986d05f bc,
bbbbdab0- f844-11e2- 8c18-73c48a26a0 f8, c781-11e2- 0800-942090b7f4 98:215593, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2 f844-11e2- 8f5a-02577ce117 90, 'tcp:// 0.0.0.0: 4567') reconnecting to bbbbdab0- f844-11e2- 8c18-73c48a26a0 f8 (tcp:// 10.8.0. 202:4567) , attempt 0 f844-11e2- 8f5a-02577ce117 90, 'tcp:// 0.0.0.0: 4567') reconnecting to 392c5097- fb84-11e2- bc39-43e986d05f bc (tcp:// 10.8.0. 203:4567) , attempt 0 f844-11e2- 8f5a-02577ce117 90, 'tcp:// 0.0.0.0: 4567') reconnecting to bbbbdab0- f844-11e2- 8c18-73c48a26a0 f8 (tcp:// 10.8.0. 202:4567) , attempt 30 f844-11e2- 8f5a-02577ce117 90, 'tcp:// 0.0.0.0: 4567') reconnecting to 392c5097- fb84-11e2- bc39-43e986d05f bc (tcp:// 10.8.0. 203:4567) , attempt 30 f844-11e2- 8f5a-02577ce117 90, 'tcp:// 0.0.0.0: 4567') reconnecting to bbbbdab0- f844-11e2- 8c18-73c48a26a0 f8 (tcp:// 10.8.0. 202:4567) , attempt 60 f844-11e2- 8f5a-02577ce117 90, 'tcp:// 0.0.0.0: 4567') reconnecting to 392c5097- fb84-11e2- bc39-43e986d05f bc (tcp:// 10.8.0. 203:4567) , attempt 60 f844-11e2- 8f5a-02577ce117 90, 'tcp:// 0.0.0.0: 4567') reconnecting to bbbbdab0- f844-11e2- 8c18-73c48a26a0 f8 (tcp:// 10.8.0. 202:4567) , attempt 90
130803 3:00:25 [Note] WSREP: (40c408fb-
130803 3:00:25 [Note] WSREP: Node 40c408fb-
130803 3:00:25 [Warning] WSREP: 40c408fb-
130803 3:00:25 [Note] WSREP: view(view_
} joined {
} left {
} partitioned {
})
130803 3:00:25 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
130803 3:00:25 [Note] WSREP: Flow-control interval: [16, 16]
130803 3:00:25 [Note] WSREP: Received NON-PRIMARY.
130803 3:00:25 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 215593)
130803 3:00:25 [Note] WSREP: New cluster view: global state: 36c05e7e-
130803 3:00:25 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
130803 3:00:25 [Note] WSREP: view(view_
} joined {
} left {
} partitioned {
})
130803 3:00:25 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
130803 3:00:25 [Note] WSREP: Flow-control interval: [16, 16]
130803 3:00:25 [Note] WSREP: Received NON-PRIMARY.
130803 3:00:25 [Note] WSREP: New cluster view: global state: 36c05e7e-
130803 3:00:25 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
130803 3:00:27 [Note] WSREP: (40c408fb-
130803 3:00:27 [Note] WSREP: (40c408fb-
130803 3:01:06 [Note] WSREP: (40c408fb-
130803 3:01:07 [Note] WSREP: (40c408fb-
130803 3:01:48 [Note] WSREP: (40c408fb-
130803 3:01:49 [Note] WSREP: (40c408fb-
130803 3:02:29 [Note] WSREP: (40c408fb-
etc.
And as the result:
node1>select * from myisam1;
ERROR 1047 (08S01): Unknown command
ws status:
| wsrep_local_state | 0 | state_comment | Initialized | index_size | 7 | addresses | 10.8.0.201:3306 | conf_id | 184467440737095 51615 | state_uuid | 36c05e7e- c781-11e2- 0800-942090b7f4 98 | status | non-Primary |
| wsrep_local_
| wsrep_cert_
| wsrep_causal_reads | 0 |
| wsrep_incoming_
| wsrep_cluster_
| wsrep_cluster_size | 1 |
| wsrep_cluster_
| wsrep_cluster_
| wsrep_connected | ON |
Start attempts of node2 or node3 are failing as there is no node in Primary status. This is log fragment from a node3 being restarted and trying to connect to a cluster:
130803 3:16:43 [Note] WSREP: gcomm: connecting to group 'cluster1', peer 'node1: ,node2: ,node3: ' fc0c-11e2- 9b1f-2fa595cede 12, 'tcp:// 0.0.0.0: 4567') address 'tcp:// 10.8.0. 203:4567' points to own listening address, blacklisting fc0c-11e2- 9b1f-2fa595cede 12, 'tcp:// 0.0.0.0: 4567') address 'tcp:// 10.8.0. 203:4567' pointing to uuid a64c939a- fc0c-11e2- 9b1f-2fa595cede 12 is blacklisted, skipping f844-11e2- 8f5a-02577ce117 90 stable id(NON_ PRIM,40c408fb- f844-11e2- 8f5a-02577ce117 90,21) memb {
40c408fb- f844-11e2- 8f5a-02577ce117 90,
a64c939a- fc0c-11e2- 9b1f-2fa595cede 12,
1a6cd2fd- fc0b-11e2- b1cc-3a867f991b 74,
392c5097- fb84-11e2- bc39-43e986d05f bc,
bbbbdab0- f844-11e2- 8c18-73c48a26a0 f8, fc0c-11e2- 9b1f-2fa595cede 12, 'tcp:// 0.0.0.0: 4567') address 'tcp:// 10.8.0. 203:4567' pointing to uuid a64c939a- fc0c-11e2- 9b1f-2fa595cede 12 is blacklisted, skipping fc0c-11e2- 9b1f-2fa595cede 12, 'tcp:// 0.0.0.0: 4567') address 'tcp:// 10.8.0. 203:4567' pointing to uuid a64c939a- fc0c-11e2- 9b1f-2fa595cede 12 is blacklisted, skipping pc.cpp: connect( ):139 gcs_core. c:gcs_core_ open(): 195: Failed to open backend connection: -110 (Connection timed out) gcs.c:gcs_ open(): 1289: Failed to open channel 'cluster1' at 'gcomm: //node1, node2,node3' : -110 (Connection timed out)
130803 3:16:43 [Warning] WSREP: (a64c939a-
130803 3:16:43 [Note] WSREP: (a64c939a-
130803 3:16:43 [Note] WSREP: declaring 40c408fb-
130803 3:16:43 [Note] WSREP: view(view_
} joined {
} left {
} partitioned {
})
130803 3:16:44 [Note] WSREP: (a64c939a-
130803 3:16:45 [Note] WSREP: (a64c939a-
(...)
130803 3:17:14 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/
130803 3:17:14 [ERROR] WSREP: gcs/src/
130803 3:17:14 [ERROR] WSREP: gcs/src/
130803 3:17:14 [ERROR] WSREP: gcs connect failed: Connection timed out
130803 3:17:14 [ERROR] WSREP: wsrep::connect() failed: 6
130803 3:17:14 [ERROR] Aborting
The only way to restore the cluster is rebootstrap from the most advanced node.
The above happened because the node1 considered the situation as a split brain condition. It would be avoided if only one other node would have consistent data with primary node, or if we had an arbitrator node (http:// www.codership. com/wiki/ doku.php? id=galera_ arbitrator).
Now the question is if emergency shutdown of all other nodes should result in non-primary state of the primary node? It does not happen when we shutdown all other nodes manually - the only single existing node remains in primary state...
I think yes, since this way we are able to investigate data differences without allowing the primary node (which in fact could be the least advanced one) to further diverge or being untrusted source of data for SST -> other nodes.
Given the above, I think this is not a bug but a correct behaviour for severe data inconsistency situation.