Bug #1037380 “WSREP Crash on MyISAM -> InnoDB” : Bugs : Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

Revision history for this message

Justice London (jlondon) wrote on 2012-08-16:

#1

Additional info. Here's the table/update that seems to have caused it:

120815 19:15:34 [ERROR] Slave SQL: Could not execute Update_rows event on table jefferscharms.log; Can't find record in 'log', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 770, Error_code: 1032
120815 19:15:34 [Warning] WSREP: RBR event 2 Update_rows apply warning: 120, 15897511
120815 19:15:34 [ERROR] WSREP: Failed to apply trx: source: 94a95a29-e405-11e1-0800-49b2799b4b14 version: 2 local: 0 state: CERTIFYING flags: 1 conn_id: 21458122 trx_id: 8015220559 seqnos (l: 5462049, g: 15897511, s: 15897510, d: 15896902, ts: 1345083333006589653)

Revision history for this message

Justice London (jlondon) wrote on 2012-08-16:

#2

Might be due to this issue. Is this perhaps something you guys can help Codership fix as it apparently still is an issue?

https://groups.google.com/forum/#!msg/codership-team/vXiLaoKdBQI/NPZRVCLr6K0J[1-25]

Seppo Jaakola (seppo-jaakola) on 2012-08-28

Changed in codership-mysql:
status:	New → In Progress
assignee:	nobody → Seppo Jaakola (seppo-jaakola)

Revision history for this message

Seppo Jaakola (seppo-jaakola) wrote on 2012-08-28:

#3

The mailing list link does not seem to be related.

The crash is due to data inconsistency in the cluster, Galera has recognized that this node does not have same data content as other nodes in the cluster and does emergency shutdown.

Note, that when the table was of type MyISAM, it was not subject to replication (well, unless you have deliberately configured wsrep_replicate_myisam). So whatever you write to this table will remain local in respective node. Then, when you alter the table to InnoDB, it will become a replicated table and if nodes contained different data at this point, it will lead to sure crash.

Can you confirm that table contents were consistent before the alter to innodb was issued?

Seppo Jaakola (seppo-jaakola) on 2013-02-26

Changed in codership-mysql:
status:	In Progress → Incomplete

Revision history for this message

Justice London (jlondon) wrote on 2013-02-26:

#4

Very likely there was a data disparity between the servers. I would say that regardless of the situation (for instance, say the data was only on one server and not yet on the other three), the cluster should be able to respond in a less drastic manner than bringing the whole cluster down.

Revision history for this message

Henrik Ingo (hingo) wrote on 2013-02-26:

#5

Justice,

it shouldn't be the case that the whole cluster went down. The node that is inconsistent shuts down. (Note also that it doesn't crash, it shuts down by design.) Often it can happen that all nodes are inconsistent, so that all "slave" nodes shut down, but the "master" node, ie the one that you are executing the transaction against, is of course consistent with itself, so at least that one will keep on running.

Galera is designed to be tightly coupled like this on purpose, we really work hard to avoid data inconsistency. Other systems, notably MySQL replication, will allow nodes to be inconsistent, and of course that seemingly gives better availability but also it is then quite a common problem that your nodes can keep on diverging.

Revision history for this message

Przemek (pmalkowski) wrote on 2013-08-24:

#6

Download full text (8.9 KiB)

I can confirm the whole cluster can go down in such case. Well, at least no node is available for access.
Let's look at following example.

node1>show create table myisam1\G
*************************** 1. row ***************************
       Table: myisam1
Create Table: CREATE TABLE `myisam1` (
  `id` int(11) DEFAULT NULL,
  KEY `id` (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

node1>select * from myisam1;
+------+
| id |
+------+
| 0 |
| 1 |
| 2 |
| 3 |
| 99 |
| 100 |
+------+
6 rows in set (0.00 sec)

node2>select * from myisam1;
+------+
| id |
+------+
| 0 |
| 1 |
| 99 |
| 500 |
+------+
4 rows in set (0.00 sec)

node3>select * from myisam1;
+------+
| id |
+------+
| 0 |
| 1 |
| 99 |
+------+
3 rows in set (0.00 sec)

node1>alter table myisam1 engine=InnoDB;
Query OK, 6 rows affected (0.03 sec)
Records: 6 Duplicates: 0 Warnings: 0

node1>update myisam1 set id=300 where id=3;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0

Now both node2 and node3 are doing emergency shutdown (not crash) as data inconsistency was detected:

130803 2:59:53 [ERROR] WSREP: Failed to apply trx: source: 40c408fb-f844-11e2-8f5a-02577ce11790 version: 2 local: 0 state: APPLYING flags: 1 conn_id: 11 trx_id: 473664 seqnos (l: 13, g: 215593, s: 215592, d: 215591, ts: 1375513225722038951)
130803 2:59:53 [ERROR] WSREP: Failed to apply app buffer: seqno: 215593, status: WSREP_FATAL
at galera/src/replicator_smm.cpp:apply_wscoll():52
at galera/src/replicator_smm.cpp:apply_trx_ws():118
130803 2:59:53 [ERROR] WSREP: Node consistency compromized, aborting...

But also a while, node1 does this:

130803 3:00:25 [Note] WSREP: declaring bbbbdab0-f844-11e2-8c18-73c48a26a0f8 stable
130803 3:00:25 [Note] WSREP: (40c408fb-f844-11e2-8f5a-02577ce11790, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.8.0.203:4567
130803 3:00:25 [Note] WSREP: Node 40c408fb-f844-11e2-8f5a-02577ce11790 state prim
130803 3:00:25 [Warning] WSREP: 40c408fb-f844-11e2-8f5a-02577ce11790 sending install message failed: Resource temporarily unavailable
130803 3:00:25 [Note] WSREP: view(view_id(NON_PRIM,40c408fb-f844-11e2-8f5a-02577ce11790,17) memb {
        40c408fb-f844-11e2-8f5a-02577ce11790,
} joined {
} left {
} partitioned {
        392c5097-fb84-11e2-bc39-43e986d05fbc,
        bbbbdab0-f844-11e2-8c18-73c48a26a0f8,
})
130803 3:00:25 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
130803 3:00:25 [Note] WSREP: Flow-control interval: [16, 16]
130803 3:00:25 [Note] WSREP: Received NON-PRIMARY.
130803 3:00:25 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 215593)
130803 3:00:25 [Note] WSREP: New cluster view: global state: 36c05e7e-c781-11e2-0800-942090b7f498:215593, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
130803 3:00:25 [Note] WSREP: wsrep_notify_cm...

I can confirm the whole cluster can go down in such case. Well, at least no node is available for access. 
Let's look at following example.

node1>show create table myisam1\G
*************************** 1. row ***************************
       Table: myisam1
Create Table: CREATE TABLE `myisam1` (
  `id` int(11) DEFAULT NULL,
  KEY `id` (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

node1>select * from myisam1;
+------+
| id   |
+------+
|    0 |
|    1 |
|    2 |
|    3 |
|   99 |
|  100 |
+------+
6 rows in set (0.00 sec)

node2>select * from myisam1;
+------+
| id   |
+------+
|    0 |
|    1 |
|   99 |
|  500 |
+------+
4 rows in set (0.00 sec)

node3>select * from myisam1;
+------+
| id   |
+------+
|    0 |
|    1 |
|   99 |
+------+
3 rows in set (0.00 sec)

node1>alter table myisam1 engine=InnoDB;
Query OK, 6 rows affected (0.03 sec)
Records: 6  Duplicates: 0  Warnings: 0

node1>update myisam1 set id=300 where id=3;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Now both node2 and node3 are doing emergency shutdown (not crash) as data inconsistency was detected:

130803  2:59:53 [ERROR] WSREP: Failed to apply trx: source: 40c408fb-f844-11e2-8f5a-02577ce11790 version: 2 local: 0 state: APPLYING flags: 1 conn_id: 11 trx_id: 473664 seqnos (l: 13, g: 215593, s: 215592, d: 215591, ts: 1375513225722038951)
130803  2:59:53 [ERROR] WSREP: Failed to apply app buffer: seqno: 215593, status: WSREP_FATAL
         at galera/src/replicator_smm.cpp:apply_wscoll():52
         at galera/src/replicator_smm.cpp:apply_trx_ws():118
130803  2:59:53 [ERROR] WSREP: Node consistency compromized, aborting...

But also a while, node1 does this:

130803  3:00:25 [Note] WSREP: declaring bbbbdab0-f844-11e2-8c18-73c48a26a0f8 stable
130803  3:00:25 [Note] WSREP: (40c408fb-f844-11e2-8f5a-02577ce11790, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.8.0.203:4567
130803  3:00:25 [Note] WSREP: Node 40c408fb-f844-11e2-8f5a-02577ce11790 state prim
130803  3:00:25 [Warning] WSREP: 40c408fb-f844-11e2-8f5a-02577ce11790 sending install message failed: Resource temporarily unavailable
130803  3:00:25 [Note] WSREP: view(view_id(NON_PRIM,40c408fb-f844-11e2-8f5a-02577ce11790,17) memb {
        40c408fb-f844-11e2-8f5a-02577ce11790,
} joined {
} left {
} partitioned {
        392c5097-fb84-11e2-bc39-43e986d05fbc,
        bbbbdab0-f844-11e2-8c18-73c48a26a0f8,
})
130803  3:00:25 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
130803  3:00:25 [Note] WSREP: Flow-control interval: [16, 16]
130803  3:00:25 [Note] WSREP: Received NON-PRIMARY.
130803  3:00:25 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 215593)
130803  3:00:25 [Note] WSREP: New cluster view: global state: 36c05e7e-c781-11e2-0800-942090b7f498:215593, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
130803  3:00:25 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
130803  3:00:25 [Note] WSREP: view(view_id(NON_PRIM,40c408fb-f844-11e2-8f5a-02577ce11790,18) memb {
        40c408fb-f844-11e2-8f5a-02577ce11790,
} joined {
} left {
} partitioned {
        392c5097-fb84-11e2-bc39-43e986d05fbc,
        bbbbdab0-f844-11e2-8c18-73c48a26a0f8,
})
130803  3:00:25 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
130803  3:00:25 [Note] WSREP: Flow-control interval: [16, 16]
130803  3:00:25 [Note] WSREP: Received NON-PRIMARY.
130803  3:00:25 [Note] WSREP: New cluster view: global state: 36c05e7e-c781-11e2-0800-942090b7f498:215593, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
130803  3:00:25 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
130803  3:00:27 [Note] WSREP: (40c408fb-f844-11e2-8f5a-02577ce11790, 'tcp://0.0.0.0:4567') reconnecting to bbbbdab0-f844-11e2-8c18-73c48a26a0f8 (tcp://10.8.0.202:4567), attempt 0
130803  3:00:27 [Note] WSREP: (40c408fb-f844-11e2-8f5a-02577ce11790, 'tcp://0.0.0.0:4567') reconnecting to 392c5097-fb84-11e2-bc39-43e986d05fbc (tcp://10.8.0.203:4567), attempt 0
130803  3:01:06 [Note] WSREP: (40c408fb-f844-11e2-8f5a-02577ce11790, 'tcp://0.0.0.0:4567') reconnecting to bbbbdab0-f844-11e2-8c18-73c48a26a0f8 (tcp://10.8.0.202:4567), attempt 30
130803  3:01:07 [Note] WSREP: (40c408fb-f844-11e2-8f5a-02577ce11790, 'tcp://0.0.0.0:4567') reconnecting to 392c5097-fb84-11e2-bc39-43e986d05fbc (tcp://10.8.0.203:4567), attempt 30
130803  3:01:48 [Note] WSREP: (40c408fb-f844-11e2-8f5a-02577ce11790, 'tcp://0.0.0.0:4567') reconnecting to bbbbdab0-f844-11e2-8c18-73c48a26a0f8 (tcp://10.8.0.202:4567), attempt 60
130803  3:01:49 [Note] WSREP: (40c408fb-f844-11e2-8f5a-02577ce11790, 'tcp://0.0.0.0:4567') reconnecting to 392c5097-fb84-11e2-bc39-43e986d05fbc (tcp://10.8.0.203:4567), attempt 60
130803  3:02:29 [Note] WSREP: (40c408fb-f844-11e2-8f5a-02577ce11790, 'tcp://0.0.0.0:4567') reconnecting to bbbbdab0-f844-11e2-8c18-73c48a26a0f8 (tcp://10.8.0.202:4567), attempt 90

etc.

And as the result:

node1>select * from myisam1;
ERROR 1047 (08S01): Unknown command

ws status:

Start attempts of node2 or node3 are failing as there is no node in Primary status. This is log fragment from a node3 being restarted and trying to connect to a cluster:

130803  3:16:43 [Note] WSREP: gcomm: connecting to group 'cluster1', peer 'node1:,node2:,node3:'
130803  3:16:43 [Warning] WSREP: (a64c939a-fc0c-11e2-9b1f-2fa595cede12, 'tcp://0.0.0.0:4567') address 'tcp://10.8.0.203:4567' points to own listening address, blacklisting
130803  3:16:43 [Note] WSREP: (a64c939a-fc0c-11e2-9b1f-2fa595cede12, 'tcp://0.0.0.0:4567') address 'tcp://10.8.0.203:4567' pointing to uuid a64c939a-fc0c-11e2-9b1f-2fa595cede12 is blacklisted, skipping
130803  3:16:43 [Note] WSREP: declaring 40c408fb-f844-11e2-8f5a-02577ce11790 stable
130803  3:16:43 [Note] WSREP: view(view_id(NON_PRIM,40c408fb-f844-11e2-8f5a-02577ce11790,21) memb {
        40c408fb-f844-11e2-8f5a-02577ce11790,
        a64c939a-fc0c-11e2-9b1f-2fa595cede12,
} joined {
} left {
} partitioned {
        1a6cd2fd-fc0b-11e2-b1cc-3a867f991b74,
        392c5097-fb84-11e2-bc39-43e986d05fbc,
        bbbbdab0-f844-11e2-8c18-73c48a26a0f8,
})
130803  3:16:44 [Note] WSREP: (a64c939a-fc0c-11e2-9b1f-2fa595cede12, 'tcp://0.0.0.0:4567') address 'tcp://10.8.0.203:4567' pointing to uuid a64c939a-fc0c-11e2-9b1f-2fa595cede12 is blacklisted, skipping
130803  3:16:45 [Note] WSREP: (a64c939a-fc0c-11e2-9b1f-2fa595cede12, 'tcp://0.0.0.0:4567') address 'tcp://10.8.0.203:4567' pointing to uuid a64c939a-fc0c-11e2-9b1f-2fa595cede12 is blacklisted, skipping
(...)
130803  3:17:14 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
         at gcomm/src/pc.cpp:connect():139
130803  3:17:14 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195: Failed to open backend connection: -110 (Connection timed out)
130803  3:17:14 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1289: Failed to open channel 'cluster1' at 'gcomm://node1,node2,node3': -110 (Connection timed out)
130803  3:17:14 [ERROR] WSREP: gcs connect failed: Connection timed out
130803  3:17:14 [ERROR] WSREP: wsrep::connect() failed: 6
130803  3:17:14 [ERROR] Aborting

The only way to restore the cluster is rebootstrap from the most advanced node.

The above happened because the node1 considered the situation as a split brain condition. It would be avoided if only one other node would have consistent data with primary node, or if we had an arbitrator node (http://www.codership.com/wiki/doku.php?id=galera_arbitrator).

Now the question is if emergency shutdown of all other nodes should result in non-primary state of the primary node?  It does not happen when we shutdown all other nodes manually - the only single existing node remains in primary state...
I think yes, since this way we are able to investigate data differences without allowing the primary node (which in fact could be the least advanced one) to further diverge or being untrusted source of data for SST -> other nodes.

Given the above, I think this is not a bug but a correct behaviour for severe data inconsistency situation.

Changed in percona-xtradb-cluster:
status:	New → Invalid

Revision history for this message

Jay Janssen (jay-janssen) wrote on 2013-08-26:

#7

I guess what I wonder here is if a node crashing for consistency reasons should leave the cluster gracefully first instead of ungracefully (like other crashes).

Revision history for this message

Teemu Ollakka (teemu-ollakka) wrote on 2013-08-26:

#8

It looks like inconsistent nodes tried to leave gracefully, but due to unfortunate timing the remaining node failed to maintain PC. As a workaround set wsrep_provider_options='pc.bootstrap=1' to restore PC.

Revision history for this message

Jay Janssen (jay-janssen) wrote on 2013-08-26: Re: [Bug 1037380] WSREP Crash on MyISAM -> InnoDB

#9

I guess I wonder why "unfortunate timing" should cause this to behave in such a way? Is this a bug? Shouldn't the cluster behavior be consistent if a node is truly leaving gracefully?

On Aug 26, 2013, at 8:57 AM, Teemu Ollakka <email address hidden> wrote:

> It looks like inconsistent nodes tried to leave gracefully, but due to
> unfortunate timing the remaining node failed to maintain PC. As a
> workaround set wsrep_provider_options='pc.bootstrap=1' to restore PC.

Jay Janssen, MySQL Consulting Lead, Percona
http://about.me/jay.janssen

Revision history for this message

Teemu Ollakka (teemu-ollakka) wrote on 2013-08-27:

#10

Jay, it's a bug: https://bugs.launchpad.net/galera/+bug/1217225.

Revision history for this message

Jay Janssen (jay-janssen) wrote on 2013-08-27:

#11

Thanks Teemu, glad we're finding these little races.

On Aug 27, 2013, at 3:28 AM, Teemu Ollakka <email address hidden> wrote:

> Jay, it's a bug: https://bugs.launchpad.net/galera/+bug/1217225.

Jay Janssen, MySQL Consulting Lead, Percona
http://about.me/jay.janssen

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

#12

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1241

Affects		Status	Importance	Assigned to	Milestone
	MySQL patches by Codership	Incomplete	Undecided	Seppo Jaakola
	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Invalid	Undecided	Unassigned

Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

WSREP Crash on MyISAM -> InnoDB

Bug Description

Other bug subscribers

Remote bug watches