2013-10-10 08:04:51 |
Daniel Ylitalo |
description |
A few times in a week our cluster experiences something like a "last man standing" situation, where all nodes except the one where the delete query is originating from is closed down
The situation only seems to be happening during delete queries and here's the error:
131010 1:56:55 [ERROR] Slave SQL: Could not execute Delete_rows event on table mytaste_se.blog_top; Can't find record in 'blog_top', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 1096, Error_code: 1032
131010 1:56:55 [Warning] WSREP: RBR event 2 Delete_rows apply warning: 120, 1408063004
131010 1:56:55 [ERROR] WSREP: Failed to apply trx: source: d2d40980-30de-11e3-86d7-43f3e3363655 version: 2 local: 0 state: APPLYING flags: 1 conn_id: 844727 trx_id: 6422274089 seqnos (l: 93953259, g: 1408063004, s: 1408063003, d: 1408062624, ts: 1381363015763698458)
131010 1:56:55 [ERROR] WSREP: Failed to apply app buffer: seqno: 1408063004, status: WSREP_FATAL
at galera/src/replicator_smm.cpp:apply_wscoll():52
at galera/src/replicator_smm.cpp:apply_trx_ws():118
131010 1:56:55 [ERROR] WSREP: Node consistency compromized, aborting...
131010 1:56:55 [Note] WSREP: Closing send monitor...
131010 1:56:55 [Note] WSREP: Closed send monitor.
131010 1:56:55 [Note] WSREP: gcomm: terminating thread
131010 1:56:55 [Note] WSREP: gcomm: joining thread
131010 1:56:55 [Note] WSREP: gcomm: closing backend
I don't see why a node would go offline due to inconsistency during a delete, if a row doesn't exist, just continue? The row is supposed to be gone anyway. On update queries the node should obviously go offline, but delete queries? |
A few times in a week our cluster experiences something like a "last man standing" situation, where all nodes except the one where the delete query is originating from is closed down
The situation only seems to be happening during delete queries and here's the error:
131010 1:56:55 [ERROR] Slave SQL: Could not execute Delete_rows event on table mytaste_se.blog_top; Can't find record in 'blog_top', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 1096, Error_code: 1032
131010 1:56:55 [Warning] WSREP: RBR event 2 Delete_rows apply warning: 120, 1408063004
131010 1:56:55 [ERROR] WSREP: Failed to apply trx: source: d2d40980-30de-11e3-86d7-43f3e3363655 version: 2 local: 0 state: APPLYING flags: 1 conn_id: 844727 trx_id: 6422274089 seqnos (l: 93953259, g: 1408063004, s: 1408063003, d: 1408062624, ts: 1381363015763698458)
131010 1:56:55 [ERROR] WSREP: Failed to apply app buffer: seqno: 1408063004, status: WSREP_FATAL
at galera/src/replicator_smm.cpp:apply_wscoll():52
at galera/src/replicator_smm.cpp:apply_trx_ws():118
131010 1:56:55 [ERROR] WSREP: Node consistency compromized, aborting...
131010 1:56:55 [Note] WSREP: Closing send monitor...
131010 1:56:55 [Note] WSREP: Closed send monitor.
131010 1:56:55 [Note] WSREP: gcomm: terminating thread
131010 1:56:55 [Note] WSREP: gcomm: joining thread
131010 1:56:55 [Note] WSREP: gcomm: closing backend
I don't see why a node would go offline due to inconsistency during a delete, if a row doesn't exist, just continue? The row is supposed to be gone anyway. On update queries the node should obviously go offline, but delete queries?
I also noted that when this happens, the wsrep_notify_cmd isn't executed on the node that goes offline. |
|