node does not abort/exit when trx replaying fails

Bug #735465 reported by Alex Yurchenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Galera
Fix Released
Critical
Teemu Ollakka

Bug Description

E.g. caught this:

110313 19:28:12 [Note] WSREP: Member 2 (ip-10-226-70-143) synced with group.
110313 19:34:53 [ERROR] WSREP: invalid state APPLYING (FATAL)
  at galera/src/replicator_smm.cpp:abort_trx():679
110313 19:34:53 [Warning] WSREP: cancel commit bad exit: 6 9470094
110313 19:44:32 [ERROR] Slave SQL: Could not execute Update_rows event on table test.comm04; Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213; handler error HA_ERR_LOCK_DEADLOCK; the event's master log FIRST, end_log_pos 8700, Error_code: 1213
110313 19:44:32 [Warning] WSREP: RBR event 30 apply warning: 149, 6393734
110313 19:44:32 [Warning] WSREP trx_replay failed for: 7
110313 20:28:03 [Warning] WSREP attempting net_end_statement while replaying
110313 20:28:03 InnoDB: Error: MySQL is freeing a thd
InnoDB: though trx->n_mysql_tables_in_use is 1
InnoDB: and trx->mysql_n_tables_locked is 1.
TRANSACTION 9948B6, not started, process no 31374, OS thread id 46920119482112
mysql tables in use 1, locked 1
MySQL thread id 6858, query id 6314006 ip-10-227-181-219.eu-west-1.compute.internal 10.227.181.219 test sleeping

The bug is twofold:
1) the node has inconsistent state, but does not abort
2) the node hangs but pretends to be live and holds the whole cluster

Changed in galera:
milestone: none → 0.8.0
assignee: nobody → Teemu Ollakka (teemu-ollakka)
importance: Undecided → Critical
status: New → Confirmed
Changed in galera:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.