shutdown hangs because of failing replayer

Bug #1358701 reported by Teemu Ollakka
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Status tracked in 5.6
5.5
High
Teemu Ollakka
5.6
High
Teemu Ollakka
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Confirmed
Undecided
Unassigned
5.6
Fix Released
Undecided
Unassigned

Bug Description

Normal shutdown under load, server remained hanging because replayer failed to finish:

2014-08-19 10:08:41 12146 [Note] /home/vagrant/galera/node1/mysql/bin/mysqld: Normal shutdown

2014-08-19 10:08:41 12146 [Note] WSREP: Stop replication
2014-08-19 10:08:41 12146 [Note] WSREP: Closing send monitor...
2014-08-19 10:08:41 12146 [Note] WSREP: gcomm: closing backend
2014-08-19 10:08:41 12146 [Note] WSREP: evs::proto(dd2a6026, OPERATIONAL, view_id(REG,01068177,59)): state change: OPERATIONAL -> LEAVING
2014-08-19 10:08:41 12146 [Note] WSREP: evs::proto(dd2a6026, LEAVING, view_id(REG,01068177,59)): state change: LEAVING -> CLOSED
2014-08-19 10:08:41 12146 [Note] WSREP: evs::proto(dd2a6026, LEAVING, view_id(REG,01068177,59)): delivering view view(view_id(TRANS,01068177,59) memb {
 dd2a6026,0
} joined {
} left {
} partitioned {
 01068177,0
 1e608ebc,0
})
2014-08-19 10:08:41 12146 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2014-08-19 10:08:41 12146 [Note] WSREP: evs::proto(dd2a6026, LEAVING, view_id(REG,01068177,59)): delivering view view((empty))
2014-08-19 10:08:41 12146 [Note] WSREP: gcomm: closed
2014-08-19 10:08:41 12146 [Note] WSREP: Flow-control interval: [16, 16]
2014-08-19 10:08:41 12146 [Note] WSREP: Received NON-PRIMARY.
2014-08-19 10:08:41 12146 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 187802)
2014-08-19 10:08:41 12146 [Note] WSREP: Received self-leave message.
2014-08-19 10:08:41 12146 [Note] WSREP: Flow-control interval: [0, 0]
2014-08-19 10:08:41 12146 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2014-08-19 10:08:41 12146 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 187802)
2014-08-19 10:08:41 12146 [Note] WSREP: RECV thread exiting 0: Success
2014-08-19 10:08:41 12146 [Note] WSREP: recv_thread() joined.
2014-08-19 10:08:41 12146 [Note] WSREP: Closing replication queue.
2014-08-19 10:08:41 12146 [Note] WSREP: Closing slave action queue.
2014-08-19 10:08:41 12146 [Note] WSREP: Closing client connections
2014-08-19 10:08:41 12146 [Note] WSREP: applier has been aborted, skipping apply_rbr: 187802
2014-08-19 10:08:41 12146 [Warning] WSREP: Failed to apply app buffer: seqno: 187802, status: 1
  at /home/teemu/work/pytest/build/galera_build-debug-gh71/galera/src/trx_handle.cpp:apply():340
  at /home/teemu/work/pytest/build/galera_build-debug-gh71/galera/src/replicator_smm.cpp:apply_trx_ws():37
Retrying 2th time
2014-08-19 10:08:41 12146 [Note] WSREP: applier has been aborted, skipping apply_rbr: 187802
2014-08-19 10:08:41 12146 [Warning] WSREP: Failed to apply app buffer: seqno: 187802, status: 1
  at /home/teemu/work/pytest/build/galera_build-debug-gh71/galera/src/trx_handle.cpp:apply():340
  at /home/teemu/work/pytest/build/galera_build-debug-gh71/galera/src/replicator_smm.cpp:apply_trx_ws():37
Retrying 3th time
2014-08-19 10:08:41 12146 [Note] WSREP: applier has been aborted, skipping apply_rbr: 187802
2014-08-19 10:08:41 12146 [Warning] WSREP: Failed to apply app buffer: seqno: 187802, status: 1
  at /home/teemu/work/pytest/build/galera_build-debug-gh71/galera/src/trx_handle.cpp:apply():340
  at /home/teemu/work/pytest/build/galera_build-debug-gh71/galera/src/replicator_smm.cpp:apply_trx_ws():37
Retrying 4th time
2014-08-19 10:08:41 12146 [Note] WSREP: applier has been aborted, skipping apply_rbr: 187802
2014-08-19 10:08:41 12146 [Warning] WSREP: failed to replay trx: source: dd2a6026-2787-11e4-b24a-67f75d26ccfe version: 3 local: 1 state: REPLAYING flags: 1 conn_id: 360 trx_id: 387759 seqnos (l: 15344, g: 187802, s: 187798, d: 187792, ts: 806609869780997)
2014-08-19 10:08:41 12146 [Warning] WSREP: Failed to apply trx 187802 4 times
  at /home/teemu/work/pytest/build/galera_build-debug-gh71/galera/src/replicator_smm.cpp:replay_trx():849
2014-08-19 10:08:41 12146 [ERROR] WSREP: trx_replay failed for: 6, query: COMMIT
2014-08-19 10:08:41 12146 [ERROR] Aborting

2014-08-19 10:08:41 12146 [Note] WSREP: Closing client connections
2014-08-19 10:08:43 12146 [Note] WSREP: killing local connection: 353
2014-08-19 10:08:43 12146 [Note] WSREP: Service disconnected.
2014-08-19 10:08:43 12146 [Note] WSREP: rollbacker thread exiting
2014-08-19 10:08:44 12146 [Note] WSREP: Some threads may fail to exit.
2014-08-19 10:08:44 12146 [Note] Binlog end
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'partition'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'PERFORMANCE_SCHEMA'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'ARCHIVE'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'BLACKHOLE'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_SYS_DATAFILES'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_SYS_TABLESPACES'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_SYS_FOREIGN_COLS'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_SYS_FOREIGN'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_SYS_FIELDS'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_SYS_COLUMNS'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_SYS_INDEXES'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_SYS_TABLESTATS'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_SYS_TABLES'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_FT_INDEX_TABLE'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_FT_INDEX_CACHE'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_FT_CONFIG'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_FT_BEING_DELETED'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_FT_DELETED'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_FT_DEFAULT_STOPWORD'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_METRICS'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_BUFFER_POOL_STATS'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_BUFFER_PAGE_LRU'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_BUFFER_PAGE'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_CMP_PER_INDEX_RESET'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_CMP_PER_INDEX'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_CMPMEM_RESET'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_CMPMEM'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_CMP_RESET'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_CMP'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_LOCK_WAITS'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_LOCKS'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_TRX'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'InnoDB'
2014-08-19 10:08:44 12146 [Note] InnoDB: FTS optimize thread exiting.
2014-08-19 10:08:44 12146 [Note] InnoDB: Starting shutdown...
2014-08-19 10:09:44 12146 [Note] InnoDB: Waiting for 2 active transactions to finish
2014-08-19 10:10:45 12146 [Note] InnoDB: Waiting for 2 active transactions to finish
2014-08-19 10:11:45 12146 [Note] InnoDB: Waiting for 2 active transactions to finish
2014-08-19 10:12:45 12146 [Note] InnoDB: Waiting for 2 active transactions to finish
2014-08-19 10:13:45 12146 [Note] InnoDB: Waiting for 2 active transactions to finish
2014-08-19 10:14:45 12146 [Note] InnoDB: Waiting for 2 active transactions to finish

The problem being in wsrep_apply_events():

  if (thd->killed == THD::KILL_CONNECTION)
  {
    WSREP_INFO("applier has been aborted, skipping apply_rbr: %lld",
               (long long) wsrep_thd_trx_seqno(thd));
    DBUG_RETURN(WSREP_CB_FAILURE);
  }

Failing replaying left some monitors grabbed inside galera so galera threads failed to finish gracefully.

Revision history for this message
Miguel Angel Nieto (miguelangelnieto) wrote :

Confirmed by the developer that the bug can be reproduced and the fix has been already merged.

https://code.launchpad.net/~codership/codership-mysql/lp1358701/+merge/234695

Revision history for this message
Teemu Ollakka (teemu-ollakka) wrote :
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1717

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers