shutdown hangs because of failing replayer

Bug #1358701 reported by Teemu Ollakka on 2014-08-19
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Status tracked in 5.6
5.5
High
Teemu Ollakka
5.6
High
Teemu Ollakka
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Confirmed
Undecided
Unassigned
5.6
Fix Released
Undecided
Unassigned

Bug Description

Normal shutdown under load, server remained hanging because replayer failed to finish:

2014-08-19 10:08:41 12146 [Note] /home/vagrant/galera/node1/mysql/bin/mysqld: Normal shutdown

2014-08-19 10:08:41 12146 [Note] WSREP: Stop replication
2014-08-19 10:08:41 12146 [Note] WSREP: Closing send monitor...
2014-08-19 10:08:41 12146 [Note] WSREP: gcomm: closing backend
2014-08-19 10:08:41 12146 [Note] WSREP: evs::proto(dd2a6026, OPERATIONAL, view_id(REG,01068177,59)): state change: OPERATIONAL -> LEAVING
2014-08-19 10:08:41 12146 [Note] WSREP: evs::proto(dd2a6026, LEAVING, view_id(REG,01068177,59)): state change: LEAVING -> CLOSED
2014-08-19 10:08:41 12146 [Note] WSREP: evs::proto(dd2a6026, LEAVING, view_id(REG,01068177,59)): delivering view view(view_id(TRANS,01068177,59) memb {
 dd2a6026,0
} joined {
} left {
} partitioned {
 01068177,0
 1e608ebc,0
})
2014-08-19 10:08:41 12146 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2014-08-19 10:08:41 12146 [Note] WSREP: evs::proto(dd2a6026, LEAVING, view_id(REG,01068177,59)): delivering view view((empty))
2014-08-19 10:08:41 12146 [Note] WSREP: gcomm: closed
2014-08-19 10:08:41 12146 [Note] WSREP: Flow-control interval: [16, 16]
2014-08-19 10:08:41 12146 [Note] WSREP: Received NON-PRIMARY.
2014-08-19 10:08:41 12146 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 187802)
2014-08-19 10:08:41 12146 [Note] WSREP: Received self-leave message.
2014-08-19 10:08:41 12146 [Note] WSREP: Flow-control interval: [0, 0]
2014-08-19 10:08:41 12146 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2014-08-19 10:08:41 12146 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 187802)
2014-08-19 10:08:41 12146 [Note] WSREP: RECV thread exiting 0: Success
2014-08-19 10:08:41 12146 [Note] WSREP: recv_thread() joined.
2014-08-19 10:08:41 12146 [Note] WSREP: Closing replication queue.
2014-08-19 10:08:41 12146 [Note] WSREP: Closing slave action queue.
2014-08-19 10:08:41 12146 [Note] WSREP: Closing client connections
2014-08-19 10:08:41 12146 [Note] WSREP: applier has been aborted, skipping apply_rbr: 187802
2014-08-19 10:08:41 12146 [Warning] WSREP: Failed to apply app buffer: seqno: 187802, status: 1
  at /home/teemu/work/pytest/build/galera_build-debug-gh71/galera/src/trx_handle.cpp:apply():340
  at /home/teemu/work/pytest/build/galera_build-debug-gh71/galera/src/replicator_smm.cpp:apply_trx_ws():37
Retrying 2th time
2014-08-19 10:08:41 12146 [Note] WSREP: applier has been aborted, skipping apply_rbr: 187802
2014-08-19 10:08:41 12146 [Warning] WSREP: Failed to apply app buffer: seqno: 187802, status: 1
  at /home/teemu/work/pytest/build/galera_build-debug-gh71/galera/src/trx_handle.cpp:apply():340
  at /home/teemu/work/pytest/build/galera_build-debug-gh71/galera/src/replicator_smm.cpp:apply_trx_ws():37
Retrying 3th time
2014-08-19 10:08:41 12146 [Note] WSREP: applier has been aborted, skipping apply_rbr: 187802
2014-08-19 10:08:41 12146 [Warning] WSREP: Failed to apply app buffer: seqno: 187802, status: 1
  at /home/teemu/work/pytest/build/galera_build-debug-gh71/galera/src/trx_handle.cpp:apply():340
  at /home/teemu/work/pytest/build/galera_build-debug-gh71/galera/src/replicator_smm.cpp:apply_trx_ws():37
Retrying 4th time
2014-08-19 10:08:41 12146 [Note] WSREP: applier has been aborted, skipping apply_rbr: 187802
2014-08-19 10:08:41 12146 [Warning] WSREP: failed to replay trx: source: dd2a6026-2787-11e4-b24a-67f75d26ccfe version: 3 local: 1 state: REPLAYING flags: 1 conn_id: 360 trx_id: 387759 seqnos (l: 15344, g: 187802, s: 187798, d: 187792, ts: 806609869780997)
2014-08-19 10:08:41 12146 [Warning] WSREP: Failed to apply trx 187802 4 times
  at /home/teemu/work/pytest/build/galera_build-debug-gh71/galera/src/replicator_smm.cpp:replay_trx():849
2014-08-19 10:08:41 12146 [ERROR] WSREP: trx_replay failed for: 6, query: COMMIT
2014-08-19 10:08:41 12146 [ERROR] Aborting

2014-08-19 10:08:41 12146 [Note] WSREP: Closing client connections
2014-08-19 10:08:43 12146 [Note] WSREP: killing local connection: 353
2014-08-19 10:08:43 12146 [Note] WSREP: Service disconnected.
2014-08-19 10:08:43 12146 [Note] WSREP: rollbacker thread exiting
2014-08-19 10:08:44 12146 [Note] WSREP: Some threads may fail to exit.
2014-08-19 10:08:44 12146 [Note] Binlog end
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'partition'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'PERFORMANCE_SCHEMA'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'ARCHIVE'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'BLACKHOLE'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_SYS_DATAFILES'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_SYS_TABLESPACES'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_SYS_FOREIGN_COLS'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_SYS_FOREIGN'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_SYS_FIELDS'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_SYS_COLUMNS'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_SYS_INDEXES'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_SYS_TABLESTATS'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_SYS_TABLES'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_FT_INDEX_TABLE'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_FT_INDEX_CACHE'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_FT_CONFIG'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_FT_BEING_DELETED'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_FT_DELETED'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_FT_DEFAULT_STOPWORD'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_METRICS'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_BUFFER_POOL_STATS'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_BUFFER_PAGE_LRU'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_BUFFER_PAGE'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_CMP_PER_INDEX_RESET'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_CMP_PER_INDEX'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_CMPMEM_RESET'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_CMPMEM'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_CMP_RESET'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_CMP'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_LOCK_WAITS'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_LOCKS'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'INNODB_TRX'
2014-08-19 10:08:44 12146 [Note] Shutting down plugin 'InnoDB'
2014-08-19 10:08:44 12146 [Note] InnoDB: FTS optimize thread exiting.
2014-08-19 10:08:44 12146 [Note] InnoDB: Starting shutdown...
2014-08-19 10:09:44 12146 [Note] InnoDB: Waiting for 2 active transactions to finish
2014-08-19 10:10:45 12146 [Note] InnoDB: Waiting for 2 active transactions to finish
2014-08-19 10:11:45 12146 [Note] InnoDB: Waiting for 2 active transactions to finish
2014-08-19 10:12:45 12146 [Note] InnoDB: Waiting for 2 active transactions to finish
2014-08-19 10:13:45 12146 [Note] InnoDB: Waiting for 2 active transactions to finish
2014-08-19 10:14:45 12146 [Note] InnoDB: Waiting for 2 active transactions to finish

The problem being in wsrep_apply_events():

  if (thd->killed == THD::KILL_CONNECTION)
  {
    WSREP_INFO("applier has been aborted, skipping apply_rbr: %lld",
               (long long) wsrep_thd_trx_seqno(thd));
    DBUG_RETURN(WSREP_CB_FAILURE);
  }

Failing replaying left some monitors grabbed inside galera so galera threads failed to finish gracefully.

Confirmed by the developer that the bug can be reproduced and the fix has been already merged.

https://code.launchpad.net/~codership/codership-mysql/lp1358701/+merge/234695

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1717

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers