The PMP was taken later on, after the database was stuck for over an hour.
DONOR:
set starts around 4:09. However, the logs have been removed from the server, cannot access them anymore.
Then a long time nothing was logged and at the moment the database got stuck (confirmed by customer), we got those 2 errors:
130730 6:03:10 [Note] WSREP: Provider paused at a6280df4-f907-11e2-b79a-3fa52bb11ac5:261230
130730 6:04:40 [ERROR] Threadpool could not create additional thread to handle queries, because the number of allowed threads was reached. Increasing 'thread_pool_max_threads' parameter can help in this situation.
If 'extra_port' parameter is set, you can still connect to the database with superuser account (it must be TCP connection using extra_port as TCP port) and troubleshoot the situation. A likely cause of pool blocks are clients that lock resources for long time. 'show
processlist' or 'show engine innodb status' can give additional hints.
130730 6:04:40 [Note] Threadpool has been blocked for 30 seconds
JOINER:
130730 04:09:52 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
130730 04:09:52 mysqld_safe Skipping wsrep-recover for empty datadir: /var/lib/mysql
130730 04:09:52 mysqld_safe Assigning 00000000-0000-0000-0000-000000000000:-1 to wsrep_start_position
130730 4:09:52 [Note] WSREP: wsrep_start_position var submitted: '00000000-0000-0000-0000-000000000000:-1'
130730 4:09:52 [ERROR] An old style --language value with language specific part detected: /usr/share/mysql/english/
130730 4:09:52 [ERROR] Use --lc-messages-dir without language specific part instead.
130730 4:09:52 [Note] WSREP: Read nil XID from storage engines, skipping position init
130730 4:09:52 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
130730 4:09:52 [Note] WSREP: wsrep_load(): Galera 2.5(r150) by Codership Oy <email address hidden> loaded succesfully.
130730 4:09:52 [Note] WSREP: Found saved state: cfa1bbe6-ad61-11e2-0800-ddbec9296193:1136915979
130730 4:09:52 [Note] WSREP: Reusing existing '/var/lib/mysql//galera.cache'.
130730 4:09:52 [Note] WSREP: Passing config to GCS: base_host = 192.168.20.122; base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 2G; gcs.fc_debug = 0; gcs.fc_factor = 0.99; gcs.fc_limit = 256; gcs.fc_master_slave = yes; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
130730 4:09:52 [Note] WSREP: Assign initial position for certification: 1136915979, protocol version: -1
130730 4:09:52 [Note] WSREP: wsrep_sst_grab()
130730 4:09:52 [Note] WSREP: Start replication
130730 4:09:52 [Note] WSREP: Setting initial position to cfa1bbe6-ad61-11e2-0800-ddbec9296193:1136915979
130730 4:09:52 [Note] WSREP: protonet asio version 0
130730 4:09:52 [Note] WSREP: backend: asio
130730 4:09:52 [Note] WSREP: GMCast version 0
130730 4:09:52 [Note] WSREP: (8e9245e4-f908-11e2-0800-b18be9f1839f, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
130730 4:09:52 [Note] WSREP: (8e9245e4-f908-11e2-0800-b18be9f1839f, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
130730 4:09:52 [Note] WSREP: EVS version 0
130730 4:09:52 [Note] WSREP: PC version 0
130730 4:09:52 [Note] WSREP: gcomm: connecting to group 'Production-XtraDB-Cluster', peer '192.168.20.121:,192.168.20.122:,192.168.20.120:'
130730 4:09:52 [Warning] WSREP: (8e9245e4-f908-11e2-0800-b18be9f1839f, 'tcp://0.0.0.0:4567') address 'tcp://192.168.20.122:4567' points to own listening address, blacklisting
130730 4:09:52 [Note] WSREP: (8e9245e4-f908-11e2-0800-b18be9f1839f, 'tcp://0.0.0.0:4567') address 'tcp://192.168.20.122:4567' pointing to uuid 8e9245e4-f908-11e2-0800-b18be9f1839f is blacklisted, skipping
130730 4:09:52 [Note] WSREP: declaring a6277ce6-f907-11e2-a7d9-77ba0c38d3c4 stable
130730 4:09:52 [Note] WSREP: Node a6277ce6-f907-11e2-a7d9-77ba0c38d3c4 state prim
130730 4:09:52 [Note] WSREP: view(view_id(PRIM,8e9245e4-f908-11e2-0800-b18be9f1839f,16) memb { 8e9245e4-f908-11e2-0800-b18be9f1839f, a6277ce6-f907-11e2-a7d9-77ba0c38d3c4,
} joined {
} left {
} partitioned {
})
130730 4:09:52 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.20.120:4567
130730 4:09:53 [Note] WSREP: gcomm: connected
130730 4:09:53 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
130730 4:09:53 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
130730 4:09:53 [Note] WSREP: Opened channel 'Production-XtraDB-Cluster'
130730 4:09:53 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
130730 4:09:53 [Note] WSREP: Waiting for SST to complete.
130730 4:09:53 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 8f2b72c8-f908-11e2-0800-645981eab011
130730 4:09:53 [Note] WSREP: STATE EXCHANGE: sent state msg: 8f2b72c8-f908-11e2-0800-645981eab011
130730 4:09:53 [Note] WSREP: STATE EXCHANGE: got state msg: 8f2b72c8-f908-11e2-0800-645981eab011 from 0 (DB3)
130730 4:09:53 [Note] WSREP: STATE EXCHANGE: got state msg: 8f2b72c8-f908-11e2-0800-645981eab011 from 1 (DB2)
130730 4:09:53 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 3,
members = 1/2 (joined/total),
act_id = 3582,
last_appl. = -1,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = a6280df4-f907-11e2-b79a-3fa52bb11ac5
130730 4:09:53 [Note] WSREP: Flow-control interval: [253, 256]
130730 4:09:53 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 3582)
130730 4:09:53 [Note] WSREP: State transfer required:
Group state: a6280df4-f907-11e2-b79a-3fa52bb11ac5:3582
Local state: cfa1bbe6-ad61-11e2-0800-ddbec9296193:1136915979
130730 4:09:53 [Note] WSREP: New cluster view: global state: a6280df4-f907-11e2-b79a-3fa52bb11ac5:3582, view# 4: Primary, number of nodes: 2, my index: 0, protocol version 2
130730 4:09:53 [Warning] WSREP: Gap in state sequence. Need state transfer.
130730 4:09:55 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'joiner' --address '192.168.20.122' --auth 'user:pass' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '39743''
WSREP_SST: [INFO] Streaming with tar (20130730 04:09:55.183)
WSREP_SST: [INFO] Note: Advanced xtrabackup features - encryption,compression etc. not available with tar. (20130730 04:09:55.186)
WSREP_SST: [INFO] However, xbstream requires manual cleanup of data directory before SST - lp:1193240. (20130730 04:09:55.189)
130730 4:09:55 [Note] WSREP: Prepared SST request: xtrabackup|192.168.20.122:4444/xtrabackup_sst
130730 4:09:55 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
130730 4:09:55 [Warning] WSREP: moving position backwards: 1136915979 -> 3582
130730 4:09:55 [Note] WSREP: Assign initial position for certification: 3582, protocol version: 2
130730 4:09:55 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (cfa1bbe6-ad61-11e2-0800-ddbec9296193) does not match group state UUID (a6280df4-f907-11e2-b79a-3fa52bb11ac5): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():436. IST will be unavailable.
130730 4:09:55 [Note] WSREP: Node 0 (DB3) requested state transfer from '*any*'. Selected 1 (DB2)(SYNCED) as donor.
130730 4:09:55 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 3582)
130730 4:09:55 [Note] WSREP: Requesting state transfer: success, donor: 1
Nothing after that, and not at 6AM when the master got stuck.
The PMP was taken later on, after the database was stuck for over an hour.
DONOR:
set starts around 4:09. However, the logs have been removed from the server, cannot access them anymore.
Then a long time nothing was logged and at the moment the database got stuck (confirmed by customer), we got those 2 errors:
130730 6:03:10 [Note] WSREP: Provider paused at a6280df4- f907-11e2- b79a-3fa52bb11a c5:261230 pool_max_ threads' parameter can help in this situation.
130730 6:04:40 [ERROR] Threadpool could not create additional thread to handle queries, because the number of allowed threads was reached. Increasing 'thread_
If 'extra_port' parameter is set, you can still connect to the database with superuser account (it must be TCP connection using extra_port as TCP port) and troubleshoot the situation. A likely cause of pool blocks are clients that lock resources for long time. 'show
processlist' or 'show engine innodb status' can give additional hints.
130730 6:04:40 [Note] Threadpool has been blocked for 30 seconds
JOINER: 0000-0000- 0000-0000000000 00:-1 to wsrep_start_ position position var submitted: '00000000- 0000-0000- 0000-0000000000 00:-1' mysql/english/ libgalera_ smm.so' ad61-11e2- 0800-ddbec92961 93:1136915979 mysql// galera. cache'. keep_pages_ size = 0; gcache.mem_size = 0; gcache.name = /var/lib/ mysql// galera. cache; gcache.page_size = 128M; gcache.size = 2G; gcs.fc_debug = 0; gcs.fc_factor = 0.99; gcs.fc_limit = 256; gcs.fc_master_slave = yes; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_ q_hard_ limit = 922337203685477 5807; gcs.recv_ q_soft_ limit = 0.25; gcs.sync_donor = NO; replicator. causal_ read_timeout = PT30S; replicator. commit_ order = 3 ad61-11e2- 0800-ddbec92961 93:1136915979 f908-11e2- 0800-b18be9f183 9f, 'tcp:// 0.0.0.0: 4567') listening at tcp://0.0.0.0:4567 f908-11e2- 0800-b18be9f183 9f, 'tcp:// 0.0.0.0: 4567') multicast: , ttl: 1 XtraDB- Cluster' , peer '192.168. 20.121: ,192.168. 20.122: ,192.168. 20.120: ' f908-11e2- 0800-b18be9f183 9f, 'tcp:// 0.0.0.0: 4567') address 'tcp:// 192.168. 20.122: 4567' points to own listening address, blacklisting f908-11e2- 0800-b18be9f183 9f, 'tcp:// 0.0.0.0: 4567') address 'tcp:// 192.168. 20.122: 4567' pointing to uuid 8e9245e4- f908-11e2- 0800-b18be9f183 9f is blacklisted, skipping f907-11e2- a7d9-77ba0c38d3 c4 stable f907-11e2- a7d9-77ba0c38d3 c4 state prim id(PRIM, 8e9245e4- f908-11e2- 0800-b18be9f183 9f,16) memb {
8e9245e4- f908-11e2- 0800-b18be9f183 9f,
a6277ce6- f907-11e2- a7d9-77ba0c38d3 c4, 168.20. 120:4567 XtraDB- Cluster' f908-11e2- 0800-645981eab0 11 f908-11e2- 0800-645981eab0 11 f908-11e2- 0800-645981eab0 11 from 0 (DB3) f908-11e2- 0800-645981eab0 11 from 1 (DB2) f907-11e2- b79a-3fa52bb11a c5 f907-11e2- b79a-3fa52bb11a c5:3582 ad61-11e2- 0800-ddbec92961 93:1136915979 f907-11e2- b79a-3fa52bb11a c5:3582, view# 4: Primary, number of nodes: 2, my index: 0, protocol version 2 sst_xtrabackup --role 'joiner' --address '192.168.20.122' --auth 'user:pass' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '39743'' compression etc. not available with tar. (20130730 04:09:55.186) 192.168. 20.122: 4444/xtrabackup _sst ad61-11e2- 0800-ddbec92961 93) does not match group state UUID (a6280df4- f907-11e2- b79a-3fa52bb11a c5): 1 (Operation not permitted) src/replicator_ str.cpp: prepare_ for_IST( ):436. IST will be unavailable.
130730 04:09:52 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
130730 04:09:52 mysqld_safe Skipping wsrep-recover for empty datadir: /var/lib/mysql
130730 04:09:52 mysqld_safe Assigning 00000000-
130730 4:09:52 [Note] WSREP: wsrep_start_
130730 4:09:52 [ERROR] An old style --language value with language specific part detected: /usr/share/
130730 4:09:52 [ERROR] Use --lc-messages-dir without language specific part instead.
130730 4:09:52 [Note] WSREP: Read nil XID from storage engines, skipping position init
130730 4:09:52 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/
130730 4:09:52 [Note] WSREP: wsrep_load(): Galera 2.5(r150) by Codership Oy <email address hidden> loaded succesfully.
130730 4:09:52 [Note] WSREP: Found saved state: cfa1bbe6-
130730 4:09:52 [Note] WSREP: Reusing existing '/var/lib/
130730 4:09:52 [Note] WSREP: Passing config to GCS: base_host = 192.168.20.122; base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/; gcache.
130730 4:09:52 [Note] WSREP: Assign initial position for certification: 1136915979, protocol version: -1
130730 4:09:52 [Note] WSREP: wsrep_sst_grab()
130730 4:09:52 [Note] WSREP: Start replication
130730 4:09:52 [Note] WSREP: Setting initial position to cfa1bbe6-
130730 4:09:52 [Note] WSREP: protonet asio version 0
130730 4:09:52 [Note] WSREP: backend: asio
130730 4:09:52 [Note] WSREP: GMCast version 0
130730 4:09:52 [Note] WSREP: (8e9245e4-
130730 4:09:52 [Note] WSREP: (8e9245e4-
130730 4:09:52 [Note] WSREP: EVS version 0
130730 4:09:52 [Note] WSREP: PC version 0
130730 4:09:52 [Note] WSREP: gcomm: connecting to group 'Production-
130730 4:09:52 [Warning] WSREP: (8e9245e4-
130730 4:09:52 [Note] WSREP: (8e9245e4-
130730 4:09:52 [Note] WSREP: declaring a6277ce6-
130730 4:09:52 [Note] WSREP: Node a6277ce6-
130730 4:09:52 [Note] WSREP: view(view_
} joined {
} left {
} partitioned {
})
130730 4:09:52 [Note] WSREP: discarding pending addr without UUID: tcp://192.
130730 4:09:53 [Note] WSREP: gcomm: connected
130730 4:09:53 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
130730 4:09:53 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
130730 4:09:53 [Note] WSREP: Opened channel 'Production-
130730 4:09:53 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
130730 4:09:53 [Note] WSREP: Waiting for SST to complete.
130730 4:09:53 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 8f2b72c8-
130730 4:09:53 [Note] WSREP: STATE EXCHANGE: sent state msg: 8f2b72c8-
130730 4:09:53 [Note] WSREP: STATE EXCHANGE: got state msg: 8f2b72c8-
130730 4:09:53 [Note] WSREP: STATE EXCHANGE: got state msg: 8f2b72c8-
130730 4:09:53 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 3,
members = 1/2 (joined/total),
act_id = 3582,
last_appl. = -1,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = a6280df4-
130730 4:09:53 [Note] WSREP: Flow-control interval: [253, 256]
130730 4:09:53 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 3582)
130730 4:09:53 [Note] WSREP: State transfer required:
Group state: a6280df4-
Local state: cfa1bbe6-
130730 4:09:53 [Note] WSREP: New cluster view: global state: a6280df4-
130730 4:09:53 [Warning] WSREP: Gap in state sequence. Need state transfer.
130730 4:09:55 [Note] WSREP: Running: 'wsrep_
WSREP_SST: [INFO] Streaming with tar (20130730 04:09:55.183)
WSREP_SST: [INFO] Note: Advanced xtrabackup features - encryption,
WSREP_SST: [INFO] However, xbstream requires manual cleanup of data directory before SST - lp:1193240. (20130730 04:09:55.189)
130730 4:09:55 [Note] WSREP: Prepared SST request: xtrabackup|
130730 4:09:55 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
130730 4:09:55 [Warning] WSREP: moving position backwards: 1136915979 -> 3582
130730 4:09:55 [Note] WSREP: Assign initial position for certification: 3582, protocol version: 2
130730 4:09:55 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (cfa1bbe6-
at galera/
130730 4:09:55 [Note] WSREP: Node 0 (DB3) requested state transfer from '*any*'. Selected 1 (DB2)(SYNCED) as donor.
130730 4:09:55 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 3582)
130730 4:09:55 [Note] WSREP: Requesting state transfer: success, donor: 1
Nothing after that, and not at 6AM when the master got stuck.