Replication slave from PXC crashes when deadlock found and corrupts databate / can't resume
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Percona Server moved to https://jira.percona.com/projects/PS | Status tracked in 5.7 | |||||
5.5 |
Incomplete
|
Undecided
|
Unassigned | |||
5.6 |
Incomplete
|
Undecided
|
Unassigned | |||
5.7 |
Incomplete
|
Undecided
|
Unassigned | |||
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC |
New
|
Undecided
|
Unassigned |
Bug Description
When there is a long reporting query running (old query which takes 15 minutes) causes system lock on the replication SQL thread. Later when it tries to resume it fails with error Error_code: 1062; handler error HA_ERR_
# VERSION installed on the reporting slave
-------
libperconaser
percona-
percona-
percona-
percona-toolkit 2.2.12 all
percona-
# VERSION installed on the galera cluster node acting as a master
-------
percona-
percona-toolkit 2.2.12 all
percona-
percona-
percona-
percona-
percona-
# ERRORS found in the error log - this is repeatable situation
2015-02-12 04:15:09 31165 [Warning] Slave SQL: Could not execute Update_rows event on table mygame.
2015-02-12 04:20:06 31165 [Warning] Slave SQL: Could not execute Update_rows event on table mygame.
2015-02-12 04:50:09 31165 [Warning] Slave SQL: Could not execute Update_rows event on table mygame.
2015-02-12 05:00:10 31165 [Warning] Slave SQL: Could not execute Update_rows event on table mygame.
2015-02-12 05:26:33 31165 [Warning] Slave SQL: Could not execute Update_rows event on table mygame.
2015-02-12 08:20:13 31165 [Warning] Slave SQL: Could not execute Update_rows event on table mygame.
2015-02-12 08:31:38 31165 [Warning] Slave SQL: Could not execute Update_rows event on table mygame.
2015-02-12 10:17:29 31165 [Warning] Slave SQL: Could not execute Update_rows event on table mygame.
2015-02-12 10:28:24 31165 [Warning] Slave SQL: Could not execute Update_rows event on table mygame.
2015-02-12 10:33:53 31165 [Warning] Slave SQL: Could not execute Update_rows event on table mygame.
2015-02-12 10:34:02 31165 [Warning] Slave SQL: Could not execute Update_rows event on table mygame.
2015-02-12 10:39:20 31165 [Warning] Slave SQL: Could not execute Update_rows event on table mygame.
2015-02-12 11:22:33 31165 [Warning] Slave SQL: Could not execute Update_rows event on table mygame.
2015-02-12 11:26:52 31165 [Warning] Slave SQL: Could not execute Update_rows event on table mygame.
2015-02-12 11:26:54 31165 [Warning] Slave SQL: Could not execute Update_rows event on table mygame.
2015-02-12 12:28:38 31165 [Warning] Aborted connection 490826 to db: 'mygame_report' user: 'root' host: '127.0.0.1' (Got an error reading communication packets)
2015-02-12 13:27:32 31165 [Warning] Slave SQL: Could not execute Write_rows event on table mygame.
2015-02-12 13:27:32 31165 [ERROR] Slave SQL: Could not execute Write_rows event on table mygame.state; Duplicate entry '8570235' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_
2015-02-12 13:27:32 31165 [Warning] Slave: Duplicate entry '8570235' for key 'PRIMARY' Error_code: 1062
2015-02-12 13:27:32 31165 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped
at log 'binlog.000012' position 166132545
# CONFIG FILE on the galera node acting as master
-------
[MYSQLD]
user=mysql
basedir=/usr/
datadir=
socket=
pid_file=mysqld.pid
port=3306
log_error=
log_warnings=2
# Use only ipv4 for connections
bind_address=
#log_output=FILE
### INNODB OPTIONS
innodb_
innodb_
innodb_
innodb_
## You may want to tune the below depending on number of cores and disk sub
#innodb_
#innodb_
innodb_
innodb_
innodb_
innodb_
innodb_
innodb_
innodb_
innodb_
#innodb_
innodb_
innodb_
## avoid statistics update when doing e.g show tables
innodb_
default_
## HEAVY WRITING ENVIROMENT ##
innodb_
## SPEEDUP ##
## /dev/shm uses virtual memory instead of a persistent storage device
tmpdir=/tmp
back_log=2048
## MYGAME SPECIFIC SETTINGS ##
lower_case_
#autocommit=0
init_connect='SET autocommit=0'
## DEFAULT CHARACTER SET ##
skip-character-
collation_
character_
# REPLICATION SPECIFIC _ GENERAL
#server_id must be unique across all mysql servers participating in replication.
server_id=8
# REPLICATION SPECIFIC
binlog_format=ROW
log_bin=binlog
#relay_
slave_net_
#skip-slave-start=1
#read_only=1
expire_logs_days=7
sync_binlog=0
log_slave_updates=1
#gtid_mode=ON
# allow create users fuctions
log_bin_
#enforce_
# OTHER THINGS, BUFFERS ETC
key_buffer_size = 32M
tmp_table_size = 64M
max_heap_table_size = 64M
## SAFETY ##
max_allowed_packet = 16M
max_connect_errors = 1000000
#sort_buffer_size = 256K
#read_buffer_size = 256K
#read_rnd_
#myisam_
skip_name_resolve
memlock=0
sysdate_is_now=1
max_connections=200
## EXTRA SAFETY ##
symbolic-links=0
thread_
open_files_
table_definitio
query_cache_type = 0
query_cache_size = 0
table_open_
# 5.6 backwards compatibility
explicit_
## SLOW LOG ##
slow_query_log = 1
slow_query_log_file = /var/log/
log_slow_verbosity = FULL
long_query_time = 10
slow_query_
log_queries_
event_scheduler=0
##
## WSREP options
##
# Full path to wsrep provider library or 'none'
wsrep_provider=
wsrep_node_
# Provider specific configuration options
wsrep_provider_
# Logical cluster name. Should be the same for all nodes.
wsrep_cluster_
# Group communication system handle
wsrep_cluster_
# Human_readable node name (non-unique). Hostname by default.
wsrep_node_
# Address for incoming client connections. Autodetect by default.
#wsrep_
# How many threads will process writesets from other nodes
#wsrep_
# DBUG options for wsrep provider
#wsrep_dbug_option
# Generate fake primary keys for non-PK tables (required for multi-master
# and parallel applying operation)
wsrep_certify_
# Location of the directory with data files. Needed for non-mysqldump
# state snapshot transfers. Defaults to mysql_real_
#wsrep_
# Maximum number of rows in write set
wsrep_max_
# Maximum size of write set
wsrep_max_
# to enable debug level logging, set this to 1
wsrep_debug=0
# convert locking sessions into transactions
wsrep_convert_
# how many times to retry deadlocked autocommits
wsrep_retry_
# change auto_increment_
wsrep_auto_
# replicate myisam
wsrep_replicate
# retry autoinc insert, which failed for duplicate key error
wsrep_drupal_
# enable "strictly synchronous" semantics for read operations
wsrep_causal_
# Command to call when node status or cluster membership changes.
# Will be passed all or some of the following options:
# --status - new status of this node
# --uuid - UUID of the cluster
# --primary - whether the component is primary or not ("yes"/"no")
# --members - comma-separated list of members
# --index - index of this node in the list
wsrep_notify_
##
## WSREP State Transfer options
##
# State Snapshot Transfer method
# ClusterControl currently DOES NOT support wsrep_sst_
wsrep_sst_
# Address on THIS node to receive SST at. DON'T SET IT TO DONOR ADDRESS!!!
# (SST method dependent. Defaults to the first IP of the first interface)
#wsrep_
# SST authentication string. This will be used to send SST to joining nodes.
# Depends on SST method. For mysqldump method it is root:<root password>
wsrep_sst_
# Desired SST donor name.
wsrep_sst_
# Protocol version to use
# wsrep_protocol_
[MYSQL]
socket=
#default_
[client]
socket=
#default_
[mysqldump]
max_allowed_packet = 16M
socket=
#default_
[MYSQLD_SAFE]
pid_file=mysqld.pid
log_error=
basedir=/usr/
datadir=
# CONFIG FILE on the reporting slave
-------
[MYSQLD]
user=mysql
basedir=/usr/
datadir=
socket=
pid_file=mysqld.pid
port=3306
log_error=
log_warnings=2
# Use only ipv4 for connections
bind_address=
#log_output=FILE
### INNODB OPTIONS
innodb_
innodb_
innodb_
innodb_
## You may want to tune the below depending on number of cores and disk sub
#innodb_
#innodb_
#innodb_
#innodb_
innodb_
innodb_
innodb_
innodb_
innodb_
innodb_
#innodb_
innodb_
#innodb_
## avoid statistics update when doing e.g show tables
innodb_
default_
## HEAVY WRITING ENVIROMENT ##
innodb_
## SPEEDUP ##
## /dev/shm uses virtual memory instead of a persistent storage device
tmpdir=/data02/tmp
back_log=2048
## MYGAME SPECIFIC SETTINGS ##
lower_case_
#autocommit=0
init_connect='SET autocommit=0'
## DEFAULT CHARACTER SET ##
skip-character-
collation_
character_
# REPLICATION SPECIFIC _ GENERAL
#server_id must be unique across all mysql servers participating in replication.
server_id=9
# REPLICATION SPECIFIC
#binlog_format=ROW
#log_bin=binlog
#relay_
slave_net_
#skip-slave-start=1
innodb_
slave_transacti
master_
relay_log_
#read_only=1
#expire_logs_days=7
#sync_binlog=0
#log_slave_
#gtid_mode=ON
# allow create users fuctions
log_bin_
#enforce_
# OTHER THINGS, BUFFERS ETC
key_buffer_size = 32M
tmp_table_size = 64M
max_heap_table_size = 64M
## SAFETY ##
max_allowed_packet = 16M
max_connect_errors = 1000000
#sort_buffer_size = 256K
#read_buffer_size = 256K
#read_rnd_
#myisam_
skip_name_resolve
#memlock=0
sysdate_is_now=1
max_connections=200
## EXTRA SAFETY ##
symbolic-links=0
thread_
open_files_
table_definitio
query_cache_type = 0
query_cache_size = 0
table_open_
# 5.6 backwards compatibility
explicit_
## SLOW LOG ##
slow_query_log = 0
slow_query_log_file = /var/log/
log_slow_verbosity = FULL
long_query_time = 10
slow_query_
log_queries_
event_scheduler=1
[MYSQL]
socket=
#default_
[client]
socket=
#default_
[mysqldump]
max_allowed_packet = 16M
socket=
#default_
[MYSQLD_SAFE]
pid_file=mysqld.pid
log_error=
basedir=/usr/
datadir=
We tried to mitigate this behavior by adding lock_wait_ timeout= 900 on_retries= 100000
innodb_
slave_transacti
and it was successful until today.
Otherwise any really large query causes system lock on the replication thread and then this can happen.
Found something similar at /www.kickstarte r.com/backing- and-hacking/ the-day- the-replication -died
https:/
and https:/ /bugs.launchpad .net/percona- xtradb- cluster/ +bug/1214465, but we have binlog_ format= NONE and pt-table-checksum runs twice a day to check galera cluster and replication slaves and it's OK.
wsrep_forced_
This is repeatable error.