Bug #1367562 “Crash since Upgrade” : Bugs : Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

Revision history for this message

Nilnandan Joshi (nilnandan-joshi) wrote on 2014-09-16:

#1

Hi,

Can you provide some more information about this error like my.cnf and full error logs of both the nodes? also it would be helpful if you can provide GRA_xxx.log files if those files are created while crash in datadir and binlog files around the time, when crashed happened if binlog is enable on both the nodes.

Changed in percona-xtradb-cluster:
status:	New → Incomplete

Revision history for this message

Jai Gupta (jai-g) wrote on 2014-09-16:

#2

Download full text (7.0 KiB)

=====LOG=====
20:14:52 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=1073741824
read_buffer_size=131072
max_used_connections=55
max_threads=258
thread_count=14
connection_count=6
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1151559 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x11ba66160
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7ff1f83ebd38 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8f5835]
/usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x664384]
/lib64/libpthread.so.0(+0xf710)[0x7ff823081710]
/usr/sbin/mysqld(_ZN10MDL_ticket7destroyEPS_+0xc)[0x653aac]
/usr/sbin/mysqld(_Z17mysql_ull_cleanupP3THD+0x49)[0x5fa1f9]
/usr/sbin/mysqld(_ZN3THD7cleanupEv+0xd2)[0x6b44a2]
/usr/sbin/mysqld(_ZN3THD17release_resourcesEv+0x288)[0x6b5068]
/usr/sbin/mysqld(_Z29one_thread_per_connection_endP3THDb+0x1c)[0x58800c]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x101)[0x6baff1]
/usr/sbin/mysqld(handle_one_connection+0x47)[0x6bb247]
/usr/sbin/mysqld(pfs_spawn_thread+0x12a)[0xaee54a]
/lib64/libpthread.so.0(+0x79d1)[0x7ff8230799d1]
/lib64/libc.so.6(clone+0x6d)[0x7ff82158086d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 991886
Status: KILL_CONNECTION

You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
140908 15:14:58 mysqld_safe Number of processes running now: 0
140908 15:14:58 mysqld_safe WSREP: not restarting wsrep node automatically
140908 15:14:58 mysqld_safe mysqld from pid file /var/lib/mysql/xxxxxxxx.pid ended

====conf====
[MYSQLD]
user=mysql
datadir=/var/lib/mysql
log_error=/var/log/mysqld.log
log_warnings=2
#log_output=FILE
bind_address=xxxxxxxxxxxxxxxxxxxxxxx

### INNODB OPTIONS
innodb_buffer_pool_size=160G
innodb_flush_log_at_trx_commit=2
innodb_file_per_table=1
innodb_data_file_path = ibdata1:100M:autoextend
## You may want to tune the below depending on number of cores and disk sub
innodb_read_io_threads=4
innodb_write_io_threads=4
innodb_io_capacity=400
innodb_doublewrite=1
innodb_log_file_size=1024M
innodb_log_buffer_size=96M
innodb_buffer_pool_instances=8
innodb_log_files_in_group=2
innodb_thread_concurrency=0
#innodb_file_format=barracuda
innodb_flush_method = O_DIRECT
innodb_autoinc...

=====LOG=====
20:14:52 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=1073741824
read_buffer_size=131072
max_used_connections=55
max_threads=258
thread_count=14
connection_count=6
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1151559 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x11ba66160
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7ff1f83ebd38 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8f5835]
/usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x664384]
/lib64/libpthread.so.0(+0xf710)[0x7ff823081710]
/usr/sbin/mysqld(_ZN10MDL_ticket7destroyEPS_+0xc)[0x653aac]
/usr/sbin/mysqld(_Z17mysql_ull_cleanupP3THD+0x49)[0x5fa1f9]
/usr/sbin/mysqld(_ZN3THD7cleanupEv+0xd2)[0x6b44a2]
/usr/sbin/mysqld(_ZN3THD17release_resourcesEv+0x288)[0x6b5068]
/usr/sbin/mysqld(_Z29one_thread_per_connection_endP3THDb+0x1c)[0x58800c]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x101)[0x6baff1]
/usr/sbin/mysqld(handle_one_connection+0x47)[0x6bb247]
/usr/sbin/mysqld(pfs_spawn_thread+0x12a)[0xaee54a]
/lib64/libpthread.so.0(+0x79d1)[0x7ff8230799d1]
/lib64/libc.so.6(clone+0x6d)[0x7ff82158086d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 991886
Status: KILL_CONNECTION

You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
140908 15:14:58 mysqld_safe Number of processes running now: 0
140908 15:14:58 mysqld_safe WSREP: not restarting wsrep node automatically
140908 15:14:58 mysqld_safe mysqld from pid file /var/lib/mysql/xxxxxxxx.pid ended

====conf====
[MYSQLD]
user=mysql
datadir=/var/lib/mysql
log_error=/var/log/mysqld.log
log_warnings=2
#log_output=FILE
bind_address=xxxxxxxxxxxxxxxxxxxxxxx

### INNODB OPTIONS 
innodb_buffer_pool_size=160G
innodb_flush_log_at_trx_commit=2
innodb_file_per_table=1
innodb_data_file_path = ibdata1:100M:autoextend
## You may want to tune the below depending on number of cores and disk sub
innodb_read_io_threads=4
innodb_write_io_threads=4
innodb_io_capacity=400
innodb_doublewrite=1
innodb_log_file_size=1024M
innodb_log_buffer_size=96M
innodb_buffer_pool_instances=8
innodb_log_files_in_group=2
innodb_thread_concurrency=0
#innodb_file_format=barracuda
innodb_flush_method = O_DIRECT
innodb_autoinc_lock_mode=2
## avoid statistics update when doing e.g show tables
innodb_stats_on_metadata=0
default_storage_engine=innodb
innodb_buffer_pool_load_at_startup=1
innodb_buffer_pool_dump_at_shutdown=1

#Time
wait_timeout = 300
connect_timeout=60
interactive_timeout=300

# CHARACTER SET
collation_server = utf8_unicode_ci
init_connect='SET NAMES utf8'
character_set_server = utf8

# REPLICATION SPECIFIC
binlog_format=ROW

# OTHER THINGS, BUFFERS ETC
key_buffer_size = 24M
tmp_table_size = 64M
max_heap_table_size = 64M
max_allowed_packet = 512M
#sort_buffer_size = 256K
#read_buffer_size = 256K
#read_rnd_buffer_size = 512K
#myisam_sort_buffer_size = 8M
skip_name_resolve
max_connect_errors = 100000000
sql_mode = ''
sysdate_is_now=1
max_connections=200
thread_cache_size=512
query_cache_type = 0
query_cache_size = 0
table_open_cache=1024
lower_case_table_names=0
# 5.6 backwards compatibility
explicit_defaults_for_timestamp=1
##
## WSREP options
##

# Full path to wsrep provider library or 'none'
wsrep_provider=/usr/lib64/libgalera_smm.so

wsrep_node_address=xxxxxxxxxxxxxxxxxxxxxx1
# Provider specific configuration options
wsrep_provider_options="gcache.size=32768M"

# Logical cluster name. Should be the same for all nodes.
wsrep_cluster_name="xxxxxxxxxxx"

# Group communication system handle
wsrep_cluster_address="gcomm://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

# Human_readable node name (non-unique). Hostname by default.
#wsrep_node_name=

# Address for incoming client connections. Autodetect by default.
#wsrep_node_incoming_address=

# How many threads will process writesets from other nodes
wsrep_slave_threads=48

# DBUG options for wsrep provider
#wsrep_dbug_option

# Generate fake primary keys for non-PK tables (required for multi-master
# and parallel applying operation)
wsrep_certify_nonPK=1

# Location of the directory with data files. Needed for non-mysqldump
# state snapshot transfers. Defaults to mysql_real_data_home.
#wsrep_data_home_dir=

# Maximum number of rows in write set
wsrep_max_ws_rows=131072

# Maximum size of write set
wsrep_max_ws_size=1073741824

# to enable debug level logging, set this to 1
wsrep_debug=0

# convert locking sessions into transactions
wsrep_convert_LOCK_to_trx=0

# how many times to retry deadlocked autocommits
wsrep_retry_autocommit=1

# change auto_increment_increment and auto_increment_offset automatically
wsrep_auto_increment_control=1

# replicate myisam
wsrep_replicate_myisam=1
# retry autoinc insert, which failed for duplicate key error
wsrep_drupal_282555_workaround=0

# enable "strictly synchronous" semantics for read operations
wsrep_causal_reads=0

# Command to call when node status or cluster membership changes.
# Will be passed all or some of the following options:
# --status  - new status of this node
# --uuid    - UUID of the cluster
# --primary - whether the component is primary or not ("yes"/"no")
# --members - comma-separated list of members
# --index   - index of this node in the list
#wsrep_notify_cmd=

##
## WSREP State Transfer options
##

# State Snapshot Transfer method
# ClusterControl currently DOES NOT support wsrep_sst_method=mysqldump
wsrep_sst_method=rsync

# Address on THIS node to receive SST at. DON'T SET IT TO DONOR ADDRESS!!!
# (SST method dependent. Defaults to the first IP of the first interface)
#wsrep_sst_receive_address=

# SST authentication string. This will be used to send SST to joining nodes.
# Depends on SST method. For mysqldump method it is root:<root password>
wsrep_sst_auth="xxxxxx"

# Desired SST donor name.
#wsrep_sst_donor=

# Protocol version to use
# wsrep_protocol_version=

[MYSQL]
socket=/var/lib/mysql/mysql.sock
default_character_set=utf8

[client]
socket=/var/lib/mysql/mysql.sock
default_character_set=utf8

[mysqldump]
max_allowed_packet = 512M
socket=/var/lib/mysql/mysql.sock
default_character_set=utf8

[MYSQLD_SAFE]
pid_file=mysqld.pid
log_error=/var/log/mysqld.log
datadir=/var/lib/mysql

Revision history for this message

Miguel Angel Nieto (miguelangelnieto) wrote on 2014-09-22:

#3

Hi,

I would like to mention to things:

1- One of the crash are caused because of data inconsistencies:

2014-09-16 12:25:24 14868 [ERROR] Slave SQL: Error 'Table 'xxxx.xxxxbackup_ids_temp' doesn't exist' on query. Default database: 'xxxx'. Query: 'CREATE INDEX xxxxbackidstemp_bacitepar_ix ON xxxxbackup_ids_temp (backupid, itemname, parentitemid)', Error_code: 1146

2014-09-16 12:25:24 14868 [ERROR] Slave SQL: Error 'Table 'xxxx.xxxxbackup_ids_temp' doesn't exist' on query. Default database: 'xxxx'. Query: 'CREATE UNIQUE INDEX xxxxbackidstemp_baciteite_uix ON xxxxbackup_ids_temp (backupid, itemname, itemid)', Error_code: 1146

How are those temporary tables created? Are you creating them in InnoDB?

Those inconsistencies cause crashes after several retries.

2- You use GET_LOCK and RELEASE_LOCK. Those are not supported in Galera, see:

https://blueprints.launchpad.net/codership-mysql/+spec/get-lock-support

3- Can you share: SHOW CREATE TABLE xxxxxxxsessions\G

4- Do you write to that table from different servers? I see rollbacks on queries that affect that table. If your application rely on GET_LOCK to write on xxxxxxxsessions in a consistent way from multiple servers, then you need to review your application logic because as I said, it is not supported in galera and maybe the application is not doing what it is expected to do. This is just an advice, maybe not related with the crash.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2014-11-22:

#4

[Expired for Percona XtraDB Cluster because there has been no activity for 60 days.]

Changed in percona-xtradb-cluster:
status:	Incomplete → Expired

Revision history for this message

Przemek (pmalkowski) wrote on 2016-04-14:

#5

I think this may be caused due to moving user locks under MDL context:
https://bugs.launchpad.net/percona-server/+bug/1401528
where Percona Server, and then PXC is using implementation similar to what was later introduced in 5.7:
http://dev.mysql.com/doc/refman/5.7/en/miscellaneous-functions.html#function_get-lock

Still, as Miguel said, user locks are not supported in Galera.

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

#6

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1733

Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

Crash since Upgrade

Bug Description

Other bug subscribers

Remote bug watches