server crash with InnoDB: Assertion failure in thread ... in file rem0rec.cc line 580

Bug #1589919 reported by Cloudseeds
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Invalid
Undecided
Unassigned

Bug Description

Hi,

we have a running cluster with 3 nodes. One of the servers crashes every few weeks with following error messages:

2016-06-07 03:36:19 7f08a47be700 InnoDB: Assertion failure in thread 139675096049408 in file rem0rec.cc line 580
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
01:36:19 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=16777216
read_buffer_size=131072
max_used_connections=370
max_threads=1002
thread_count=59
connection_count=50
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 415502 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x24b8f820
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f08a47bdd60 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x3b)[0x904edb]
/usr/sbin/mysqld(handle_fatal_signal+0x471)[0x677f61]
/usr/lib64/libpthread.so.0(+0xf130)[0x7f0e6b67f130]
/usr/lib64/libc.so.6(gsignal+0x37)[0x7f0e699f55d7]
/usr/lib64/libc.so.6(abort+0x148)[0x7f0e699f6cc8]
/usr/sbin/mysqld[0x9f40bb]
/usr/sbin/mysqld(_Z25wsrep_rec_get_foreign_keyPhPmPKhP12dict_index_tS4_m+0x7e)[0x9f673e]
/usr/sbin/mysqld(_Z24wsrep_append_foreign_keyP5trx_tP14dict_foreign_tPKhP12dict_index_tmm+0x43e)[0x97559e]
/usr/sbin/mysqld[0xa03555]
/usr/sbin/mysqld[0xa05a7c]
/usr/sbin/mysqld[0xa2d938]
/usr/sbin/mysqld[0xa31c9a]
/usr/sbin/mysqld[0xa328cb]
/usr/sbin/mysqld[0xa1575b]
/usr/sbin/mysqld[0x97818f]
/usr/sbin/mysqld(_ZN7handler13ha_delete_rowEPKh+0x10f)[0x5bd30f]
/usr/sbin/mysqld(_Z12mysql_deleteP3THDP10TABLE_LISTP4ItemP10SQL_I_ListI8st_orderEyy+0xe9c)[0x8440ac]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x24c8)[0x6ff558]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x5b8)[0x705678]
/usr/sbin/mysqld[0x705ed1]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1b92)[0x708232]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x274)[0x709d24]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x182)[0x6d15b2]
/usr/sbin/mysqld(handle_one_connection+0x40)[0x6d17b0]
/usr/lib64/libpthread.so.0(+0x7df5)[0x7f0e6b677df5]
/usr/lib64/libc.so.6(clone+0x6d)[0x7f0e69ab61ad]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f08800c9a20): is an invalid pointer
Connection ID (thread ID): 20345247
Status: NOT_KILLED

You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
160607 03:36:21 mysqld_safe Number of processes running now: 0
160607 03:36:21 mysqld_safe WSREP: not restarting wsrep node automatically
160607 03:36:21 mysqld_safe mysqld from pid file /run/mysqld/mysql.pid ended

We use following versions:
Percona-XtraDB-Cluster-server-56-5.6.26-25.12.1.el7.x86_64
Percona-XtraDB-Cluster-galera-3-3.12.2-1.rhel7.x86_64
Percona-XtraDB-Cluster-shared-56-5.6.26-25.12.1.el7.x86_64
Percona-XtraDB-Cluster-56-5.6.26-25.12.1.el7.x86_64

We were not able to find out the reason.

Regards,

Guido

Cloudseeds (iico)
description: updated
Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :
Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

Can you help us provided following details

a. Reproducible TCs if any
b. log files
c. configuration option for each node in cluster.

Revision history for this message
Cloudseeds (iico) wrote :

We don't know how to reproduce it and have no specific log files.
All nodes have the same configuration, except for the ip address.

[mysqld]

wsrep_slave_threads = 8
innodb_autoinc_lock_mode = 2

# WSREP #
wsrep_provider = /usr/lib64/galera3/libgalera_smm.so
wsrep_cluster_address = "gcomm://10.1.1.14,10.1.1.15,10.1.1.11"
wsrep_cluster_name = 'PXC Cluster'
wsrep_node_address = 10.1.1.14
wsrep_node_incoming_address = 10.1.1.14
wsrep_node_name = 'db1'
wsrep_sst_method = xtrabackup-v2
wsrep_sst_auth = "sstuser:pass"
wsrep_sst_receive_address = 10.1.1.14
wsrep_retry_autocommit = 1
wsrep_sync_wait = 0

# General #
datadir = /var/lib/mysql
socket = /var/lib/mysql/mysql.sock
bind_address = 10.1.1.14
default_storage_engine = InnoDB
lower_case_table_names = 1
ssl = false

# MyISAM #
key_buffer_size = 16M
myisam_recover = FORCE,BACKUP

# SAFETY #
max_allowed_packet = 64M
max_connect_errors = 1000000
skip_name_resolve
sysdate_is_now = 1
innodb = FORCE
thread_stack = 262144
back_log = 2048
performance_schema = OFF
skip_show_database

# Binary Logging #
#log_bin = /var/lib/mysql-binlog/mysql-bin
#log_bin_index = /var/lib/mysql-binlog/mysql-bin
expire_logs_days = 14
sync_binlog = 1
binlog_format = row

# Logging #
slow_query_log = ON
log_queries_not_using_indexes = ON
long_query_time = 0.5
log_error = /var/lib/mysql/mysql.log

# CACHES AND LIMITS #
tmp_table_size = 64M
max_heap_table_size = 64M
join_buffer_size = 8M
query_cache_type = 1
query_cache_size = 512M
query_cache_limit = 8M
query_cache_min_res_unit = 1024
max_connections = 1000
thread_cache_size = 500
open-files-limit = 65535
table_definition_cache = 7000
table_open_cache = 7000

# InnoDB Related #
innodb_autoinc_lock_mode = 2
innodb_buffer_pool_instances = 20
innodb_buffer_pool_size = 20G
innodb_data_file_path = ibdata1:10M:autoextend
innodb_file_format = Barracuda
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 0
innodb_flush_method = O_DIRECT
innodb_locks_unsafe_for_binlog = 1
innodb_log_file_size = 128M
innodb_log_files_in_group = 2
innodb_read_io_threads = 32
innodb_write_io_threads = 32
innodb_thread_concurrency = 16

[mysqld_safe]
pid-file = /run/mysqld/mysql.pid
syslog

Revision history for this message
Elena C (ecurry007) wrote :
Download full text (4.3 KiB)

Similar bug with 5.6.22-72.0-56-log Percona XtraDB 3 node Cluster.
That issue brought all three nodes down.

The error posted below. Will provide my.cnf if needed.
Please comment on where to look next , as the error log does not exactly provide more details.

Thank you in advance,
EC

**************
2016-08-31 21:29:02 7f8c78271700 InnoDB: Assertion failure in thread 140241287976704 in file rem0rec.cc line 580
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
2016-08-31 21:29:02 27913 [Note] WSREP: cleanup transaction for LOCAL_STATE: SELECT option_value FROM dma_options WHERE option_name = 'featured-content' LIMIT 1
2016-08-31 21:29:02 27913 [Note] WSREP: cleanup transaction for LOCAL_STATE: SET autocommit=1
04:29:02 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=536870912
read_buffer_size=131072
max_used_connections=1501
max_threads=362
thread_count=85
connection_count=82
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3545279 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x10f0bed0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
...
stack_bottom = 7f8c78270d38 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8fa965]
/usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x665644]
/lib64/libpthread.so.0(+0xf7e0)[0x7f8cb72b27e0]
/lib64/libc.so.6(gsignal+0x35)[0x7f8cb56fb625]
/lib64/libc.so.6(abort+0x175)[0x7f8cb56fce05]
/usr/sbin/mysqld[0x98f8bf]
/usr/sbin/mysqld(_Z25wsrep_rec_get_foreign_keyPhPmPKhP12dict_index_tS4_m+0x72)[0x98fed2]
/usr/sbin/mysqld(_Z24wsrep_append_foreign_keyP5trx_tP14dict_foreign_tPKhP12dict_index_tmm+0x3a6)[0x914616]
/usr/sbin/mysqld[0x99fac3]
/usr/sbin/mysqld[0x9a34e2]
/usr/sbin/mysqld[0x9cb7de]
/usr/sbin/mysqld[0x9cc159]
/usr/sbin/mysqld[0x9cd781]
/usr/sbin/mysqld[0x9abdd0]
/usr/sbin/mysqld[0x99fd57]
/usr/sbin/mysqld[0x9a34e2]
/usr/sbin/mysqld[0x9cb7de]
/usr/sbin/mysqld[0x9cc159]
/usr/sbin/mysqld[0x9cd781]
/usr/sbin/mysqld[0x9b0aeb]
/usr/sbin/mysqld[0x91a53f]
/usr/sbin/mysqld(_ZN7handler13ha_delete_rowEPKh+0xac)[0x5a95ac]
/usr/sbin/mysqld(_Z12mysql_deleteP3THDP10TABLE_LISTP4ItemP10SQL_I_ListI8st_orderEyy+0xd5c)[0x8407bc]
/u...

Read more...

Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

Log file info doesn't help but I wanted to check if you can upgrade to latest version because we haven't seen it being reported since long and given that there were series of important bug fixes in last 3-4 version.

Revision history for this message
Elena C (ecurry007) wrote :

The cluster is runnin on:
Server version: 5.6.22-72.0-56-log Percona XtraDB Cluster (GPL), Release rel72.0, Revision 978, WSREP version 25.8, wsrep_25.8.r4150

Revision history for this message
Elena C (ecurry007) wrote :
Download full text (4.2 KiB)

It is possible that we are encountering the OOM issue as the syslog messages had:
Out of memory: Kill process 28080 (mysqld) score 980 or sacrifice child
Killed process 28080, UID 498, (mysqld) total-vm:34727848kB, anon-rss:24101248kB, file-rss:1244kB

system currently has 24Gb of RAM and 2 CPU.

I am posting the values of my.cnf below, may be you will be able to pinpoint something that may trigger this issue.
*****
[client]
user = root
port = 3306
socket = /var/lib/mysql/mysql.sock

[mysqld_safe]
user = mysql
pid-file = /var/lib/mysql/mysqld.pid
socket = /var/lib/mysql/mysql.sock
port = 3306
basedir = /usr
datadir = /var/lib/mysql

expire_logs_days = 3
general_log = 0
general_log_file = /var/log/mysql/mysql_general.log
log-error = /var/log/mysql/mysql_error.log
log-bin = /var/lib/mysql/mysql_bin
log_slave_updates
skip_slave_start
slow_query_log = 1
slow_query_log_file = /var/log/mysql/mysql_slow.log
tmpdir = /tmp
server-id = xx

[mysqld]
user = mysql
pid-file = /var/lib/mysql/mysqld.pid
socket = /var/lib/mysql/mysql.sock
port = 3306
basedir = /usr
datadir = /var/lib/mysql

expire_logs_days = 3
general_log = 0
general_log_file = /var/log/mysql/mysql_general.log
log-error = /var/log/mysql/mysql_error.log
log-bin = /var/lib/mysql/mysql_bin
slow_query_log = 1
slow_query_log_file = /var/log/mysql/mysql_slow.log
tmpdir = /tmp
server-id = xx

binlog_format = ROW

# added binlog_row_image = MINIMAL to assist with the rbr write fail - to process large amount of rows
binlog_row_image = MINIMAL

default_storage_engine = InnoDB
enforce_storage_engine = InnoDB
back_log = 128
key_buffer_size = 512M
join_buffer_size = 8M
long_query_time = 2
# log-queries-not-using-indexes
max_allowed_packet = 64M
max_connections = 360
max_heap_table_size = 32M

net_read_timeout = 30
net_write_timeout = 30
query_cache_limit = 2M
query_cache_size = 128M
sort_buffer_size = 8M
table_open_cache = 512
thread_cache_size = 128
thread_stack = 256K
wait_timeout = 28800
explicit_defaults_for_timestamp

# innodb_force_recovery = 3
innodb_buffer_pool_size =...

Read more...

Revision history for this message
Elena C (ecurry007) wrote :

Sorry - one more - the MySQL Percona error log does contain quite a bit of similar messages with various SQL statements , as in 2 examples below:
[Note] WSREP: cleanup transaction for LOCAL_STATE: SELECT `auth_user`.`id`, `auth_user`.`password`, `auth_user`.`last_login`, `auth_user`.`is_superuser`, `auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`date_joined` FROM `auth_user` LIMIT 1

2016-09-01 19:36:00 1328 [Note] WSREP: cleanup transaction for LOCAL_STATE: select racefixtur0_.round as round24_, racefixtur0_.language as language24_, racefixtur0_.lastUpdated as lastUpda3_24_, racefixtur0_.seasonName as seasonName24_, racefixtur0_.shortName as shortName24_, racefixtur0_.fullName as fullName24_, racefixtur0_.dateStartFP1 as dateStar7_24_, racefixtur0_.dateStartFP2 as dateStar8_24_, racefixtur0_.dateStartFP3 as dateStar9_24_, racefixtur0_.dateStartQUA as dateSta10_24_, racefixtur0_.dateStartGPX as dateSta11_24_, racefixtur0_.tbdFlag as tbdFlag24_, racefixtur0_.dateEnd as dateEnd24_, racefixtur0_.circuitId as circuitId24_, racefixtur0_.circuitName as circuit15_24_, racefixtur0_.location as location24_, racefixtur0_.raceStatus as raceStatus24_, racefixtur0_.racePhase as racePhase24_, racefixtur0_.countryCode as country19_24_, racefixtur0_.raceId as raceId24_, racefixtur0_.expireDate as expireDate24_ from race_fixture racefixtur0_ where racefixtur0_.language in ('en' , 'fr' , 'ja')
2016-09-01 19:36:00 1328 [Note] WSREP: cleanup transaction for LOCAL_STATE: SELECT DISTINCT t1.ID, t1.DESCRIPTIONKEY, t1.LASTUPDATED, t1.PARENT_ID, t1.SECTIONKEY, t1.TEXT, t1.TYPE, t1.USAGETYPE, t0.ID, t0.ENTITYID, t0.ENTITYTYPE, t0.LASTUPDATED, t0.LOCALE, t0.TRANSLATEDALT, t0.TRANSLATEDHTML, t0.TRANSLATEDTEXT, t0.TRANSLATEDTITLE, t0.TRANSLATEDURL FROM DESCRIPTION t1 LEFT OUTER JOIN (DESCRIPTION_TRANSLATION t2 JOIN TRANSLATION t0 ON (t0.ID = t2.translation_ID)) ON (t2.Description_ID = t1.ID) WHERE (t1.PARENT_ID = 405)
2016-09-01 19:36:00 1328 [Note] WSREP: cleanup transaction for LOCAL_STATE: commit

and then server dies:

2016-09-01 19:36:00 1328 [Note] WSREP: cleanup transaction for LOCAL_STATE: SELECT meta_key, meta_value FROM dma_sitemeta WHERE meta_key IN ('site_name', 'siteurl', 'active_sitewide_plugins', '_site_transient_timeout_theme_roots', '_site_transient_theme_roots', 'site_admins', 'can_compress_scripts', 'global_terms_enabled', 'ms_files_rewriting') AND site_id = 1
2016-09-01 19:36:00 1328 [Note] WSREP: cleanup transaction for LOCAL_STATE: SET autocommit=1
160901 19:36:02 mysqld_safe Number of processes running now: 0
160901 19:36:02 mysqld_safe WSREP: not restarting wsrep node automatically
160901 19:36:02 mysqld_safe mysqld from pid file /var/lib/mysql/mysqld.pid ended

Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

I would suggest you can try to upgrade and check if this continue to happens.

Changed in percona-xtradb-cluster:
status: New → Incomplete
Revision history for this message
Viktor Csiky (strongholdmedia) wrote :

I remember that this bug, and one another, had been fixed in upstream 5.6.31, but was still present in Percona XtraDB Cluster 5.6.30-25.16.3, and you did not release a version based on 5.6.31, so after some refused backport attempts, and lack of information, we migrated to MariaDB Cluster before the release of Percona XtraDB Cluster 5.6.32-25.17.
Anyway, I think that, based on the above, it is fairly safe to close this now.

Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

* No probs. Thanks.

* PXC will miss you. Hope to you see you back on PXC in due-course. We have exciting improvements in PXC-5.7 :)

Changed in percona-xtradb-cluster:
status: Incomplete → Invalid
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1904

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.