Percona XtraDB cluster crashes when executing a delete query

Bug #1271533 reported by René Feiner
This bug report is a duplicate of:  Bug #1267507: cluster crashes on importing data. Edit Remove
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
New
Undecided
Unassigned

Bug Description

We're running a 2 node Percona XtraDB Cluster; Version: 5.5.34-31.1-log (Release 31.1, wsrep_25.9.r3928) at Debian Wheezy x64.
Our servers are multicore, equipped with 64GB memory, 4x SSD in RAID.

When we execute a series of delete queries on various tables (cleanup script), the cluster crashes doing a delete from a table with only 150.000 records (we delete about 100.000 records). When we're only execute this latest query, the cluster doesn't crash.

We can reproduce this behaviour at our servers, but also at a test environment at virtual machines.

The following information is shown during the crash in mysql.log (at node 1):
------------------------------------------------------------
140122 13:22:13 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000000 of size 134217728 bytes
140122 13:22:36 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000001 of size 134217728 bytes
140122 13:22:59 [Note] WSREP: Deleted page /var/lib/mysql/gcache.page.000000
12:22:59 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=10
max_threads=502
thread_count=8
connection_count=8
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1106966 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fe3d74d8fd0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fe389a5aaa0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7f224e]
/usr/sbin/mysqld(handle_fatal_signal+0x491)[0x6d7c31]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf030)[0x7ff07d490030]
/usr/lib/libgalera_smm.so(_ZN6galera13Certification16purge_for_trx_v3EPNS_9TrxHandleE+0x9b)[0x7ff07a8e2d9b]
/usr/lib/libgalera_smm.so(_ZNK6galera13Certification15PurgeAndDiscardclERSt4pairIKlPNS_9TrxHandleEE+0xc0)[0x7ff07a8eacb0]
/usr/lib/libgalera_smm.so(_ZSt8for_eachISt17_Rb_tree_iteratorISt4pairIKlPN6galera9TrxHandleEEENS3_13Certification15PurgeAndDiscardEET0_T_SB_SA_+0x2c)[0x7ff07a8eaedc]
/usr/lib/libgalera_smm.so(_ZN6galera13Certification16purge_trxs_upto_Elb+0x72)[0x7ff07a8e4722]
/usr/lib/libgalera_smm.so(_ZN6galera13Certification15purge_trxs_uptoElb+0x4b)[0x7ff07a9122bb]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM18process_commit_cutEll+0x94)[0x7ff07a90c864]
/usr/lib/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_actionRb+0x43d)[0x7ff07a8f08ad]
/usr/lib/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPvRb+0x5b)[0x7ff07a8f14eb]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x63)[0x7ff07a910d03]
/usr/lib/libgalera_smm.so(galera_recv+0x23)[0x7ff07a920ee3]
/usr/sbin/mysqld[0x6955c2]
/usr/sbin/mysqld(start_wsrep_THD+0x38b)[0x552e6b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50)[0x7ff07d487b50]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7ff07c0c6a7d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 4
Status: NOT_KILLED

You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
140122 13:23:00 mysqld_safe Number of processes running now: 0
140122 13:23:00 mysqld_safe WSREP: not restarting wsrep node automatically
140122 13:23:00 mysqld_safe mysqld from pid file /var/lib/mysql/dc1-db-1.pid ended
------------------------------------------------------------

The following information is shown during the crash in mysql.log (at node 2):
------------------------------------------------------------
140122 13:22:13 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000000 of size 134217728 bytes
140122 13:22:36 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000001 of size 134217728 bytes
12:22:59 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=9
max_threads=502
thread_count=7
connection_count=7
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1106966 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x2cd73b0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f4888a2eaa0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7f224e]
/usr/sbin/mysqld(handle_fatal_signal+0x491)[0x6d7c31]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf030)[0x7f48963d5030]
/usr/lib/libgalera_smm.so(_ZN6galera13Certification16purge_for_trx_v3EPNS_9TrxHandleE+0x9b)[0x7f4893827d9b]
/usr/lib/libgalera_smm.so(_ZNK6galera13Certification15PurgeAndDiscardclERSt4pairIKlPNS_9TrxHandleEE+0xc0)[0x7f489382fcb0]
/usr/lib/libgalera_smm.so(_ZSt8for_eachISt17_Rb_tree_iteratorISt4pairIKlPN6galera9TrxHandleEEENS3_13Certification15PurgeAndDiscardEET0_T_SB_SA_+0x2c)[0x7f489382fedc]
/usr/lib/libgalera_smm.so(_ZN6galera13Certification16purge_trxs_upto_Elb+0x72)[0x7f4893829722]
/usr/lib/libgalera_smm.so(_ZN6galera13Certification15purge_trxs_uptoElb+0x4b)[0x7f48938572bb]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM18process_commit_cutEll+0x94)[0x7f4893851864]
/usr/lib/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_actionRb+0x43d)[0x7f48938358ad]
/usr/lib/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPvRb+0x5b)[0x7f48938364eb]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x63)[0x7f4893855d03]
/usr/lib/libgalera_smm.so(galera_recv+0x23)[0x7f4893865ee3]
/usr/sbin/mysqld[0x6955c2]
/usr/sbin/mysqld(start_wsrep_THD+0x38b)[0x552e6b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50)[0x7f48963ccb50]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f489500ba7d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 1
Status: NOT_KILLED

You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
140122 13:23:00 mysqld_safe Number of processes running now: 0
140122 13:23:00 mysqld_safe WSREP: not restarting wsrep node automatically
140122 13:23:00 mysqld_safe mysqld from pid file /var/lib/mysql/dc2-db-1.pid ended
------------------------------------------------------------

Configuration node 1:
------------------------------------------------------------
[mysqld]

# General
server_id = 1
binlog_format = ROW
log_bin = mysql-bin
expire_logs_days = 5
datadir = /var/lib/mysql
tmpdir = /tmp
user = mysql
default-storage-engine = InnoDB

# Cluster configuration
wsrep_cluster_address = gcomm://10.31.35.174
wsrep_provider = /usr/lib/libgalera_smm.so
wsrep_node_address = 10.31.35.170
wsrep_slave_threads = 4
wsrep_cluster_name = domainname.com
wsrep_sst_method = xtrabackup
wsrep_sst_auth = xxx:xxx
wsrep_node_name = dc1-db-1
innodb_locks_unsafe_for_binlog = 1
innodb_autoinc_lock_mode = 2
log_slave_updates = 1

# Caching & limits
tmp-table-size = 32M
max-heap-table-size = 32M
query-cache-type = 0
query-cache-size = 0
max-connections = 500
thread-cache-size = 50
open-files-limit = 65535
table-definition-cache = 4096
table-open-cache = 10240

# InnoDB #
innodb_file_per_table = 1
innodb-flush-method = O_DIRECT
innodb-log-files-in-group = 2
innodb-log-file-size = 512M
innodb-flush-log-at-trx-commit = 2
innodb-file-per-table = 1
innodb-buffer-pool-size = 48G

# Logging
log-error = /var/log/mysql.log
general-log = 0
general-log-file = "/var/log/mysql_query.log"

# Skip reverse DNS lookup of clients
skip-name-resolve
------------------------------------------------------------

Differences at node 2:
------------------------------------------------------------
server_id = 2
wsrep_cluster_address = gcomm://10.31.35.170
wsrep_node_address = 10.31.35.174
wsrep_node_name = dc2-db-1

The issue could be related to https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1269842. But we're about to bring this new servers live to production and don't want to run 5.6 which is still in RC.

description: updated
Revision history for this message
René Feiner (rene-feiner) wrote :

We've replaced galera 3 with galera 2: everything works fine.

description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.