Crash since Upgrade

Bug #1367562 reported by Jai Gupta
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Expired
Undecided
Unassigned

Bug Description

Since we upgraded Percona Cluster from version 5.6.19-25.6 to 5.6.20-25.7 our nodes are crashing.

==== Node 1 crashed on 2014-09-08 ====
20:14:52 UTC - mysqld got signal 11 ;

key_buffer_size=1073741824
read_buffer_size=131072
max_used_connections=55
max_threads=258
thread_count=14
connection_count=6
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1151559 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x11ba66160
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7ff1f83ebd38 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8f5835]
/usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x664384]
/lib64/libpthread.so.0(+0xf710)[0x7ff823081710]
/usr/sbin/mysqld(_ZN10MDL_ticket7destroyEPS_+0xc)[0x653aac]
/usr/sbin/mysqld(_Z17mysql_ull_cleanupP3THD+0x49)[0x5fa1f9]
/usr/sbin/mysqld(_ZN3THD7cleanupEv+0xd2)[0x6b44a2]
/usr/sbin/mysqld(_ZN3THD17release_resourcesEv+0x288)[0x6b5068]
/usr/sbin/mysqld(_Z29one_thread_per_connection_endP3THDb+0x1c)[0x58800c]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x101)[0x6baff1]
/usr/sbin/mysqld(handle_one_connection+0x47)[0x6bb247]
/usr/sbin/mysqld(pfs_spawn_thread+0x12a)[0xaee54a]
/lib64/libpthread.so.0(+0x79d1)[0x7ff8230799d1]
/lib64/libc.so.6(clone+0x6d)[0x7ff82158086d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 991886
Status: KILL_CONNECTION

==== Node 5 crashed on 2014-09-09 ====
03:56:03 UTC - mysqld got signal 11 ;
key_buffer_size=1073741824
read_buffer_size=131072
max_used_connections=46
max_threads=258
thread_count=12
connection_count=10
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1151559 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x14ed42840
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f95941d7d38 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8f5835]
/usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x664384]
/lib64/libpthread.so.0(+0xf710)[0x7f9bbe219710]
/usr/sbin/mysqld(_ZN10MDL_ticket7destroyEPS_+0xc)[0x653aac]
/usr/sbin/mysqld(_Z17mysql_ull_cleanupP3THD+0x49)[0x5fa1f9]
/usr/sbin/mysqld(_ZN3THD7cleanupEv+0xd2)[0x6b44a2]
/usr/sbin/mysqld(_ZN3THD17release_resourcesEv+0x288)[0x6b5068]
/usr/sbin/mysqld(_Z29one_thread_per_connection_endP3THDb+0x1c)[0x58800c]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x101)[0x6baff1]
/usr/sbin/mysqld(handle_one_connection+0x47)[0x6bb247]
/usr/sbin/mysqld(pfs_spawn_thread+0x12a)[0xaee54a]
/lib64/libpthread.so.0(+0x79d1)[0x7f9bbe2119d1]
/lib64/libc.so.6(clone+0x6d)[0x7f9bbc71886d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 493659
Status: KILL_CONNECTION

Both nodes started nicely without doing full SST.

Revision history for this message
Nilnandan Joshi (nilnandan-joshi) wrote :

Hi,

Can you provide some more information about this error like my.cnf and full error logs of both the nodes? also it would be helpful if you can provide GRA_xxx.log files if those files are created while crash in datadir and binlog files around the time, when crashed happened if binlog is enable on both the nodes.

Changed in percona-xtradb-cluster:
status: New → Incomplete
Revision history for this message
Jai Gupta (jai-g) wrote :
Download full text (7.0 KiB)

=====LOG=====
20:14:52 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=1073741824
read_buffer_size=131072
max_used_connections=55
max_threads=258
thread_count=14
connection_count=6
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1151559 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x11ba66160
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7ff1f83ebd38 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8f5835]
/usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x664384]
/lib64/libpthread.so.0(+0xf710)[0x7ff823081710]
/usr/sbin/mysqld(_ZN10MDL_ticket7destroyEPS_+0xc)[0x653aac]
/usr/sbin/mysqld(_Z17mysql_ull_cleanupP3THD+0x49)[0x5fa1f9]
/usr/sbin/mysqld(_ZN3THD7cleanupEv+0xd2)[0x6b44a2]
/usr/sbin/mysqld(_ZN3THD17release_resourcesEv+0x288)[0x6b5068]
/usr/sbin/mysqld(_Z29one_thread_per_connection_endP3THDb+0x1c)[0x58800c]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x101)[0x6baff1]
/usr/sbin/mysqld(handle_one_connection+0x47)[0x6bb247]
/usr/sbin/mysqld(pfs_spawn_thread+0x12a)[0xaee54a]
/lib64/libpthread.so.0(+0x79d1)[0x7ff8230799d1]
/lib64/libc.so.6(clone+0x6d)[0x7ff82158086d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 991886
Status: KILL_CONNECTION

You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
140908 15:14:58 mysqld_safe Number of processes running now: 0
140908 15:14:58 mysqld_safe WSREP: not restarting wsrep node automatically
140908 15:14:58 mysqld_safe mysqld from pid file /var/lib/mysql/xxxxxxxx.pid ended

====conf====
[MYSQLD]
user=mysql
datadir=/var/lib/mysql
log_error=/var/log/mysqld.log
log_warnings=2
#log_output=FILE
bind_address=xxxxxxxxxxxxxxxxxxxxxxx

### INNODB OPTIONS
innodb_buffer_pool_size=160G
innodb_flush_log_at_trx_commit=2
innodb_file_per_table=1
innodb_data_file_path = ibdata1:100M:autoextend
## You may want to tune the below depending on number of cores and disk sub
innodb_read_io_threads=4
innodb_write_io_threads=4
innodb_io_capacity=400
innodb_doublewrite=1
innodb_log_file_size=1024M
innodb_log_buffer_size=96M
innodb_buffer_pool_instances=8
innodb_log_files_in_group=2
innodb_thread_concurrency=0
#innodb_file_format=barracuda
innodb_flush_method = O_DIRECT
innodb_autoinc...

Read more...

Revision history for this message
Miguel Angel Nieto (miguelangelnieto) wrote :

Hi,

I would like to mention to things:

1- One of the crash are caused because of data inconsistencies:

2014-09-16 12:25:24 14868 [ERROR] Slave SQL: Error 'Table 'xxxx.xxxxbackup_ids_temp' doesn't exist' on query. Default database: 'xxxx'. Query: 'CREATE INDEX xxxxbackidstemp_bacitepar_ix ON xxxxbackup_ids_temp (backupid, itemname, parentitemid)', Error_code: 1146

2014-09-16 12:25:24 14868 [ERROR] Slave SQL: Error 'Table 'xxxx.xxxxbackup_ids_temp' doesn't exist' on query. Default database: 'xxxx'. Query: 'CREATE UNIQUE INDEX xxxxbackidstemp_baciteite_uix ON xxxxbackup_ids_temp (backupid, itemname, itemid)', Error_code: 1146

How are those temporary tables created? Are you creating them in InnoDB?

Those inconsistencies cause crashes after several retries.

2- You use GET_LOCK and RELEASE_LOCK. Those are not supported in Galera, see:

https://blueprints.launchpad.net/codership-mysql/+spec/get-lock-support

3- Can you share: SHOW CREATE TABLE xxxxxxxsessions\G

4- Do you write to that table from different servers? I see rollbacks on queries that affect that table. If your application rely on GET_LOCK to write on xxxxxxxsessions in a consistent way from multiple servers, then you need to review your application logic because as I said, it is not supported in galera and maybe the application is not doing what it is expected to do. This is just an advice, maybe not related with the crash.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Percona XtraDB Cluster because there has been no activity for 60 days.]

Changed in percona-xtradb-cluster:
status: Incomplete → Expired
Revision history for this message
Przemek (pmalkowski) wrote :

I think this may be caused due to moving user locks under MDL context:
https://bugs.launchpad.net/percona-server/+bug/1401528
where Percona Server, and then PXC is using implementation similar to what was later introduced in 5.7:
http://dev.mysql.com/doc/refman/5.7/en/miscellaneous-functions.html#function_get-lock

Still, as Miguel said, user locks are not supported in Galera.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1733

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.