Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

system crashes randomly and frequently

Series 5.6
Bug #1365093

Bug #1365093 reported by malte on 2014-09-03

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Status tracked in 5.6
	5.6	Incomplete	Undecided	Anna

Bug Description

Recently I posted a new bug report about my system just randomly crashing usually about once per day. I was then told the bug causing this was already identified and reported here: https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1353644

If you read about that bug, it says that it only happens if you have binlog turned OFF (which I had). So I turned binlog on, but yet my system is still crashing, perhaps even a little more frequently now than before I turned on binlog. Here is the output in error.log:

17:52:17 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=16777216
read_buffer_size=131072
max_used_connections=62
max_threads=402
thread_count=53
connection_count=53
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 176835 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f6bfe3ce000
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f6be91f3dc0 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0x7f7710d4304c]
/usr/sbin/mysqld(handle_fatal_signal+0x3cb)[0x7f7710a89a2b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f770ed00340]
/lib/x86_64-linux-gnu/libc.so.6(+0x98f94)[0x7f770e1a2f94]
/usr/sbin/mysqld(memdup_root+0x2b)[0x7f7710d3ac5b]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x287a)[0x7f7710b25c7a]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x16c)[0x7f7710b268fc]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x31d)[0x7f7710ae7d3d]
/usr/sbin/mysqld(handle_one_connection+0x40)[0x7f7710ae7de0]
/usr/sbin/mysqld(pfs_spawn_thread+0x140)[0x7f7710fbbfb0]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f770ecf8182]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f770e20538d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 143
Status: NOT_KILLED

You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
140903 11:52:17 mysqld_safe Number of processes running now: 0
140903 11:52:17 mysqld_safe mysqld restarted
140903 11:52:17 mysqld_safe Skipping wsrep-recover for empty datadir: /var/lib/mysql
140903 11:52:17 mysqld_safe Assigning 00000000-0000-0000-0000-000000000000:-1 to wsrep_start_position
2014-09-03 11:52:17 0 [Warning] Using unique option prefix key_buffer instead of key_buffer_size is deprecated and will be removed in a future release. Please use the full name instead.
2014-09-03 11:52:17 0 [Note] WSREP: wsrep_start_position var submitted: '00000000-0000-0000-0000-000000000000:-1'
2014-09-03 11:52:17 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2014-09-03 11:52:17 15659 [Warning] Using unique option prefix myisam-recover instead of myisam-recover-options is deprecated and will be removed in a future release. Please use the full name instead.
2014-09-03 11:52:17 15659 [Note] Plugin 'FEDERATED' is disabled.
2014-09-03 11:52:17 15659 [Note] InnoDB: Using atomics to ref count buffer pool pages
2014-09-03 11:52:17 15659 [Note] InnoDB: The InnoDB memory heap is disabled
2014-09-03 11:52:17 15659 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2014-09-03 11:52:17 15659 [Note] InnoDB: Compressed tables use zlib 1.2.8
2014-09-03 11:52:17 15659 [Note] InnoDB: Using Linux native AIO
2014-09-03 11:52:17 15659 [Note] InnoDB: Using CPU crc32 instructions
2014-09-03 11:52:17 15659 [Note] InnoDB: Initializing buffer pool, size = 40.0G
2014-09-03 11:52:19 15659 [Note] InnoDB: Completed initialization of buffer pool
2014-09-03 11:52:19 15659 [Note] InnoDB: Highest supported file format is Barracuda.
2014-09-03 11:52:19 15659 [Note] InnoDB: Log scan progressed past the checkpoint lsn 2357398215693
2014-09-03 11:52:19 15659 [Note] InnoDB: Database was not shutdown normally!
2014-09-03 11:52:19 15659 [Note] InnoDB: Starting crash recovery.
2014-09-03 11:52:19 15659 [Note] InnoDB: Reading tablespace information from the .ibd files...
2014-09-03 11:52:19 15659 [Note] InnoDB: Restoring possible half-written data pages
2014-09-03 11:52:19 15659 [Note] InnoDB: from the doublewrite buffer...
InnoDB: Doing recovery: scanned up to log sequence number 2357403458560
InnoDB: Doing recovery: scanned up to log sequence number 2357408701440
InnoDB: Doing recovery: scanned up to log sequence number 2357413944320
InnoDB: Doing recovery: scanned up to log sequence number 2357419187200
InnoDB: Doing recovery: scanned up to log sequence number 2357424430080
InnoDB: Doing recovery: scanned up to log sequence number 2357429672960
InnoDB: Doing recovery: scanned up to log sequence number 2357434915840
InnoDB: Doing recovery: scanned up to log sequence number 2357440158720
InnoDB: Doing recovery: scanned up to log sequence number 2357445401600
InnoDB: Doing recovery: scanned up to log sequence number 2357450644480
InnoDB: Doing recovery: scanned up to log sequence number 2357455887360
InnoDB: Doing recovery: scanned up to log sequence number 2357461130240
InnoDB: Doing recovery: scanned up to log sequence number 2357466373120
InnoDB: Doing recovery: scanned up to log sequence number 2357471616000
InnoDB: Doing recovery: scanned up to log sequence number 2357476858880
InnoDB: Doing recovery: scanned up to log sequence number 2357482101760
InnoDB: Doing recovery: scanned up to log sequence number 2357487344640
InnoDB: Doing recovery: scanned up to log sequence number 2357492587520
InnoDB: Doing recovery: scanned up to log sequence number 2357497830400
InnoDB: Doing recovery: scanned up to log sequence number 2357503073280
InnoDB: Doing recovery: scanned up to log sequence number 2357508316160
InnoDB: Doing recovery: scanned up to log sequence number 2357513559040
InnoDB: Doing recovery: scanned up to log sequence number 2357518801920
InnoDB: Doing recovery: scanned up to log sequence number 2357524044800
InnoDB: Doing recovery: scanned up to log sequence number 2357529287680
InnoDB: Doing recovery: scanned up to log sequence number 2357534530560
InnoDB: Doing recovery: scanned up to log sequence number 2357539773440
InnoDB: Doing recovery: scanned up to log sequence number 2357545016320
InnoDB: Doing recovery: scanned up to log sequence number 2357550259200
InnoDB: Doing recovery: scanned up to log sequence number 2357555502080
InnoDB: Doing recovery: scanned up to log sequence number 2357560744960
InnoDB: Doing recovery: scanned up to log sequence number 2357565987840
InnoDB: Doing recovery: scanned up to log sequence number 2357571230720
InnoDB: Doing recovery: scanned up to log sequence number 2357576473600
InnoDB: Doing recovery: scanned up to log sequence number 2357581716480
InnoDB: Doing recovery: scanned up to log sequence number 2357586959360
InnoDB: Doing recovery: scanned up to log sequence number 2357592202240
InnoDB: Doing recovery: scanned up to log sequence number 2357597445120
InnoDB: Doing recovery: scanned up to log sequence number 2357602688000
InnoDB: Doing recovery: scanned up to log sequence number 2357607930880
InnoDB: Doing recovery: scanned up to log sequence number 2357613173760
InnoDB: Doing recovery: scanned up to log sequence number 2357614493851
InnoDB: Transaction 4124481611 was in the XA prepared state.
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 0 row operations to undo
InnoDB: Trx id counter is 4124482048
2014-09-03 11:52:21 15659 [Note] InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
InnoDB: Apply batch completed
InnoDB: Last MySQL binlog file position 0 378198424, file name mysql-bin.000496
2014-09-03 11:52:24 15659 [Note] InnoDB: 128 rollback segment(s) are active.
InnoDB: Starting in background the rollback of uncommitted transactions
2014-09-03 11:52:24 7ff38d7ff700 InnoDB: Rollback of non-prepared transactions completed
2014-09-03 11:52:29 15659 [Note] InnoDB: Waiting for purge to start
2014-09-03 11:52:29 15659 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.19-67.0 started; log sequence number 2357614493851
2014-09-03 11:52:29 15659 [Note] Recovering after a crash using /var/log/mysql/mysql-bin
2014-09-03 11:52:29 15659 [Note] WSREP: Binlog recovery, found wsrep position 00000000-0000-0000-0000-000000000000:-1
2014-09-03 11:52:30 15659 [Note] WSREP: Binlog recovery scan stopped at Xid event 2065850
2014-09-03 11:52:30 15659 [Note] Starting crash recovery...
2014-09-03 11:52:30 7ffea39117c0 InnoDB: Starting recovery for XA transactions...
2014-09-03 11:52:30 7ffea39117c0 InnoDB: Transaction 4124481611 in prepared state after recovery
2014-09-03 11:52:30 7ffea39117c0 InnoDB: Transaction contains changes to 1 rows
2014-09-03 11:52:30 7ffea39117c0 InnoDB: 1 transactions in prepared state after recovery
2014-09-03 11:52:30 15659 [Note] Found 1 prepared transaction(s) in InnoDB
2014-09-03 11:52:30 15659 [Note] Crash recovery finished.
2014-09-03 11:52:30 15659 [Note] RSA private key file not found: /var/lib/mysql//private_key.pem. Some authentication plugins will not work.
2014-09-03 11:52:30 15659 [Note] RSA public key file not found: /var/lib/mysql//public_key.pem. Some authentication plugins will not work.
2014-09-03 11:52:30 15659 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306
2014-09-03 11:52:30 15659 [Note] - '0.0.0.0' resolves to '0.0.0.0';
2014-09-03 11:52:30 15659 [Note] Server socket created on IP: '0.0.0.0'.
2014-09-03 11:52:30 15659 [Note] Event Scheduler: Loaded 0 events
2014-09-03 11:52:30 15659 [Note] WSREP: Read nil XID from storage engines, skipping position init
2014-09-03 11:52:30 15659 [Note] WSREP: wsrep_load(): loading provider library 'none'
2014-09-03 11:52:30 15659 [Note] [Debug] WSREP: dummy_init
2014-09-03 11:52:30 15659 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.6.19-67.0-56-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 Percona XtraDB Cluster (GPL), Release 25.6, wsrep_25.6.r4111

Revision history for this message

Alexey (alxextor) wrote on 2014-09-22:

mysql_log.txt Edit (2.5 KiB, text/plain)

Hi,

We update Percona XtraDB Cluster from "5.6.15" to "5.6.19" and after that to "5.6.20". In both cases we have same issue.

My log file in attachment

Revision history for this message

Miguel Angel Nieto (miguelangelnieto) wrote on 2014-09-24:

Hi,

How many nodes do you have in your cluster? Are all those nodes crashing or just one of them? Do they have the same my.cnf?

Please, attach the my.cnf, dmesg and /var/log/messages of the log that is crashing.

Revision history for this message

Alexey (alxextor) wrote on 2014-09-24:

log_files.tgz Edit (15.0 KiB, application/x-tar)

Hi Miguel,

We have 3 nodes in cluster. 2 of them used for service requests. And these 2 nodes crashes randomly.

They have same configuration and same my.cnf.

In attachments:

full dmesg
cat /var/log/messages | grep -v 'snmpd\|yum\|ntpd'
my.cnf without comments

Revision history for this message

Miguel Angel Nieto (miguelangelnieto) wrote on 2014-09-24:

Thanks. I'm going to need some more info.

The output of pt-summary and "SHOW GLOBAL STATUS LIKE '%wsrep%'" from those two nodes.

You can download pt-summary from http://www.percona.com/get/pt-summary

I can see these messages in dmesg and messages:

mysqld[32258]: segfault at 55 ip 00007fdb2d13c867 sp 00007fdb0c073dc0 error 4 in libgcc_s-4.4.7-20120601.so.1[7fdb2d12d000+16000]

Sep 23 17:58:05 dc1-crp-db1 kernel: mysqld[31161]: segfault at 51 ip 00007ffa454b1867 sp 00007ffa243e7dc0 error 4 in libgcc_s-4.4.7-20120601.so.1[7ffa454a2000+16000]

but they are pretty useless, like the stacktrace.

Also, in the log you sent in the bug description seems that galera libraries are not being loaded:

2014-09-03 11:52:29 15659 [Note] WSREP: Binlog recovery, found wsrep position 00000000-0000-0000-0000-000000000000:-1
2014-09-03 11:52:30 15659 [Note] WSREP: wsrep_load(): loading provider library 'none'
2014-09-03 11:52:30 15659 [Note] [Debug] WSREP: dummy_init

Revision history for this message

Alexey (alxextor) wrote on 2014-09-24:

files.tgz Edit (3.7 KiB, application/x-tar)

ok,

files in attachment

Revision history for this message

Alexey (alxextor) wrote on 2014-09-24:

<quote>2014-09-03 11:52:29 15659 [Note] WSREP: Binlog recovery, found wsrep position 00000000-0000-0000-0000-000000000000:-1
2014-09-03 11:52:30 15659 [Note] WSREP: wsrep_load(): loading provider library 'none'
2014-09-03 11:52:30 15659 [Note] [Debug] WSREP: dummy_init</quote>

We are tried install updates this week. =)

Revision history for this message

Anna (sonyque) wrote on 2014-10-21:

Hi,

i have the same issue. Did you solve the problem?

Revision history for this message

Alexey (alxextor) wrote on 2014-10-22:

Hi Anna,

we have not solved problem. Now we use

Name : Percona-XtraDB-Cluster-56
Release : 25.5.759.rhel6

because newer versions crashes in our case.

Revision history for this message

Milos Blazevic (milos-blazevic) wrote on 2016-02-24:

Hi,

I'd like to report another occurrence of this particular bug, in a 3-node PXC cluster running on IO optimized VMs with 64GB of RAM, and 32 CPU cores each...

One of the nodes crashed for no apparent reason:

****
12:43:55 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=1256
max_threads=4002
thread_count=38
connection_count=5
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1602318 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x37e70130
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
160217 16:43:55 mysqld_safe Number of processes running now: 0
160217 16:43:55 mysqld_safe WSREP: not restarting wsrep node automatically
160217 16:43:55 mysqld_safe mysqld from pid file /var/lib/mysql/pxc-01.pid ended
160217 17:00:11 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
160217 17:00:11 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.QF2aWv' --pid-file='/var/lib/mysql/pxc-01-recover.pid'
2016-02-17 17:00:12 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2016-02-17 17:00:12 0 [Note] /usr/sbin/mysqld (mysqld 5.6.26-74.0-56-log) starting as process 128623 ...
160217 17:00:20 mysqld_safe WSREP: Recovered position 8a89896c-a304-11e5-bed2-1e4a9bc0b937:14497921
Log of wsrep recovery (--wsrep-recover):
2016-02-17 17:00:12 128623 [Note] Plugin 'FEDERATED' is disabled.
2016-02-17 17:00:12 128623 [Note] InnoDB: Using atomics to ref count buffer pool pages
****

The cluster had been running stabily for ~6 months on a less powerful VMs, only to be upgraded to a newer version of XtraDB cluster and more powerful VMs two months ago... and this is the first and only crash of the kind that had happened.

Unfortunately, there's not much more to add - we didn't have oprofile or gdb attached to mysql server process on any of the nodes, and other than "pt-summary" output (dating from 10 days later) for all 3 nodes, which I can provide on request, I doubt I can give any more useful info.

Interestingly enough, the segfault message is the same:

Feb 17 16:43:55 pxc-01 kernel: mysqld[123325]: segfault at 55 ip 00007f6ad0e81867 sp 00007f61cb6d9b00 error 4 in libgcc_s-4.4.7-20120601.so.1[7f6ad0e72000+16000]

Regards,
Milos.

Hi,

I'd like to report another occurrence of this particular bug, in a 3-node PXC cluster running on IO optimized VMs with 64GB of RAM, and 32 CPU cores each...

One of the nodes crashed for no apparent reason:

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=1256
max_threads=4002
thread_count=38
connection_count=5
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1602318 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Interestingly enough, the segfault message is the same:

Feb 17 16:43:55 pxc-01 kernel: mysqld[123325]: segfault at 55 ip 00007f6ad0e81867 sp 00007f61cb6d9b00 error 4 in libgcc_s-4.4.7-20120601.so.1[7f6ad0e72000+16000]

Regards,
Milos.

Revision history for this message

Milos Blazevic (milos-blazevic) wrote on 2016-02-28:

#10

xtradb_cluster-my.cnf Edit (1.3 KiB, text/plain)

Download full text (3.3 KiB)

Couple of days ago, a different node crashed:

08:49:21 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=1263
max_threads=4002
thread_count=35
connection_count=2
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1602318 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x3a6f0770
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fc69734cd38 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x900aa5]
/usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x667e04]
/lib64/libpthread.so.0(+0xf790)[0x7fd379669790]
/usr/sbin/mysqld(_ZN10MDL_ticket7destroyEPS_+0xc)[0x6574bc]
/usr/sbin/mysqld(_Z17mysql_ull_cleanupP3THD+0x49)[0x5fd779]
/usr/sbin/mysqld(_ZN3THD7cleanupEv+0xd0)[0x6b8420]
/usr/sbin/mysqld(_ZN3THD17release_resourcesEv+0x298)[0x6b8ff8]
/usr/sbin/mysqld(_Z29one_thread_per_connection_endP3THDb+0x2e)[0x58551e]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x101)[0x6bf011]
/usr/sbin/mysqld(handle_one_connection+0x47)[0x6bf277]
/usr/sbin/mysqld(pfs_spawn_thread+0x12a)[0xafd66a]
/lib64/libpthread.so.0(+0x7a51)[0x7fd379661a51]
/lib64/libc.so.6(clone+0x6d)[0x7fd377b6593d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 153707045
Status: KILL_CONNECTION

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.