The problem with Node after using innobackupex and "Backup Locks"

Bug #1401133 reported by Aleksey Sokolov
112
This bug affects 21 people
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
New
Undecided
Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Invalid
Undecided
Unassigned
5.6
Fix Released
High
Unassigned

Bug Description

Good evening, dear colleagues.

After the upgrade of PXC to the latest version, we had some problems with innobackupex and "Backup locks".

Current MySQL server version: "Server version: 5.6.21-70.1-56-log Percona XtraDB Cluster (GPL), Release rel70.1, Revision 938, WSREP version 25.8, wsrep_25.8.r4150"

Before the upgrade: "Server version: 5.6.20-68.0-56-log Percona XtraDB Cluster (GPL), Release rel68.0, Revision 888, WSREP version 25.7, wsrep_25.7.r4126"

OS: Scientific Linux release 6.6 (Carbon)
The number of nodes in the cluster: 3

While using the tool "innobackupex", errors occur during the locks and the node is crashed.

Trying to run innobackupex:

[root@natrium mysql]# innobackupex --parallel=2 /mnt/xtrabackup/
..............................................
..............................................
>> log scanned up to (2892514252116)
>> log scanned up to (2892514252116)
[01] ...done
xtrabackup: Creating suspend file '/mnt/xtrabackup/2014-12-10_11-49-36/xtrabackup_suspended_2' with pid '24638'
>> log scanned up to (2892514252116)

141210 12:36:34 innobackupex: Continuing after ibbackup has suspended
141210 12:36:34 innobackupex: Executing LOCK TABLES FOR BACKUP...
DBD::mysql::db do failed: MySQL server has gone away at /usr/bin/innobackupex line 3035.
innobackupex: got a fatal error with the following stacktrace: at /usr/bin/innobackupex line 3038
        main::mysql_query('HASH(0x2a82530)', 'LOCK TABLES FOR BACKUP') called at /usr/bin/innobackupex line 3440
        main::mysql_lock_tables('HASH(0x2a82530)') called at /usr/bin/innobackupex line 1982
        main::backup() called at /usr/bin/innobackupex line 1592
innobackupex: Error:
Error executing 'LOCK TABLES FOR BACKUP': DBD::mysql::db do failed: MySQL server has gone away at /usr/bin/innobackupex line 3035.
141210 12:36:34 innobackupex: Waiting for ibbackup (pid=24638) to finish

Node immediately crashes. Logs:

10:07:54 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=8
max_threads=2050
thread_count=386
connection_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 949471 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8f97d5]
/usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x6655c4]
/lib64/libpthread.so.0(+0xf710)[0x7fd38c8f1710]
/usr/lib64/libjemalloc.so.1(+0xf2c4)[0x7fd38cb0e2c4]
/usr/lib64/libjemalloc.so.1(+0x2834f)[0x7fd38cb2734f]
/usr/lib64/libjemalloc.so.1(malloc+0x28d)[0x7fd38cb0585d]
/usr/sbin/mysqld(my_malloc+0x32)[0x8f4ef2]
/usr/sbin/mysqld(init_dynamic_array2+0x62)[0x8e09d2]
/usr/sbin/mysqld(_ZN8Gtid_set4initEv+0x47)[0x881c37]
/usr/sbin/mysqld(_ZN3THDC1Ebb+0x966)[0x6b7866]
/usr/sbin/mysqld(_Z26handle_connections_socketsv+0x43f)[0x581d4f]
/usr/sbin/mysqld(_Z11mysqld_mainiPPc+0xf1b)[0x58a7cb]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7fd38ad23d5d]
/usr/sbin/mysqld[0x57a229]
You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
141210 12:07:55 mysqld_safe Number of processes running now: 0
141210 12:07:55 mysqld_safe WSREP: not restarting wsrep node automatically
141210 12:07:55 mysqld_safe mysqld from pid file /var/lib/mysql/natrium.la.net.ua.pid ended

I restarted MySQL (using IST method for recovery) and tried again:

[root@natrium mysql]# innobackupex --parallel=2 /mnt/xtrabackup/
..............................................
..............................................
xtrabackup: Creating suspend file '/mnt/xtrabackup/2014-12-10_13-45-24/xtrabackup_suspended_2' with pid '10719'

141210 14:34:31 innobackupex: Continuing after ibbackup has suspended
141210 14:34:31 innobackupex: Executing LOCK TABLES FOR BACKUP...
141210 14:34:31 innobackupex: Backup tables lock acquired

141210 14:34:31 innobackupex: Starting to backup non-InnoDB tables and files
innobackupex: in subdirectories of '/var/lib/mysql/'
innobackupex: Backing up files '/var/lib/mysql//mysql/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (75 files)
>> log scanned up to (2895378455386)
innobackupex: Backing up files '/var/lib/mysql//__zabbix/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (122 files)
innobackupex: Backing up file '/var/lib/mysql//test/db.opt'
innobackupex: Backing up files '/var/lib/mysql//performance_schema/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (53 files)
innobackupex: Backing up files '/var/lib/mysql//billing/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (563 files)
>> log scanned up to (2895378479147)
>> log scanned up to (2895378479640)
innobackupex: Backing up file '/var/lib/mysql//billing_storage/db.opt'
innobackupex: Backing up file '/var/lib/mysql//billing_storage/file_chunks.frm'
innobackupex: Backing up file '/var/lib/mysql//billing_storage/files.frm'
innobackupex: Backing up files '/var/lib/mysql//billing_backups/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (38 files)
innobackupex: Backing up file '/var/lib/mysql//quick/db.opt'
innobackupex: Backing up file '/var/lib/mysql//quick/KEY_COLUMN_USAGE.frm'
innobackupex: Backing up file '/var/lib/mysql//quick/REFERENTIAL_CONSTRAINTS.frm'
innobackupex: Backing up file '/var/lib/mysql//asterisk/db.opt'
innobackupex: Backing up file '/var/lib/mysql//asterisk/permissions.frm'
innobackupex: Backing up file '/var/lib/mysql//asterisk/sipfriends.frm'
141210 14:34:34 innobackupex: Finished backing up non-InnoDB tables and files

141210 14:34:34 innobackupex: Executing LOCK BINLOG FOR BACKUP...
DBD::mysql::db do failed: Deadlock found when trying to get lock; try restarting transaction at /usr/bin/innobackupex line 3035.
innobackupex: got a fatal error with the following stacktrace: at /usr/bin/innobackupex line 3038
        main::mysql_query('HASH(0x2148530)', 'LOCK BINLOG FOR BACKUP') called at /usr/bin/innobackupex line 3490
        main::mysql_lock_binlog('HASH(0x2148530)') called at /usr/bin/innobackupex line 2000
        main::backup() called at /usr/bin/innobackupex line 1592
innobackupex: Error:
Error executing 'LOCK BINLOG FOR BACKUP': DBD::mysql::db do failed: Deadlock found when trying to get lock; try restarting transaction at /usr/bin/innobackupex line 3035.
141210 14:34:34 innobackupex: Waiting for ibbackup (pid=10719) to finish

After the second attempt, the error "deadlock" occurred. Server worked for another 15 minutes and crashed again. Logs:

12:55:01 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=5
max_threads=2050
thread_count=385
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 949471 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8f97d5]
/usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x6655c4]
/lib64/libpthread.so.0(+0xf710)[0x7fd0951ff710]
/usr/lib64/libjemalloc.so.1(+0xf2c4)[0x7fd09541c2c4]
/usr/lib64/libjemalloc.so.1(+0x2834f)[0x7fd09543534f]
/usr/lib64/libjemalloc.so.1(malloc+0x28d)[0x7fd09541385d]
/usr/sbin/mysqld(my_malloc+0x32)[0x8f4ef2]
/usr/sbin/mysqld(init_dynamic_array2+0x62)[0x8e09d2]
/usr/sbin/mysqld(_ZN8Gtid_set4initEv+0x47)[0x881c37]
/usr/sbin/mysqld(_ZN3THDC1Ebb+0x966)[0x6b7866]
/usr/sbin/mysqld(_Z26handle_connections_socketsv+0x43f)[0x581d4f]
/usr/sbin/mysqld(_Z11mysqld_mainiPPc+0xf1b)[0x58a7cb]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7fd093631d5d]
/usr/sbin/mysqld[0x57a229]
You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
141210 14:55:01 mysqld_safe Number of processes running now: 0
141210 14:55:01 mysqld_safe WSREP: not restarting wsrep node automatically
141210 14:55:01 mysqld_safe mysqld from pid file /var/lib/mysql/natrium.la.net.ua.pid ended

Prior to the last update such problems did not arise (previously used "FLUSH TABLES WITH READ LOCK").

All packets installed from Percona RPM repository:

[root@natrium /]# rpm -qa | grep Percona
Percona-XtraDB-Cluster-client-56-5.6.21-25.8.938.el6.x86_64
Percona-XtraDB-Cluster-shared-56-5.6.21-25.8.938.el6.x86_64
Percona-XtraDB-Cluster-full-56-5.6.21-25.8.938.el6.x86_64
Percona-XtraDB-Cluster-galera-3-3.8-1.3390.rhel6.x86_64
Percona-XtraDB-Cluster-56-debuginfo-5.6.21-25.8.938.el6.x86_64
Percona-XtraDB-Cluster-server-56-5.6.21-25.8.938.el6.x86_64
Percona-XtraDB-Cluster-garbd-3-3.8-1.3390.rhel6.x86_64
Percona-Server-shared-51-5.1.73-rel14.12.624.rhel6.x86_64
Percona-XtraDB-Cluster-test-56-5.6.21-25.8.938.el6.x86_64
Percona-XtraDB-Cluster-galera-3-debuginfo-3.8-1.3390.rhel6.x86_64

All databases summary size: 256G
We also use partitioning.

/etc/my.cnf:

[mysqld_safe]

open-files-limit = 120000
malloc-lib=/usr/lib64/libjemalloc.so.1

[mysqld]

skip-name-resolve

user = mysql
bind-address = 10.10.91.4
port = 3306
max_connections = 2048

datadir = /var/lib/mysql
socket = /var/lib/mysql/mysql.sock
tmpdir = /tmp/mysql

symbolic-links = 0

table_open_cache = 8192
table_definition_cache = 4496
table_open_cache_instances = 16

thread_cache_size = 64

default_storage_engine = InnoDB
explicit_defaults_for_timestamp = On
ft_min_word_len = 3

large-pages

slow_query_log = On
slow_launch_time = 5

general_log_file = '/var/log/mysql/general/general.log'
general_log = Off

query_cache_size = 0
query_cache_type = off

innodb_buffer_pool_size = 64G
innodb_buffer_pool_instances = 16

innodb_log_file_size = 8G
innodb_log_buffer_size = 16M
innodb_log_block_size = 4096
innodb_log_group_home_dir = /var/lib/mysql_logs/innodb
innodb_data_file_path = /ibdata1:64M:autoextend
innodb_data_home_dir = /var/lib/mysql_logs/innodb

innodb_print_all_deadlocks = 0

innodb_open_files = 8192
innodb_file_per_table = 1
innodb_rollback_on_timeout = On
innodb_flush_log_at_trx_commit = 2
innodb_doublewrite = 1
innodb_flush_method = O_DIRECT
innodb_lock_wait_timeout = 300
innodb_flush_neighbors = 0

innodb_io_capacity = 40000
innodb_io_capacity_max = 70000

innodb_write_io_threads = 24
innodb_read_io_threads = 24
innodb_purge_threads = 4
innodb_random_read_ahead = On

innodb_support_xa = 0

innodb_autoinc_lock_mode = 2 # Galera
innodb_locks_unsafe_for_binlog = 1 # Galera

innodb_buffer_pool_load_at_startup = On
innodb_buffer_pool_dump_at_shutdown = On

log-bin = /var/lib/mysql_logs/binary/binlog
max_binlog_size = 1024M
binlog_format = ROW
binlog_cache_size = 5M
max_binlog_files = 10
expire_logs_days = 0
sync_binlog = 0

relay_log = /var/lib/mysql_logs/relay/relaylog
slave_load_tmpdir = /tmp/mysql

server-id = 12
skip-slave-start
log_slave_updates = On

log_error = "/var/log/mysql/error.log"
slow_query_log_file = "/var/log/mysql/slow.log"

wsrep_provider=/usr/lib64/libgalera_smm.so
wsrep_provider_options="gcache.size = 1G; gcache.page_size = 1G; gcache.name = /var/lib/mysql_logs/galera/galera.cache; gcs.fc_limit = 20000"

wsrep_node_address="10.10.91.4"

wsrep_cluster_name="PXC"
wsrep_node_name="natrium"

wsrep_sst_method = xtrabackup-v2
wsrep_sst_auth = sst_xtrabackup:*****

wsrep_notify_cmd = '/usr/local/bin/wsrep_notify.sh'
wsrep_replicate_myisam = On
wsrep_forced_binlog_format = ROW
wsrep_log_conflicts = Off
wsrep_auto_increment_control = On
wsrep_retry_autocommit = 10
wsrep_slave_threads = 128
wsrep_convert_LOCK_to_trx = 1

wsrep_max_ws_size = 2147483648 # 2G
wsrep_max_ws_rows = 1048576 # 1M

tags: added: pxc
tags: added: crash innobackupex sst xtrabackup
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Aleksey,

Thanks for reporting. Would it be possible to provide 'thread apply all bt' 'bt' 'thread apply all bt full' from a coredump?

You can use https://gist.github.com/ronin13/8d403c2f88826fcc7aba to generate traces.

Revision history for this message
Aleksey Sokolov (absokolov2010) wrote :
Download full text (4.2 KiB)

Hi, Raghavendra.
Thank you for reply.

For backtrace I have not used xtrabackup, and do the following:

1) mysql> lock tables for backup;
Query OK, 0 rows affected (0,00 sec)

Waiting 3-5 sec.

2) mysql> lock binlog for backup;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

Waiting 3-5 sec.

3) mysql> lock binlog for backup;
Query OK, 0 rows affected (0,00 sec)

Waiting 3-5 sec.

4) mysql> unlock tables;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

Waiting 3-5 sec.

5) mysql> unlock tables;
Query OK, 0 rows affected (0,00 sec)

Waiting 3-5 sec.

6) mysql> unlock binlog;
Query OK, 0 rows affected (0,00 sec)

After the above steps, the server crashed after 6-8 minutes. The error occurs after 2nd step.
Backtrace in an attachment to the comment.

Note: as a temporary solution to the problem, I disabled the "Backup locks" in the script innobackupex :

@@ -4966,7 +4966,7 @@
         get_mysql_slave_status($con);
     }

- $have_backup_locks = defined($con->{vars}->{have_backup_locks});
+ $have_backup_locks = 0;

Error log:

19:22:51 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=10
max_threads=2050
thread_count=131
connection_count=2
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 949471 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=10
max_threads=2050
thread_count=131
connection_count=2
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 949471 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
2015-01-21 21:22:51 10374 [Warning] WSREP: last inactive check more than PT1.5S ago (PT9M40.3504S), skipping check
2015-01-21 21:22:51 10374 [Note] WSREP: (b1f16bc9, 'tcp://0.0.0.0:4567') address 'tcp://10.10.91.4:4567' pointing to uuid b1f16bc9 is blacklisted, skipping
stack_bottom = 0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8f97d5]
/usr/sbin/mysqld(handle_fatal_signal+0x4b4...

Read more...

Revision history for this message
Aleksey Sokolov (absokolov2010) wrote :

I've repeated the procedure again, this time with coredump generation. Please find backtrace attached.

Revision history for this message
Assen-g (assen-g) wrote :

Confirming the issue on RHEL 6.6 (fully up-to-date). The issue is especially nasty after updating to latest packages few days ago:

[assen@ares ~]$ rpm -qa | grep Percona
Percona-XtraDB-Cluster-shared-56-5.6.21-25.8.938.el6.x86_64
Percona-XtraDB-Cluster-56-5.6.21-25.8.938.el6.x86_64
Percona-XtraDB-Cluster-galera-3-3.8-1.3390.rhel6.x86_64
Percona-XtraDB-Cluster-server-56-5.6.21-25.8.938.el6.x86_64
Percona-Server-shared-51-5.1.73-rel14.12.624.rhel6.x86_64
Percona-XtraDB-Cluster-client-56-5.6.21-25.8.938.el6.x86_64

[assen@ares ~]$ rpm -qa | grep xtrabackup
percona-xtrabackup-2.2.8-5059.el6.x86_64

Pls, fix as this kills each cluster members once a day (as each member is backed up once a day in a rolling fashion, so that we get backups each 8 hours).

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Assen-g

Currently the only viable workaround, if you are hitting this issue, is to use rsync SST. The fix should be in shortly.

tags: added: backup-locks
Revision history for this message
FeiWang (wfxiang08) wrote :

@Raghavendra D Prabhu (raghavendra-prabhu)
 I also encounter such problem, and when SST happens, the backup lock will be called, and node crashed, then may be the cluster crashed. Hope bug fix will come out quickly.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Note, with latest PXB 2.2.9, anyone hitting the issue can use

[sst]
inno-backup-opts='--no-backup-locks'

(with other options) in the my.cnf file.

That options forces FTWRL.

Revision history for this message
Laimonas Anusauskas (lanusauskas) wrote :

We also have what seems to be the same issue with PXB crashing PXC node on CentOS 6.6

End of log from PXB:
---- cut here -----
150303 12:52:55 innobackupex: Executing LOCK BINLOG FOR BACKUP...
DBD::mysql::db do failed: Deadlock found when trying to get lock; try restarting transaction at /usr/bin/innobackupex line 3036.
innobackupex: got a fatal error with the following stacktrace: at /usr/bin/innobackupex line 3039
        main::mysql_query('HASH(0x162a220)', 'LOCK BINLOG FOR BACKUP') called at /usr/bin/innobackupex line 3501
        main::mysql_lock_binlog('HASH(0x162a220)') called at /usr/bin/innobackupex line 2000
        main::backup() called at /usr/bin/innobackupex line 1592
innobackupex: Error:
Error executing 'LOCK BINLOG FOR BACKUP': DBD::mysql::db do failed: Deadlock found when trying to get lock; try restarting transaction at /usr/bin/innobackupex line 3036.
150303 12:52:55 innobackupex: Waiting for ibbackup (pid=2844) to finish
Innobackupex failed. Please check logfiles
---------------------------

Log from PXC:

---- cut here -----
20:52:55 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=0
read_buffer_size=131072
max_used_connections=2
max_threads=1002
thread_count=42
connection_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 131477641 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x3334940
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fb5a83c0d68 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8f97d5]
/usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x6655c4]
/lib64/libpthread.so.0(+0xf710)[0x7fb5ac0d4710]
[0x7fab840000a8]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 148
Status: KILL_CONNECTION

You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
150303 12:52:57 mysqld_safe Number of processes running now: 0
150303 12:52:57 mysqld_safe WSREP: not restarting wsrep node automatically
150303 12:52:57 mysqld_safe mysqld from pid file /var/untd/mysql/log/mysqld.pid ended
---- cut here -----

Laimonas

Revision history for this message
Laimonas Anusauskas (lanusauskas) wrote :

Percona-XtraDB-Cluster-galera-3-3.8-1.3390.rhel6.x86_64
Percona-XtraDB-Cluster-garbd-3-3.8-1.3390.rhel6.x86_64
Percona-XtraDB-Cluster-client-56-5.6.21-25.8.938.el6.x86_64
Percona-XtraDB-Cluster-shared-56-5.6.21-25.8.938.el6.x86_64
Percona-XtraDB-Cluster-server-56-5.6.21-25.8.938.el6.x86_64
percona-xtrabackup-2.2.8-5059.el6.x86_64

Revision history for this message
infernix (infernix) wrote :

We are seeing this problem on 5.6.22-25.8-978.wheezy in combination with xtrabackup 2.2.6-5042-1.wheezy:

*** glibc detected *** /usr/sbin/mysqld: malloc(): smallbin double linked list corrupted: 0x00007f5b5ac445d0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x76a16)[0x7f5e7abdea16]
/lib/x86_64-linux-gnu/libc.so.6(+0x7a2f8)[0x7f5e7abe22f8]
/lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x70)[0x7f5e7abe38a0]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZnwmRKSt9nothrow_t+0x1d)[0x7f5e7b3fc10d]
stack_bottom = 7f5ba035de50 thread_stack 0x30000
/usr/sbin/mysqld(_ZN10MDL_ticket6createEP11MDL_context13enum_mdl_type+0x25)[0x674085]
/usr/sbin/mysqld(_ZN11MDL_context21try_acquire_lock_implEP11MDL_requestPP10MDL_ticket+0xb0)[0x675490]
/usr/sbin/mysqld(_ZN11MDL_context12acquire_lockEP11MDL_requestm+0xa5)[0x675b45]
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x8ccd9e]
/usr/sbin/mysqld(handle_fatal_signal+0x36c)[0x6828dc]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0)[0x7f5e7c97f0a0]
/usr/sbin/mysqld(_ZN18Global_backup_lock18acquire_protectionEP3THD17enum_mdl_durationm+0x58)[0x66c678]
[0x7f5dcc0000a8]

xtrabackup reports:

150325 03:02:51 innobackupex: Executing LOCK BINLOG FOR BACKUP...
 150325 03:02:51 innobackupex: Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS...
 DBD::mysql::db do failed: Deadlock found when trying to get lock; try restarting transaction at /usr/bin/innobackupex line 3036.
 innobackupex: got a fatal error with the following stacktrace: at /usr/bin/innobackupex line 3039
        main::mysql_query('HASH(0x18ad1a0)', 'FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS') called at /usr/bin/innobackupex line 2024
        main::backup() called at /usr/bin/innobackupex line 1592
 innobackupex: Error:
 Error executing 'FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS': DBD::mysql::db do failed: Deadlock found when trying to get lock; try restarting transaction at /usr/bin/innobackupex line 3036.
 150325 03:02:51 innobackupex: Waiting for ibbackup (pid=33362) to finish

We will upgrade xtrabackup to 2.2.9 and test.

Revision history for this message
infernix (infernix) wrote :

Today we had innobackupex segfault mysql even with this in /etc/mysql/my.cnf:

[sst]
inno-backup-opts='--no-backup-locks'

This was with xtrabackup 2.2.10 and mysql 5.6.22-25.8-978.wheezy.

It happened at the end of an SST:

150405 01:39:12 innobackupex: Executing LOCK BINLOG FOR BACKUP...
>> log scanned up to (1084384190394)
DBD::mysql::db do failed: Deadlock found when trying to get lock; try restarting transaction at /usr//bin/innobackupex line 3044.
innobackupex: got a fatal error with the following stacktrace: at /usr//bin/innobackupex line 3047
        main::mysql_query('HASH(0x137f940)', 'LOCK BINLOG FOR BACKUP') called at /usr//bin/innobackupex line 3529
        main::mysql_lock_binlog('HASH(0x137f940)') called at /usr//bin/innobackupex line 2008
        main::backup() called at /usr//bin/innobackupex line 1601
innobackupex: Error:

We are 4 months in since this issue was reported, and 2 months since @raghavendra-prabhu reported the fix should be in soon. Has there been any progress? This is a really troublesome issue for cluster reliability.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@infernix,

Yes, with that option you shouldn't be seeing it.

From, mysql 5.6.22-25.8-978.wheezy - looks like you are not using PXC., can you provide dpkg -l | grep -i percona

Can you confirm this?

Also, make sure that option is set on donor where the backup takes place.

You can also use FORCE_FTWRL=1 in /etc/default/mysql

From your log, it looks like it is still using backup locks even with that option.

Please provide full error log of this. (and donor innobackup log)

Revision history for this message
infernix (infernix) wrote :

We've had to redo 2 out of 3 nodes from scratch so logs are gone unfortunately.

We've now edited innobackupex ($have_backup_locks = 0) to prevent this from ever reoccurring, no matter what config options are set/parsed.

ii percona-toolkit 2.2.13 all Advanced MySQL and system command-line tools
ii percona-xtrabackup 2.2.10-1.wheezy amd64 Open source backup tool for InnoDB and XtraDB
ii percona-xtradb-cluster-56 5.6.22-25.8-978.wheezy amd64 Percona XtraDB Cluster with Galera
ii percona-xtradb-cluster-client-5.6 5.6.22-25.8-978.wheezy amd64 Percona XtraDB Cluster database client binaries
ii percona-xtradb-cluster-common-5.6 5.6.22-25.8-978.wheezy amd64 Percona XtraDB Cluster database common files (e.g. /etc/mysql/my.cnf)
ii percona-xtradb-cluster-galera-3 3.9.3494.wheezy amd64 Metapackage for latest version of galera3.
ii percona-xtradb-cluster-galera-3.x 3.9.3494.wheezy amd64 Galera components of Percona XtraDB Cluster
ii percona-xtradb-cluster-server-5.6 5.6.22-25.8-978.wheezy amd64 Percona XtraDB Cluster database server binaries

Revision history for this message
Thomas Roog (thomas.roog) wrote :

It has also affected us this weekend. It happens every time there is a higher load on the cluster.
There are 2 reporting slaves attached to that node. Backup with correct binary logs positions is required to restore reporting slaves. Can we please fix it?

150226 22:17:18 innobackupex: Finished backing up non-InnoDB tables and files

150226 22:17:18 innobackupex: Executing LOCK BINLOG FOR BACKUP...
150226 22:17:18 innobackupex: Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS...
DBD::mysql::db do failed: Deadlock found when trying to get lock; try restarting transaction at /usr/bin/innobackupex line 3036.
innobackupex: got a fatal error with the following stacktrace: at /usr/bin/innobackupex line 3039.
 main::mysql_query('HASH(0x26ff468)', 'FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS') called at /usr/bin/innobackupex line 2024
 main::backup() called at /usr/bin/innobackupex line 1592
innobackupex: Error:
Error executing 'FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS': DBD::mysql::db do failed: Deadlock found when trying to get lock; try restarting transaction at /usr/bin/innobackupex line 3036.
150226 22:17:18 innobackupex: Waiting for ibbackup (pid=4367) to finish

Revision history for this message
Thomas Roog (thomas.roog) wrote :

Will stopping replication slaves during the backup help? I don't want to run FTWRL as it can cause flow control to stop all read/write operations on the other end of the cluster, while this node is only being used for backup/replication master.

root@primysql08prod1-galera3:~# dpkg -l | grep percona
ii percona-nagios-plugins 1.1.4-1 all Percona Monitoring Plugins for Nagios
ii percona-toolkit 2.2.12 all Advanced MySQL and system command-line tools
ii percona-xtrabackup 2.2.9-5067-1.trusty amd64 Open source backup tool for InnoDB and XtraDB
ii percona-xtradb-cluster-client-5.6 5.6.22-25.8-978.trusty amd64 Percona XtraDB Cluster database client binaries
ii percona-xtradb-cluster-common-5.6 5.6.22-25.8-978.trusty amd64 Percona XtraDB Cluster database common files (e.g. /etc/mysql/my.cnf)
ii percona-xtradb-cluster-galera-3.x 3.9.3494.trusty amd64 Galera components of Percona XtraDB Cluster
ii percona-xtradb-cluster-server-5.6 5.6.22-25.8-978.trusty amd64 Percona XtraDB Cluster database server binaries

Revision history for this message
Medali (medaliziedi) wrote :

Hello,
We had the same error inside our 9 nodes cluster :
 - percona-xtrabackup 2.2.9
 - percona-xtradb-cluster-server-5.6 5.6.22-25.8-978
------------------------------------------
root@db # innobackupex --help | grep lock
        This option controls if backup locks should be used instead of FLUSH
        when backup locks are not supported by the server. This option is
        enabled by default, disable with --no-backup-locks.
    --debug-sleep-before-unlock=SECONDS
        effect when backup locks are used to create the backup.
        queries that block it. Default is 0 seconds, which means
        unblock the global lock. Default is "all".
    --lock-wait-timeout=SECONDS
        for queries that would block FTWRL before running it. If there are
    --lock-wait-threshold=SECONDS
        --lock-wait-timeout. FTWRL is not started until such long-running
        queries exist. This option has no effect if --lock-wait-timeout is
    --lock-wait-query-type=all|update
        before innobackupex will issue the global lock. Default is all.
    --no-lock
        Use this option to disable table lock with "FLUSH TABLES WITH READ
        inconsistent backup. If you are considering to use --no-lock because
        your backups are failing to acquire the lock, this could be because
        of incoming replication events preventing the lock from succeeding.
-------------------
Adding :
[sst]
inno-backup-opts='--no-backup-locks'
does not resolve the problem.

When you call the binary 'xtrabackupex' you may use '--no-backup-locks' or '--no-locks' directly as parameters to resolve this pb.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Medali,

Can you provide the error (and innobackupex) logs when you used

 [sst]
inno-backup-opts='--no-backup-locks'

Alternatively you can set FORCE_FTWRL=1 in /etc/sysconfig/mysql (or its systemd service variant) for centos, and /etc/default/mysql in debian.

Revision history for this message
Reiner Rusch (reiner-9) wrote :

I got same problem and costs me some real hard days.

So to summarize:

1) Tip with --no-backup-locks did not work for @Medali
2) Next Tip: force flush with read lock ....

This is a great contrast, isn't it?
One does no locking - the other tip does a read lock explicitly.
-> 1) No locking should make the machine more responsive but could cause problems with myisam (more later)
-> 2) Could get you in great trouble with deadlocks

Both things locking tables (on one node, if all works fine) is a problem, read locks not much better.
A pitty is to read this (see last block) after searching for hints for hours:
https://www.percona.com/doc/percona-xtradb-cluster/5.6/limitation.html
This also doesn't to be real safe. Hmm. (I wish this last block was in red and fat!)

As I guess (still testing) the only solution would be this:
https://www.percona.com/blog/2012/03/23/how-flush-tables-with-read-lock-works-with-innodb-tables/
So don't do locking at all (perhaps forbit) and do innobackupex with:
--no-lock in combination with --save-slave-backup if interacting with a slave.

Any ideas?

Revision history for this message
Reiner Rusch (reiner-9) wrote :

... not to forget: teach your customers not to use myisam any more ;-)

Revision history for this message
Alexey Kopytov (akopytov) wrote :

There are 2 parts to this bug report:

1. Deadlock is triggered by the LOCK BINLOG FOR BACKUP statement
executed by XtraBackup
2. The XtraBackup connection is closed due to the above error while
holding a backup (TABLE) lock. Which results in a heap corruption and a
server crash, possibly some time later.

I believe #2 has been reported and fixed in PS 5.6.24-72.2 with
https://github.com/percona/percona-server/pull/26

Which means the latest PXC release 5.6.24-25.11 should not crash when
backup locks are used by XtraBackup if my
assumption is correct, but XtraBackup may still fail with an error due
to #1.

#1 (as in, a deadlock with LOCK BINLOG FOR BACKUP) looks like a
PXC-specific issue, but so far I’ve been unable to reproduce it
neither with PXC 5.6.22-25.8 nor with PXC 5.6.24-25.11. Here’s what I
tried:

1. sysbench OLTP_RW workload + concurrent LOCK * FOR BACKUP statements
in the same sequence as executed by XtraBackup on the same node
2. sysbench autocommit updates + same concurrent LOCK * FOR BACKUP
statements on the same
3. the above 2 case, but with LOCK * FOR BACKUP statements executed on
the same node where queries coming from sysbench are executed
4. all of the above cases with binary logging enabled.

At this point I’m out of ideas on how to reproduce the deadlock with
LOCK BINLOG FOR BACKUP issue.

If you can still reproduce either #1 or #2 with PXC 5.6.24-25.11 or a
later release, please provide your my.cnf _and_ at least a general
description of your workload (tables, queries, storage engines involved,
etc.)

Thank you.

Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

Anyone could reproduce it after the update fix for #2 as mentioned in comment#21 ?

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1020

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.