Percona backup files with signal 11
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona XtraBackup moved to https://jira.percona.com/projects/PXB |
New
|
Undecided
|
Unassigned |
Bug Description
running the following command:
innobackupex --kill-
output:
171204 09:26:43 Killing query 21512388 (duration 1078 sec): SELECT MAX(`cloudtrail
her_time, region_name AS region_name FROM `cloudtrail_
cloudtrail_
171204 09:26:43 Killing query 21512527 (duration 1236 sec): SELECT MAX(`cloudtrail
her_time, region_name AS region_name FROM `cloudtrail_
cloudtrail_
171204 09:26:43 Killing query 21512539 (duration 1101 sec): SELECT MAX(`cloudtrail
her_time, region_name AS region_name FROM `cloudtrail_
cloudtrail_
09:26:43 UTC - xtrabackup got signal 11 ;
This could be because you hit a bug or data is corrupted.
This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x10000
innobackupex(
innobackupex(
/lib/x86_
/lib/x86_
innobackupex[
/lib/x86_
/lib/x86_
Please report a bug at https:/
Shahriyar Rzayev (rzayev-sehriyar) wrote : | #1 |
Madolyn Sullivan (madolyns) wrote : | #2 |
xtrabackup version 2.4.8
planning to upgrade to 2.4.9 before end of year.
my.cnf
[client]
port = 3306
socket = /var/run/
# Here is entries for some specific programs
# The following values assume you have at least 32M ram
# This was formally known as [safe_mysqld]. Both versions are currently parsed.
[mysqld_safe]
socket = /var/run/
nice = 0
[mysqld]
#
# * Basic Settings
#
#
# * IMPORTANT
# If you make changes to these settings and your system uses apparmor, you may
# also need to also adjust /etc/apparmor.
#
init_connect="set autocommit=1"
user = mysql
pid-file = /var/run/
socket = /var/run/
port = 3306
basedir = /usr
datadir = /mnt/lvm_
tmpdir = /mnt/lvm_stripe/tmp
skip-external-
# Charset and Collation
character-
collation-server = utf8_general_ci
event_scheduler = 0
#
# Instead of skip-networking the default is now to listen only on
# localhost which is more compatible and is not less secure.
bind-address = 0.0.0.0
#
# * Fine Tuning
#
key_buffer_size = 256M
max_allowed_packet = 67108864
thread_stack = 256K
thread_cache_size = 8
sort_buffer_size = 16M
read_buffer_size = 16M
read_rnd_
join_buffer_size = 512M
auto-increment-
auto-increment-
# This replaces the startup script and checks MyISAM tables if needed
# the first time they are touched
myisam-recover = BACKUP
max_connections = 3200
max_connect_errors = 10
concurrent_insert = 2
connect_timeout = 10
wait_timeout = 180
net_read_timeout = 30
net_write_timeout = 30
back_log = 128
tmp_table_size = 32M
max_heap_table_size = 32M
bulk_insert_
open-files-limit = 1024
# Default Table Settings
#
# * Query Cache Configuration
#
query_cache_limit = 1M
query_cache_size = 16M
query_cache_type = 1
#
# * Logging
#
# Both location gets rotated by the cronjob.
# Be aware that this log type is a performance killer.
#log = /var/log/
#
# Error logging goes to syslog. This is a Debian improvement :)
log_error = /var/log/
#
# * Replication
#
#
# Here you can see queries with especially long duration
slow_query_log = 1
slow_query_log_file = /var/log/
long_query_time = 30
#
# The following can be used as easy to replay backup logs or for replication.
# note: if you are setting up a replication slave, see README.Debian about
# other settings you may need to change.
server-id = 33981
log_bin = /mnt/lvm_
binlog_format = mixed
log_slave_updates = false
expire_logs_days = 3
max_binlog_size = 100M
binlog_cache_size = 32K
#binlog_do_db = include_
#binlog_ignore_db = include_
sync_binlog = 0
slave_compresse
#
# * Inn...
Madolyn Sullivan (madolyns) wrote : | #3 |
I have seen this in the past when it has tried to kill the same query twice, but that was not the case with the 1st example i sent. here is another example , but tried to kill a query that it has already killed:
171203 19:20:36 Killing query 7779197 (duration 1311 sec): SELECT id, instance_id, aws_dedicated_
es`.`customer_id` = 889 AND `aws_instances`
171203 19:20:36 >> log scanned up to (1292144075016)
171203 19:20:36 Killing query 7777546 (duration 1915 sec): SELECT id, instance_id, aws_dedicated_
es`.`customer_id` = 889 AND `aws_instances`
171203 19:20:36 Killing query 7777547 (duration 1915 sec): SELECT id, instance_id, aws_dedicated_
es`.`customer_id` = 889 AND `aws_instances`
171203 19:20:36 Killing query 7777548 (duration 1915 sec): SELECT id, instance_id, aws_dedicated_
es`.`customer_id` = 889 AND `aws_instances`
171203 19:20:36 Killing query 7777549 (duration 1846 sec): SELECT aws_instance_
and (created_at < DATE_ADD(
17-05' or (is_active=1 ) ))
171203 19:20:36 Killing query 7777550 (duration 1465 sec): SELECT volume_id, aws_availabilit
stomer_id` = 889
171203 19:20:36 Killing query 7779129 (duration 1413 sec): SELECT volume_id, aws_availabilit
stomer_id` = 889
171203 19:20:36 Killing query 7779139 (duration 1256 sec): SELECT instance_id, aws_availabilit
s`.`customer_id` = 889
171203 19:20:36 Killing query 7779197 (duration 1311 sec): SELECT id, instance_id, aws_dedicated_
es`.`customer_id` = 889 AND `aws_instances`
171203 19:20:36 Killing query 7777546 (duration 1915 sec): SELECT id, instance_id, aws_dedicated_
es`.`customer_id` = 889 AND `aws_instances`
171203 19:20:36 Killing query 7777547 (duration 1915 sec): SELECT id, instance_id, aws_dedicated_
es`.`customer_id` = 889 AND `aws_instances`
171203 19:20:36 Killing query 7777548 (duration 1915 sec): SELECT id, instance_id, aws_dedicated_
es`.`customer_id` = 889 AND `aws_instances`
171203 19:20:36 Killing query 7777549 (duration 1846 sec): SELECT aws_instance_
and (created_at < DATE_ADD(
17-05' or (is_active=1 ) ))
171203 19:20:36 Killing query 7777550 (duration 1465 sec): SELECT volume_id, aws_availabilit
Shahriyar Rzayev (rzayev-sehriyar) wrote : | #4 |
Hi,
Sorry for late reply.
I have tested with PXB 2.4.9 + PS 5.7(an ordinary server) and could not reproduce the segfault with repeated killing. Do you have some another info which can help to reproduce this issue?
171207 21:42:42 Executing FLUSH TABLES WITH READ LOCK...
171207 21:42:42 Kill query timeout 1 seconds.
171207 21:42:43 Connecting to MySQL server host: localhost, user: root, password: set, port: not set, socket: /home/shahriyar
erver-5.
171207 21:42:43 Killing query 768 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_
171207 21:42:43 Killing query 769 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_
171207 21:42:43 Killing query 770 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_
171207 21:42:43 Killing query 771 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_
171207 21:42:43 Killing query 772 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_
171207 21:42:43 Killing query 773 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_
171207 21:42:43 Killing query 774 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_
171207 21:42:43 Killing query 775 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_
171207 21:42:43 Killing query 776 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_
171207 21:42:43 Killing query 777 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_
171207 21:42:43 Killing query 778 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_
171207 21:42:43 Killing query 779 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_
171207 21:42:43 Killing query 780 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_
171207 21:42:43 Killing query 781 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_
171207 21:42:43 Killing query 782 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_
171207 21:42:43 Killing query 783 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_
171207 21:42:43 Killing query 784 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_
171207 21:42:43 Killing query 785 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_
171207 21:42:43 Killing query 786 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_
171207 21:42:43 Killing query 787 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_
171207 21:42:43 Killing query 788 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_
171207 21:42:43 Killing query 789 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_
171207 21:42:43 Killing query 790 (duration 22 sec): select benchmar...
Madolyn Sullivan (madolyns) wrote : | #5 |
Seems to happen when killing long running queries
Have another example of the issue:
innobackupex --version
innobackupex version 2.4.9 Linux (x86_64) (revision id: a467167cdd4)
command:
innobackupex --kill-
171208 16:05:34 [01] Compressing ./cloudpercept4
171208 16:05:34 [01] ...done
171208 16:05:34 >> log scanned up to (1320480395795)
171208 16:05:34 Executing FLUSH TABLES WITH READ LOCK...
171208 16:05:34 Kill query timeout 900 seconds.
171208 16:05:35 >> log scanned up to (1320480495779)
171208 16:05:36 >> log scanned up to (1320480500250)
171208 16:05:37 >> log scanned up to (1320480500270)
171208 16:05:38 >> log scanned up to (1320480500280)
171208 16:05:39 >> log scanned up to (1320480500290)
171208 16:05:40 >> log scanned up to (1320480500300)
171208 16:05:41 >> log scanned up to (1320480500310)
171208 16:05:42 >> log scanned up to (1320480500320)
171208 16:05:43 >> log scanned up to (1320480500330)
171208 16:05:44 >> log scanned up to (1320480500330)
171208 16:05:45 >> log scanned up to (1320480500350)
171208 16:05:46 >> log scanned up to (1320480500350)
171208 16:05:47 >> log scanned up to (1320480500360)
.....
....
171208 16:20:28 >> log scanned up to (1320480504999)
171208 16:20:29 >> log scanned up to (1320480504999)
171208 16:20:30 >> log scanned up to (1320480504999)
171208 16:20:31 >> log scanned up to (1320480505009)
171208 16:20:32 >> log scanned up to (1320480505009)
171208 16:20:33 >> log scanned up to (1320480505009)
171208 16:20:34 Connecting to MySQL server host: localhost, user: root, password: set, port: 3306, socket: /var/run/
171208 16:20:34 Killing query 8406100 (duration 900 sec): INSERT INTO `aws_instance_
171208 16:20:34 >> log scanned up to (1320482193245)
171208 16:20:36 >> log scanned up to (1320494626970)
171208 16:20:38 >> log scanned up to (1320507474637)
16:20:38 UTC - xtrabackup got signal 11 ;
This could be because you hit a bug or data is corrupted.
This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collectio...
Madolyn Sullivan (madolyns) wrote : | #6 |
Any update on this. This is happening on all our databases. Happens every time it has to kill long running query(s).
Another example from today:
{code}
171214 17:09:48 [01] Compressing ./cloudpercept3
171214 17:09:49 >> log scanned up to (6064103372944)
171214 17:09:50 >> log scanned up to (6064103658133)
171214 17:09:51 >> log scanned up to (6064103658133)
171214 17:09:52 >> log scanned up to (6064103658133)
171214 17:09:53 >> log scanned up to (6064103658133)
171214 17:09:54 >> log scanned up to (6064103659372)
171214 17:09:55 >> log scanned up to (6064103659372)
171214 17:09:56 >> log scanned up to (6064103660003)
171214 17:09:57 >> log scanned up to (6064103660003)
171214 17:09:58 >> log scanned up to (6064103660003)
171214 17:09:59 >> log scanned up to (6064103660003)
171214 17:10:00 >> log scanned up to (6064103660650)
171214 17:10:01 >> log scanned up to (6064103661273)
171214 17:10:02 >> log scanned up to (6064103661273)
171214 17:10:03 >> log scanned up to (6064103671412)
171214 17:10:04 >> log scanned up to (6064103694885)
171214 17:10:05 >> log scanned up to (6064103694885)
171214 17:10:06 [01] ...done
171214 17:10:06 [01] Compressing ./sys/sys_
171214 17:10:06 [01] ...done
171214 17:10:06 >> log scanned up to (6064103705029)
171214 17:10:07 Executing FLUSH TABLES WITH READ LOCK...
171214 17:10:07 Kill query timeout 900 seconds.
171214 17:10:07 >> log scanned up to (6064103706307)
171214 17:10:08 >> log scanned up to (6064103706307)
171214 17:10:09 >> log scanned up to (6064103706307)
171214 17:10:10 >> log scanned up to (6064103706307)
171214 17:10:11 >> log scanned up to (6064103706307)
171214 17:10:12 >> log scanned up to (6064103706307)
171214 17:10:13 >> log scanned up to (6064103706307)
171214 17:10:14 >> log scanned up to (6064103706317)
171214 17:10:15 >> log scanned up to (6064103706317)
171214 17:10:16 >> log scanned up to (6064103706317)
171214 17:10:17 >> log scanned up to (6064103706317)
171214 17:10:18 >> log scanned up to (6064103706317)
171214 17:10:19 >> log scanned up to (6064103706317)
171214 17:10:20 >> log scanned up to (6064103706317)
171214 17:10:21 >> log scanned up to (6064103706317)
171214 17:10:22 >> log scanned up to (6064103706317)
171214 17:10:23 >> log scanned up to (6064103706327)
171214 17:10:24 >> log scanned up to (6064103706327)
171214 17:10:25 >> log scanned up to (6064103706327)
171214 17:10:26 >> log scanned up to (6064103706327)
171214 17:10:27 >> log scanned up to (6064103706327)
171214 17:10:28 >> log scanned up to (6064103706327)
171214 17:10:29 >> log scanned up to (6064103706327)
171214 17:10:30 >> log scanned up to (6064103706327)
171214 17:10:31 >> log scanned up to (6064103706327)
171214 17:10:32 >> log scanned up to (6064103706327)
171214 17:10:33 >> log scanned up to (6064103706327)
171214 17:10:34 >> log scanned up to (6064103706327)
171214 17:10:35 >> log scanned up to (6064103706327)
171214 17:10:36 >> log scanned up to (6...
Shahriyar Rzayev (rzayev-sehriyar) wrote : | #7 |
Hi, still unable to reproduce with select queries on my test case, gave 900 seconds as you did:
171215 14:09:53 >> log scanned up to (53580790)
171215 14:09:53 Waiting 1 seconds for queries running longer than 900 seconds to finish
171215 14:09:53 Executing FLUSH TABLES WITH READ LOCK...
171215 14:09:53 Kill query timeout 900 seconds.
171215 14:09:54 >> log scanned up to (53580790)
171215 14:09:55 >> log scanned up to (53580790)
171215 14:09:56 >> log scanned up to (53580790)
171215 14:09:57 >> log scanned up to (53580790)
.
.
171215 14:24:53 Killing query 538 (duration 923 sec): select benchmark(9999999, md5(c)) from sysbench_
171215 14:24:53 Killing query 539 (duration 923 sec): select benchmark(9999999, md5(c)) from sysbench_
171215 14:24:53 Killing query 540 (duration 923 sec): select benchmark(9999999, md5(c)) from sysbench_
171215 14:24:53 Killing query 541 (duration 923 sec): select benchmark(9999999, md5(c)) from sysbench_
171215 14:24:53 Killing query 542 (duration 923 sec): select benchmark(9999999, md5(c)) from sysbench_
171215 14:24:53 Killing query 543 (duration 923 sec): select benchmark(9999999, md5(c)) from sysbench_
171215 14:24:53 Killing query 544 (duration 923 sec): select benchmark(9999999, md5(c)) from sysbench_
171215 14:24:53 Kill query thread stopped
171215 14:24:53 Starting to backup non-InnoDB tables and files
Looking to your output:
INSERT INTO `cloudtrail_
171214 17:25:10 >> log scanned up to (6070245752182)
17:25:10 UTC - xtrabackup got signal 11 ;
Could you please paste the `cloudtrail_
Madolyn Sullivan (madolyns) wrote : | #8 |
it seems like you are running a different command than I am running? I'm not getting this message:
Waiting 1 seconds for queries running longer than 900 seconds to finish
What command are you running?
I also see this crash when the kill query returns:
Error: failed to execute query KILL 7779129: Unknown thread id: 7779129
It had already issued this kill command, perhaps the transaction was rolling back, and it tried to kill it again?
Does the rest of the info i sent with this ticket help?
09:26:43 UTC - xtrabackup got signal 11 ;
This could be because you hit a bug or data is corrupted.
This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x10000
innobackupex(
innobackupex(
/lib/x86_
/lib/x86_
innobackupex[
/lib/x86_
/lib/x86_
Here is the table structure
`cloudtrail_
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`customer_id` int(11) NOT NULL,
`aws_account_id` bigint(20) NOT NULL,
`region_name` varchar(255) NOT NULL,
`gather_time` datetime NOT NULL,
`fetched` tinyint(1) DEFAULT NULL,
`key_file` varchar(255) NOT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `by_all_unique` (`aws_account_
KEY `index_
KEY `index_
KEY `processing_
) ENGINE=InnoDB AUTO_INCREMENT=
Shahriyar Rzayev (rzayev-sehriyar) wrote : | #9 |
Now it looks like yours, but still no success to reproduce:
171218 12:14:40 Executing FLUSH TABLES WITH READ LOCK...
171218 12:14:40 Kill query timeout 900 seconds.
171218 12:14:40 >> log scanned up to (53462105)
171218 12:14:41 >> log scanned up to (53462210)
.
.
171218 14:33:41 Killing query 939 (duration 1030 sec): insert into sysbench_
171218 14:33:41 Killing query 945 (duration 1030 sec): insert into sysbench_
171218 14:33:41 Killing query 970 (duration 1029 sec): insert into sysbench_
171218 14:33:41 Killing query 982 (duration 1029 sec): insert into sysbench_
171218 14:33:41 Killing query 984 (duration 1029 sec): insert into sysbench_
171218 14:33:41 Killing query 812 (duration 1035 sec): insert into sysbench_
171218 14:33:41 Killing query 843 (duration 1033 sec): insert into sysbench_
171218 14:33:41 Killing query 893 (duration 1031 sec): insert into sysbench_
171218 14:33:41 Killing query 930 (duration 1030 sec): insert into sysbench_
Error: failed to execute query KILL 930: Unknown thread id: 930
171218 14:33:42 Killing query 939 (duration 1030 sec): insert into sysbench_
Error: failed to execute query KILL 939: Unknown thread id: 939
171218 14:33:42 Killing query 945 (duration 1030 sec): insert into sysbench_
Error: failed to execute query KILL 945: Unknown thread id: 945
171218 14:33:42 Killing query 970 (duration 1029 sec): insert into sysbench_
Error: failed to execute query KILL 970: Unknown thread id: 970
171218 14:33:42 Killing query 982 (duration 1029 sec): insert into sysbench_
Error: failed to execute query KILL 982: Unknown thread id: 982
171218 14:33:42 Killing query 984 (duration 1029 sec): insert into sysbench_
Error: failed to execute query KILL 984: Unknown thread id: 984
171218 14:33:42 Kill query thread stopped
171218 14:33:42 Starting to backup non-InnoDB tables and files
Madolyn Sullivan (madolyns) wrote : | #10 |
And the info i provided:
This could be because you hit a bug or data is corrupted.
This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x10000
innobackupex(
innobackupex(
/lib/
/lib/
innobackupex[
/lib/
/lib/
is not helping to troubleshoot this issue?
Shahriyar Rzayev (rzayev-sehriyar) wrote : | #11 |
Actually not.
What you can do to further dig into the problem(so this is general procedure).
Basically you have 2 choices here:
* To build xtrabackup from source with DEBUG which is described below:
Read first this DOC, to install dependency packages prior executing commands below:
https:/
cd to some folder.
git clone https:/
cd percona-xtrabackup
rm -rf target
mkdir -p target
cmake -DCMAKE_
make install
mkdir ./target/
mv ./target/bin ./target/
tar -zcf ../percona-
# After doing this you will have percona-
# After extracting, you can find innobackupex binary inside extracted folder.
Say, you have build and extracted now just replace new binary path with old one in your backup script and append --core-file option to get core dump.
* Or try to install debuginfo package of xtrabackup to get debug symbols. For Centos:
yum install percona-
At this time just append --core-file option and try to get core dump.
If you could get the coredump file, open it with gdb, syntax is something like this:
$ gdb full_path_
You will be in debugger, then type:
bt # paste the output as comment here
and
bt full # paste the output as separate comment here
I hope those outputs will give in which line of the code the crash happened.
Shahriyar Rzayev (rzayev-sehriyar) wrote : | #12 |
Percona now uses JIRA for bug reports so this bug report is migrated to: https:/
Hi,
Could you please specify the version of xtrabackup and MySQL server and provide my.cnf?
Also any other info to help understand and reproduce the issue will be appreciated.