Percona backup files with signal 11

Bug #1736177 reported by Madolyn Sullivan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraBackup moved to https://jira.percona.com/projects/PXB
New
Undecided
Unassigned

Bug Description

running the following command:
innobackupex --kill-wait-query-type=all --kill-long-queries-timeout=900 --kill-long-query-type=all --compress --compress-threads=4 /mnt/restore/innobackupex/2017_12_04/base --no-timestamp

output:
171204 09:26:43 Killing query 21512388 (duration 1078 sec): SELECT MAX(`cloudtrail_track_files`.`gather_time`) AS maximum_gat\
her_time, region_name AS region_name FROM `cloudtrail_track_files` WHERE `cloudtrail_track_files`.`customer_id` = 1733 AND `\
cloudtrail_track_files`.`aws_account_id` = 137 GROUP BY region_name
171204 09:26:43 Killing query 21512527 (duration 1236 sec): SELECT MAX(`cloudtrail_track_files`.`gather_time`) AS maximum_gat\
her_time, region_name AS region_name FROM `cloudtrail_track_files` WHERE `cloudtrail_track_files`.`customer_id` = 1791 AND `\
cloudtrail_track_files`.`aws_account_id` = 859 GROUP BY region_name
171204 09:26:43 Killing query 21512539 (duration 1101 sec): SELECT MAX(`cloudtrail_track_files`.`gather_time`) AS maximum_gat\
her_time, region_name AS region_name FROM `cloudtrail_track_files` WHERE `cloudtrail_track_files`.`customer_id` = 1800 AND `\
cloudtrail_track_files`.`aws_account_id` = 1798 GROUP BY region_name
09:26:43 UTC - xtrabackup got signal 11 ;
This could be because you hit a bug or data is corrupted.
This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x10000
innobackupex(my_print_stacktrace+0x2c)[0xbe1bec]
innobackupex(handle_fatal_signal+0x262)[0xa24dc2]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f54e3e5e330]
/lib/x86_64-linux-gnu/libc.so.6(+0x3d467)[0x7f54e1d82467]
innobackupex[0x738d8c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7f54e3e56184]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f54e1e42ffd]

Please report a bug at https://bugs.launchpad.net/percona-xtrabackup

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Hi,
Could you please specify the version of xtrabackup and MySQL server and provide my.cnf?
Also any other info to help understand and reproduce the issue will be appreciated.

Revision history for this message
Madolyn Sullivan (madolyns) wrote :
Download full text (5.3 KiB)

xtrabackup version 2.4.8
planning to upgrade to 2.4.9 before end of year.

my.cnf
[client]
port = 3306
socket = /var/run/mysqld/mysqld.sock

# Here is entries for some specific programs
# The following values assume you have at least 32M ram

# This was formally known as [safe_mysqld]. Both versions are currently parsed.
[mysqld_safe]
socket = /var/run/mysqld/mysqld.sock
nice = 0

[mysqld]
#
# * Basic Settings
#

#
# * IMPORTANT
# If you make changes to these settings and your system uses apparmor, you may
# also need to also adjust /etc/apparmor.d/usr.sbin.mysqld.
#
init_connect="set autocommit=1"

user = mysql
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
port = 3306
basedir = /usr
datadir = /mnt/lvm_stripe/mysql
tmpdir = /mnt/lvm_stripe/tmp
skip-external-locking

# Charset and Collation
character-set-server = utf8
collation-server = utf8_general_ci
event_scheduler = 0

#
# Instead of skip-networking the default is now to listen only on
# localhost which is more compatible and is not less secure.
bind-address = 0.0.0.0
#
# * Fine Tuning
#
key_buffer_size = 256M
max_allowed_packet = 67108864
thread_stack = 256K
thread_cache_size = 8
sort_buffer_size = 16M
read_buffer_size = 16M
read_rnd_buffer_size = 16M
join_buffer_size = 512M

auto-increment-increment = 1
auto-increment-offset = 1

# This replaces the startup script and checks MyISAM tables if needed
# the first time they are touched
myisam-recover = BACKUP
max_connections = 3200
max_connect_errors = 10
concurrent_insert = 2
connect_timeout = 10
wait_timeout = 180
net_read_timeout = 30
net_write_timeout = 30
back_log = 128
tmp_table_size = 32M
max_heap_table_size = 32M
bulk_insert_buffer_size = 32M
open-files-limit = 1024

# Default Table Settings

#
# * Query Cache Configuration
#
query_cache_limit = 1M
query_cache_size = 16M
query_cache_type = 1
#
# * Logging
#
# Both location gets rotated by the cronjob.
# Be aware that this log type is a performance killer.
#log = /var/log/mysql/mysql.log
#
# Error logging goes to syslog. This is a Debian improvement :)
log_error = /var/log/mysql/myerror.log
#
# * Replication
#

#
# Here you can see queries with especially long duration
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow.log

long_query_time = 30
#
# The following can be used as easy to replay backup logs or for replication.
# note: if you are setting up a replication slave, see README.Debian about
# other settings you may need to change.
server-id = 33981
log_bin = /mnt/lvm_stripe/mysql/binlogs/binlog.log
binlog_format = mixed
log_slave_updates = false
expire_logs_days = 3
max_binlog_size = 100M
binlog_cache_size = 32K
#binlog_do_db = include_database_name
#binlog_ignore_db = include_database_name

sync_binlog = 0

slave_compressed_protocol = 0
#
# * Inn...

Read more...

Revision history for this message
Madolyn Sullivan (madolyns) wrote :
Download full text (6.4 KiB)

I have seen this in the past when it has tried to kill the same query twice, but that was not the case with the 1st example i sent. here is another example , but tried to kill a query that it has already killed:

171203 19:20:36 Killing query 7779197 (duration 1311 sec): SELECT id, instance_id, aws_dedicated_host_id FROM `aws_instances` WHERE `aws_instanc\
es`.`customer_id` = 889 AND `aws_instances`.`tenancy` = 'host'
171203 19:20:36 >> log scanned up to (1292144075016)
171203 19:20:36 Killing query 7777546 (duration 1915 sec): SELECT id, instance_id, aws_dedicated_host_id FROM `aws_instances` WHERE `aws_instanc\
es`.`customer_id` = 889 AND `aws_instances`.`tenancy` = 'host'
171203 19:20:36 Killing query 7777547 (duration 1915 sec): SELECT id, instance_id, aws_dedicated_host_id FROM `aws_instances` WHERE `aws_instanc\
es`.`customer_id` = 889 AND `aws_instances`.`tenancy` = 'host'
171203 19:20:36 Killing query 7777548 (duration 1915 sec): SELECT id, instance_id, aws_dedicated_host_id FROM `aws_instances` WHERE `aws_instanc\
es`.`customer_id` = 889 AND `aws_instances`.`tenancy` = 'host'
171203 19:20:36 Killing query 7777549 (duration 1846 sec): SELECT aws_instance_type_id, instance_id FROM `aws_instances` WHERE (customer_id=889 \
and (created_at < DATE_ADD('2017-05-01' , INTERVAL 1 MONTH) or launch_time < DATE_ADD('2017-05-01' , INTERVAL 1 MONTH)) and (inactivated_at > '20\
17-05' or (is_active=1 ) ))
171203 19:20:36 Killing query 7777550 (duration 1465 sec): SELECT volume_id, aws_availability_zone_id FROM `aws_volumes` WHERE `aws_volumes`.`cu\
stomer_id` = 889
171203 19:20:36 Killing query 7779129 (duration 1413 sec): SELECT volume_id, aws_availability_zone_id FROM `aws_volumes` WHERE `aws_volumes`.`cu\
stomer_id` = 889
171203 19:20:36 Killing query 7779139 (duration 1256 sec): SELECT instance_id, aws_availability_zone_id FROM `aws_instances` WHERE `aws_instance\
s`.`customer_id` = 889
171203 19:20:36 Killing query 7779197 (duration 1311 sec): SELECT id, instance_id, aws_dedicated_host_id FROM `aws_instances` WHERE `aws_instanc\
es`.`customer_id` = 889 AND `aws_instances`.`tenancy` = 'host'
171203 19:20:36 Killing query 7777546 (duration 1915 sec): SELECT id, instance_id, aws_dedicated_host_id FROM `aws_instances` WHERE `aws_instanc\
es`.`customer_id` = 889 AND `aws_instances`.`tenancy` = 'host'
171203 19:20:36 Killing query 7777547 (duration 1915 sec): SELECT id, instance_id, aws_dedicated_host_id FROM `aws_instances` WHERE `aws_instanc\
es`.`customer_id` = 889 AND `aws_instances`.`tenancy` = 'host'
171203 19:20:36 Killing query 7777548 (duration 1915 sec): SELECT id, instance_id, aws_dedicated_host_id FROM `aws_instances` WHERE `aws_instanc\
es`.`customer_id` = 889 AND `aws_instances`.`tenancy` = 'host'
171203 19:20:36 Killing query 7777549 (duration 1846 sec): SELECT aws_instance_type_id, instance_id FROM `aws_instances` WHERE (customer_id=889 \
and (created_at < DATE_ADD('2017-05-01' , INTERVAL 1 MONTH) or launch_time < DATE_ADD('2017-05-01' , INTERVAL 1 MONTH)) and (inactivated_at > '20\
17-05' or (is_active=1 ) ))
171203 19:20:36 Killing query 7777550 (duration 1465 sec): SELECT volume_id, aws_availability_zone_id FROM `aws_...

Read more...

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :
Download full text (7.0 KiB)

Hi,
Sorry for late reply.
I have tested with PXB 2.4.9 + PS 5.7(an ordinary server) and could not reproduce the segfault with repeated killing. Do you have some another info which can help to reproduce this issue?

171207 21:42:42 Executing FLUSH TABLES WITH READ LOCK...
171207 21:42:42 Kill query timeout 1 seconds.
171207 21:42:43 Connecting to MySQL server host: localhost, user: root, password: set, port: not set, socket: /home/shahriyar.rzaev/XB_TEST/server_dir/PS061217-percona-s
erver-5.7.20-18-linux-x86_64-debug/socket.sock
171207 21:42:43 Killing query 768 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest20
171207 21:42:43 Killing query 769 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest21
171207 21:42:43 Killing query 770 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest22
171207 21:42:43 Killing query 771 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest23
171207 21:42:43 Killing query 772 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest24
171207 21:42:43 Killing query 773 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest20
171207 21:42:43 Killing query 774 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest21
171207 21:42:43 Killing query 775 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest22
171207 21:42:43 Killing query 776 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest23
171207 21:42:43 Killing query 777 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest24
171207 21:42:43 Killing query 778 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest20
171207 21:42:43 Killing query 779 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest21
171207 21:42:43 Killing query 780 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest22
171207 21:42:43 Killing query 781 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest23
171207 21:42:43 Killing query 782 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest24
171207 21:42:43 Killing query 783 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest20
171207 21:42:43 Killing query 784 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest21
171207 21:42:43 Killing query 785 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest22
171207 21:42:43 Killing query 786 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest23
171207 21:42:43 Killing query 787 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest24
171207 21:42:43 Killing query 788 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest20
171207 21:42:43 Killing query 789 (duration 22 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest21
171207 21:42:43 Killing query 790 (duration 22 sec): select benchmar...

Read more...

Revision history for this message
Madolyn Sullivan (madolyns) wrote :
Download full text (3.8 KiB)

Seems to happen when killing long running queries
Have another example of the issue:

innobackupex --version
innobackupex version 2.4.9 Linux (x86_64) (revision id: a467167cdd4)

command:
innobackupex --kill-wait-query-type=all --kill-long-queries-timeout=900 --kill-long-query-type=all --compress --compress-threads=4 /mnt/restore/innobackupex/2017_12_08/incr_2017_12_08_16_00_01 --no-timestamp --incremental --incremental-basedir=/mnt/restore/innobackupex/2017_12_08//incr_2017_12_08_15_00_01/

171208 16:05:34 [01] Compressing ./cloudpercept47/azure_reservation_orders_azure_tags.ibd to /mnt/restore/innobackupex/2017_12_08/incr_2017_12_08_16_00_01/cloudpercept47/azure_reservation_orders_azure_tags.ibd.delta.qp
171208 16:05:34 [01] ...done
171208 16:05:34 >> log scanned up to (1320480395795)
171208 16:05:34 Executing FLUSH TABLES WITH READ LOCK...
171208 16:05:34 Kill query timeout 900 seconds.
171208 16:05:35 >> log scanned up to (1320480495779)
171208 16:05:36 >> log scanned up to (1320480500250)
171208 16:05:37 >> log scanned up to (1320480500270)
171208 16:05:38 >> log scanned up to (1320480500280)
171208 16:05:39 >> log scanned up to (1320480500290)
171208 16:05:40 >> log scanned up to (1320480500300)
171208 16:05:41 >> log scanned up to (1320480500310)
171208 16:05:42 >> log scanned up to (1320480500320)
171208 16:05:43 >> log scanned up to (1320480500330)
171208 16:05:44 >> log scanned up to (1320480500330)
171208 16:05:45 >> log scanned up to (1320480500350)
171208 16:05:46 >> log scanned up to (1320480500350)
171208 16:05:47 >> log scanned up to (1320480500360)
.....
....
171208 16:20:28 >> log scanned up to (1320480504999)
171208 16:20:29 >> log scanned up to (1320480504999)
171208 16:20:30 >> log scanned up to (1320480504999)
171208 16:20:31 >> log scanned up to (1320480505009)
171208 16:20:32 >> log scanned up to (1320480505009)
171208 16:20:33 >> log scanned up to (1320480505009)
171208 16:20:34 Connecting to MySQL server host: localhost, user: root, password: set, port: 3306, socket: /var/run/mysqld/mysqld.sock
171208 16:20:34 Killing query 8406100 (duration 900 sec): INSERT INTO `aws_instance_usage_hours_monthlies` (`avg_amortized_cost`, `aws_instance_id`, `compute_cost`, `customer_id`, `ebs_io_cost`, `ebs_piops_cost`, `ebs_storage_cost`, `ebs_total_cost`, `ec2_dedicated_tenancy_cost`, `ec2_io_cost`, `ec2_optimized_ebs_surcharge_cost`, `ec2_other_cost`, `full_cost`, `job_execution_version_id`, `month`, `on_demand_hours`, `reserved_hours`, `spot_hours`, `total_cost`, `total_hours`, `used_pct`) VALUES (0.0, 3161096194893, 0.04841926, 889, NULL, NULL, NULL, NULL, 0.0, 0.0000028, 0.0, 0.0, 0.04842206, 3161095937272, '2017-04-01', 0.0, 0.0, 1.0, 0.04842206, 1.0, 0)
171208 16:20:34 >> log scanned up to (1320482193245)
171208 16:20:36 >> log scanned up to (1320494626970)
171208 16:20:38 >> log scanned up to (1320507474637)
16:20:38 UTC - xtrabackup got signal 11 ;
This could be because you hit a bug or data is corrupted.
This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collectio...

Read more...

Revision history for this message
Madolyn Sullivan (madolyns) wrote :
Download full text (38.9 KiB)

Any update on this. This is happening on all our databases. Happens every time it has to kill long running query(s).
Another example from today:

{code}
171214 17:09:48 [01] Compressing ./cloudpercept39/#sql-ib2248-2806336235.ibd to /mnt/restore/innobackupex/2017_12_14/incr_2017_12_14_17_00_01/cloudpercept39/#sql-ib2248-2806336235.ibd.delta.qp
171214 17:09:49 >> log scanned up to (6064103372944)
171214 17:09:50 >> log scanned up to (6064103658133)
171214 17:09:51 >> log scanned up to (6064103658133)
171214 17:09:52 >> log scanned up to (6064103658133)
171214 17:09:53 >> log scanned up to (6064103658133)
171214 17:09:54 >> log scanned up to (6064103659372)
171214 17:09:55 >> log scanned up to (6064103659372)
171214 17:09:56 >> log scanned up to (6064103660003)
171214 17:09:57 >> log scanned up to (6064103660003)
171214 17:09:58 >> log scanned up to (6064103660003)
171214 17:09:59 >> log scanned up to (6064103660003)
171214 17:10:00 >> log scanned up to (6064103660650)
171214 17:10:01 >> log scanned up to (6064103661273)
171214 17:10:02 >> log scanned up to (6064103661273)
171214 17:10:03 >> log scanned up to (6064103671412)
171214 17:10:04 >> log scanned up to (6064103694885)
171214 17:10:05 >> log scanned up to (6064103694885)
171214 17:10:06 [01] ...done
171214 17:10:06 [01] Compressing ./sys/sys_config.ibd to /mnt/restore/innobackupex/2017_12_14/incr_2017_12_14_17_00_01/sys/sys_config.ibd.delta.qp
171214 17:10:06 [01] ...done
171214 17:10:06 >> log scanned up to (6064103705029)
171214 17:10:07 Executing FLUSH TABLES WITH READ LOCK...
171214 17:10:07 Kill query timeout 900 seconds.
171214 17:10:07 >> log scanned up to (6064103706307)
171214 17:10:08 >> log scanned up to (6064103706307)
171214 17:10:09 >> log scanned up to (6064103706307)
171214 17:10:10 >> log scanned up to (6064103706307)
171214 17:10:11 >> log scanned up to (6064103706307)
171214 17:10:12 >> log scanned up to (6064103706307)
171214 17:10:13 >> log scanned up to (6064103706307)
171214 17:10:14 >> log scanned up to (6064103706317)
171214 17:10:15 >> log scanned up to (6064103706317)
171214 17:10:16 >> log scanned up to (6064103706317)
171214 17:10:17 >> log scanned up to (6064103706317)
171214 17:10:18 >> log scanned up to (6064103706317)
171214 17:10:19 >> log scanned up to (6064103706317)
171214 17:10:20 >> log scanned up to (6064103706317)
171214 17:10:21 >> log scanned up to (6064103706317)
171214 17:10:22 >> log scanned up to (6064103706317)
171214 17:10:23 >> log scanned up to (6064103706327)
171214 17:10:24 >> log scanned up to (6064103706327)
171214 17:10:25 >> log scanned up to (6064103706327)
171214 17:10:26 >> log scanned up to (6064103706327)
171214 17:10:27 >> log scanned up to (6064103706327)
171214 17:10:28 >> log scanned up to (6064103706327)
171214 17:10:29 >> log scanned up to (6064103706327)
171214 17:10:30 >> log scanned up to (6064103706327)
171214 17:10:31 >> log scanned up to (6064103706327)
171214 17:10:32 >> log scanned up to (6064103706327)
171214 17:10:33 >> log scanned up to (6064103706327)
171214 17:10:34 >> log scanned up to (6064103706327)
171214 17:10:35 >> log scanned up to (6064103706327)
171214 17:10:36 >> log scanned up to (6...

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Hi, still unable to reproduce with select queries on my test case, gave 900 seconds as you did:

171215 14:09:53 >> log scanned up to (53580790)
171215 14:09:53 Waiting 1 seconds for queries running longer than 900 seconds to finish
171215 14:09:53 Executing FLUSH TABLES WITH READ LOCK...
171215 14:09:53 Kill query timeout 900 seconds.
171215 14:09:54 >> log scanned up to (53580790)
171215 14:09:55 >> log scanned up to (53580790)
171215 14:09:56 >> log scanned up to (53580790)
171215 14:09:57 >> log scanned up to (53580790)
.
.
171215 14:24:53 Killing query 538 (duration 923 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest23
171215 14:24:53 Killing query 539 (duration 923 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest24
171215 14:24:53 Killing query 540 (duration 923 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest20
171215 14:24:53 Killing query 541 (duration 923 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest21
171215 14:24:53 Killing query 542 (duration 923 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest22
171215 14:24:53 Killing query 543 (duration 923 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest23
171215 14:24:53 Killing query 544 (duration 923 sec): select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest24
171215 14:24:53 Kill query thread stopped
171215 14:24:53 Starting to backup non-InnoDB tables and files

Looking to your output:

INSERT INTO `cloudtrail_track_files` (`aws_account_id`, `created_at`, `customer_id`, `fetched`, `gather_time`, `key_file`, `region_name`, `updated_at`) VALUES (1236950582317, '2017-12-14 17:10:08', 1032, 0, '2017-12-14 16:15:00', 'AWSLogs/015644606986/CloudTrail/us-east-2/2017/12/14/015644606986_CloudTrail_us-east-2_20171214T1615Z_1B0pP9ZvAwsXqEZJ.json.gz', 'us-east-2', '2017-12-14 17:10:08')
171214 17:25:10 >> log scanned up to (6070245752182)
17:25:10 UTC - xtrabackup got signal 11 ;

Could you please paste the `cloudtrail_track_files` table structure? I will try with insert statements as well.

Revision history for this message
Madolyn Sullivan (madolyns) wrote :

it seems like you are running a different command than I am running? I'm not getting this message:
Waiting 1 seconds for queries running longer than 900 seconds to finish
What command are you running?

I also see this crash when the kill query returns:
Error: failed to execute query KILL 7779129: Unknown thread id: 7779129

It had already issued this kill command, perhaps the transaction was rolling back, and it tried to kill it again?

Does the rest of the info i sent with this ticket help?
09:26:43 UTC - xtrabackup got signal 11 ;
This could be because you hit a bug or data is corrupted.
This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x10000
innobackupex(my_print_stacktrace+0x2c)[0xbe1bec]
innobackupex(handle_fatal_signal+0x262)[0xa24dc2]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f54e3e5e330]
/lib/x86_64-linux-gnu/libc.so.6(+0x3d467)[0x7f54e1d82467]
innobackupex[0x738d8c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7f54e3e56184]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f54e1e42ffd]

Here is the table structure

`cloudtrail_track_files` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `customer_id` int(11) NOT NULL,
  `aws_account_id` bigint(20) NOT NULL,
  `region_name` varchar(255) NOT NULL,
  `gather_time` datetime NOT NULL,
  `fetched` tinyint(1) DEFAULT NULL,
  `key_file` varchar(255) NOT NULL,
  `created_at` datetime NOT NULL,
  `updated_at` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `by_all_unique` (`aws_account_id`,`gather_time`,`region_name`),
  KEY `index_cloudtrail_track_files_on_customer_id` (`customer_id`),
  KEY `index_cloudtrail_track_files_on_fetched_and_aws_account_id` (`fetched`,`aws_account_id`),
  KEY `processing_cloudtrail_idx` (`aws_account_id`,`fetched`,`created_at`)
) ENGINE=InnoDB AUTO_INCREMENT=3779663418859 DEFAULT CHARSET=utf8

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Now it looks like yours, but still no success to reproduce:

171218 12:14:40 Executing FLUSH TABLES WITH READ LOCK...
171218 12:14:40 Kill query timeout 900 seconds.
171218 12:14:40 >> log scanned up to (53462105)
171218 12:14:41 >> log scanned up to (53462210)
.
.
171218 14:33:41 Killing query 939 (duration 1030 sec): insert into sysbench_test_db.query_test(c) select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest3
171218 14:33:41 Killing query 945 (duration 1030 sec): insert into sysbench_test_db.query_test(c) select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest5
171218 14:33:41 Killing query 970 (duration 1029 sec): insert into sysbench_test_db.query_test(c) select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest19
171218 14:33:41 Killing query 982 (duration 1029 sec): insert into sysbench_test_db.query_test(c) select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest5
171218 14:33:41 Killing query 984 (duration 1029 sec): insert into sysbench_test_db.query_test(c) select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest21
171218 14:33:41 Killing query 812 (duration 1035 sec): insert into sysbench_test_db.query_test(c) select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest8
171218 14:33:41 Killing query 843 (duration 1033 sec): insert into sysbench_test_db.query_test(c) select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest19
171218 14:33:41 Killing query 893 (duration 1031 sec): insert into sysbench_test_db.query_test(c) select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest18
171218 14:33:41 Killing query 930 (duration 1030 sec): insert into sysbench_test_db.query_test(c) select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest23
Error: failed to execute query KILL 930: Unknown thread id: 930
171218 14:33:42 Killing query 939 (duration 1030 sec): insert into sysbench_test_db.query_test(c) select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest3
Error: failed to execute query KILL 939: Unknown thread id: 939
171218 14:33:42 Killing query 945 (duration 1030 sec): insert into sysbench_test_db.query_test(c) select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest5
Error: failed to execute query KILL 945: Unknown thread id: 945
171218 14:33:42 Killing query 970 (duration 1029 sec): insert into sysbench_test_db.query_test(c) select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest19
Error: failed to execute query KILL 970: Unknown thread id: 970
171218 14:33:42 Killing query 982 (duration 1029 sec): insert into sysbench_test_db.query_test(c) select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest5
Error: failed to execute query KILL 982: Unknown thread id: 982
171218 14:33:42 Killing query 984 (duration 1029 sec): insert into sysbench_test_db.query_test(c) select benchmark(9999999, md5(c)) from sysbench_test_db.sbtest21
Error: failed to execute query KILL 984: Unknown thread id: 984
171218 14:33:42 Kill query thread stopped
171218 14:33:42 Starting to backup non-InnoDB tables and files

Revision history for this message
Madolyn Sullivan (madolyns) wrote :

And the info i provided:

  This could be because you hit a bug or data is corrupted.
  This error can also be caused by malfunctioning hardware.
  Attempting to collect some information that could help diagnose the problem.
  As this is a crash and something is definitely wrong, the information
  collection process might fail.

  Thread pointer: 0x0
  Attempting backtrace. You can use the following information to find out
  where mysqld died. If you see no messages after this, something went
  terribly wrong...
  stack_bottom = 0 thread_stack 0x10000
  innobackupex(my_print_stacktrace+0x2c)[0xbe1bec]
  innobackupex(handle_fatal_signal+0x262)[0xa24dc2]
  /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f54e3e5e330]
  /lib/x86_64-linux-gnu/libc.so.6(+0x3d467)[0x7f54e1d82467]
  innobackupex[0x738d8c]
  /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7f54e3e56184]
  /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f54e1e42ffd]

is not helping to troubleshoot this issue?

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Actually not.

What you can do to further dig into the problem(so this is general procedure).

Basically you have 2 choices here:
* To build xtrabackup from source with DEBUG which is described below:
Read first this DOC, to install dependency packages prior executing commands below:
https://www.percona.com/doc/percona-xtrabackup/LATEST/installation/compiling_xtrabackup.html

cd to some folder.
git clone https://github.com/percona/percona-xtrabackup -b 2.4
cd percona-xtrabackup
rm -rf target
mkdir -p target

cmake -DCMAKE_INSTALL_PREFIX=./target -DWITH_DEBUG=1 -DWITH_SSL=system -DWITH_MAN_PAGES=OFF -DDOWNLOAD_BOOST=1 -DWITH_BOOST=/tmp/boost_23/ && make -j2
make install
mkdir ./target/percona-xtrabackup-2.4.x-debug
mv ./target/bin ./target/xtrabackup-test ./target/percona-xtrabackup-2.4.x-debug/
tar -zcf ../percona-xtrabackup-2.4.x-debug.tar.gz ./target/percona-xtrabackup-2.4.x-debug

# After doing this you will have percona-xtrabackup-2.4.x-debug.tar.gz archive which you can extract and use in any folder.
# After extracting, you can find innobackupex binary inside extracted folder.

Say, you have build and extracted now just replace new binary path with old one in your backup script and append --core-file option to get core dump.

* Or try to install debuginfo package of xtrabackup to get debug symbols. For Centos:
yum install percona-xtrabackup-24-debuginfo

At this time just append --core-file option and try to get core dump.

If you could get the coredump file, open it with gdb, syntax is something like this:
$ gdb full_path_to/innobackupex core.878273_file

You will be in debugger, then type:
bt # paste the output as comment here
and
bt full # paste the output as separate comment here

I hope those outputs will give in which line of the code the crash happened.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXB-1486

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.