Master database crashes apparently triggered by network outage

Bug #1620902 reported by David Turner
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
New
Undecided
Unassigned

Bug Description

Network issues between 2 datacenters triggered an apparent mysql bug that made a large number of our mysql masters crash. This affected a number of instances.

Linux <HOSTNAME> 3.18.27-031827-generic #201602160131 SMP Wed Feb 17 01:07:24 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

The crash happened on a number of different versions of MySQL

5.6.21-70.1
5.6.28-76.1
5.6.29-76.2
5.6.30-76.3
5.6.31-77.0

The following is relevant output from an error log

2016-09-02 09:27:13 12003 [Warning] Aborted connection 32826055 to db: 'unconnected' user: 'repl' host: 'XXX.XXX.XXX.XXX' (Failed on my_net_write())
2016-09-02 09:27:23 12003 [Note] Start binlog_dump to master_thread_id(32826532) slave_server(551830430), pos(, 4)
2016-09-02 09:27:36 12003 [Warning] Aborted connection 32826292 to db: 'unconnected' user: 'repl' host: 'XXX.XXX.XXX.XXX' (Got an error reading communication packets)
09:27:40 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=16777216
read_buffer_size=131072
max_used_connections=6832
max_threads=16386
thread_count=6709
connection_count=6709
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6535565 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7ec9a6a3b000
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7ee802b6be40 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0x8e7e3c]
/usr/sbin/mysqld(handle_fatal_signal+0x461)[0x66c631]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7effb040ecb0]
/usr/sbin/mysqld[0x1329000]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7ec9fd87e010): is an invalid pointer
Connection ID (thread ID): 32826829
Status: NOT_KILLED

You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.
160902 09:27:43 mysqld_safe Number of processes running now: 0
160902 09:27:43 mysqld_safe mysqld restarted

=======
My.cnf
=======

#
# The MySQL database server configuration file.
#
# You can copy this to one of:
# - "/etc/mysql/my.cnf" to set global options,
# - "~/.my.cnf" to set user-specific options.
#
# One can use all long options that the program supports.
# Run program with --help to get a list of available options and with
# --print-defaults to see which it would actually understand and use.
#
# For explanations see
# http://dev.mysql.com/doc/mysql/en/server-system-variables.html

# This will be passed to all mysql clients
# It has been reported that passwords should be enclosed with ticks/quotes
# escpecially if they contain "#" chars...
# Remember to edit /etc/mysql/debian.cnf when changing the socket location.
[client]
port = 3306
socket = /var/run/mysqld/mysqld.sock
# Here is entries for some specific programs
# The following values assume you have at least 32M ram

# This was formally known as [safe_mysqld]. Both versions are currently parsed.
[mysqld_safe]

socket = /var/run/mysqld/mysqld.sock
nice = 0
numa_interleave = 1
flush_caches = 1
open_files_limit = 65536

[mysqld]
#
# * Basic Settings
#
user = mysql
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
port = 3306
basedir = /usr
datadir = /var/lib/mysql
tmpdir = /tmp
lc-messages-dir = /usr/share/mysql
default-time-zone = '+0:00'
skip-external-locking
# don't do dns or something
skip-name-resolve

performance_schema = 0
# it'd be cool to set this to only listen in 127.0.0.1 and 10.x.y.z but not
# the public IP. I don't know if bind-address accepts multiple values,
# though
bind-address = 0.0.0.0
#
# * Fine Tuning
#
# These are the Debian defaults and probably need to be tuned
key_buffer = 16M
max_allowed_packet = 128M
thread_stack = 192K
thread_cache_size = 8
# This replaces the startup script and checks MyISAM tables if needed
# the first time they are touched
myisam-recover = BACKUP
max_connections = 16384

max_user_connections = 16374
#table_cache = 64
#thread_concurrency = 10
#
# * Query Cache Configuration
#
query_cache_limit = 1M
query_cache_size = 16M
#
# * Logging and Replication
#
# Both location gets rotated by the cronjob.
# Be aware that this log type is a performance killer.
# As of 5.1 you can enable the log at runtime!
#general_log_file = /var/log/mysql/mysql.log
#general_log = 1
#
# Error log - should be very few entries.
#
log-warnings = 2
log_error = /var/log/mysql/error.log
#
# Here you can see queries with especially long duration
slow_query_log = 1
slow_query_log_file = /var/log/mysql/mysql-slow.log
long_query_time = 10
#log-queries-not-using-indexes
log_slow_verbosity = microtime,innodb
slow_query_log_use_global_control = all
#
# The following can be used as easy to replay backup logs or for replication.
# note: if you are setting up a replication slave, see README.Debian about
# other settings you may need to change.
server-id = 35791131
report-host = schemalessdb473-pek1

log-bin = /var/lib/mysql/log/mysql-bin.log

auto_increment_increment = 2
auto_increment_offset = 2
enforce-gtid-consistency
gtid-mode = ON
# force an fsync every statement (trading performance to avoid corruption)
sync_binlog = 1
log-slave-updates
expire_logs_days = 5
slave-net-time = 30
max_binlog_size = 1G
binlog_format = MIXED
table_definition_cache = 20000
table_open_cache_instances = 64
lock_wait_timeout = 300

relay_log_info_repository = TABLE
relay_log_recovery = ON

default-storage-engine = innodb

#binlog_do_db = include_database_name
#binlog_ignore_db = include_database_name
#
# * InnoDB
#
# InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/.
# Read the manual for more InnoDB related options. There are many!
innodb_buffer_pool_size = 94553705021
innodb_flush_method = O_DIRECT
innodb_file_per_table
innodb_log_file_size = 512M
innodb_file_format = ANTELOPE
innodb_flush_log_at_trx_commit = 1
innodb_kill_idle_transaction = 0
#
# * Security Features
#
# Read the manual, too, if you want chroot!
# chroot = /var/lib/mysql/
#
# For generating SSL certificates I recommend the OpenSSL GUI "tinyca".
#
# ssl-ca=/etc/mysql/cacert.pem
# ssl-cert=/etc/mysql/server-cert.pem
# ssl-key=/etc/mysql/server-key.pem

character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci

[mysqldump]
quick
quote-names
max_allowed_packet = 16M
single-transaction

[mysql]
#no-auto-rehash # faster start of mysql but no tab completition

[isamchk]
key_buffer = 16M

#
# * IMPORTANT: Additional settings that can override those from this file!
# The files must end with '.cnf', otherwise they'll be ignored.
#
!includedir /etc/mysql/conf.d/

# vim: set syntax=conf:

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

Here are relevant logs from another MySQL server with the crash:

2016-09-02 08:40:55 58538 [Note] While initializing dump thread for slave with UUID <xxx>, found a zombie dump thread with the same UUID. Master is killing the zombie dump thread(1506741).
2016-09-02 08:40:55 58538 [Note] Start binlog_dump to master_thread_id(8749347) slave_server(41716794), pos(, 4)
2016-09-02 08:41:35 58538 [Warning] Aborted connection 8749347 to db: 'unconnected' user: 'repl' host: 'X.X.X.X' (Failed on my_net_write())
2016-09-02 08:41:40 58538 [Note] While initializing dump thread for slave with UUID <xxx>, found a zombie dump thread with the same UUID. Master is killing the zombie dump thread(1506741).
2016-09-02 08:41:40 58538 [Note] Start binlog_dump to master_thread_id(8749361) slave_server(41716794), pos(, 4)
2016-09-02 08:41:50 58538 [Warning] Aborted connection 1506741 to db: 'unconnected' user: 'repl' host: 'X.X.X.X' (failed on net_flush())
2016-09-02 08:41:54 58538 [Note] While initializing dump thread for slave with UUID <xxx>, found a zombie dump thread with the same UUID. Master is killing the zombie dump thread(8749333).
2016-09-02 08:41:54 58538 [Note] Start binlog_dump to master_thread_id(8749365) slave_server(409262272), pos(, 4)
2016-09-02 08:42:04 58538 [Warning] Aborted connection 8749333 to db: 'unconnected' user: 'repl' host: 'X.X.X.X' (Failed on my_net_write())
2016-09-02 08:43:14 58538 [Warning] Aborted connection 8749365 to db: 'unconnected' user: 'repl' host: 'X.X.X.X' (Failed on my_net_write())
2016-09-02 08:43:29 58538 [Warning] Aborted connection 8749361 to db: 'unconnected' user: 'repl' host: 'X.X.X.X' (Failed on my_net_write())
2016-09-02 08:43:33 58538 [Warning] Aborted connection 8749377 to db: 'unconnected' user: 'repl' host: 'X.X.X.X' (Got an error reading communication packets)
08:43:53 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=16777216
read_buffer_size=131072
max_used_connections=830
max_threads=4098
thread_count=811
connection_count=811
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1816973 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fa07171e000
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fa13ec5be10 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0x8e66dc]
/usr/sbin/mysqld(handle_fatal_signal+0x461)[0x66bcb1]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7fb37fdfdcb0]
/usr/sbin/mysqld[0x1320e40]

description: updated
description: updated
Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

We don't have coredumps right now.

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

The address in the stack trace is probably not correct because it did not resolve to the correct symbol name

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-3550

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.