mysqld got signal 6 at rally job

Bug #1438516 reported by Alexander Nevenchannyy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
Medium
Pavel Boldin
5.1.x
Won't Fix
Medium
Pavel Boldin
6.0.x
Won't Fix
Medium
Pavel Boldin
7.0.x
Fix Committed
Medium
Pavel Boldin

Bug Description

api: '1.0'
astute_sha: ed76b0cacf34a4a683b464ebd86e0beb273b5473
auth_required: true
build_id: 2015-03-12_22-54-44
build_number: '192'
feature_groups:
- mirantis
fuellib_sha: fda8128b9ca7a8ce818421040f597a50eece8078
fuelmain_sha: 3764b8a73b3a93fd7ee66937ba4c4c77da409b78
nailgun_sha: c186f71158ed27b03d8db87561ea66c19e39b452
ostf_sha: ecb8e294b0acbdc5b0300d5e39028fb26ecc9088
production: docker
python-fuelclient_sha: 59513d6b75f86060ff5059f39fdd9cca56c83f19
release: '6.1'
release_versions:
  2014.2-6.1:
    VERSION:
      api: '1.0'
      astute_sha: ed76b0cacf34a4a683b464ebd86e0beb273b5473
      build_id: 2015-03-12_22-54-44
      build_number: '192'
      feature_groups:
      - mirantis
      fuellib_sha: fda8128b9ca7a8ce818421040f597a50eece8078
      fuelmain_sha: 3764b8a73b3a93fd7ee66937ba4c4c77da409b78
      nailgun_sha: c186f71158ed27b03d8db87561ea66c19e39b452
      ostf_sha: ecb8e294b0acbdc5b0300d5e39028fb26ecc9088
      production: docker
      python-fuelclient_sha: 59513d6b75f86060ff5059f39fdd9cca56c83f19
      release: '6.1'

Deploy was following confguration:
Baremetal, Ubuntu, HA, Neutron-gre,Ceph-all, Debug, nova-quotas, 6.1-192
Controllers:3 Computes:45

Rally job was: create-from-volume-and-delete-volume

Rally error:
2015-03-30 05:15:59.750 2794 DEBUG keystoneclient.auth.identity.v2 [-] Making authentication request to http://172.16.45.101:5000/v2.0/tokens get_auth_ref /opt/stack/.venv/lib/python2.7/site-packages/keystoneclient/auth/identity/v2.py:76
2015-03-30 05:16:07.348 2794 DEBUG keystoneclient.session [-] Request returned failure status: 500 request /opt/stack/.venv/lib/python2.7/site-packages/keystoneclient/session.py:396
2015-03-30 05:16:07.394 2794 ERROR rally.benchmark.runners.base [-] Authorization Failed: An unexpected error prevented the server from fulfilling your request: (OperationalError) (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") None None (Disable debug mode to suppress these details.) (HTTP 500)
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base Traceback (most recent call last):
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/rally/benchmark/runners/base.py", line 79, in _run_scenario_once
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base method_name)(**kwargs) or scenario_output
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/rally/benchmark/scenarios/cinder/volumes.py", line 172, in create_from_volume_and_delete_volume
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base **kwargs)
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/rally/benchmark/scenarios/base.py", line 255, in func_atomic_actions
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base f = func(self, *args, **kwargs)
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/rally/benchmark/scenarios/cinder/utils.py", line 81, in _create_volume
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base volume = self.clients("cinder").volumes.create(size, **kwargs)
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/rally/benchmark/scenarios/base.py", line 195, in clients
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base return getattr(self._clients, client_type)()
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/rally/osclients.py", line 69, in wrapper
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base self.cache[key] = func(self, *args, **kwargs)
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/rally/osclients.py", line 215, in cinder
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base kc = self.keystone()
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/rally/osclients.py", line 69, in wrapper
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base self.cache[key] = func(self, *args, **kwargs)
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/rally/osclients.py", line 118, in keystone
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base client = create_keystone_client(kw)
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/rally/osclients.py", line 80, in create_keystone_client
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base return keystone_v2.Client(**args)
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/keystoneclient/v2_0/client.py", line 152, in __init__
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base self.authenticate()
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/keystoneclient/utils.py", line 318, in inner
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base return func(*args, **kwargs)
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/keystoneclient/httpclient.py", line 503, in authenticate
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base resp = self.get_raw_token_from_identity_service(**kwargs)
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/keystoneclient/v2_0/client.py", line 196, in get_raw_token_from_identity_service
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base _("Authorization Failed: %s") % e)
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base AuthorizationFailure: Authorization Failed: An unexpected error prevented the server from fulfilling your request: (OperationalError) (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") None None (Disable debug mode to suppress these details.) (HTTP 500)
2015-03-30 05:16:07.394 2794 TRACE rally.benchmark.runners.base
2015-03-30 05:16:07.397 2794 INFO rally.benchmark.runners.base [-] Task ffeb1f5f-ca66-4243-bf79-0888b3abb85f | ITER: 380 END: Error AuthorizationFailure: Authorization Failed: An unexpected error prevented the server from fulfilling your request: (OperationalError) (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") None None (Disable debug mode to suppress these details.) (HTTP 500)

In mysqld logs at active controller we see:
05:15:54 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

key_buffer_size=67108864
read_buffer_size=131072
max_used_connections=887
max_threads=4096
thread_count=886
connection_count=886
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1696288 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fd178ce8330
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fcf26c16e00 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0x7fd16b9c0e1c]
/usr/sbin/mysqld(handle_fatal_signal+0x3d2)[0x7fd16b719912]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7fd16a419340]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39)[0x7fd169a70cc9]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7fd169a740d8]
/usr/sbin/mysqld(+0x7dba78)[0x7fd16ba89a78]
/usr/sbin/mysqld(+0x7df8d1)[0x7fd16ba8d8d1]
/usr/sbin/mysqld(+0x7e07fb)[0x7fd16ba8e7fb]
/usr/sbin/mysqld(+0x7c4797)[0x7fd16ba72797]
/usr/sbin/mysqld(+0x7375bf)[0x7fd16b9e55bf]
/usr/sbin/mysqld(_ZN7handler13ha_delete_rowEPKh+0xb1)[0x7fd16b64f351]
/usr/sbin/mysqld(_Z12mysql_deleteP3THDP10TABLE_LISTP4ItemP10SQL_I_ListI8st_orderEyy+0xe5a)[0x7fd16b8f6cca]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x1ae1)[0x7fd16b7a4f91]
/usr/sbin/mysqld(+0x4fd710)[0x7fd16b7ab710]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x2742)[0x7fd16b7ae342]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x124)[0x7fd16b7aef34]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x2d5)[0x7fd16b771a85]
/usr/sbin/mysqld(handle_one_connection+0x40)[0x7fd16b771af0]
/usr/sbin/mysqld(pfs_spawn_thread+0x140)[0x7fd16bbb6a40]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7fd16a411182]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fd169b3447d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fcf30634410): is an invalid pointer
Connection ID (thread ID): 40368
Status: NOT_KILLED

Tags: scale
Revision history for this message
Alexander Nevenchannyy (anevenchannyy) wrote :

root@node-1:/var/log/mysql# dpkg -l | grep mysql
ii libdbd-mysql-perl 4.025-1 amd64 Perl5 database interface to the MySQL database
ii libmysqlclient18:amd64 5.5.41-0ubuntu0.14.04.1 amd64 MySQL database client library
ii mysql-client-5.6 5.6.16-2~mos6.1+1 amd64 MySQL database client binaries
ii mysql-client-core-5.6 5.6.16-2~mos6.1+1 amd64 MySQL database core client binaries
ii mysql-common 5.5.41-0ubuntu0.14.04.1 all MySQL database common files, e.g. /etc/mysql/my.cnf
ii mysql-server-wsrep-5.6 5.6.16-2~mos6.1+1 amd64 MySQL database server binaries and system database setup
ii mysql-server-wsrep-core-5.6 5.6.16-2~mos6.1+1 amd64 MySQL database server binaries
ii mysql-wsrep-common-5.6 5.6.16-2~mos6.1+1 all MySQL 5.6 specific common files, e.g. /etc/mysql/conf.d/my-5.6.cnf
ii python-mysqldb 1.2.5-1~mos6.1 amd64 Python interface to MySQL

Dina Belova (dbelova)
Changed in fuel:
status: New → Confirmed
Revision history for this message
Pavel Boldin (pboldin) wrote :

Analysis is:
1. The `handler::ha_delete_row' method calls virtual method 'delete_row' that is being overloaded by the storage class implementation such as e.g. `ha_myisam'
2. The traces without resolved name is therefore most likely an implementation methods contained in a shared object without debuginfo.

Questions:
* Is `mysql' starting after that? Is there any binary file corruption?
* What about disk space left?
* Where can I get the exact `mysqld' package content, preferable with debuginfo?
* Please show me the `/proc/<mysql-server pid>/maps' file content.

Revision history for this message
Alexander Nevenchannyy (anevenchannyy) wrote :

Atteched mysql maps

Changed in fuel:
assignee: MOS Linux (mos-linux) → Pavel Boldin (pboldin)
Revision history for this message
Pavel Boldin (pboldin) wrote :

Well, the maps seems to be useless, because the kernel loaded MySQL on a different base this time (it is position-independent code).

However, we can try to find the code using relative offsets. I will try to do this.

But first, please answer the above questions plus the following one:
1. Was this `maps' taken from exact the same host where fault occured?

Revision history for this message
Alexander Nevenchannyy (anevenchannyy) wrote :

Pavel, yes this is host where we have problem at March 30 at 5 AM, so maps in attach are have offset more then 30 hours and i't was taken from new mysqld process, that was started automatically.

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

> thread_count=886
> connection_count=886

MySQL is not designed for such a load. It spawns a thread per connection (which is does not scale at all,
For one even allocating an initial stack for every thread (on x86_64) requires 8MB*886 ~ 7GB of RAM.
Using the thread pool plugin [1] (which queues several connections to one thread) can mitigate the problem
(please note: the thread pool plugin is available only in MariaDB and commercial version of MySQL).
However to scale properly OpenStack needs a different storage (something which can scale horizontally,
and most likely this "something is not relational database at all).

[1] https://mariadb.com/kb/en/mariadb/thread-pool-in-mariadb

Revision history for this message
Dina Belova (dbelova) wrote :

Alexei, so what will be the recommendations for the 6.1 release? I mean I'm not very sure that workaround you've proposed might be applied for 6.1 release

Revision history for this message
Dina Belova (dbelova) wrote :

Also please move the bug to the IN PROGRESS as it looks like you're actively working on it, folks

Dina Belova (dbelova)
Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Pavel Boldin (pboldin) wrote :

The stacktraces from this look familiar.
https://bugs.launchpad.net/codership-mysql/+bug/1264809

Revision history for this message
Pavel Boldin (pboldin) wrote :

This looks like a bug inside the InnoDB implementation. More details to follow.

Revision history for this message
Pavel Boldin (pboldin) wrote :

Moving to 7.0 since it is a hard to debug and is not affecting our system badly.

Changed in fuel:
milestone: 6.1 → 7.0
Revision history for this message
Pavel Boldin (pboldin) wrote :

So, the function at 0x7db5a0 that calls abort from 0x7dba78 is absolutely matched by the code with function '_ZL39wsrep_row_upd_check_foreign_constraintsP10upd_node_tP10btr_pcur_tP12dict_table_tP12dict_index_tPmP9que_thr_tP5mtr_t.isra.17.part.18', one can ensure from the parts of the attached files.

Revision history for this message
Pavel Boldin (pboldin) wrote :

One item upper in the stack is inside the `row_upd_clust_rec_by_insert' function and is at the line 2280 of the file `storage/innobase/row/row0upd.cc' in our version of the code.

Revision history for this message
Pavel Boldin (pboldin) wrote :

This is the assumption that is failing:

                                 ut_a(foreign->referenced_table
  7e7f9f: 49 8b 47 48 mov 0x48(%r15),%rax
  7e7fa3: 48 8b 90 b8 00 00 00 mov 0xb8(%rax),%rdx
  7e7faa: 48 85 d2 test %rdx,%rdx
  7e7fad: 0f 84 c9 01 00 00 je 7e817c <_ZL39wsrep_row_upd_check_foreign_constraintsP10upd_node_tP10btr_pcur_tP12dict_table_tP12dict_index_tPmP9que_thr_tP5mtr_t.isra.17.part.18+0x4bc>
                                     ->n_foreign_key_checks_running > 0);

Revision history for this message
Pavel Boldin (pboldin) wrote :

OK, so the problem is the code re-inventing the `os_(inc|dec)_counter' calls, that is writing:

                        if (foreign->referenced_table) {
                                mutex_enter(&(dict_sys->mutex));

                                (foreign->referenced_table
                                 ->n_foreign_key_checks_running)++;

                                mutex_exit(&(dict_sys->mutex));
                        }

instead of:

                        if (foreign->referenced_table) {
                                os_inc_counter(dict_sys->mutex,
                                               foreign->referenced_table
                                               ->n_foreign_key_checks_running);
                        }

The difference is the latter providing an atomic update while first is not.

Furthermore, `os_inc_counter' is implemented as an atomic in our version (HAVE_ATOMIC_BUILTINS is defined) and is not using `mutex'es at all.

The proposed fix is to use `os_inc_counter'/`os_dec_counter' pair instead.

Revision history for this message
Pavel Boldin (pboldin) wrote :
Revision history for this message
Pavel Boldin (pboldin) wrote :

The solution is to update the package with the latest WSREP patch.

Revision history for this message
Pavel Boldin (pboldin) wrote :

Here is the implementation in the MySQL: https://mariadb.com/kb/en/mariadb/what-is-mariadb-galera-cluster/.

So we can update to the latest MariaDB as well.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to packages/precise/MySQL-wsrep (5.1.1)

Fix proposed to branch: 5.1.1
Change author: Pavel Boldin <email address hidden>
Review: https://review.fuel-infra.org/5643

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to packages/trusty/mysql-wsrep-5.6 (6.1)

Fix proposed to branch: 6.1
Change author: Pavel Boldin <email address hidden>
Review: https://review.fuel-infra.org/5644

Revision history for this message
OSCI Robot (oscirobot) wrote :

DEB package MySQL-wsrep has been built for project packages/precise/MySQL-wsrep
Package version == 5.6.23, package release == ubuntu5

Changeset: https://review.fuel-infra.org/5808
project: packages/precise/MySQL-wsrep
branch: 5.1.1-updates
author: Pavel Boldin
committer: Pavel Boldin
subject: Fix mysql abort updating to 5.6.23+wsrep-25.10
status: patchset-created

Files placed on repository:
mysql-client-5.6_5.6.23-ubuntu5_amd64.deb
mysql-client-core-5.6_5.6.23-ubuntu5_amd64.deb
mysql-server-wsrep-5.6_5.6.23-ubuntu5_amd64.deb
mysql-server-wsrep-core-5.6_5.6.23-ubuntu5_amd64.deb
mysql-source-5.6_5.6.23-ubuntu5_amd64.deb
mysql-testsuite-5.6_5.6.23-ubuntu5_amd64.deb
mysql-wsrep-common-5.6_5.6.23-ubuntu5_all.deb

NOTE: Changeset is not merged, created temporary package repository.
DEB repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1.1-updates-stable-5808/ubuntu

Revision history for this message
OSCI Robot (oscirobot) wrote :

DEB package MySQL-wsrep has been built for project packages/precise/MySQL-wsrep
Package version == 5.6.23, package release == ubuntu5

Changeset: https://review.fuel-infra.org/5810
project: packages/precise/MySQL-wsrep
branch: 6.0-updates
author: Pavel Boldin
committer: Pavel Boldin
subject: Fix mysql abort updating to 5.6.23+wsrep-25.10
status: patchset-created

Files placed on repository:
mysql-client-5.6_5.6.23-ubuntu5_amd64.deb
mysql-client-core-5.6_5.6.23-ubuntu5_amd64.deb
mysql-server-wsrep-5.6_5.6.23-ubuntu5_amd64.deb
mysql-server-wsrep-core-5.6_5.6.23-ubuntu5_amd64.deb
mysql-source-5.6_5.6.23-ubuntu5_amd64.deb
mysql-testsuite-5.6_5.6.23-ubuntu5_amd64.deb
mysql-wsrep-common-5.6_5.6.23-ubuntu5_all.deb

NOTE: Changeset is not merged, created temporary package repository.
DEB repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-6.0-updates-stable-5810/ubuntu

Pavel Boldin (pboldin)
Changed in fuel:
status: In Progress → Fix Committed
status: Fix Committed → In Progress
Changed in fuel:
milestone: 7.0 → 6.1
status: In Progress → New
Changed in fuel:
status: New → Confirmed
Revision history for this message
Dina Belova (dbelova) wrote :

@Pavel, any news about this bug?

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Lowering to medium and moving to the 6.1.x as this is "not affecting our system badly".

Changed in fuel:
importance: High → Medium
milestone: 6.1 → 6.1.1
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on packages/trusty/mysql-wsrep-5.6 (6.1)

Change abandoned by Pavel Boldin <email address hidden> on branch: 6.1
Review: https://review.fuel-infra.org/5644

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Related fix proposed to packages/trusty/mysql-wsrep-5.6 (7.0)

Related fix proposed to branch: 7.0
Change author: Pavel Boldin <email address hidden>
Review: https://review.fuel-infra.org/7712

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Related fix merged to packages/trusty/mysql-wsrep-5.6 (7.0)

Reviewed: https://review.fuel-infra.org/7712
Submitter: Alexei Sheplyakov <email address hidden>
Branch: 7.0

Commit: 6cb2f2488836da5288a0e7d0f46573caae1e0566
Author: Pavel Boldin <email address hidden>
Date: Wed Jun 10 14:40:46 2015

Update to MySQL 5.6.23 and wsrep 25.10

Amongst other things this fixes the (Heisen)bug caused by mixing mutexes
and atomic operations to manage a variable in a thread safe manner.

The wsrep patch was created by diffing original mysql-5.6.23 against [1]
provided by galeracluster and fixing conflicts with existing patches.

[1] http://releases.galeracluster.com/source/mysql-wsrep-5.6.23-25.10.tar.gz

Related-Bug: #1462998
Related-Bug: #1438516
Change-Id: I3eff2760f8e0696a9798f9d2e99bfe78e8a6e314

Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Won't Fix for 6.0-updates and 5.1.1-updates because of Medium importance

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.