mysqld crashed with "got signal 11"

Bug #1505118 reported by ksp
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.6
New
Undecided
Przemek

Bug Description

Debian7
Server version: 5.6.25-73.1-56-log Percona XtraDB Cluster (GPL), Release rel73.1, Revision 011f1e6, WSREP version 25.12, wsrep_25.12

Sometimes mysqld crashes. I have few stacktraces.

(my_print_stacktrace+0x2e)[0x901dae]
(handle_fatal_signal+0x491)[0x6c1d01]
(ull_get_key(unsigned char const*, unsigned long*, char)+0x14)[0x666614]
(my_hash_first_from_hash_value+0x77)[0x8e8db7]
(my_hash_search+0x11)[0x8e8f51]
(Item_func_release_lock::val_int()+0xb9)[0x66b719]
(Item::send(Protocol*, String*)+0x15c)[0x62a86c]
(Protocol::send_result_set_row(List<Item>*)+0xa9)[0x6bcc39]
(select_send::send_data(List<Item>&)+0x67)[0x708b67]
(JOIN::exec()+0x4d8)[0x717028]
(mysql_select(THD*, TABLE_LIST*, unsigned int, List<Item>&, Item*, SQL_I_List<st_order>*, SQL_I_List<st_order>*, Item*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*)+0x265)[0x759fd5]
(handle_select(THD*, select_result*, unsigned long)+0x17d)[0x75a28d][0x5d62f3]
(mysql_execute_command(THD*)+0xc95)[0x735be5]
(Prepared_statement::execute(String*, bool)+0x345)[0x74bb05][0x74bc48]
(mysql_sql_stmt_execute(THD*)+0xba)[0x74c79a]
(mysql_execute_command(THD*)+0x17de)[0x73672e]
(mysql_parse(THD*, char*, unsigned int, Parser_state*)+0x598)[0x73b688][0x73b789]
(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x1163)[0x73cf63]
(do_command(THD*)+0x1e9)[0x73dc49]
(do_handle_one_connection(THD*)+0x167)[0x70efa7]
(handle_one_connection+0x47)[0x70f117]
(pfs_spawn_thread+0x143)[0xb5af53]

(my_print_stacktrace+0x2e)[0x901dae]
(handle_fatal_signal+0x491)[0x6c1d01]
(MDL_lock::Ticket_list::remove_ticket(MDL_ticket*)+0x11)[0x6b5241]
(MDL_lock::remove_ticket(MDL_lock::Ticket_list MDL_lock::*, MDL_ticket*)+0x48)[0x6b5df8]
(MDL_context::release_lock(enum_mdl_duration, MDL_ticket*)+0x1a)[0x6b696a]
(mysql_ull_cleanup(THD*)+0x49)[0x671049]
(THD::cleanup()+0xd0)[0x70ac90]
(THD::release_resources()+0x148)[0x70ae28]
(one_thread_per_connection_end(THD*, bool)+0x1d)[0x600c1d]
(do_handle_one_connection(THD*)+0xee)[0x70ef2e]
(handle_one_connection+0x47)[0x70f117]
(pfs_spawn_thread+0x143)[0xb5af53]

(my_print_stacktrace+0x2e)[0x901dae]
(handle_fatal_signal+0x491)[0x6c1d01]
(MDL_ticket::destroy(MDL_ticket*)+0xd)[0x6b4d0d]
(mysql_ull_cleanup(THD*)+0x49)[0x671049]
(THD::cleanup()+0xd0)[0x70ac90]
(THD::release_resources()+0x148)[0x70ae28]
(one_thread_per_connection_end(THD*, bool)+0x1d)[0x600c1d]
(do_handle_one_connection(THD*)+0xee)[0x70ef2e]
(handle_one_connection+0x47)[0x70f117]
(pfs_spawn_thread+0x143)[0xb5af53]

I have some server, OS, SOFT on another node of clusrter but all working normal.

How can i fix this?

Tags: i68136
Revision history for this message
ksp (kspahc) wrote :

crashed again and have coredump

gdb /usr/sbin/mysql /home/corefiles/core.5340
(gdb) info threads
* 2 Thread 0x7f375ebe8700 (LWP 16681) 0x00000000008b8381 in THD::binlog_get_pending_rows_event (this=this@entry=0x7f36d20cb9e0, is_transactional=is_transactional@entry=true)
    at /mnt/workspace/percona-xtradb-cluster-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-xtradb-cluster-5.6-5.6.25-25.12/sql/binlog.cc:7990

(gdb) thread 2
[Switching to thread 2 (Thread 0x7f375ebe8700 (LWP 16681))]
#0 0x00000000008b8381 in THD::binlog_get_pending_rows_event (this=this@entry=0x7f36d20cb9e0, is_transactional=is_transactional@entry=true)
    at /mnt/workspace/percona-xtradb-cluster-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-xtradb-cluster-5.6-5.6.25-25.12/sql/binlog.cc:7990
7990 in /mnt/workspace/percona-xtradb-cluster-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-xtradb-cluster-5.6-5.6.25-25.12/sql/binlog.cc

(gdb) info stack
#0 0x00000000008b8381 in THD::binlog_get_pending_rows_event (this=this@entry=0x7f36d20cb9e0, is_transactional=is_transactional@entry=true)
    at /mnt/workspace/percona-xtradb-cluster-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-xtradb-cluster-5.6-5.6.25-25.12/sql/binlog.cc:7990

(gdb) list +
7985 in /mnt/workspace/percona-xtradb-cluster-5.6-debian-binary/label_exp/debian-wheezy-x64/percona-xtradb-cluster-5.6-5.6.25-25.12/sql/binlog.cc

(gdb) print binlog_get_pending_rows_event
'$1 = {Rows_log_event *(const THD * const, bool)} 0x8b8380 <THD::binlog_get_pending_rows_event(bool) const>'

Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

Do you have a reproducible TCs ?
If you have one and you can share it would be good (please also share the my.cnf setting)

Also, I assume you have taken into consideration Galera limitation.
Given MDL is involved locking related limitation. Does you app uses GET_LOCK/RELEASE_LOCK ?

Changed in percona-xtradb-cluster:
status: New → Incomplete
Revision history for this message
ksp (kspahc) wrote :

I can't reproduce a situation while, but some times mysql crashes with

Query (8291120): SELECT RELEASE_LOCK('OA_1.7260e4bd5dfd0a71c335310e686134e7c0057ca5')

some times i have mess

2016-01-04 01:07:02 15235 [Note] WSREP: MDL conflict db=OA_1.7260e4bd5dfd0a71c335310e686134e7c0057ca5 table= ticket=6 solved by abort

My my.cnf

[mysqld]
core-file
wsrep_cluster_name=pxctest
wsrep_provider=/usr/lib/libgalera_smm.so
wsrep_provider_options ="gmcast.listen_addr=tcp://10.10.0.1:4567;"
wsrep_cluster_address=gcomm://10.10.0.1,10.10.0.2
wsrep_slave_threads=48
wsrep_sst_method=rsync
wsrep_notify_cmd=/usr/bin/galeranotify.py
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
datadir=/opt/mysql/newcluster/lib/mysql
bind-address = 10.10.0.1
skip-name-resolve
innodb_buffer_pool_size=40G
innodb_log_file_size=256M
innodb_log_buffer_size=4M
innodb_flush_log_at_trx_commit=2
innodb_thread_concurrency=48
innodb_flush_method=O_DIRECT
innodb_file_per_table
transaction-isolation=READ-COMMITTED
key_buffer_size=20G
query_cache_type = 1
query_cache_size = 64M
query_cache_limit = 16M
log-warnings=2
wsrep_debug=1
slow_query_log=1
log_queries_not_using_indexes
wsrep_log_conflicts=1
explicit_defaults_for_timestamp=1
max_connections=1000
wsrep_retry_autocommit=10
net_read_timeout=120
net_write_timeout=240
max_allowed_packet = 1024M
read_buffer_size = 32M
tmp_table_size = 1280M
max_heap_table_size = 1280M
[client]
socket=/var/run/mysqld/mysqld.sock

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Percona XtraDB Cluster because there has been no activity for 60 days.]

Changed in percona-xtradb-cluster:
status: Incomplete → Expired
Revision history for this message
Przemek (pmalkowski) wrote :

In Percona Server 5.6, we introduced:
https://www.percona.com/doc/percona-server/5.6/scalability/multiple_user_level_locks.html#multiple-user-level-locks
living now in MDL context, the same way as it is in MySQL 5.7 now:
http://dev.mysql.com/doc/refman/5.7/en/miscellaneous-functions.html#function_get-lock

The function *mysql_ull_cleanup* we can see in stack traces is not even present in MySQL Community 5.6:
$ grep -r 'mysql_ull_cleanup' mysql-5.6.30/

$ grep -r 'mysql_ull_cleanup' mysql-5.7.12/
mysql-5.7.12/sql/item_func.h:void mysql_ull_cleanup(THD *thd);
mysql-5.7.12/sql/item_func.cc:void mysql_ull_cleanup(THD *thd)
mysql-5.7.12/sql/item_func.cc: DBUG_ENTER("mysql_ull_cleanup");
mysql-5.7.12/sql/sql_class.cc: mysql_ull_cleanup(this);

$ grep -r 'mysql_ull_cleanup' Percona-XtraDB-Cluster-5.6.27-76.0/
Percona-XtraDB-Cluster-5.6.27-76.0/sql/item_func.h:void mysql_ull_cleanup(THD *thd);
Percona-XtraDB-Cluster-5.6.27-76.0/sql/wsrep_thd.cc: mysql_ull_cleanup(thd);
Percona-XtraDB-Cluster-5.6.27-76.0/sql/item_func.cc:void mysql_ull_cleanup(THD *thd)
Percona-XtraDB-Cluster-5.6.27-76.0/sql/item_func.cc: DBUG_ENTER("mysql_ull_cleanup");
Percona-XtraDB-Cluster-5.6.27-76.0/sql/sql_class.cc: mysql_ull_cleanup(this);

So in addition to the fact that user locks are not supported in Galera:
http://galeracluster.com/documentation-webpages/limitations.html#table-locking
https://blueprints.launchpad.net/codership-mysql/+spec/get-lock-support

now they are even more complex in PXC 5.6.19+, which I think may lead to those crashes.

tags: added: i68136
Revision history for this message
Przemek (pmalkowski) wrote :

New example crash traces, happened on PXC 5.6.24-72.2-56:

Thread pointer: 0x1e579ee0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f7cf4e7cd38 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8fd375]
/usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x666264]
/lib64/libpthread.so.0(+0xf790)[0x7f8199298790]
/usr/sbin/mysqld(_ZN8MDL_lock11Ticket_list13remove_ticketEP10MDL_ticket+0x3e)[0x6559ce]
/usr/sbin/mysqld(_ZN8MDL_lock13remove_ticketEMS_NS_11Ticket_listEP10MDL_ticket+0x93)[0x658533]
/usr/sbin/mysqld(_ZN11MDL_context12release_lockE17enum_mdl_durationP10MDL_ticket+0x1a)[0x6585da]
/usr/sbin/mysqld(_Z17mysql_ull_cleanupP3THD+0x49)[0x5fbdf9]
/usr/sbin/mysqld(_ZN3THD7cleanupEv+0xd0)[0x6b66a0]
/usr/sbin/mysqld(_ZN3THD17release_resourcesEv+0x288)[0x6b7268]
/usr/sbin/mysqld(_Z29one_thread_per_connection_endP3THDb+0x2e)[0x584a9e]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x101)[0x6bd251]
/usr/sbin/mysqld(handle_one_connection+0x47)[0x6bd4b7]
/usr/sbin/mysqld(pfs_spawn_thread+0x12a)[0xaf978a]
/lib64/libpthread.so.0(+0x7a51)[0x7f8199290a51]
/lib64/libc.so.6(clone+0x6d)[0x7f81977949ad]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 12129117
Status: KILL_CONNECTION

-------

Thread pointer: 0x1d3c5a00
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7ff99c41dd38 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8fd375]
/usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x666264]
/lib64/libpthread.so.0(+0xf790)[0x7ff9d768b790]
/usr/sbin/mysqld(_ZN11MDL_context12release_lockE17enum_mdl_durationP10MDL_ticket+0x39)[0x6585f9]
/usr/sbin/mysqld(_Z17mysql_ull_cleanupP3THD+0x49)[0x5fbdf9]
/usr/sbin/mysqld(_ZN3THD7cleanupEv+0xd0)[0x6b66a0]
/usr/sbin/mysqld(_ZN3THD17release_resourcesEv+0x288)[0x6b7268]
/usr/sbin/mysqld(_Z29one_thread_per_connection_endP3THDb+0x2e)[0x584a9e]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x101)[0x6bd251]
/usr/sbin/mysqld(handle_one_connection+0x47)[0x6bd4b7]
/usr/sbin/mysqld(pfs_spawn_thread+0x12a)[0xaf978a]
/lib64/libpthread.so.0(+0x7a51)[0x7ff9d7683a51]
/lib64/libc.so.6(clone+0x6d)[0x7ff9d5b8793d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 12608838
Status: KILL_CONNECTION

Revision history for this message
Sveta Smirnova (svetasmirnova) wrote :

Krunal,

status of this report is "In progress". Were you able to repeat it?

Revision history for this message
Przemek (pmalkowski) wrote :

Just wanted to point out exact similarities between the crashes in initial report and examples I posted above.
Firstly, all crashes have some reference to user level locks (ull) functions - mysql_ull_cleanup, ull_get_key, etc, and most if them have MDL lock reference too.

A very similar parts of the initial, second crash and crash from PXC 5.6.24 I mentioned before:

(MDL_lock::Ticket_list::remove_ticket(MDL_ticket*)+0x11)[0x6b5241]
(MDL_lock::remove_ticket(MDL_lock::Ticket_list MDL_lock::*, MDL_ticket*)+0x48)[0x6b5df8]
(MDL_context::release_lock(enum_mdl_duration, MDL_ticket*)+0x1a)[0x6b696a]
(mysql_ull_cleanup(THD*)+0x49)[0x671049]```
vs
/usr/sbin/mysqld(_ZN8MDL_lock11Ticket_list13remove_ticketEP10MDL_ticket+0x3e)[0x6559ce]
/usr/sbin/mysqld(_ZN8MDL_lock13remove_ticketEMS_NS_11Ticket_listEP10MDL_ticket+0x93)[0x658533]
/usr/sbin/mysqld(_ZN11MDL_context12release_lockE17enum_mdl_durationP10MDL_ticket+0x1a)[0x6585da]
/usr/sbin/mysqld(_Z17mysql_ull_cleanupP3THD+0x49)[0x5fbdf9]

Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

Sveta,

No I was not able to reproduce it when I tried last time.
Not sure who set the status to In-Progress (as per the timestamp it seems like it was set immediately on day-of-filing).

Revision history for this message
Sveta Smirnova (svetasmirnova) wrote :

Thank you for the update, Krunal.

Status checked to appropriate: "New".

Revision history for this message
Sveta Smirnova (svetasmirnova) wrote :

Tried to repeat bug today and without luck.

All who still can repeat: try to re-create sequence of actions you are using. How do you use GET_LOCK/RELEASE_LOCK? Do you use locks with same name on different nodes? What are you doing between GET_LOCK/RELEASE_LOCK calls?

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1860

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.