Server deadlock while DDL
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
MySQL patches by Codership | Status tracked in 5.6 | |||||
| 5.5 |
Undecided
|
Seppo Jaakola | |||
| 5.6 |
Undecided
|
Teemu Ollakka | |||
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC | Status tracked in 5.6 | |||||
| 5.5 |
Fix Released
|
Undecided
|
Unassigned | ||
| 5.6 |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
TOI-ased DDL causes server hang. I will attach the backtrace.
Related branches
Backtrace for PXC 5.5.33
Produced from
perl runall-new.pl --basedir=/chip --galera=ms --reporter=
The yy and zz files attached.
Thread apply all bt full of codership 5.5.32
Log of the command line.
GDB backtrace of node1
GDB backtrace for node2.
I have added backtrace for nodes 1 and 2 - this hang happens on node1 when DDL is added to the grammar. Without it, it works fine.
runall log for previous gdb backtraces.
From the trace it looks like, few threads are in galera_pre_commit, 2-3 in MDL_context:
summary: |
- Server hung due to DDL + Server deadlock while DDL |
Grammar files.
Changed in percona-xtradb-cluster: | |
milestone: | none → 5.5.33-23.7.6 |
Wrt. table1000_
=======
Thread 14 (Thread 0x7f0570404700 (LWP 29937)):
#0 0x00007f05832c003f in pthread_
#1 0x00007f0580dbeb25 in wait (this=0x7f05703
#2 galera:
#3 0x00007f0580db978a in galera:
#4 0x00007f0580dcd074 in galera_
#5 0x000000000068370b in wsrep_TOI_begin (table_list=0x0, table_=
#6 wsrep_to_
#7 0x000000000060fe24 in mysql_create_
#8 0x000000000059f999 in mysql_execute_
#9 0x00000000005a6e41 in mysql_parse (thd=thd@
#10 0x00000000005a7ab3 in wsrep_mysql_parse (thd=thd@
#11 0x00000000005a8a89 in dispatch_command (command=
#12 0x00000000005aa994 in do_command (thd=0x2840900) at /media/
#13 0x000000000065834a in do_handle_
#14 0x000000000065857e in handle_
Backtrace attached.
node0 vardir with logs.
GDB backtrace.
With wsrep_debug it crashed as:
.........
130827 17:26:43 [Note] WSREP: BF Aborted, thd: 25 is_AC: 0, retry: 0 - 1 SQL: UPDATE IGNORE `table100_
_char_
InnoDB: DEBUG: update_statistics for test/table1000_
InnoDB: DEBUG: update_statistics for test/table1000_
130827 17:26:43 [ERROR] WSREP: galera/
11:56:43 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://
key_buffer_
read_buffer_
max_used_
max_threads=153
thread_count=19
connection_
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x2b9d730
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7ff73c0e1e68 thread_stack 0x40000
/pxc/bin/
/pxc/bin/
/usr/lib/
/usr/lib/
/usr/lib/
/pxc/lib/
/pxc/lib/
/pxc/lib/
/pxc/bin/
/pxc/bin/
/pxc/bin/
/pxc/bin/
/pxc/bin/
/pxc/bin/
/pxc/bin/
/pxc/bin/
/pxc/bin/
/pxc/bin/
/pxc/bin/
/usr/lib/
/usr/lib/
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7ff6d0004c10): DROP TABLE IF EXISTS b
Connection ID (thread ID...
GDB trace of when it is hung, mysql.err to follow.
mysql error log.
Attached in #17 and #18, the logs for
perl runall-new.pl --basedir=/pxc --galera=ms --reporter=
# 2013-08-27T17:41:22 [575] Using Log::Log4perl
# 2013-08-27T17:41:22 [575] Copyright (c) 2010,2011 Oracle and/or its affiliates. All rights reserved. Use is subject to license terms.
# 2013-08-27T17:41:22 [575] Please see http://
# 2013-08-27T17:41:22 [575] Starting
# 2013-08-27T17:41:22 [575] # runall-new.pl \
# 2013-08-27T17:41:22 [575] # --basedir=/pxc \
# 2013-08-27T17:41:22 [575] # --galera=ms \
# 2013-08-27T17:41:22 [575] # --reporter=
# 2013-08-27T17:41:22 [575] # --grammar=
# 2013-08-27T17:41:22 [575] # --gendata=
# 2013-08-27T17:41:22 [575] # --threads=16 \
# 2013-08-27T17:41:22 [575] # --queries=1000 \
# 2013-08-27T17:41:22 [575] # --mysqld=
# 2013-08-27T17:41:22 [575] # --mysqld=
# 2013-08-27T17:41:22 [575] # --mysqld=
# 2013-08-27T17:41:22 [575] # --mysqld=
# 2013-08-27T17:41:22 [575] master_port : 19300 slave_port : 19302 ports : 19300 19302 MTR_BUILD_THREAD : 930
# 2013-08-27T17:41:23 [575] /media/
# 2013-08-27T17:41:23 [575] /media/
# 2013-08-27T17:41:23 [575] /pxc does not look like a bzr branch, cannot get revision info.
# 2013-08-27T17:41:23 [575] Creating MySQL 5.5.33 database at /pxc/mysql-
# 2013-08-27T17:41:31 [575] Creating MySQL 5.5.33 database at /pxc/mysql-
# 2013-08-27T17:41:38 [575] MySQL Version:5.5.33
# 2013-08-27T17:41:38 [575] Binary: /pxc/bin/mysqld
# 2013-08-27T17:41:38 [575] Type: Debug
# 2013-08-27T17:41:38 [575] Datadir: /pxc/mysql-
# 2013-08-27T17:41:38 [575] Tmpdir: /pxc/mysql-
# 2013-08-27T17:41:38 [575] Corefile: /pxc/mysql-
# 2013-08-27T17:41:38 [575] Starting MySQL 5.5.33: "/pxc/bin/mysqld" "--no-defaults" "--basedir=/pxc" "--datadir=
vardir containing mysql error and general log of node0 and node1
Noticed
"
130827 20:31:21 [Note] [Debug] WSREP: galera/
"
in the logs.
Changed in codership-mysql: | |
assignee: | nobody → Seppo Jaakola (seppo-jaakola) |
status: | New → In Progress |
Seppo Jaakola (seppo-jaakola) wrote : | #22 |
The problem seems to happen after a COMMIT processing was interrupted (e.g. certification failure). When such commit processing returns, it does not always release transactional MDL locks. So some MDL locks will remain even for the next queries in the connection.
And later some DDL processor get blocked because of the MDL lock.
Seppo Jaakola (seppo-jaakola) wrote : | #23 |
A potential fix for releasing the transactional MDL locks was pushed in revision: http://
I am still able to reproduce with ROW_TYPE_COMPRESSED or ROW_TYPE_DYNAMIC (in table formats). Attached the trace.
Log of
perl runall-new.pl --basedir=/pxc --galera=ms --threads=16 --queries=50000 9.26s user 8.95s system 5% cpu 5:10.36 total
Archie% perl runall-new.pl --basedir=/pxc --galera=ms --reporter=
# 2013-08-30T14:15:57 [1169] Using Log::Log4perl
# 2013-08-30T14:15:57 [1169] Copyright (c) 2010,2011 Oracle and/or its affiliates. All rights reserved. Use is subject to license terms.
# 2013-08-30T14:15:57 [1169] Please see http://
# 2013-08-30T14:15:57 [1169] Starting
# 2013-08-30T14:15:57 [1169] # runall-new.pl \
# 2013-08-30T14:15:57 [1169] # --basedir=/pxc \
# 2013-08-30T14:15:57 [1169] # --galera=ms \
# 2013-08-30T14:15:57 [1169] # --reporter=
# 2013-08-30T14:15:57 [1169] # --grammar=
# 2013-08-30T14:15:57 [1169] # --gendata=
# 2013-08-30T14:15:57 [1169] # --threads=16 \
# 2013-08-30T14:15:57 [1169] # --queries=50000 \
# 2013-08-30T14:15:57 [1169] # --mysqld=
# 2013-08-30T14:15:57 [1169] # --mysqld=
# 2013-08-30T14:15:57 [1169] # --mysqld=
# 2013-08-30T14:15:57 [1169] # --mysqld=
# 2013-08-30T14:15:57 [1169] master_port : 19300 slave_port : 19302 ports : 19300 19302 MTR_BUILD_THREAD : 930
# 2013-08-30T14:15:57 [1169] /media/
# 2013-08-30T14:15:57 [1169] /media/
# 2013-08-30T14:15:57 [1169] /pxc does not look like a bzr branch, cannot get revision info.
# 2013-08-30T14:15:57 [1169] Creating MySQL 5.5.33 database at /pxc/mysql-
# 2013-08-30T14:16:04 [1169] Creating MySQL 5.5.33 database at /pxc/mysql-
# 2013-08-30T14:16:10 [1169] MySQL Version:5.5.33
# 2013-08-30T14:16:10 [1169] Binary: /pxc/bin/mysqld
# 2013-08-30T14:16:10 [1169] Type: Debug
# 2013-08-30T14:16:10 [1169] Datadir: /pxc/mysql-
# 2013-08-30T14:16:10 [1169] Tmpdir: /pxc/mysql-
# 2013-08-30T14:16:10 [1169] Corefile: /pxc/mysql-
# 2013-08-30T14:16:10 [1169] Starting MySQL 5.5.33: "/pxc/bin/mysqld" "--no-defaults" "--basedir=/pxc" "--datadir=
Transition crash while
=======
perl runall-new.pl --basedir=/pxc --galera=ms --reporter=
# 2013-08-30T17:51:21 [48533] Using Log::Log4perl
# 2013-08-30T17:51:21 [48533] Copyright (c) 2010,2011 Oracle and/or its affiliates. All rights reserved. Use is subject to license terms.
# 2013-08-30T17:51:21 [48533] Please see http://
# 2013-08-30T17:51:21 [48533] Starting
# 2013-08-30T17:51:21 [48533] # runall-new.pl \
# 2013-08-30T17:51:21 [48533] # --basedir=/pxc \
# 2013-08-30T17:51:21 [48533] # --galera=ms \
# 2013-08-30T17:51:21 [48533] # --reporter=
# 2013-08-30T17:51:21 [48533] # --grammar=
# 2013-08-30T17:51:21 [48533] # --gendata=
# 2013-08-30T17:51:21 [48533] # --threads=16 \
# 2013-08-30T17:51:21 [48533] # --queries=50000 \
# 2013-08-30T17:51:21 [48533] # --mysqld=
# 2013-08-30T17:51:21 [48533] # --mysqld=
# 2013-08-30T17:51:21 [48533] # --mysqld=
# 2013-08-30T17:51:21 [48533] # --mem
# 2013-08-30T17:51:21 [48533] master_port : 19300 slave_port : 19302 ports : 19300 19302 MTR_BUILD_THREAD : 930
# 2013-08-30T17:51:21 [48533] /media/
# 2013-08-30T17:51:21 [48533] /media/
# 2013-08-30T17:51:21 [48533] /pxc does not look like a bzr branch, cannot get revision info.
# 2013-08-30T17:51:21 [48533] Creating MySQL 5.5.33 database at /pxc/mysql-
# 2013-08-30T17:51:28 [48533] Creating MySQL 5.5.33 database at /pxc/mysql-
# 2013-08-30T17:51:45 [48533] MySQL Version:5.5.33
# 2013-08-30T17:51:45 [48533] Binary: /pxc/bin/mysqld
# 2013-08-30T17:51:45 [48533] Type: Debug
# 2013-08-30T17:51:45 [48533] Datadir: /pxc/mysql-
# 2013-08-30T17:51:45 [48533] Tmpdir: /pxc/mysql-
# 2013-08-30T17:51:45 [48533] Corefile: /pxc/mysql-
# 2013-08-30T17:51:45 [48533] Starting MySQL 5.5.33: "/pxc/bin/mysqld" "--no-defaults" "--basedir=/pxc" "--datadir=
Based on the logs, it looks like a DDL MDL conflict may be
causing this.
130902 22:13:29 [Note] WSREP: BF kill (1, seqno: 5216), victim: (25) trx: 7836
130902 22:13:29 [Note] WSREP: Aborting query: SET AUTOCOMMIT=ON
130902 22:13:29 [Note] WSREP: victim 7836 in state 7
130902 22:13:29 [Note] WSREP: MDL conflict db=test table=r ticket=4 solved by abort
130902 22:13:29 [Note] WSREP: MDL conflict
request: (14 seqno 5216 wsrep (2, 1, 0) cmd 3 9 DROP TABLE IF EXISTS r)
granted: (25 seqno 5220 wsrep (0, 1, 7) cmd 3 31 SET AUTOCOMMIT=ON)
130902 22:13:29 [Note] WSREP: MDL ticket: type: shared write space: TABLE db: test name: r (Waiting for table metadata lock)
130902 22:13:29 [Note] WSREP: DROP caused BF abort
130902 22:13:29 [Note] WSREP: MDL ticket: type: shared write space: TABLE db: test name: r (Waiting for table metadata lock)
130902 22:13:29 [Note] WSREP: wsrep_abort_thd, by: 140481475598080, victim: 140481472669440
130902 22:13:29 [Note] WSREP: abort transaction: BF: DROP TABLE IF EXISTS r victim: SET AUTOCOMMIT=ON
130902 22:13:29 [Note] WSREP: cluster conflict due to high priority abort for threads:
130902 22:13:29 [Note] WSREP: Winning thread:
THD: 14, mode: total order, state: executing, conflict: no conflict, seqno: 5216
SQL: DROP TABLE IF EXISTS r
130902 22:13:29 [Note] WSREP: Victim thread:
THD: 25, mode: local, state: executing, conflict: cert failure, seqno: 5220
SQL: SET AUTOCOMMIT=ON
130902 22:13:29 [Note] WSREP: BF kill (1, seqno: 5216), victim: (25) trx: 7836
130902 22:13:29 [Note] WSREP: Aborting query: SET AUTOCOMMIT=ON
130902 22:13:29 [Note] WSREP: victim 7836 in state 7
130902 22:13:29 [Note] WSREP: MDL victim killed at add_ticket
130902 22:13:29 [Note] WSREP: cert failure, thd: 25 is_AC: 0, retry: 0 - 1 SQL: SET AUTOCOMMIT=ON
130902 22:13:29 [Note] WSREP: TO BEGIN: 5210, 0 : ALTER TABLE n RENAME `table1_
130902 22:13:29 [Note] WSREP: TO BEGIN: 5211, 0 : DROP TABLE `table1_
130902 22:13:29 [Note] WSREP: TO BEGIN: 5213, 0 : CREATE TABLE n LIKE `table1_
130902 22:13:29 [Note] WSREP: TO BEGIN: 5158, 0 : DROP TABLE IF EXISTS t
130902 22:13:29 [Note] WSREP: empty rbr buffer, query: DROP TABLE IF EXISTS c
130902 22:13:29 [Note] WSREP: TO BEGIN: 5214, 0 : DROP TABLE IF EXISTS c
InnoDB: DEBUG: update_statistics for test/table1000_
InnoDB: DEBUG: update_statistics for test/table1000_
130902 22:13:29 [Note] WSREP: TO BEGIN: 5154, 0 : DROP TABLE IF EXISTS l
130902 22:14:20 [ERROR] /pxc/bin/mysqld: Lock wait timeout exceeded; try restarting transaction
130902 22:14:20 [ERROR] /pxc/bin/mysqld: Lock wait timeout exceeded; try restarting transaction
130902 22:14:20 [ERROR] /pxc/bin/mysqld: Sort aborted: Lock wait timeout exceeded; try restarting transaction
130902 22:14:20 [ERROR] /pxc/bin/mysqld: Sort aborted: Lock wait timeout exceeded; try restarting transaction
130902 22:14:20 [Note] WSREP: TO BEGIN: 5212, 0 : DROP TABLE IF EXISTS y
130902 22:14:20 [Note] WSREP: TO BEGIN: 5215, 0 : DROP TABLE IF EXISTS h
I...
Following grammars:
a)
$tables = {
partitions => [ undef , 'KEY (pk) PARTITIONS 2' ],
engines => [ 'InnoDB' ],
pk => [ 'int', 'int auto_increment' ],
rows => [ 1, 100, 1000 ],
row_formats => ['COMPRESSED'
};
$fields = {
types => [ 'int', 'char(12)'],
indexes => [undef, 'key' ]
};
$data = {
numbers => [ 'digit', 'null', undef ],
strings => [ 'letter', 'english', 'char(12)' ]
}
b)
$tables = {
partitions => [ undef , 'KEY (pk) PARTITIONS 2' ],
engines => [ 'InnoDB' ],
pk => [ 'int', 'int auto_increment' ],
rows => [ 1, 100, 1000 ],
row_formats => ['DEFAULT']
};
$fields = {
types => [ 'int', 'char(12)', 'varchar', 'date', 'bigint', 'enum' ],
indexes => [undef, 'key' ],
charsets => ['utf8','latin1']
};
$data = {
numbers => [ 'digit', 'null', undef ],
temporals => ['date', 'year', 'null', undef ],
enum => [ 'letter' ],
strings => [ 'letter', 'english', 'char(12)', 'varchar(246)', 'varchar(100)' ]
}
still hang the server (after latest fix).
If timeout is used, everything works fine (till now).
After latest commit, the combined grammar of
$tables = {
partitions => [ undef , 'KEY (pk) PARTITIONS 2' ],
engines => [ 'InnoDB' ],
pk => [ 'int', 'int auto_increment' ],
rows => [ 1, 100, 1000 ],
row_formats => ['COMPRESSED'
};
$fields = {
types => [ 'int', 'char(12)', 'varchar', 'date', 'bigint', 'enum' ],
indexes => [undef, 'key' ],
charsets => ['utf8','latin1']
};
$data = {
numbers => [ 'digit', 'null', undef ],
temporals => ['date', 'year', 'null', undef ],
enum => [ 'letter' ],
strings => [ 'letter', 'english', 'char(12)', 'varchar(246)', 'varchar(100)' ]
}
hangs.
Combination (b) doesn't
Combination (a) causes crash - reported here - lp:1123233
Seppo Jaakola (seppo-jaakola) wrote : | #30 |
Pushed fixes for releasing MDL after failed BEGIN/START TRANSACTION and "SET AUTOCOMMIT". Fix is in revision: http://
No hangs after last fix. Related details also posted here https:/
Changed in percona-xtradb-cluster: | |
status: | New → In Progress |
I re-added temporary table since https:/
=======
Running with galera grammar (yy and zz attached).
# 2013-09-16T09:50:01 [20538] Copyright (c) 2010,2011 Oracle and/or its affiliates. All rights reserved. Use is subject to license terms.
# 2013-09-16T09:50:01 [20538] Please see http://
# 2013-09-16T09:50:01 [20538] Starting
# 2013-09-16T09:50:01 [20538] # runall-new.pl \
# 2013-09-16T09:50:01 [20538] # --basedir=
# 2013-09-16T09:50:01 [20538] # --galera=ms \
# 2013-09-16T09:50:01 [20538] # --reporter=
# 2013-09-16T09:50:01 [20538] # --grammar=
# 2013-09-16T09:50:01 [20538] # --gendata=
# 2013-09-16T09:50:01 [20538] # --threads=16 \
# 2013-09-16T09:50:01 [20538] # --queries=50000 \
# 2013-09-16T09:50:01 [20538] # --mysqld=
# 2013-09-16T09:50:01 [20538] # --seed=time \
# 2013-09-16T09:50:01 [20538] # --mysqld=
# 2013-09-16T09:50:01 [20538] master_port : 19300 slave_port : 19302 ports : 19300 19302 MTR_BUILD_THREAD : 930
# 2013-09-16T09:50:02 [20538] /mnt/workspace/
# 2013-09-16T09:50:02 [20538] /mnt/workspace/
# 2013-09-16T09:50:02 [20538] /mnt/workspace/
# 2013-09-16T09:50:02 [20538] Creating MySQL 5.5.33 database at /mnt/workspace/
# 2013-09-16T09:50:08 [20538] Creating MySQL 5.5.33 database at /mnt/workspace/
# 2013-09-16T09:50:20 [20538] MySQL Version:5.5.33
# 2013-09-16T09:50:20 [20538] Binary: /mnt/workspace/
# 2013-09-16T09:50:20 [20538] Type: Release
# 2013-09-16T09:50:20 [20538] Datadir: /mnt/workspace/
# 2013-09-16T09:50:20 [20538] Tmpdir: /mnt/workspace/
Changed in percona-xtradb-cluster: | |
milestone: | 5.5.33-23.7.6 → future-5.5 |
I see deadlock with 5.6 tree as well, so the fix will need to ported there.
Teemu Ollakka (teemu-ollakka) wrote : | #34 |
Merged lp:~codership/codership-mysql/5.5-23 revs 3097, 3012 to lp:codership-mysql/5.6 (rev 3969).
See also https:/
Teemu Ollakka (teemu-ollakka) wrote : | #35 |
Percona now uses JIRA for bug reports so this bug report is migrated to: https:/
Backtrace/gdb bt of codership 5.5.32