Print locking transaction which cause "Lock wait timeout" errors into log file

Bug #1657737 reported by Sveta Smirnova on 2017-01-19
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL Server
Unknown
Unknown
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.5
Won't Fix
Undecided
Unassigned
5.6
Fix Released
Wishlist
Vlad Lesin
5.7
Fix Released
Wishlist
Vlad Lesin

Bug Description

Description:
Currently InnoDB has option innodb_print_all_deadlocks which tells InnoDB to redirect information about all deadlocks into log file. Please implement similar feature, but for "Lock wait timeout" errors. Print information about both failed and blocking transactions.

Since blocking transaction is often multiple statement transaction, if possible, print all statements. You may print all statements depending if historical Performance Schema instrumentation for statements enabled or not: in this case server already stored statements, participated in the transaction.

How to repeat:
Ask your colleague to run high concurrent load which produces "Lock wait timeout" errors, try to find reason for them.

TokuDB can do this if option tokudb_lock_timeout_debug is set

tags: added: upstream
tags: added: sfr-92
Vlad Lesin (vlad-lesin) wrote :

The above implementation is merged to trunk, but the task is not closed because the necessity of audit plugin using as well as necessity of some additional information requesting from I_S and PFS is still under the question.

Vlad, please set the milestones too

Vlad Lesin (vlad-lesin) wrote :

Laurynas, done.

Vlad Lesin (vlad-lesin) wrote :

This is my draft implementation with blocking statements history and output with audit log plugin instead off error log: https://github.com/vlad-lesin/percona-server/tree/lwt-bug-lp1657737-5.7

The audit log plugins needs fixes to have the ability to filter out only the certain kinds of events. Currently audit_log_policy is mutually exclusive, except "ALL". It's highly desirable to have the ability to output, for example, QUERIES+LOCK messages.

Vlad Lesin (vlad-lesin) wrote :

A little FAQ about the current implementation(https://github.com/percona/percona-server/pull/1970, https://github.com/percona/percona-server/pull/1971).

Q: How to turn on the feature output?
A: There is new global boolean variable print_lock_wait_timeout_info which turns on the functionality. It is turned off by default.

Q: How to get query with query id from error log output?
A: The query id can be seen in "show process list" output. Besides query id is logged in general query log which can be turned on with "log=/path_to/mysql_query.log" option. See also https://dev.mysql.com/doc/refman/5.7/en/query-log.html.

Q: What is thread_id in error log output?
A: This is mysql thread id which is shown in "show processlist" results.

Q: What the relation between PFS thread id and thread id which is output to error log by the feature?
A: PFS has their own thread id space. Our thread id corresponds to "PROCESSLIST_ID" field in performance_schema.events_statements_current. If we have blocking thread id then we can get blocking thread statements history with the following query:

   SELECT s.SQL_TEXT FROM performance_schema.events_statements_history s
   INNER JOIN performance_schema.threads t ON t.THREAD_ID = s.THREAD_ID
   WHERE t.PROCESSLIST_ID = %d
   UNION
   SELECT s.SQL_TEXT FROM performance_schema.events_statements_current s
   INNER JOIN performance_schema.threads t ON t.THREAD_ID = s.THREAD_ID
   WHERE t.PROCESSLIST_ID = %d;

Where PROCESSLIST_ID is thread id from error log output.

Vlad Lesin (vlad-lesin) wrote :

The question is why don't output blocking transaction query string instead of or along with query id.

Let's consider the code from this https://github.com/percona/percona-server/pull/1971 pull request.

This is the place in the code where the list of blocking transactions is obtained:

Thread 28 "mysqld" hit Breakpoint 2, lock_wait_suspend_thread (
    thr=0x7fffa4010418)
    at ./storage/innobase/lock/lock0wait.cc:323
323 blocking[blocking_count].trx_id = lock_get_trx_id(curr_lock);
(gdb) l
318 lock_queue_iterator_reset(&iter, wait_lock, ULINT_UNDEFINED);
319 for (curr_lock = lock_queue_iterator_get_prev(&iter);
320 curr_lock != NULL;
321 curr_lock = lock_queue_iterator_get_prev(&iter)) {
322 if (lock_has_to_wait(trx->lock.wait_lock, curr_lock)) {
323 blocking[blocking_count].trx_id = lock_get_trx_id(curr_lock);
324 blocking[blocking_count].thread_id =
325 curr_lock->trx->mysql_thd ?
326 thd_get_thread_id(curr_lock->trx->mysql_thd) : 0;
327 blocking[blocking_count].query_id =

If we look at the THD object of blocking transaction we will see that it does not contain query string:

(gdb) p curr_lock->trx->mysql_thd->m_query_string
$2 = {str = 0x0, length = 0}

The question is why? On what exactly step the string was cleaned up? Let's look on the following stack trace:

Thread 27 "mysqld" hit Breakpoint 6, THD::reset_query (this=0x7fffa0128060) at ./sql/sql_class.h:4564
4564 void reset_query() { set_query(LEX_CSTRING()); }
(gdb) bt
#0 THD::reset_query (this=0x7fffa0128060) at ./sql/sql_class.h:4564
#1 0x000000000164747b in dispatch_command (thd=0x7fffa0128060, com_data=0x7ffff11c2de0, command=COM_QUERY)
    at ./sql/sql_parse.cc:1946
#2 0x0000000001644b80 in do_command (thd=0x7fffa0128060) at ./sql/sql_parse.cc:1021
#3 0x0000000001796f22 in handle_connection (arg=0x3949180) at ./sql/conn_handler/connection_handler_per_thread.cc:312
#4 0x0000000001e7e5ad in pfs_spawn_thread (arg=0x3962a00) at ./storage/perfschema/pfs.cc:2188
#5 0x00007ffff6f5e6ba in start_thread (arg=0x7ffff11c3700) at pthread_create.c:333
#6 0x00007ffff63f33dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

The query string is cleaned up after the statement result is sent. So it can't be accessed from the place where the list of blocking transactions is obtained.

The query string can be obtained from performance_schema.events_statements_current table. This requires additional code to execute query inside of the server(what is preferable for better portability) or get access to internal PFS data directly, what is easier, but can require PFS code patching and increases code cohesion.

Vlad Lesin (vlad-lesin) wrote :

In the above FAQ please change print_lock_wait_timeout_info to innodb_print_lock_wait_timeout_info.

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-255

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.