Statistics collected in query response time tables are incorrect

Bug #1639808 reported by Maciej Dobrzanski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.5
Won't Fix
Medium
Unassigned
5.6
Triaged
Medium
Unassigned
5.7
Triaged
Medium
Unassigned

Bug Description

Query Response Time feature/plug-in provides aggregated query response time metrics through a series of tables. By definition response time is the entire time it takes a system to react to a given input, which in this case is a database query started by a user or an application. However, it's not what query_response_time tables collect and aggregate.

Depending on Percona Server version, the times sent for aggregation are collected in the following ways:

1. QRT plug-in in MySQL 5.6+

static void query_response_time_audit_notify(MYSQL_THD thd,
                                             unsigned int event_class,
                                             const void *event)
{
  ..
      query_response_time_collect(query_type,
                                  thd->utime_after_query -
                                  thd->utime_after_lock);

2. Built-in QRT in MySQL 5.5 ==

static inline ulonglong get_query_exec_time(THD *thd, ulonglong cur_utime)
{
  ..
  res= cur_utime - thd->utime_after_lock;
  ..
  return res;
}

void log_slow_statement(THD *thd)
{
  ..
  ulonglong query_exec_time= get_query_exec_time(thd, end_utime_of_query);
#ifdef HAVE_RESPONSE_TIME_DISTRIBUTION
  if (opt_query_response_time_stats)
  {
    query_response_time_collect(query_exec_time);
  }
#endif

In both cases query response time is calculated as a timestamp difference between the query end (end_utime_of_query, thd->utime_after_query) and the moment when it successfully got past table locks and was free to execute (thd->utime_after_lock). This is incorrect for a couple of reasons:

1. Time spent in MDL contributes as much to query response time as anything else.
2. InnoDB implements thd_storage_lock_wait() where it moves thd->utime_after_lock to subtract any row lock time and row lock waits contribute to query response time as well.

This means, contrary to their names, the query_response_time tables do not actually provide any information about query response times.

tags: added: query-response-time
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Your observation is correct. It looks like it was implemented this way to match the slow query log behavior, which BTW also raises questions for the users: https://bugs.mysql.com/bug.php?id=71767, https://bugs.mysql.com/bug.php?id=65139.

The current behavior should be documented and then switch to full query time (controlled by option or just by default) should be planned too.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

bug 1640062 for docs

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-1760

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.