Comment 3 for bug 1680803

Revision history for this message
Elena Stepanova (elenst) wrote :

To reproduce, the 10.2 server should be started with InnoDB, and even then I imagine it can be system- or luck-dependent.

MariaDB 10.2 prints information about InnoDB system threads in the processlist. Like this:

+----+-------------+-----------------+------+---------+------+--------------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------------+------+---------+------+--------------------------+------------------+----------+
| 4 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 2 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 1 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 3 | system user | | NULL | Daemon | NULL | InnoDB purge coordinator | NULL | 0.000 |
| 5 | system user | | NULL | Daemon | NULL | InnoDB shutdown handler | NULL | 0.000 |
| 26 | root | localhost:40405 | test | Query | 0 | init | show processlist | 0.000 |
+----+-------------+-----------------+------+---------+------+--------------------------+------------------+----------+

Note NULL in the Time column.

The pt tool hangs (or rather falls into an endless loop) in summarize_processlist, where it uses the column to populate time and mtime.
For 'time', it seems to work okay, as it's populated by adding the value, and apparently it's implicitly converted into 0 on the way.
For 'mtime', under certain conditions it uses the value as is. Thus mtime becomes equal 'NULL' (the string).
Further, dark magic applies inside fuzzy_formula, which compares the value to various numbers and continues performing arithmetical operations until the result of the comparison is satisfactory. Among other things, it keeps increasing the 'factor' variable which is later used for more arithmetic.

The rest depends on the system/luck, apparently. 'factor' overflows rather quickly, if it becomes 0, the loop ends. If it becomes something ugly, the loop continues forever.

On the MariaDB side, printing the threads is a conscious decision (see https://jira.mariadb.org/browse/MDEV-11703). The question will be raised whether they should have NULL in the Time column, although coming up with a reasonable alternative might be challenging.

An easy fix for the pt tool is to convert a non-digital value to 0. Something like this, but maybe a bit smarter (as far as shell/awk scripting goes):

--- bin/pt-mysql-summary.orig 2017-04-19 00:29:48.992822955 +0300
+++ bin/pt-mysql-summary 2017-04-19 00:33:26.692815665 +0300
@@ -1334,6 +1334,9 @@
          }
          \$1 == \"Time:\" {
             t = \$2;
+ if ( t == \"NULL\" ) {
+ t = 0;
+ }
          }
          \$1 == \"Command:\" {
             c = \$2;