Percona Server moved to https://jira.percona.com/projects/PS

Assertion `pending || ttp->thd' failed in sql/sql_timer.cc:193

Bug #1427526 reported by Ramesh Sivaraman on 2015-03-03

This bug affects 1 person

	Status	Importance	Assigned to
Percona Server moved to https://jira.percona.com/projects/PS	Status tracked in 5.7
5.1	Invalid	Undecided	Unassigned
5.5	Invalid	Undecided	Unassigned
5.6	Triaged	High	Unassigned
5.7	Triaged	High	Unassigned

Bug Description

** Testcase

DROP DATABASE test;CREATE DATABASE test;USE test;
SET @@SESSION.max_statement_time = 10;
select get_lock('test_bug16407', 60);
SELECT table_name, table_rows, data_length, max_data_length, data_free FROM information_schema.tables WHERE table_name='t1'; ;

To reproduce the issue we need to run multi thread pquery binary. The attached tarball gives the testcase as an exact match of our system, including some handy utilities
$ vi {epoch}_mybase # Update base path in this file (the only change
required!)
$ ./{epoch}_init # Initializes the data dir
$ ./{epoch}_start # Starts mysqld
$ ./{epoch}_cl # To check mysqld is up
$ ./{epoch}_run_pquery # Run the testcase with pquery binary(produces
output)
$ vi /dev/shm/{epoch}/error.log.out # Verify the error log
$ ./{epoch}_gdb # Brings you to a gdb prompt attached to correct
mysqld
& generated core
$ ./{epoch}_parse_core # Create {epoch}_STD.gdb and {epoch}_FULL.gdb;
standard and full var gdb stack traces
etc.

PS : No extra star-up variable is required to reproduce this issue

*GDB info

#0 0x00007ff87208a771 in pthread_kill () from /lib64/libpthread.so.0
#1 0x0000000000ab97ca in my_write_core (sig=6) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/mysys/stacktrace.c:422
#2 0x00000000007301a3 in handle_fatal_signal (sig=6) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/sql/signal_handler.cc:236
#3 <signal handler called>
#4 0x00007ff870c8e5c9 in raise () from /lib64/libc.so.6
#5 0x00007ff870c8fcd8 in abort () from /lib64/libc.so.6
#6 0x00007ff870c87536 in __assert_fail_base () from /lib64/libc.so.6
#7 0x00007ff870c875e2 in __assert_fail () from /lib64/libc.so.6
#8 0x00000000008de7fb in reap_timer (ttp=0x7ff849c890b0, pending=false) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/sql/sql_timer.cc:193
#9 0x00000000008de8a8 in thd_timer_reset (ttp=0x7ff849c890b0) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/sql/sql_timer.cc:230
#10 0x00000000007df113 in reset_statement_timer (thd=0x7ff8597de000) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/sql/sql_parse.cc:1202
#11 0x00000000007ea882 in mysql_execute_command (thd=0x7ff8597de000) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/sql/sql_parse.cc:5491
#12 0x00000000007edf41 in mysql_parse (thd=0x7ff8597de000, rawbuf=0x7ff849c1f010 "select get_lock('test_bug16407', 60)", length=36, parser_state=0x7ff860a6ddf0) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/sql/sql_parse.cc:6922
#13 0x00000000007dfdb7 in dispatch_command (command=COM_QUERY, thd=0x7ff8597de000, packet=0x7ff8597e2001 "", packet_length=38) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/sql/sql_parse.cc:1434
#14 0x00000000007decd5 in do_command (thd=0x7ff8597de000) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/sql/sql_parse.cc:1051
#15 0x00000000007a6c4d in do_handle_one_connection (thd_arg=0x7ff8597de000) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/sql/sql_connect.cc:1532
#16 0x00000000007a6755 in handle_one_connection (arg=0x7ff8597de000) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/sql/sql_connect.cc:1443
#17 0x0000000000dc9e6c in pfs_spawn_thread (arg=0x7ff86b7466a0) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/storage/perfschema/pfs.cc:1860
#18 0x00007ff872085df3 in start_thread () from /lib64/libpthread.so.0
#19 0x00007ff870d4f1ad in clone () from /lib64/libc.so.6

Tags:

Revision history for this message

Ramesh Sivaraman (rameshvs02) wrote on 2015-03-03:

1424952842_bug_bundle.tar.gz Edit (93.7 KiB, application/x-tar)

Revision history for this message

Ramesh Sivaraman (rameshvs02) wrote on 2015-03-04:

You may have to wait up to 5 minutes after running ./{epoch}_run_pquery to get the crash.

Revision history for this message

Yura Sorokin (yura-sorokin) wrote on 2015-11-11:

Unable to reproduce on 5.6 (commit ee0f346).

However a race condition is theoretically possible.

Consider the following function "thd_timer_reset()" that calls "reap_timer()"

thd_timer_t * thd_timer_reset(thd_timer_t *ttp)
{
  bool unreachable;
  int status, state;
  DBUG_ENTER("thd_timer_reset");

status= my_os_timer_reset(&ttp->timer, &state);

  pthread_mutex_lock(&ttp->mutex);
  unreachable= reap_timer(ttp, status ? true : !state);
  ttp->destroy= unreachable ? false : true;
  pthread_mutex_unlock(&ttp->mutex);

DBUG_RETURN(unreachable ? ttp : NULL);
}

static bool reap_timer(thd_timer_t *ttp, bool pending)
{
bool unreachable;

DBUG_ASSERT(!ttp->destroy);

DBUG_ASSERT(pending || ttp->thd);

unreachable= pending ? ttp->thd == NULL : true;

ttp->thd= NULL;

return unreachable;
}

Here is one of the possible cases when "pending == false" and "ttp->thd == NULL" which triggers the assertion.
1. Two threads call "thd_timer_reset()" on the same object.
2. Thread 1 executes "status= my_os_timer_reset(&ttp->timer, &state);" and returns "status ==0" and "state == 1".
3. Thread 2 executes "status= my_os_timer_reset(&ttp->timer, &state);" and returns "status ==0" and "state == 1".
4. Thread 1 successfully acquires "ttp->mutex".
5. Thread 2 tries to acquire "ttp->mutex" and hangs.
6. Thread 1 executes "reap_timer()" with "pending == false" which returns "true". As a side effect "ttp->thd" is set to NULL.
7. Thread 1 sets "ttp->destroy" to "false"and releases "ttp->mutex".
8. Thread 2 finally acquires "ttp->mutex".
9. Thread 2 executes "reap_timer()" with "pending == false" and "ttp->thd == NULL".
10. The "DBUG_ASSERT(pending || ttp->thd);" assertion is triggered in Thread 2.

Revision history for this message

Laurynas Biveinis (laurynas-biveinis) wrote on 2015-11-12:

Did you use the "bug bundle" of comment #1 in attempts to reproduce?

Can you please update the 5.1 and 5.5 bug status?

Revision history for this message

Yura Sorokin (yura-sorokin) wrote on 2015-11-12:

Yes, I was using the "bug bundle" trying to reproduce.
I only observed "pquery" utility itself crashing several times. After that "mysqld" process was still alive.

[yura@localhost bug1427526-sql_timer]$ ./1424952842_run_pquery
Executing testcase ./1424952842.sql against mysqld with socket /dev/shm/1424952842/socket.sock using pquery...
Infile is ./1424952842.sql
Database is test
Starting with 30 threads
Query limit per thread is 400000
User is root
Socket is /dev/shm/1424952842/socket.sock
MySQL Connection Info: Localhost via UNIX socket
MySQL Client Info: 5.6.27-75.0
MySQL Server Info: 5.6.27-75.0-debug
* SUMMARY: 104465/400000 queries failed (73.88% were successful)
./1424952842_run_pquery: line 5: 4106 Segmentation fault ./1424952842_pquery --infile=./1424952842.sql --database=test --threads=30 --queries=400000 --user=root --socket=/dev/shm/1424952842/socket.sock

As for 5.1 and 5.5, because "sql_timer.cc" was added only in 5.6, they are not a target of investigation.

Yura Sorokin (yura-sorokin) on 2015-11-12

no longer affects:	percona-server/5.5
no longer affects:	percona-server/5.1

Revision history for this message

Roel Van de Paar (roel11) wrote on 2015-11-19:

For completeness, in regards pquery crashing from comment #5, turns out this was a problem in PS 5.6/5.7 which is already fixed in 5.7 and will shortly be fixed in 5.6.

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-25:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-878