Assertion `pending || ttp->thd' failed in sql/sql_timer.cc:193

Bug #1427526 reported by Ramesh Sivaraman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.1
Invalid
Undecided
Unassigned
5.5
Invalid
Undecided
Unassigned
5.6
Triaged
High
Unassigned
5.7
Triaged
High
Unassigned

Bug Description

** Testcase

DROP DATABASE test;CREATE DATABASE test;USE test;
SET @@SESSION.max_statement_time = 10;
select get_lock('test_bug16407', 60);
SELECT table_name, table_rows, data_length, max_data_length, data_free FROM information_schema.tables WHERE table_name='t1'; ;

To reproduce the issue we need to run multi thread pquery binary. The attached tarball gives the testcase as an exact match of our system, including some handy utilities
$ vi {epoch}_mybase # Update base path in this file (the only change
required!)
$ ./{epoch}_init # Initializes the data dir
$ ./{epoch}_start # Starts mysqld
$ ./{epoch}_cl # To check mysqld is up
$ ./{epoch}_run_pquery # Run the testcase with pquery binary(produces
output)
$ vi /dev/shm/{epoch}/error.log.out # Verify the error log
$ ./{epoch}_gdb # Brings you to a gdb prompt attached to correct
mysqld
& generated core
$ ./{epoch}_parse_core # Create {epoch}_STD.gdb and {epoch}_FULL.gdb;
standard and full var gdb stack traces
etc.

PS : No extra star-up variable is required to reproduce this issue

*GDB info

#0 0x00007ff87208a771 in pthread_kill () from /lib64/libpthread.so.0
#1 0x0000000000ab97ca in my_write_core (sig=6) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/mysys/stacktrace.c:422
#2 0x00000000007301a3 in handle_fatal_signal (sig=6) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/sql/signal_handler.cc:236
#3 <signal handler called>
#4 0x00007ff870c8e5c9 in raise () from /lib64/libc.so.6
#5 0x00007ff870c8fcd8 in abort () from /lib64/libc.so.6
#6 0x00007ff870c87536 in __assert_fail_base () from /lib64/libc.so.6
#7 0x00007ff870c875e2 in __assert_fail () from /lib64/libc.so.6
#8 0x00000000008de7fb in reap_timer (ttp=0x7ff849c890b0, pending=false) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/sql/sql_timer.cc:193
#9 0x00000000008de8a8 in thd_timer_reset (ttp=0x7ff849c890b0) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/sql/sql_timer.cc:230
#10 0x00000000007df113 in reset_statement_timer (thd=0x7ff8597de000) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/sql/sql_parse.cc:1202
#11 0x00000000007ea882 in mysql_execute_command (thd=0x7ff8597de000) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/sql/sql_parse.cc:5491
#12 0x00000000007edf41 in mysql_parse (thd=0x7ff8597de000, rawbuf=0x7ff849c1f010 "select get_lock('test_bug16407', 60)", length=36, parser_state=0x7ff860a6ddf0) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/sql/sql_parse.cc:6922
#13 0x00000000007dfdb7 in dispatch_command (command=COM_QUERY, thd=0x7ff8597de000, packet=0x7ff8597e2001 "", packet_length=38) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/sql/sql_parse.cc:1434
#14 0x00000000007decd5 in do_command (thd=0x7ff8597de000) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/sql/sql_parse.cc:1051
#15 0x00000000007a6c4d in do_handle_one_connection (thd_arg=0x7ff8597de000) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/sql/sql_connect.cc:1532
#16 0x00000000007a6755 in handle_one_connection (arg=0x7ff8597de000) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/sql/sql_connect.cc:1443
#17 0x0000000000dc9e6c in pfs_spawn_thread (arg=0x7ff86b7466a0) at /mnt/workspace/percona-server-5.6-binaries-debug-yassl/label_exp/centos6-64/percona-server-5.6.22-71.0/storage/perfschema/pfs.cc:1860
#18 0x00007ff872085df3 in start_thread () from /lib64/libpthread.so.0
#19 0x00007ff870d4f1ad in clone () from /lib64/libc.so.6

Tags: qa
Revision history for this message
Ramesh Sivaraman (rameshvs02) wrote :
Revision history for this message
Ramesh Sivaraman (rameshvs02) wrote :

You may have to wait up to 5 minutes after running ./{epoch}_run_pquery to get the crash.

Revision history for this message
Yura Sorokin (yura-sorokin) wrote :

Unable to reproduce on 5.6 (commit ee0f346).

However a race condition is theoretically possible.

Consider the following function "thd_timer_reset()" that calls "reap_timer()"

thd_timer_t * thd_timer_reset(thd_timer_t *ttp)
{
  bool unreachable;
  int status, state;
  DBUG_ENTER("thd_timer_reset");

  status= my_os_timer_reset(&ttp->timer, &state);

  pthread_mutex_lock(&ttp->mutex);
  unreachable= reap_timer(ttp, status ? true : !state);
  ttp->destroy= unreachable ? false : true;
  pthread_mutex_unlock(&ttp->mutex);

  DBUG_RETURN(unreachable ? ttp : NULL);
}

static bool reap_timer(thd_timer_t *ttp, bool pending)
{
  bool unreachable;

  DBUG_ASSERT(!ttp->destroy);

  DBUG_ASSERT(pending || ttp->thd);

  unreachable= pending ? ttp->thd == NULL : true;

  ttp->thd= NULL;

  return unreachable;
}

Here is one of the possible cases when "pending == false" and "ttp->thd == NULL" which triggers the assertion.
1. Two threads call "thd_timer_reset()" on the same object.
2. Thread 1 executes "status= my_os_timer_reset(&ttp->timer, &state);" and returns "status ==0" and "state == 1".
3. Thread 2 executes "status= my_os_timer_reset(&ttp->timer, &state);" and returns "status ==0" and "state == 1".
4. Thread 1 successfully acquires "ttp->mutex".
5. Thread 2 tries to acquire "ttp->mutex" and hangs.
6. Thread 1 executes "reap_timer()" with "pending == false" which returns "true". As a side effect "ttp->thd" is set to NULL.
7. Thread 1 sets "ttp->destroy" to "false"and releases "ttp->mutex".
8. Thread 2 finally acquires "ttp->mutex".
9. Thread 2 executes "reap_timer()" with "pending == false" and "ttp->thd == NULL".
10. The "DBUG_ASSERT(pending || ttp->thd);" assertion is triggered in Thread 2.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Did you use the "bug bundle" of comment #1 in attempts to reproduce?

Can you please update the 5.1 and 5.5 bug status?

Revision history for this message
Yura Sorokin (yura-sorokin) wrote :

Yes, I was using the "bug bundle" trying to reproduce.
I only observed "pquery" utility itself crashing several times. After that "mysqld" process was still alive.

[yura@localhost bug1427526-sql_timer]$ ./1424952842_run_pquery
Executing testcase ./1424952842.sql against mysqld with socket /dev/shm/1424952842/socket.sock using pquery...
Infile is ./1424952842.sql
Database is test
Starting with 30 threads
Query limit per thread is 400000
User is root
Socket is /dev/shm/1424952842/socket.sock
MySQL Connection Info: Localhost via UNIX socket
MySQL Client Info: 5.6.27-75.0
MySQL Server Info: 5.6.27-75.0-debug
* SUMMARY: 104465/400000 queries failed (73.88% were successful)
./1424952842_run_pquery: line 5: 4106 Segmentation fault ./1424952842_pquery --infile=./1424952842.sql --database=test --threads=30 --queries=400000 --user=root --socket=/dev/shm/1424952842/socket.sock

As for 5.1 and 5.5, because "sql_timer.cc" was added only in 5.6, they are not a target of investigation.

no longer affects: percona-server/5.5
no longer affects: percona-server/5.1
Revision history for this message
Roel Van de Paar (roel11) wrote :

For completeness, in regards pquery crashing from comment #5, turns out this was a problem in PS 5.6/5.7 which is already fixed in 5.7 and will shortly be fixed in 5.6.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-878

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.