race condition in group commit + pfs + threadpool

Bug #1318609 reported by Sergei Glushchenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Confirmed
Undecided
Sergei Glushchenko
5.1
New
Undecided
Unassigned
5.5
New
Undecided
Unassigned
5.6
Confirmed
Undecided
Sergei Glushchenko

Bug Description

Performance Schema has instrumentation for events waits
accounting. When it enabled instrumented code populates internal
data structures with collected data. Here is the layout of these
structures in memory:

There is an array of threads:

PFS_thread thread_array[];

each PFS_thread contains an array of events waits and pointer to the
top of events waits stack:

PFS_events_waits m_events_waits_stack[];
PFS_events_waits *m_events_waits_current;

The crash happens when m_events_waits_current went below array
bounds. Then code which aggregates event waits statistics overwrites
other members of PFS_thread.

m_events_waits_current goes below the bottom due to race condition.

Group commit code does kind of hack in process_commit_stage_queue
and process_after_commit_stage_queue. It attaches to other threads
changing PSI_thread (thread-local variable) of current thread with
PSI_thread of the thread which it attached to. Thus, two threads
sharing the same PSI_thread are running, which gives room for race
condition since PFS code mostly does unprotected reads/updates.

Following the stack trace of two concurrent threads.
Server code is slightly modified to catch the race:

(lldb) t 2
* thread #2: tid = 0xa5e930, 0x0000000100472508 mysqld`end_mutex_wait_v1(locker=<unavailable>, rc=<unavailable>) + 399 at pfs.cc:3576, stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    frame #0: 0x0000000100472508 mysqld`end_mutex_wait_v1(locker=<unavailable>, rc=<unavailable>) + 399 at pfs.cc:3576
   3573 if (wait != thread->m_events_waits_current)
   3574 {
   3575 fprintf(stderr, "[%d, %p, %d]\n", (int)pthread_mach_thread_np(pthread_self()), thread, __LINE__);
-> 3576 *((char*)0)='x';
   3577 }
   3578 }
   3579 }
(lldb) bt
* thread #2: tid = 0xa5e930, 0x0000000100472508 mysqld`end_mutex_wait_v1(locker=<unavailable>, rc=<unavailable>) + 399 at pfs.cc:3576, stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
  * frame #0: 0x0000000100472508 mysqld`end_mutex_wait_v1(locker=<unavailable>, rc=<unavailable>) + 399 at pfs.cc:3576
    frame #1: 0x0000000100425b45 mysqld`trx_commit_low(trx_t*, mtr_t*) [inlined] pfs_mutex_enter_func(priority=<unavailable>) + 117 at sync0sync.ic:379
    frame #2: 0x0000000100425ad0 mysqld`trx_commit_low(trx_t*, mtr_t*) [inlined] trx_write_serialisation_history(trx=0x000000013e1be478, mtr=<unavailable>) + 173 at trx0trx.cc:1169
    frame #3: 0x0000000100425a23 mysqld`trx_commit_low(trx=0x000000013e1be478, mtr=0x0000000141a5d910) + 104 at trx0trx.cc:1545
    frame #4: 0x00000001004266f6 mysqld`trx_commit(trx=<unavailable>) + 120 at trx0trx.cc:1592
    frame #5: 0x0000000100426eb9 mysqld`trx_commit_for_mysql(trx=0x000000013e1be478) + 86 at trx0trx.cc:1820
    frame #6: 0x0000000100365437 mysqld`innobase_commit(handlerton*, THD*, bool) [inlined] innobase_commit_low(trx_t*) + 684 at ha_innodb.cc:3788
    frame #7: 0x0000000100365428 mysqld`innobase_commit(hton=<unavailable>, thd=<unavailable>, commit_trx=<unavailable>) + 669 at ha_innodb.cc:3950
    frame #8: 0x0000000100043d07 mysqld`ha_commit_low(thd=0x000000013e1baa00, all=false, run_after_commit=false) + 76 at handler.cc:1546
    frame #9: 0x0000000100270485 mysqld`MYSQL_BIN_LOG::process_commit_stage_queue(this=0x00000001009e0ec0, thd=<unavailable>, first=<unavailable>) + 105 at binlog.cc:6540
    frame #10: 0x0000000100266efe mysqld`MYSQL_BIN_LOG::ordered_commit(this=0x00000001009e0ec0, thd=0x000000013e12e400, all=<unavailable>, skip_commit=<unavailable>) + 520 at binlog.cc:6926
    frame #11: 0x000000010026607f mysqld`MYSQL_BIN_LOG::commit(this=0x00000001009e0ec0, thd=0x000000013e12e400, all=<unavailable>) + 541 at binlog.cc:6366
    frame #12: 0x0000000100043bef mysqld`ha_commit_trans(thd=0x000000013e12e400, all=false, ignore_global_read_lock=<unavailable>) + 558 at handler.cc:1489
    frame #13: 0x00000001002225b7 mysqld`trans_commit_stmt(thd=0x000000013e12e400) + 44 at transaction.cc:434
    frame #14: 0x0000000100189720 mysqld`mysql_execute_command(thd=0x000000013e12e400) + 2276 at sql_parse.cc:5376
    frame #15: 0x00000001001880f7 mysqld`mysql_parse(thd=0x000000013e12e400, rawbuf=<unavailable>, length=<unavailable>, parser_state=<unavailable>) + 802 at sql_parse.cc:6647
    frame #16: 0x0000000100186789 mysqld`dispatch_command(command=<unavailable>, thd=0x000000013e12e400, packet=<unavailable>, packet_length=<unavailable>) + 3670 at sql_parse.cc:1438
    frame #17: 0x0000000100187b59 mysqld`do_command(thd=0x000000013e12e400) + 292 at sql_parse.cc:1051
    frame #18: 0x0000000100229b37 mysqld`threadpool_process_request(thd=0x000000013e12e400) + 117 at threadpool_common.cc:316
    frame #19: 0x000000010022b88e mysqld`worker_main(void*) [inlined] handle_event(connection=0x000000013fd0e040) + 28 at threadpool_unix.cc:1472
    frame #20: 0x000000010022b872 mysqld`worker_main(param=0x00000001009ba200) + 299 at threadpool_unix.cc:1525
    frame #21: 0x00000001004705af mysqld`pfs_spawn_thread(arg=0x000000013c87b510) + 311 at pfs.cc:1858
    frame #22: 0x00007fff8bf16899 libsystem_pthread.dylib`_pthread_body + 138
    frame #23: 0x00007fff8bf1672a libsystem_pthread.dylib`_pthread_start + 137

(lldb) t 37
* thread #37: tid = 0xa5e932, 0x0000000100472d6c mysqld`end_cond_wait_v1(locker=<unavailable>, rc=<unavailable>) + 368 at pfs.cc:3806, stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    frame #0: 0x0000000100472d6c mysqld`end_cond_wait_v1(locker=<unavailable>, rc=<unavailable>) + 368 at pfs.cc:3806
   3803 if (wait != thread->m_events_waits_current)
   3804 {
   3805 fprintf(stderr, "[%d, %p, %d]\n", (int)pthread_mach_thread_np(pthread_self()), thread, __LINE__);
-> 3806 *((char*)0)='x';
   3807 }
   3808 }
   3809 }
(lldb) bt
* thread #37: tid = 0xa5e932, 0x0000000100472d6c mysqld`end_cond_wait_v1(locker=<unavailable>, rc=<unavailable>) + 368 at pfs.cc:3806, stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
  * frame #0: 0x0000000100472d6c mysqld`end_cond_wait_v1(locker=<unavailable>, rc=<unavailable>) + 368 at pfs.cc:3806
    frame #1: 0x000000010026695c mysqld`Stage_manager::enroll_for(Stage_manager::StageID, THD*, st_mysql_mutex*) [inlined] inline_mysql_cond_wait(st_mysql_cond*, st_mysql_mutex*, char const*, unsigned int) + 324 at mysql_thread.h:1155
    frame #2: 0x00000001002668ea mysqld`Stage_manager::enroll_for(this=0x00000001009e1ec0, stage=<unavailable>, thd=0x000000013e1baa00, stage_mutex=<unavailable>) + 210 at binlog.cc:1452
    frame #3: 0x00000001002705d0 mysqld`MYSQL_BIN_LOG::change_stage(this=<unavailable>, thd=<unavailable>, stage=<unavailable>, queue=<unavailable>, leave_mutex=<unavailable>, enter_mutex=0x00000001009e0ec8) + 38 at binlog.cc:6640
    frame #4: 0x0000000100266d6f mysqld`MYSQL_BIN_LOG::ordered_commit(this=0x00000001009e0ec0, thd=0x000000013e1baa00, all=<unavailable>, skip_commit=<unavailable>) + 121 at binlog.cc:6854
    frame #5: 0x000000010026607f mysqld`MYSQL_BIN_LOG::commit(this=0x00000001009e0ec0, thd=0x000000013e1baa00, all=<unavailable>) + 541 at binlog.cc:6366
    frame #6: 0x0000000100043bef mysqld`ha_commit_trans(thd=0x000000013e1baa00, all=false, ignore_global_read_lock=<unavailable>) + 558 at handler.cc:1489
    frame #7: 0x00000001002225b7 mysqld`trans_commit_stmt(thd=0x000000013e1baa00) + 44 at transaction.cc:434
    frame #8: 0x0000000100189720 mysqld`mysql_execute_command(thd=0x000000013e1baa00) + 2276 at sql_parse.cc:5376
    frame #9: 0x00000001001880f7 mysqld`mysql_parse(thd=0x000000013e1baa00, rawbuf=<unavailable>, length=<unavailable>, parser_state=<unavailable>) + 802 at sql_parse.cc:6647
    frame #10: 0x0000000100186789 mysqld`dispatch_command(command=<unavailable>, thd=0x000000013e1baa00, packet=<unavailable>, packet_length=<unavailable>) + 3670 at sql_parse.cc:1438
    frame #11: 0x0000000100187b59 mysqld`do_command(thd=0x000000013e1baa00) + 292 at sql_parse.cc:1051
    frame #12: 0x0000000100229b37 mysqld`threadpool_process_request(thd=0x000000013e1baa00) + 117 at threadpool_common.cc:316
    frame #13: 0x000000010022b88e mysqld`worker_main(void*) [inlined] handle_event(connection=0x000000013c889950) + 28 at threadpool_unix.cc:1472
    frame #14: 0x000000010022b872 mysqld`worker_main(param=0x00000001009ba800) + 299 at threadpool_unix.cc:1525
    frame #15: 0x00000001004705af mysqld`pfs_spawn_thread(arg=0x000000013f79ec20) + 311 at pfs.cc:1858
    frame #16: 0x00007fff8bf16899 libsystem_pthread.dylib`_pthread_body + 138
    frame #17: 0x00007fff8bf1672a libsystem_pthread.dylib`_pthread_start + 137

Notably, I cannot reproduce the race condition with threadpool turned off.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.