race condition in group commit + pfs + threadpool
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Server moved to https://jira.percona.com/projects/PS |
Confirmed
|
Undecided
|
Sergei Glushchenko | ||
5.1 |
New
|
Undecided
|
Unassigned | ||
5.5 |
New
|
Undecided
|
Unassigned | ||
5.6 |
Confirmed
|
Undecided
|
Sergei Glushchenko |
Bug Description
Performance Schema has instrumentation for events waits
accounting. When it enabled instrumented code populates internal
data structures with collected data. Here is the layout of these
structures in memory:
There is an array of threads:
PFS_thread thread_array[];
each PFS_thread contains an array of events waits and pointer to the
top of events waits stack:
PFS_events_waits m_events_
PFS_events_waits *m_events_
The crash happens when m_events_
bounds. Then code which aggregates event waits statistics overwrites
other members of PFS_thread.
m_events_
Group commit code does kind of hack in process_
and process_
changing PSI_thread (thread-local variable) of current thread with
PSI_thread of the thread which it attached to. Thus, two threads
sharing the same PSI_thread are running, which gives room for race
condition since PFS code mostly does unprotected reads/updates.
Following the stack trace of two concurrent threads.
Server code is slightly modified to catch the race:
(lldb) t 2
* thread #2: tid = 0xa5e930, 0x0000000100472508 mysqld`
frame #0: 0x0000000100472508 mysqld`
3573 if (wait != thread-
3574 {
3575 fprintf(stderr, "[%d, %p, %d]\n", (int)pthread_
-> 3576 *((char*)0)='x';
3577 }
3578 }
3579 }
(lldb) bt
* thread #2: tid = 0xa5e930, 0x0000000100472508 mysqld`
* frame #0: 0x0000000100472508 mysqld`
frame #1: 0x0000000100425b45 mysqld`
frame #2: 0x0000000100425ad0 mysqld`
frame #3: 0x0000000100425a23 mysqld`
frame #4: 0x00000001004266f6 mysqld`
frame #5: 0x0000000100426eb9 mysqld`
frame #6: 0x0000000100365437 mysqld`
frame #7: 0x0000000100365428 mysqld`
frame #8: 0x0000000100043d07 mysqld`
frame #9: 0x0000000100270485 mysqld`
frame #10: 0x0000000100266efe mysqld`
frame #11: 0x000000010026607f mysqld`
frame #12: 0x0000000100043bef mysqld`
frame #13: 0x00000001002225b7 mysqld`
frame #14: 0x0000000100189720 mysqld`
frame #15: 0x00000001001880f7 mysqld`
frame #16: 0x0000000100186789 mysqld`
frame #17: 0x0000000100187b59 mysqld`
frame #18: 0x0000000100229b37 mysqld`
frame #19: 0x000000010022b88e mysqld`
frame #20: 0x000000010022b872 mysqld`
frame #21: 0x00000001004705af mysqld`
frame #22: 0x00007fff8bf16899 libsystem_
frame #23: 0x00007fff8bf1672a libsystem_
(lldb) t 37
* thread #37: tid = 0xa5e932, 0x0000000100472d6c mysqld`
frame #0: 0x0000000100472d6c mysqld`
3803 if (wait != thread-
3804 {
3805 fprintf(stderr, "[%d, %p, %d]\n", (int)pthread_
-> 3806 *((char*)0)='x';
3807 }
3808 }
3809 }
(lldb) bt
* thread #37: tid = 0xa5e932, 0x0000000100472d6c mysqld`
* frame #0: 0x0000000100472d6c mysqld`
frame #1: 0x000000010026695c mysqld`
frame #2: 0x00000001002668ea mysqld`
frame #3: 0x00000001002705d0 mysqld`
frame #4: 0x0000000100266d6f mysqld`
frame #5: 0x000000010026607f mysqld`
frame #6: 0x0000000100043bef mysqld`
frame #7: 0x00000001002225b7 mysqld`
frame #8: 0x0000000100189720 mysqld`
frame #9: 0x00000001001880f7 mysqld`
frame #10: 0x0000000100186789 mysqld`
frame #11: 0x0000000100187b59 mysqld`
frame #12: 0x0000000100229b37 mysqld`
frame #13: 0x000000010022b88e mysqld`
frame #14: 0x000000010022b872 mysqld`
frame #15: 0x00000001004705af mysqld`
frame #16: 0x00007fff8bf16899 libsystem_
frame #17: 0x00007fff8bf1672a libsystem_
Notably, I cannot reproduce the race condition with threadpool turned off.