Comment 5 for bug 1191375

Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

    The stack trace in description points to access violation at the
    pfs->m_parent_thread_internal_id= parent->m_thread_internal_id; in
    pfs_spawn_thread.

    I've tracked this bug down and can explain it pretty well. Since
    PFS designed to handle only limited amount of threads all the
    threads above this number will not have THR_PFS assigned
    (i.e. create_thread in pfs_spawn_thread can return NULL and it
    will be assigned to thread-specific THR_PFS). This however did not
    taken into account neither in spawn_thread_v1 (where NULL can be
    assigned to psi_arg->m_parent_thread) nor pfs_spawn_thread itself
    (where NULL-valued psi_arg->m_parent_thread can be accessed).
    This probably not the issue for upstream since new threads
    commonly spawned by "old" threads which likely have their THR_PFS
    keys assigned. But with threadpool enabled every worker thread can
    spawn new thread when it about to wait. I saw this kind of
    failures, but it is hard to construct MTR test case for them.

    My previous assumption was that parent spawned a child thread,
    finished it's wait and stopped, while child was too slow to finish
    it's initialization in the meantime. This is also possible but
    I've never seen this in practice due to thread_pool_idle_timeout
    and threadpool internal logic.

    This bug was fixed upstream (rev. 0.18870.1) with following
    message:

    Bug#16939689 VALGRIND ERROR IN PFS_SPAWN_THREAD

    Before this fix, valgrind errors could be reported under stress tests.

    In particular, the following line in pfs_spawn_thread()
    memcpy(pfs->m_username, parent->m_username, sizeof(pfs->m_username));

    could cause memcpy to be called for the same source and destination,
    which should not happen.

    The root cause, by analysis, is that the code is reading attributes
    from the parent thread, while the parent thread instrumentation is already
    destroyed (and was reused for the child, triggering valgrind complaints).

    The fix is to capture all the parent attributes needed:
    - the thread internal id
    - the thread user name
    - the thread host name
    in the parent thread, before spawning a child,
    and use that to initialize the child instrumentation,
    instead of pasing a pointer that might become invalid.