Ubuntu
glibc package

deadlock in pthread_cond_signal under high contention

Bug #1888857 reported by Anton Nikolaevsky on 2020-07-24

This bug report is a duplicate of: Bug #1851263: Ubuntu 18.04.3 LTS bump Glibc 2.27 to the latest stable. Edit Remove

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	glibc (Ubuntu)	New	Undecided	Unassigned

Bug Description

Hello!

I'm working on a large C++-based cross-platform project. I noticed that on arm64-based systems some of my processes sporadically became paralyzed by the deadlock hitting all the threads posting to the single boost::asio::io_service. I investigated the deadlock condition further and reduced the problem to the simple test application available at https://github.com/bezkrovatki/deadlock_in_pthread_cond_signal
There you may find there the test source code, the detailed description, the deadlock call stacks for all threads and theirs compact view as a call graph.
In a short, in the test I have threads of two types:
(1) producers - Np threads calling pthread_cond_signal after unlocking a mutex at a rate of Rp calls per second;
(2) consumers - Nc threads calling pthread_cond_wait at a rate of Rc calls per second.
Np, Rp and Rc can be specified with command line parameters, Nc is equal to the number of CPU cores of the particular system running the test. Once started on arm64-based multi-core device the test eventually gets all its threads blocked if the Np, Rp and Rc are enough to keep contention high around pthread_cond_singal calls.

The deadlock can be workarounded by
* reducing probability of concurring pthread_cond_singal calls by tuning Np, Rp and Rc;
* moving pthread_cond_singal call under the lock

Moreover, the deadlock can be broken by ptrace: attaching with debugger, generating dump with Google Breakpad and etc. makes the process revive. One time I was able to wake up the process from the deadlock with SIGSTOP/SIGCONT, however, the healing effect was very limited and the process returned into the deadlock state in a few seconds.

I would like to note that a problem with symptoms that look similar was reported and fixed in kernel several years ago (see https://groups.google.com/forum/#!topic/mechanical-sympathy/QbmpZxp6C64 and https://github.com/torvalds/linux/commit/76835b0ebf8a7fe85beb03c75121419a7dec52f0).

However, I believe this time the problem is on the NPTL implementation side because:

* 100% of the observed deadlocks both in our product and the tests appear to have the same structure: single producer blocked in __condvar_quiesce_and_switch_g1, all other producers blocked in __condvar_acquire_lock, all consumers blocked in __pthread_cond_wait_common;

* mutex misbehavior was never observed either in test or in my project;

* wakeups by ptrace/signal simply mean waiting on a futex got interrupted and on the next iteration (if any) at least one of these call paths made progress after observing changed global state, which can be a side effect of the race in the userland as well as in the kernel;

* while the mutex object is more contended than pthread_cond_signal related internal data of the condvar if I put the pthread_cond_signal call under the lock, I cannot reproduce the problem.

I looked at the nptl source code (https://elixir.bootlin.com/glibc/glibc-2.27/source/nptl/pthread_cond_common.c#L280) a bit. I'm not burdened with a deep knowledge of the implemented algorithm and its dark corners. For me the observed deadlock looks quite probable from the source code.

1) All producers (signalling threads) except one are blocked in __condvar_acquire_lock@pthread_cond_common.c:280, they are waiting for the single signalling thread which was lucky to succeed in acquiring the internal data lock.

2) According to the comments lavishly sown around the code, that "lucky" signalling thread waits for the some of consumers (waiting threads) to leave G1 group to be able to close the group and make the group switch in __condvar_quiesce_and_switch_g1@pthread_cond_common.c:412

3) And all consumers (waiting threads) wait, of course, they wait for the producers to send a signal, see __pthread_cond_wait_common@pthread_cond_wait.c:502

4) And if you watch the code around __pthread_cond_wait_common@pthread_cond_wait.c:502 carefully you can see that when the waiting for signals on the futex gets interrupted, the code wakes our "lucky" thread blocked in __condvar_quiesce_and_switch_g1@pthread_cond_common.c:412 at first (by calling __condvar_dec_grefs@pthread_cond_wait.c:149 ) and only then re-evaluates the condition and returns to the waiting on the futex if necessary.
This fact can explain how ptrace/signal allows to break the deadlock.

--
I posted the bug report here because the glibc's wiki strongly recommends to start from the distribution bug tracker. All arm64-based devices I tested were running Ubuntu 18.04.

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: libc6 2.27-3ubuntu1
Uname: Linux 4.9.187-52 aarch64
ApportVersion: 2.20.9-0ubuntu7.15
Architecture: arm64
Date: Fri Jul 24 14:05:57 2020
Dependencies:
gcc-8-base 8.3.0-6ubuntu1~18.04.1
libc6 2.27-3ubuntu1
libgcc1 1:8.3.0-6ubuntu1~18.04.1
ProcEnviron:
TERM=rxvt-unicode-256color
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=C.UTF-8
SHELL=/bin/bash
SourcePackage: glibc
UpgradeStatus: No upgrade log present (probably fresh install)