Comment 0 for bug 1899800

Revision history for this message
Michael Bacarella (mbacarella) wrote :

This bug was submitted by Qin Li to glibc bugzilla earlier this year, with a one-line patch, though it hasn't been merged into glibc yet:

https://sourceware.org/bugzilla/show_bug.cgi?id=25847

This bug in pthread conditions will deadlock the OCaml runtime, as well as Python's runtime, and .NET.

The bug was introduced in glibc 2.27, so affects Ubuntu 18.04 onwards. I confirm my OCaml app, as well as the repro from the bugzilla, deadlocks on Ubuntu 20.04 and Ubuntu 18.04. To further strengthen the case that this is because of a bug in glibc, my app and the repro do not deadlock on Ubuntu 16.04.

To rule out kernel issues, I further confirm that no deadlock happens when I copy Ubuntu 16.04's libc to 18.04 and redirect the dynamic linker so my app loads the earlier libc.

I confirm that the one-line patch (available at the above bugzilla) applies cleanly on top of:

* glibc-2.31-0ubuntu9.1 (Ubuntu 20.04 latest)
* glibc-2.28-10 (Debian Buster/10 latest)
* glibc-2.27-3ubuntu1.2 (Ubuntu 18.04 latest)

I confirm that the one-line patch to glibc cures the deadlock issue in my OCaml apps.

On Ubuntu 20.04, I have not been able to get the repro to deadlock in 5 days. My OCaml apps have not deadlocked in 5 days.

On Debian Buster/10, the repro has not deadlocked in about 5 days. This is my desktop box, and I can otherwise use normal applications as usual like the GNOME environment, etc.

On Ubuntu 18.04, the repro takes about 24-48 hours before it triggers a deadlock. Prior to patching glibc, it would take only a few hours. I have not seen my OCaml apps deadlock since applying this patch, however.

On Ubuntu 16.04 I have not been able to get the repro to deadlock ever. My OCaml apps never deadlocked on this platform. This is expected, since this platform runs glibc 2.23, where the bug has not been introduced yet (the bugzilla report claims introduced in 2.27).

As for why 18.04 still deadlocks, I suspect another, unrelated pthread bug was introduced in glibc 2.27 and fixed by 2.28. When applied to glibc 2.27, the one-line patch appears to significantly reduce the deadlocking by an order of magnitude.

Please kindly consider merging the patch into Ubuntu glibc.

More background about this bug, for the sake of future internet searchers:
* https://discuss.ocaml.org/t/is-there-a-known-recent-linux-locking-bug-that-affects-the-ocaml-runtime