Race condition using ATOMIC_FASTBINS in _int_free causes crash or heap corruption
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | eglibc |
Fix Released
|
Medium
|
||
| | eglibc (Ubuntu) |
Undecided
|
Adam Conrad | ||
| | Precise |
Undecided
|
Adam Conrad | ||
Bug Description
[Impact]
* This bug is likely to cause a crash with a SEGV in multithreading applications doing many memory deallocations with ATOMIC_FASTBINS feature enabled.
[Test Case]
* Since this is a race condition issue there is no simple path of reproducing it, however one could try to follow the instructions in the upstream bug (https:/
https:/
[Regression Potential]
* This issue has been merged upstream with no further issues reported.
[Other Info]
* Original bug description:
We have an application which makes heavy allocation and de-allocation demands from multiple threads. We run this application continuously on many servers, and once every several CPU months or years, we were getting a crash in _int_free that did not look like vanilla heap corruption. I believe I have narrowed it down to a race condition in _int_free due to the ATOMIC_FASTBINS feature. Basically, in the lockless FASTBIN _int_free path, a chunk is pulled into a local variable with the intent to add it to the fastbins list. However, the heap consolidation/trim code can race with this, and can coalesce the entire block and/or give it back to the OS before _int_free has a chance to try and store it into the fastbins list.
The problem is very challenging to reproduce in situ, but using gdb I have a recipe which demonstrates the crash 100% of the time on my 12.04 x64 system running eglibc 2.15. It relies on malloc_trim, although in our in situ data, the consolidation is triggered as a result of a normal free. malloc_trim is just easier to control.
While I am not a glibc developer, I could not see any easy ways to fix the situation shy of disabling ATOMIC_FASTBINS.
I am attaching the reproduction source. Other pertinent information follows:
> jpieper@
> Description: Ubuntu 12.04 LTS
> Release: 12.04
> jpieper@
> libc6:
> Installed: 2.15-0ubuntu10
> Candidate: 2.15-0ubuntu10
> Version table:
> *** 2.15-0ubuntu10 0
> 500 http://
> 100 /var/lib/
What I expect: I expect the attached application, when run using the gdb script in the comments, to complete with no failures.
What happened: A SIGSEGV after the final continue.
| Josh Pieper (jpieper) wrote : | #1 |
|
|
#4 |
Created attachment 6833
Reproduction recipe
I reported the following bug in ubuntu's libc about 6 months ago where the ATOMIC_FASTBINS feature can crash or cause heap corruption.
https:/
I'm pasting the content of that problem here and will attach the reproduction recipe. Unfortunately, the reproduction recipe is only reliable with the exact version of eglibc in ubuntu, given enough time to build an arbitrary version, I should be able to create one that works with any version.
----
We have an application which makes heavy allocation and de-allocation demands from multiple threads. We run this application continuously on many servers, and once every several CPU months or years, we were getting a crash in _int_free that did not look like vanilla heap corruption. I believe I have narrowed it down to a race condition in _int_free due to the ATOMIC_FASTBINS feature. Basically, in the lockless FASTBIN _int_free path, a chunk is pulled into a local variable with the intent to add it to the fastbins list. However, the heap consolidation/trim code can race with this, and can coalesce the entire block and/or give it back to the OS before _int_free has a chance to try and store it into the fastbins list.
The problem is very challenging to reproduce in situ, but using gdb I have a recipe which demonstrates the crash 100% of the time on my 12.04 x64 system running eglibc 2.15. It relies on malloc_trim, although in our in situ data, the consolidation is triggered as a result of a normal free. malloc_trim is just easier to control.
While I am not a glibc developer, I could not see any easy ways to fix the situation shy of disabling ATOMIC_FASTBINS.
I am attaching the reproduction source. Other pertinent information follows:
> jpieper@
> Description: Ubuntu 12.04 LTS
> Release: 12.04
> jpieper@
> libc6:
> Installed: 2.15-0ubuntu10
> Candidate: 2.15-0ubuntu10
> Version table:
> *** 2.15-0ubuntu10 0
> 500 http://
> 100 /var/lib/
What I expect: I expect the attached application, when run using the gdb script in the comments, to complete with no failures.
What happened: A SIGSEGV after the final continue.
| Josh Pieper (jpieper) wrote : | #3 |
Reported upstream at: http://
| Changed in eglibc: | |
| importance: | Unknown → Medium |
| status: | Unknown → Confirmed |
|
|
#5 |
I can't reproduce this on Fedora 17 x86_64 (glibc-2.15 + patches). Can you reproduce this with vanilla glibc? Also, can you reproduce this with newer versions of eglibc. glibc-2.15 had ATOMIC_FASTBINS removed, i.e. made into an implicit option, so there's no way to disable it.
| Changed in eglibc: | |
| status: | Confirmed → Incomplete |
I was able to reproduce this bug on Fedora 19, x86_64.
Reproduction required a slight modification of the recipe: the breakpoint in malloc.c needs to happen at line 3865. This line will continue to change as new versions of the library are released.
Required packages: glibc-debuginfo boost-devel boost-thread gcc-c++
[testuser@localhost ~]$ yum info glibc
Loaded plugins: auto-update-
Installed Packages
Name : glibc
Arch : x86_64
Version : 2.17
Release : 18.fc19
[testuser@localhost ~]$ cat /etc/redhat-release
Fedora release 19 (Schrödinger’s Cat)
[testuser@localhost ~]$ cat gdb_script
break main
r
set scheduler-locking on
break 54
break 59
break 60
c
break malloc.c:3865
c
thread 2
c
c
thread 1
c
[testuser@localhost ~]$
| Nate Gallaher (nate+launchpad) wrote : | #6 |
I can reproduce this on the Ubuntu Trusty Tahr amd64 daily snapshot: (d6856805ca67b4
This confirms the bug's existence up through eglibc 2.17-93ubuntu4. I have attached an updated recipe for this target.
testuser@trusty:~$ apt-cache policy libc6
libc6:
Installed: 2.17-93ubuntu4
Candidate: 2.17-93ubuntu4
testuser@trusty:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu Trusty Tahr (development branch)
Release: 14.04
Codename: trusty
I can reproduce this on the Ubuntu Trusty Tahr amd64 daily snapshot: (d6856805ca67b4
This confirms the bug's existence up through eglibc 2.17-93ubuntu4. I have attached an updated recipe for this target.
testuser@trusty:~$ apt-cache policy libc6
libc6:
Installed: 2.17-93ubuntu4
Candidate: 2.17-93ubuntu4
testuser@trusty:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu Trusty Tahr (development branch)
Release: 14.04
Codename: trusty
Created attachment 7250
Trusty Tahr Reproduction Recipe
|
|
#12 |
(In reply to Nate Gallaher from comment #4)
> Created attachment 7250 [details]
> Trusty Tahr Reproduction Recipe
On glibc master the test does not produce a SIGSEGV and can be continued and exits normally.
It would really help if you could describe in detail what you think is the race condition between malloc_trim and the fastbin implementation?
|
|
#13 |
Carlos, this is faster to debug on paper than trying debug optimized program.
For minimal example what is wrong I could trigger assert for unoptimized version of malloc. In optimized version you need go to assembly to see where gcc scheduled loads.
Idea is simple, while we free one chunk then a chunk on top of fastbin could be in other thread allocated, resized and then returned back into top of fastbin to trigger assertion or seqfault when trim unmaps corresponding page.
A program is following,
#include <stdlib.h>
#include <pthread.h>
void * freea (void *p)
{
free (p); // 1
}
int main ()
{
pthread_t x;
char *u, *v;
u = malloc (16);
pthread_create (&x, NULL, freea, u);
v = malloc (16);
free (v); // 2
malloc_trim (0);
v = malloc (512); // 3
free (v);
malloc_trim (0);
v = malloc (16);
free (v); // 4
}
First step into free 1 until you get to this fragment.
Here run free 2 so v gets into top of fastbin.
unsigned int idx = fastbin_
fb = &fastbin (av, idx);
mchunkptr fd;
mchunkptr old = *fb; // v
unsigned int old_idx = ~0u;
do
{
/* Another simple check: make sure the top of the bin is not the
record we are going to add (i.e., double free). */
if (__builtin_expect (old == p, 0))
{
errstr = "double free or corruption (fasttop)";
goto errout;
}
Now here run step 3 where v is chunk of size 528
if (old != NULL)
old_idx = fastbin_
p->fd = fd = old;
And continue by step 4 which returns v into top of fastbin. which is same state as at 2.
}
while ((old = catomic_
And as 33 != 2 we cause an error.
if (fd != NULL && __builtin_expect (old_idx != idx, 0))
{
errstr = "invalid fastbin entry (free)";
goto errout;
}
[For the benefit of Carlos and other developers]
There are two patches for this bug posted to libc-alpha@, one from Ondrej and one from myself.
The attached Trusty Tahr reproduction testcase has line numbers for GDB breakpoints wrong, and one has to correct them to get the failure.
Created attachment 7331
Trusty reproduction testcase
Fixed trusty testcase.
|
|
#16 |
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".
The branch, master has been updated
via abc26e998f74750
via 362b47fe09ca9a9
from b9bcbbcbe7afa94
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -------
https:/
commit abc26e998f74750
Author: Maxim Kuvyrkov <email address hidden>
Date: Tue Dec 24 09:55:03 2013 +1300
Restore accidentally deleted bug-fix entries in NEWS.
* NEWS: Restore accidentally deleted bug-fix entries.
https:/
commit 362b47fe09ca9a9
Author: Maxim Kuvyrkov <email address hidden>
Date: Tue Dec 24 09:44:50 2013 +1300
Fix race in free() of fastbin chunk: BZ #15073
Perform sanity check only if we have_lock. Due to lockless nature of fastbins
we need to be careful derefencing pointers to fastbin entries (chunksize(old)
in this case) in multithreaded environments.
The fix is to add have_lock to the if-condition checks. The rest of the patch
only makes code more readable.
* malloc/malloc.c (_int_free): Perform sanity check only if we
have_lock.
-------
Summary of changes:
ChangeLog | 11 +++++++++++
NEWS | 23 +++++++
malloc/malloc.c | 20 +++++++
3 files changed, 35 insertions(+), 19 deletions(-)
Fixed by the above commit.
Real credit goes to Josh Pieper, whose carefully prepared testcase made it possible to investigate and fix the bug.
| Changed in eglibc: | |
| status: | Incomplete → Fix Released |
Fixed on trunk only. I've pinged release managers to merge the fix to 2.15 onwards.
| Changed in eglibc: | |
| status: | Fix Released → Confirmed |
| Changed in eglibc (Ubuntu): | |
| assignee: | nobody → Adam Conrad (adconrad) |
|
|
#19 |
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".
The branch, 2.15 has been created
at 9875bb22212391e
- Log -------
https:/
commit 9875bb22212391e
Author: Maxim Kuvyrkov <email address hidden>
Date: Tue Dec 24 09:44:50 2013 +1300
Fix race in free() of fastbin chunk: BZ #15073
Perform sanity check only if we have_lock. Due to lockless nature of fastbins
we need to be careful derefencing pointers to fastbin entries (chunksize(old)
in this case) in multithreaded environments.
The fix is to add have_lock to the if-condition checks. The rest of the patch
only makes code more readable.
* malloc/malloc.c (_int_free): Perform sanity check only if we
have_lock.
Conflicts:
ChangeLog
NEWS
-------
|
|
#20 |
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".
The branch, 2.15 has been deleted
was 9875bb22212391e
- Log -------
9875bb22212391e
-------
|
|
#21 |
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".
The branch, release/2.15/master has been updated
via 9875bb22212391e
from 53fa2b6063a484e
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -------
https:/
commit 9875bb22212391e
Author: Maxim Kuvyrkov <email address hidden>
Date: Tue Dec 24 09:44:50 2013 +1300
Fix race in free() of fastbin chunk: BZ #15073
Perform sanity check only if we have_lock. Due to lockless nature of fastbins
we need to be careful derefencing pointers to fastbin entries (chunksize(old)
in this case) in multithreaded environments.
The fix is to add have_lock to the if-condition checks. The rest of the patch
only makes code more readable.
* malloc/malloc.c (_int_free): Perform sanity check only if we
have_lock.
Conflicts:
ChangeLog
NEWS
-------
Summary of changes:
ChangeLog | 7 +++++++
NEWS | 2 +-
malloc/malloc.c | 20 +++++++
3 files changed, 20 insertions(+), 9 deletions(-)
|
|
#23 |
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".
The branch, release/2.16/master has been updated
via c972bcc9ebdb5c2
from 02eff8c4f82241c
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -------
https:/
commit c972bcc9ebdb5c2
Author: Maxim Kuvyrkov <email address hidden>
Date: Tue Dec 24 09:44:50 2013 +1300
Fix race in free() of fastbin chunk: BZ #15073
Perform sanity check only if we have_lock. Due to lockless nature of fastbins
we need to be careful derefencing pointers to fastbin entries (chunksize(old)
in this case) in multithreaded environments.
The fix is to add have_lock to the if-condition checks. The rest of the patch
only makes code more readable.
* malloc/malloc.c (_int_free): Perform sanity check only if we
have_lock.
Conflicts:
ChangeLog
NEWS
-------
Summary of changes:
ChangeLog | 7 +++++++
NEWS | 2 +-
malloc/malloc.c | 20 +++++++
3 files changed, 20 insertions(+), 9 deletions(-)
|
|
#25 |
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".
The branch, release/2.17/master has been updated
via 3db0119ef56decc
from 15256e58adc62d8
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -------
https:/
commit 3db0119ef56decc
Author: Maxim Kuvyrkov <email address hidden>
Date: Tue Dec 24 09:44:50 2013 +1300
Fix race in free() of fastbin chunk: BZ #15073
Perform sanity check only if we have_lock. Due to lockless nature of fastbins
we need to be careful derefencing pointers to fastbin entries (chunksize(old)
in this case) in multithreaded environments.
The fix is to add have_lock to the if-condition checks. The rest of the patch
only makes code more readable.
* malloc/malloc.c (_int_free): Perform sanity check only if we
have_lock.
Conflicts:
ChangeLog
NEWS
-------
Summary of changes:
ChangeLog | 7 +++++++
NEWS | 2 +-
malloc/malloc.c | 20 +++++++
3 files changed, 20 insertions(+), 9 deletions(-)
|
|
#27 |
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".
The branch, release/2.18/master has been updated
via 8b43a2274a593ce
from ca0dd6386ed2b5c
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -------
https:/
commit 8b43a2274a593ce
Author: Maxim Kuvyrkov <email address hidden>
Date: Tue Dec 24 09:44:50 2013 +1300
Fix race in free() of fastbin chunk: BZ #15073
Perform sanity check only if we have_lock. Due to lockless nature of fastbins
we need to be careful derefencing pointers to fastbin entries (chunksize(old)
in this case) in multithreaded environments.
The fix is to add have_lock to the if-condition checks. The rest of the patch
only makes code more readable.
* malloc/malloc.c (_int_free): Perform sanity check only if we
have_lock.
Conflicts:
ChangeLog
NEWS
-------
Summary of changes:
ChangeLog | 7 +++++++
NEWS | 2 +-
malloc/malloc.c | 20 +++++++
3 files changed, 20 insertions(+), 9 deletions(-)
| Changed in eglibc: | |
| status: | Confirmed → Fix Released |
| Changed in eglibc (Ubuntu): | |
| status: | Confirmed → In Progress |
| Changed in eglibc (Ubuntu Precise): | |
| status: | New → In Progress |
| assignee: | nobody → Adam Conrad (adconrad) |
|
|
#29 |
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".
The annotated tag, glibc-2.19 has been created
at 62acb0ba856abf4
tagging 9a869d822025be8
replaces glibc-2.18
tagged by Allan McRae
on Fri Feb 7 19:12:54 2014 +1000
- Log -------
The GNU C Library
=================
The GNU C Library version 2.19 is now available.
The GNU C Library is used as *the* C library in the GNU systems
and most systems with the Linux kernel.
The GNU C Library is primarily designed to be a portable
and high performance C library. It follows all relevant
standards including ISO C11 and POSIX.1-2008. It is also
internationalized and has one of the most complete
internationaliz
The GNU C Library webpage is at http://
Packages for the 2.19 release may be downloaded from:
http://
http://
The mirror list is at http://
NEWS for version 2.19
=======
* The following bugs are resolved with this release:
156, 387, 431, 762, 832, 926, 2801, 4772, 6786, 6787, 6807, 6810, 6981,
7003, 9721, 9954, 10253, 10278, 11087, 11157, 11214, 12100, 12486, 12751,
12986, 13028, 13982, 13985, 14029, 14032, 14120, 14143, 14155, 14286,
14547, 14699, 14752, 14782, 14876, 14910, 15004, 15048, 15073, 15089,
15128, 15218, 15268, 15277, 15308, 15362, 15374, 15400, 15425, 15427,
15483, 15522, 15531, 15532, 15593, 15601, 15608, 15609, 15610, 15632,
15640, 15670, 15672, 15680, 15681, 15723, 15734, 15735, 15736, 15748,
15749, 15754, 15760, 15763, 15764, 15797, 15799, 15825, 15843, 15844,
15846, 15847, 15849, 15850, 15855, 15856, 15857, 15859, 15867, 15886,
15887, 15890, 15892, 15893, 15895, 15897, 15901, 15905, 15909, 15915,
15917, 15919, 15921, 15923, 15939, 15941, 15948, 15963, 15966, 15968,
15985, 15988, 15997, 16032, 16034, 16036, 16037, 16038, 16041, 16046,
16055, 16071, 16072, 16074, 16077, 16078, 16103, 16112, 16143, 16144,
16146, 16150, 16151, 16153, 16167, 16169, 16172, 16195, 16214, 16245,
16271, 16274, 16283, 16289, 16293, 16314, 16316, 16330, 16337, 16338,
16356, 16365, 16366, 16369, 16372, 16375, 16379, 16384, 16385, 16386,
16387, 16390, 16394, 16398, 16400, 16407, 16408, 16414, 16430, 16431,
16453, 16474, 16506, 16510, 16529
* Slovenian translations for glibc messages have been contributed by the
Translation Project's Slovenian team of translators.
* The public headers no longer use __unused nor __block. This change is to
support compiling programs that are derived from BSD sources and use
__unused internally, and to support compiling with Clang's -fblock
extension which uses __block.
* CVE-2012-4412 The strcoll implementation caches indices and rules for
large collation sequences to optimize multiple passes. This cache
computation may overflow for large collation sequences and may cause a
stack or b...
| Adam Conrad (adconrad) wrote : | #30 |
This was fixed in trusty with the upload of 2.19.
| Changed in eglibc (Ubuntu): | |
| status: | In Progress → Fix Released |
| dtsomp (dtsomp) wrote : | #31 |
When should we expect the fix to be merged in the Precise pipeline? It seems it's the only LTS left out right now.
| description: | updated |
| lezbak (lezgin-bakircioglu) wrote : | #33 |
After testing on our system for 12.04 LTS I can now confirm it works for me
| Dariusz Gadomski (dgadomski) wrote : | #34 |
Updated debdiff against eglibc_
| Chris J Arges (arges) wrote : | #35 |
Can you resubmit/recheck this patch against 10.10 and bump version to 10.11?
| Dariusz Gadomski (dgadomski) wrote : | #37 |
Chris, please find the updated debdiff.
Hello Josh, or anyone else affected,
Accepted eglibc into precise-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-
Further information regarding the verification process can be found at https:/
| Changed in eglibc (Ubuntu Precise): | |
| status: | In Progress → Fix Committed |
| tags: | added: verification-needed |
| Sebastien Bacher (seb128) wrote : | #39 |
(seems like the changes got sponsored, unsubscribing sponsors)
| Adam Conrad (adconrad) wrote : | #40 |
A Canonical customer has been running this exact patch (character-
| tags: |
added: verification-done removed: verification-needed |
| Launchpad Janitor (janitor) wrote : | #41 |
This bug was fixed in the package eglibc - 2.15-0ubuntu10.12
---------------
eglibc (2.15-0ubuntu10.12) precise; urgency=medium
* cvs-vfprintf-
longer parsing %s format arguments as multibyte strings (LP: #1109327)
* cvs-__SSE_
feraiseexcept to fix backported -m32 builds of GCC 4.8 (LP: #1165387)
* cvs-canonical-
to do a canonical lookup for a host using AI_CANONNAME (LP: #1057526)
* cvs-atomic-
-- Adam Conrad <email address hidden> Wed, 25 Mar 2015 13:28:41 -0600
| Changed in eglibc (Ubuntu Precise): | |
| status: | Fix Committed → Fix Released |
| Adam Conrad (adconrad) wrote : Update Released | #42 |
The verification of the Stable Release Update for eglibc has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.


Status changed to 'Confirmed' because the bug affects multiple users.