race condition on rmm for module ldap (ldap cache)

Bug #1752683 reported by Rafael David Tinoco on 2018-03-01
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Apache2 Web Server
Fix Released
Unknown
apache2 (Ubuntu)
Status tracked in Bionic
Trusty
Medium
Rafael David Tinoco
Xenial
Medium
Rafael David Tinoco
Artful
Medium
Rafael David Tinoco
Bionic
Medium
Rafael David Tinoco

Bug Description

[Impact]

 * Apache users using ldap module might face this if using multiple threads and shared memory activated for apr memory allocator (default in Ubuntu).

[Test Case]

 * Configure apache to use ldap module, for authentication e.g., and wait for the race condition to happen.
 * Analysis made out of a dump from a production environment.
 * Bug has been reported multiple times upstream in the past 10 years.

[Regression Potential]

 * ldap module has broken locking mechanism when using apr mem mgmt.
 * ldap would continue to have broken locking mechanism.
 * race conditions could still exist.
 * could could brake ldap module.
 * patch is upstreamed in next version to be released.

[Other Info]

ORIGINAL CASE DESCRIPTION:

Problem summary:

apr_rmm_init acts as a relocatable memory management initialization

it is used in: mod_auth_digest and util_ldap_cache

From the dump was brought to my knowledge, in the following sequence:

- util_ldap_compare_node_copy()
- util_ald_strdup()
- apr_rmm_calloc()
- find_block_of_size()

Had a "cache->rmm_addr" with no lock at "find_block_of_size()"

cache->rmm_addr->lock { type = apr_anylock_none }

And an invalid "next" offset (out of rmm->base->firstfree).

This rmm_addr was initialized with NULL as a locking mechanism:

From apr-utils:

apr_rmm_init()

    if (!lock) { <-- 2nd argument to apr_rmm_init()
        nulllock.type = apr_anylock_none; <--- found in the dump
        nulllock.lock.pm = NULL;
        lock = &nulllock;
    }

From apache:

# mod_auth_digest

    sts = apr_rmm_init(&client_rmm,
                       NULL, /* no lock, we'll do the locking ourselves */
                       apr_shm_baseaddr_get(client_shm),
                       shmem_size, ctx);

# util_ldap_cache

        result = apr_rmm_init(&st->cache_rmm, NULL,
                              apr_shm_baseaddr_get(st->cache_shm), size,
                              st->pool);

It appears that the ldap module chose to use "rmm" for memory allocation, using
the shared memory approach, but without explicitly definiting a lock to it.
Without it, its up to the caller to guarantee that there are locks for rmm
synchronization (just like mod_auth_digest does, using global mutexes).

Because of that, there was a race condition in "find_block_of_size" and a call
touching "rmm->base->firstfree", possibly "move_block()", in a multi-threaded
apache environment, since there were no lock guarantees inside rmm logic (lock
was "apr_anylock_none" and the locking calls don't do anything).

In find_block_of_size:

    apr_rmm_off_t next = rmm->base->firstfree;

We have:

    rmm->base->firstfree
 Decimal:356400
 Hex:0x57030

But "next" turned into:

Name : next
 Decimal:8320808657351632189
 Hex:0x737973636970653d

Causing:

        struct rmm_block_t *blk = (rmm_block_t*)((char*)rmm->base + next);

        if (blk->size == size)

To segfault.

Upstream bugs:

https://bz.apache.org/bugzilla/show_bug.cgi?id=58483
https://bz.apache.org/bugzilla/show_bug.cgi?id=60296
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=814980#15

Changed in apache2 (Ubuntu):
status: New → In Progress
assignee: nobody → Rafael David Tinoco (inaddy)
importance: Undecided → Medium
tags: added: sts
Rafael David Tinoco (inaddy) wrote :

I have found the cause of why the race condition happened and was able to verify that apache2 upstream version (ldap module) also suffers from the same issue. Instead of re-describing the problem here I'll give you the URLs of where I'm working upstream:

This bug was already opened in 2015:

https://bz.apache.org/bugzilla/show_bug.cgi?id=58483

And again in 2016:

https://bz.apache.org/bugzilla/show_bug.cgi?id=60296

My notes are here:

https://bz.apache.org/bugzilla/show_bug.cgi?id=60296#c7

And actually the notes were the ones that helped me finding all the upstream bits for this particular bug. As you can see in apache upstream bug, an engineer from EA.COM proposed a small patch solving the external lock need that I also identified. He has also reported this in debian project:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=814980#15

I sent my findings upstream as a new comment to the existing bug in apache2 project because I wanted to check on what was going to be done by the maintainer, since there are multiple ways of solving this (internal and/or external lock), and I could help in any of them.

Maintainer responded:

https://bz.apache.org/bugzilla/show_bug.cgi?id=60296#c8

Saying that he has already fixed this behaviour:

2 Changes with Apache 2.5.1
3
4 *) mod_ldap: Avoid possible crashes, hangs, and busy loops due to
5 improper merging of the cache lock in vhost config.
6 PR 43164 [Eric Covener]

by changing an external (to rmm allocator logic) lock that wasn't working. The lock was being copied in a callback after it was being used, causing the race condition because there was no external (to rmm) lock and rmm lock wasn't being used.

Rafael David Tinoco (inaddy) wrote :

PPA containing a testfix for Xenial: https://launchpad.net/~inaddy/+archive/ubuntu/lp1752683

Rafael David Tinoco (inaddy) wrote :
description: updated
Rafael David Tinoco (inaddy) wrote :

Waiting confirmation that the fix is good in order to subscribe sponsors for the patches to be uploaded to T/X/A/B in reverse order.

Changed in apache2:
status: Unknown → New
Rafael David Tinoco (inaddy) wrote :

I have proposed the same fix to Debian BUG:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=814980

Rafael David Tinoco (inaddy) wrote :

I've got the confirmation that the patch fixes the issue. End user installed patched apache2 into multiple servers and the crashes stopped occurring.

tags: added: sts-sponsor
Eric Desrochers (slashd) wrote :

I see a current upload in Xenial from cpaelzer for LP: #1466926 with a few automatic testing failures[1].

While the failures are only found in Xenial release, I prefer to sync-up with cpaelzer first tomorrow to make sure we won't interfere with his current apache2 work.

Once I can confirm everything is clear, I'll go ahead and sponsor bionic.

Thanks Rafael !

[1] - https://people.canonical.com/~ubuntu-archive/pending-sru.html

- Eric

Eric Desrochers (slashd) wrote :

I couldn't talk to cpaelzer, but after double-checking, the only release on which cpaelzer seems to work for now for apache2 is Xenial.

The bug he is working on has been fix on Bionic a while ago, so I don't think he'll need extra upload in bionic (at least for this particular bug).

# Bionic : debian/changelog
  * mpm_event: Fix "scoreboard full" errors. Closes: #834708 LP: #1466926

 -- Stefan Fritsch <email address hidden> Wed, 21 Dec 2016 23:46:06 +0100

With that being said I think I can safely say, our way is clear for bionic.

Note that it would be best before proceeding with the SRU to re-evaluate the situation (especially for Xenial).

- Eric

Eric Desrochers (slashd) wrote :

Sponsored for bionic.

Changed in apache2 (Ubuntu Trusty):
status: New → In Progress
Changed in apache2 (Ubuntu Xenial):
status: New → In Progress
Changed in apache2 (Ubuntu Artful):
status: New → In Progress
Changed in apache2 (Ubuntu Trusty):
assignee: nobody → Rafael David Tinoco (inaddy)
Changed in apache2 (Ubuntu Xenial):
assignee: nobody → Rafael David Tinoco (inaddy)
Changed in apache2 (Ubuntu Artful):
assignee: nobody → Rafael David Tinoco (inaddy)
Changed in apache2 (Ubuntu Trusty):
importance: Undecided → Medium
Changed in apache2 (Ubuntu Xenial):
importance: Undecided → Medium
Changed in apache2 (Ubuntu Artful):
importance: Undecided → Medium
Eric Desrochers (slashd) on 2018-03-29
Changed in apache2 (Ubuntu Bionic):
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package apache2 - 2.4.29-1ubuntu4

---------------
apache2 (2.4.29-1ubuntu4) bionic; urgency=medium

  * Avoid crashes, hangs and loops by fixing mod_ldap locking: (LP: #1752683)
    - added debian/patches/util_ldap_cache_lock_fix.patch

 -- Rafael David Tinoco <email address hidden> Fri, 02 Mar 2018 02:19:31 +0000

Changed in apache2 (Ubuntu Bionic):
status: Fix Committed → Fix Released
Dan Streetman (ddstreet) wrote :

Once the LP: #1466926 apache2 SRU in xenial -proposed is either promoted or removed, I'll upload this to affected releases.

Changed in apache2:
status: New → Fix Released

Hello Rafael, or anyone else affected,

Accepted apache2 into artful-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/apache2/2.4.27-2ubuntu4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-artful to verification-done-artful. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-artful. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in apache2 (Ubuntu Artful):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-artful
Brian Murray (brian-murray) wrote :

Hello Rafael, or anyone else affected,

Accepted apache2 into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/apache2/2.4.18-2ubuntu3.7 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in apache2 (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed-xenial
Changed in apache2 (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed-trusty
Brian Murray (brian-murray) wrote :

Hello Rafael, or anyone else affected,

Accepted apache2 into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/apache2/2.4.7-1ubuntu4.19 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-trusty to verification-done-trusty. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-trusty. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Rafael David Tinoco (inaddy) wrote :

I have asked end user for verification and was informed that after deployed no more crashes were seen (intermittent and hard to reproduce, based on workload and several variables). I think it is safe to mark this as verification-done.

tags: added: verification-done
removed: verification-needed verification-needed-artful verification-needed-trusty verification-needed-xenial
Dan Streetman (ddstreet) wrote :
Download full text (4.8 KiB)

Notes on the apache-deps pending sru autopkgtest failures:

For trusty, the 'puppet' autopkgtest is failing:
http://autopkgtest.ubuntu.com/packages/p/puppet/trusty/amd64

the failure for this sru is here:
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-trusty/trusty/amd64/p/puppet/20180405_220028_67950@/log.gz

it appears to have been failing since 2016, however:
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-trusty/trusty/amd64/p/puppet/20160713_165538@/log.gz

so it's not related to this sru and can be ignored.

For xenial, the gvfs (s390x) autopkgtest fails, but the log indicates it's a testbed setup script failure - so almost certainly some transient issue with the s390x autopkgtest setup, not anything caused by this sru.
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-xenial/xenial/s390x/g/gvfs/20180405_214901_9f312@/log.gz

Additionally, the gvfs autopkgtest has been failing for over a month, so it seems safe to ignore its failures w.r.t this sru:
http://autopkgtest.ubuntu.com/packages/g/gvfs/xenial/s390x

Also for xenial, libapache2-mod-perl2 autopkgtest is failing. This started failing with the last version of apache, however before that this autopkgtest wasn't run since 2016 - so it's very likely something else introduced this test failure, since the last xenial apache2 version was removed from -proposed and its code is not included in this upload, but the libapache2-mod-perl2 autopkgtest failure is the same. I believe this can be ignored as unrelated to this sru.
http://autopkgtest.ubuntu.com/packages/lib/libapache2-mod-perl2/xenial/amd64

For artful, resource-agent (s390x) autopkgtest fails:
http://autopkgtest.ubuntu.com/packages/c/cacti/artful/amd64

however the specific failing tests are first 'IPaddr2' which fails with what looks like a broken test (and certainly seems unrelated to apache2):
autopkgtest [20:43:48]: test IPaddr2: [-----------------------
sed: -e expression #1, char 11: unterminated `s' command
autopkgtest [20:43:48]: test IPaddr2: -----------------------]
autopkgtest [20:43:49]: test IPaddr2: - - - - - - - - - - results - - - - - - - - - -
IPaddr2 FAIL non-zero exit status 1
autopkgtest [20:43:49]: test IPaddr2: - - - - - - - - - - stderr - - - - - - - - - -
sed: -e expression #1, char 11: unterminated `s' command

and the tests 'command4' and 'command6' which are both mysql tests, not for apache2. So appear unrelated to this.

Also for artful, the cacti autopkgtest fails:
http://autopkgtest.ubuntu.com/packages/c/cacti/artful/amd64

This failure is during a test that has something to do with apache2, so possibly is related:
autopkgtest [20:31:42]: test check-all-pages: [-----------------------
--2018-04-05 20:31:42-- http://localhost/cacti/index.php
Resolving localhost (localhost)... 127.0.0.1
Connecting to localhost (localhost)|127.0.0.1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 134 [text/html]
Saving to: ‘/tmp/tmp.JiG38g3XKE’

     0K 100% 24.8M=0...

Read more...

tags: added: verification-done-artful verification-done-trusty verification-done-xenial
Dan Streetman (ddstreet) wrote :

I reproduced the 'cacti' autopkgtest on my local system, without this sru apache2 package; so the test failure is not related to this sru.

This and the last comment provide justification for ignoring all the pending-sru autopkgtest failures.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package apache2 - 2.4.27-2ubuntu4

---------------
apache2 (2.4.27-2ubuntu4) artful; urgency=medium

  * Avoid crashes, hangs and loops by fixing mod_ldap locking: (LP: #1752683)
    - added debian/patches/util_ldap_cache_lock_fix.patch

 -- Rafael David Tinoco <email address hidden> Fri, 02 Mar 2018 02:14:42 +0000

Changed in apache2 (Ubuntu Artful):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for apache2 has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package apache2 - 2.4.18-2ubuntu3.7

---------------
apache2 (2.4.18-2ubuntu3.7) xenial; urgency=medium

  * Avoid crashes, hangs and loops by fixing mod_ldap locking: (LP: #1752683)
    - added debian/patches/util_ldap_cache_lock_fix.patch

 -- Rafael David Tinoco <email address hidden> Thu, 01 Mar 2018 18:29:12 +0000

Changed in apache2 (Ubuntu Xenial):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package apache2 - 2.4.7-1ubuntu4.19

---------------
apache2 (2.4.7-1ubuntu4.19) trusty; urgency=medium

  * Avoid crashes, hangs and loops by fixing mod_ldap locking: (LP: #1752683)
    - added debian/patches/util_ldap_cache_lock_fix.patch

 -- Rafael David Tinoco <email address hidden> Fri, 02 Mar 2018 01:48:33 +0000

Changed in apache2 (Ubuntu Trusty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.