Comment 11 for bug 1767105

Revision history for this message
In , Rafael David Tinoco (rafaeldtinoco) wrote :

Max or Eric,

Are there any changes in status of this bug ?

I can corroborate to what Max said, and proposed, since I'm analysing a dump that was brought to me, exactly in this situation (and also could find another old bug with the same race condition):

https://bz.apache.org/bugzilla/show_bug.cgi?id=58483

These are my notes so far:

Problem summary:

apr_rmm_init acts as a relocatable memory management initialization

it is used in: mod_auth_digest and util_ldap_cache

From the dump was brought to my knowledge, in the following sequence:

- util_ldap_compare_node_copy()
- util_ald_strdup()
- apr_rmm_calloc()
- find_block_of_size()

Had a "cache->rmm_addr" with no lock at "find_block_of_size()"

cache->rmm_addr->lock { type = apr_anylock_none }

And an invalid "next" offset (out of rmm->base->firstfree).

This rmm_addr was initialized with NULL as a locking mechanism:

From apr-utils:

apr_rmm_init()

    if (!lock) { <-- 2nd argument to apr_rmm_init()
        nulllock.type = apr_anylock_none; <--- found in the dump
        nulllock.lock.pm = NULL;
        lock = &nulllock;
    }

From apache:

# mod_auth_digest

    sts = apr_rmm_init(&client_rmm,
                       NULL, /* no lock, we'll do the locking ourselves */
                       apr_shm_baseaddr_get(client_shm),
                       shmem_size, ctx);

# util_ldap_cache

        result = apr_rmm_init(&st->cache_rmm, NULL,
                              apr_shm_baseaddr_get(st->cache_shm), size,
                              st->pool);

It appears that the ldap module chose to use "rmm" for memory allocation, using
the shared memory approach, but without explicitly definiting a lock to it.
Without it, its up to the caller to guarantee that there are locks for rmm
synchronization (just like mod_auth_digest does, using global mutexes).

Because of that, there was a race condition in "find_block_of_size" and a call
touching "rmm->base->firstfree", possibly "move_block()", in a multi-threaded
apache environment, since there were no lock guarantees inside rmm logic (lock
was "apr_anylock_none" and the locking calls don't do anything).

In find_block_of_size:

    apr_rmm_off_t next = rmm->base->firstfree;

We have:

    rmm->base->firstfree
 Decimal:356400
 Hex:0x57030

But "next" turned into:

Name : next
 Decimal:8320808657351632189
 Hex:0x737973636970653d

Causing:

        struct rmm_block_t *blk = (rmm_block_t*)((char*)rmm->base + next);

        if (blk->size == size)

To segfault.