uint32_t overflow cause busy loop in 1.0.18

Bug #1451184 reported by bin on 2015-05-03
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
libmemcached
Undecided
Unassigned

Bug Description

We are using libmemcached 1.0.18 and observed occasionally very high CPU consumption.
After some debugging, the cause is located in libmemcached/purge.cc :

bool memcached_purge(memcached_instance_st* ptr)
{
    .............
    uint32_t no_msg= memcached_server_response_count(ptr) - 1;
   for (uint32_t x= 0; x < no_msg; x++) // busy loop here
   {
         .....
   }
}

During debugging, we found that memcached_server_response_count(ptr) sometimes return a large value which equals to (uint32_t)(0-1) .

In order to fix this issue, we took an quick & dirty solution, which refuses to decrement by 1 when 0. This patch is attached below.

But still, I haven't found the root cause, and I'm afraid the root cause would leads to more possible hidden issues. So hope someone else could catch and fix this bug.

bin (0x3fffffff) wrote :
Brandon DuRette (brandond) wrote :

memcached_server_response_count(ptr) returns 0 when ptr is NULL or ptr->cursor_active is NULL. Decrementing this and assigning to a uint32_t produces 2^32-1, so the error check if (no_msg > 0) only fails if ptr->cursor_active == 1. I suspect the right answer is to not use uint32_t here (before it was uint32_t it was int; not sure why it was converted to unsigned) and instead use int32_t. That change is here:

http://bazaar.launchpad.net/~tangent-trunk/libmemcached/1.2/revision/509/libmemcached/memcached_purge.c

For what it's worth, I've seen this bug tickled when calling set with a key containing a newline (\n) from PHP 5.5's Memcached class. The Memcached server doesn't like that (seems to be interpreting the second line as a new command) and errors out -- eventually triggering this loop as it attempts to unwind from the error.

kenorb (kenorb) wrote :

The same in here, PHP just crashes.
PHP 5.6.15 (cli) (built: Oct 31 2015 14:39:02)

Bt:

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libmemcached.11.dylib 0x000000010968328d memcached_purge(memcached_instance_st*) + 61
1 libmemcached.11.dylib 0x0000000109681c8d io_flush(memcached_instance_st*, bool, memcached_return_t&) + 45
2 libmemcached.11.dylib 0x0000000109681720 _io_write(memcached_instance_st*, void const*, unsigned long, bool, unsigned long&) + 165
3 libmemcached.11.dylib 0x0000000109681769 memcached_io_write(memcached_instance_st*, void const*, unsigned long, bool) + 20
4 libmemcached.11.dylib 0x000000010967f667 __mget_by_key_real(memcached_st*, char const*, unsigned long, char const* const*, unsigned long const*, unsigned long, bool) + 1247
5 libmemcached.11.dylib 0x000000010967fd40 memcached_mget_by_key + 20
6 memcached.so 0x000000010965b8d9 php_memc_get_impl + 637
7 php 0x000000010894b458 dtrace_execute_internal + 126
8 php 0x00000001089c436e zend_do_fcall_common_helper_SPEC + 1450
9 php 0x0000000108981b12 execute_ex + 78

kenorb (kenorb) wrote :
TimBunce (tim-bunce) wrote :

Just to be clear, "This PR probably fixes the issue" in the previous comment refers to the fact that the Memcached-libmemcached repo contains a copy of the libmemcached code. It's the changes in that PR that need to be applied here to the upstream libmemcached code. So "This PR probably fixes the issue" -> "The changes in this PR probably fix the issue if we applied them here".

There's also the patch provided by "bin (0x3fffffff)" in the first comment on this issue.

I've not applied the PR to Memcached-libmemcached yet because I'm waiting on some acknowledgement from the libmemcached team of which fix is likely to be applied here.

Any news?

lawlielt (lawlielt) wrote :

is there anyone to fix this in next release?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers