libmemcached

_is_auto_eject_host() not causing run_distribution()

Bug #810888 reported by Ondrej Holecek on 2011-07-15

This bug report is a duplicate of: Bug #777672: libmemcached autoeject/conn retry problem. Edit Remove

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	libmemcached	New	Medium	Brian Aker

Bug Description

we are using libmemcached-0.44 but I observed the same problem on
latest 0.50 as well.

We have a cluster of let say 10 memcaches. We have
MEMCACHED_DISTRIBUTION_CONSISTENT enabled.

One of the memcashes went down and memcachd_get() returned
MEMCACHED_SERVER_MARKED_DEAD since the key should be stored on the
died memcache. So, we called memcached_autoeject() to redistribute
keys to the other servers. (let me note, there is nothing written
about memcached_autoeject() in the docs). memcachd_get() still
returned MEMCACHED_SERVER_MARKED_DEAD.

I discovered run_distribution(ptr); does not call because of "if
(_is_auto_eject_host(ptr) && ptr->next_distribution_rebuild)"
condition in _regen_for_auto_eject().

The only place where ptr->next_distribution_rebuild is being filled is
inside run_distribution() which can not be called because of the
condition. So, my first proposal is:

void memcached_autoeject(memcached_st *ptr)
{
if (_is_auto_eject_host(ptr))
run_distribution(ptr);
}

I'm not sure if the patch is correct. However, it still did not fix
all my problems. memcachd_get() was still returning
MEMCACHED_SERVER_MARKED_DEAD. So, my second proposal is:

@@ -134,6 +134,7 @@ static memcached_return_t
update_continuum(memcached_st *ptr)
    {
      if (list[host_index].next_retry <= now.tv_sec)
      {
+ list[host_index].server_failure_counter = 0;
        live_servers++;
      }

Consider the server went up and next retry timeout occurred. Due to
code in memcached_connect()....

514 if (ptr->root->server_failure_limit &&
ptr->server_failure_counter >= ptr->root->server_failure_limit)
515 {
516 set_last_disconnected_host(ptr);
517
518 // @todo fix this by fixing behavior to no longer make use of
519 // memcached_st
520 if (_is_auto_eject_host(ptr->root))
521 {
522 run_distribution((memcached_st *)ptr->root);
523 }
524
525 return MEMCACHED_SERVER_MARKED_DEAD;
526 }

run_distribution() is called but it does not set
server_failure_counter to zero! The server has to be connected but the
condition "ptr->server_failure_counter >=
ptr->root->server_failure_limit" cause MEMCACHED_SERVER_MARKED_DEAD
was returned again and again.

So the server never got back.

Brian Aker (brianaker) on 2011-07-20

summary:	- consistent hashing broken + _is_auto_eject_host() not causing run_distribution()
Changed in libmemcached:
assignee:	nobody → Brian Aker (brianaker)
importance:	Undecided → Medium

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #777672 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.