OpenSRF

improve resilience to failure of a memcached server

Bug #1695050 reported by Galen Charlton on 2017-06-01

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenSRF	New	Wishlist	Unassigned

Bug Description

If more than one memcached server is used, keys are spread out among them as determined by the memcached client (libmemcached for C code), Cache::Memcached for Perl. However, in the (uncommon) event that one of the memcached instances should fail, C backends in particular can end up in loops where they attempt to repeatedly send data to the failed instance.

As it happens, libmemcached has some tools for dealing with failed servers:

http://docs.libmemcached.org/memcached_behavior.html#MEMCACHED_BEHAVIOR_REMOVE_FAILED_SERVERS

as well as replicating cached keys:

http://docs.libmemcached.org/libmemcached_configuration.html#memcached

Making use of these could improve resilience in the face of memcached failure.

OpenSRF master