libmemcached

Server timeout state reset on IO error

Bug #931696 reported by Trevor North on 2012-02-13

This bug report is a duplicate of: Bug #928696: incorrect handling of server restart. Edit Remove

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	libmemcached	New	Undecided	Unassigned

Bug Description

When an IO error is encountered the server state is reset to MEMCACHED_SERVER_STATE_NEW even if it is currently MEMCACHED_SERVER_STATE_IN_TIMEOUT. The call to memcached_mark_server_for_timeout will then incorrectly push the next connection retry time further back and further increment the server failure counter. This throws out the connection back-off handling as it appears there has been another failure when in fact we're just dealing with an in-progress failure so to speak.

This may only manifest itself as a problem when using consistent distribution due to the point at which the continuum is recalculated - I haven't tested with any of the other distribution options. It's probably also more obviously a problem when making use of the dead server retry behaviour included in 1.0.3+. In a nutshell it should be possible to observe that retries do not occur at the expected intervals and failure counts are not accurate after a server in the pool is taken offline.

I patched io.cc and quit.cc to work around this as part of the following commit to my branch: http://bazaar.launchpad.net/~trevor/libmemcached/dead-retry/revision/978

This may well be fixing the symptom rather than the cause, but I have had the change running in production for quite some time now with no apparent side-effects. I do understand that those changes cause at least some of the tests to fail though which certainly warrants further investigation.

I've been meaning to find the time to put together a proper example test case and results for this but that has been proving impossible of late. I still wanted to get the issue logged though - please let me know if I've not been clear enough here or can provide any more useful information.

See original description

Trevor North (trevor) on 2012-02-13

description:

updated

Trevor North (trevor) on 2012-02-13

affects:

libmemcached (Ubuntu) → libmemcached

Revision history for this message

Trevor North (trevor) wrote on 2012-02-13:

Now that I've got this tagged against the project rather than a distro package I notice that bug #928696 has a rather better description of the problem seen. Please feel free to mark this as a duplicate/merge as appropriate.

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #928696 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.