Server timeout state reset on IO error

Bug #931696 reported by Trevor North
This bug report is a duplicate of:  Bug #928696: incorrect handling of server restart. Edit Remove
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
libmemcached
New
Undecided
Unassigned

Bug Description

When an IO error is encountered the server state is reset to MEMCACHED_SERVER_STATE_NEW even if it is currently MEMCACHED_SERVER_STATE_IN_TIMEOUT. The call to memcached_mark_server_for_timeout will then incorrectly push the next connection retry time further back and further increment the server failure counter. This throws out the connection back-off handling as it appears there has been another failure when in fact we're just dealing with an in-progress failure so to speak.

This may only manifest itself as a problem when using consistent distribution due to the point at which the continuum is recalculated - I haven't tested with any of the other distribution options. It's probably also more obviously a problem when making use of the dead server retry behaviour included in 1.0.3+. In a nutshell it should be possible to observe that retries do not occur at the expected intervals and failure counts are not accurate after a server in the pool is taken offline.

I patched io.cc and quit.cc to work around this as part of the following commit to my branch: http://bazaar.launchpad.net/~trevor/libmemcached/dead-retry/revision/978

This may well be fixing the symptom rather than the cause, but I have had the change running in production for quite some time now with no apparent side-effects. I do understand that those changes cause at least some of the tests to fail though which certainly warrants further investigation.

I've been meaning to find the time to put together a proper example test case and results for this but that has been proving impossible of late. I still wanted to get the issue logged though - please let me know if I've not been clear enough here or can provide any more useful information.

Trevor North (trevor)
description: updated
Trevor North (trevor)
affects: libmemcached (Ubuntu) → libmemcached
Revision history for this message
Trevor North (trevor) wrote :

Now that I've got this tagged against the project rather than a distro package I notice that bug #928696 has a rather better description of the problem seen. Please feel free to mark this as a duplicate/merge as appropriate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.