Comment 7 for bug 881983

Revision history for this message
Trevor North (trevor) wrote :

I have a branch which more or less achieves what is described above by way of a new dead server retry timeout behaviour.

As per the current behaviour with consistent distribution and auto ejection, keys on the dead server are moved after we hit the initial failure limit by taking that host out of the continuum. We then reset the continuum to force a retry every time we hit the dead server retry timeout in the same manner as is done for standard connection retries. Each dead server retry will result in a miss if the host is not actually available which isn't ideal but I wanted to achieve this whilst maintaining compatibility with current behaviour so have kept the changes to a bare minimum.

It's worth noting here that there are a couple of instances where an IO failure would incorrectly reset a server state to new even if it was already in timeout. I've corrected this when setting the state although I suspect the IO in question probably shouldn't be being attempted in the first place in some cases.

I've made no attempt to leave keys on their newly allocated servers once the dead server is brought back to life and I don't believe it would be sensible to do so. With multiple clients running, network flapping would result in effectively random distribution if we attempted to did this negating the point of the use of consistent distribution.

Bar the correction to the server state reset on IO failure when in timeout the changes introduced do not alter the behaviour currently seen if the new dead retry timeout is not used so they should be completely backwards compatible.

The branch is available at https://code.launchpad.net/~trevor/libmemcached/dead-retry and I've attached a patch which will apply to 1.0.2.

Feedback would be welcome as ideally this isn't something I want to have to maintain separately.