Comment 7 for bug 1675500

Revision history for this message
Kota Tsuyuzaki (tsuyuzaki-kota) wrote :

@Matt

Thanks for worthful analysis for this bug and the reason makes me sense. That is from wrong usage of the more_nodes iterator.

However, your fix looks still to be needed to improve. I wrote unit test for the case that, handoff nodes attempts to push thier local to ....

With your patch, the handoff node still try to replicate 4th replica even if it uses 3 replica ring because the replicator doesn't count his local as the third replica yeah? And then, I think it causes a race to push the replica between handoff nodes.

Thinking of similar case in the object-replicator, the object-replicators in handoffs attempt to push its local *only* to primaries, and the object-replicators in *primaries* are able to push the local replica to a handoff when a device failure found while syncing.

My patch for db_replicator with this comment is designed to work as well as the object-replicator, i.e. the db replciator in a handoff never tries to push the replica to handoffs, which is desirable behavior, I think.

Thought?