Comment 2 for bug 1843376

Revision history for this message
John A Meinel (jameinel) wrote :

There were 2 ways that the test TestOnLeaderFailure can fail

1) We get to the BlockUntilLeadership call, and then we start shutting down. That would cause BlockUntilLeadershipReleased to return ErrBlockCancelled (because the leadershipClaimer isn't running over the API and would notice that tomb.Dying() is returning)

With my patch, we no longer block on the send to claimLease. Which means we eventually close the claimLease channel as the goroutine exits.

In this case, you see 3 calls, Claim + Block + Claim.

The main loop then sees the closed claimLease channel *before* it sees tomb.Dying() itself. Which had an unconditional "if claimLease returns, and it isn't leadership.ErrBlockCancelled (and a closed channel returns nil error), then it would try to make another claim". That can be fixed by noticing the channel was closed and treating that as block cancelled.

2) The other way it could fail is that the main loop *does* notice tomb.Dying while the setMinion goroutine is running. The goroutine does eventually call BlockUntilLeadershipCancelled, but the main loop has already exited and since it doesn't wait for the Cancelled to return/close the channel, the test moves forward and doesn't see the request.

In this case you see 1 call, just to Claim.