Comment 3 for bug 2025724

Revision history for this message
John A Meinel (jameinel) wrote :

So we've actually had the stated policy that "you should not use Charm leadership to manage application leadership" for a few reasons

1) The actual policy from Charm leadership is that "there will not be >1 leader at any given time". Which is different from "there is always guaranteed to be 1 leader".

2) The actual mechanism we use is "you have a lease for X time, and you should renew that lease in X/2 time". (where we have chosen 60s as the length of lease, and 30s as the renewal).
Most applications would be quite happy to be down for 60s while juju notices that someone else needs to take over, while management of the app is usually perfectly fine at that interval.

3) There is a big different from "my application is not responding" vs "my charm is not responding". The Juju Unit agent being alive and responsive doesn't meant that your running application hasn't stopped accepting requests.
You *really* want a health check on the actual application to be the thing that maintains the primary of the application, and charm leadership is a *very* weak proxy for that.

If you are ok with that delay, I would be ok with 'you can request leadership, but it won't take effect until the current lease expires'. Implementation is also a little bit tricky, as we don't have any way to disallow other units from getting the lease (all units are in a 'block until the lease expires', which in a healthy state never happens, and then make a claim which might succeed)

Lots of potential problems, like you request for unit/1 to become the next leader, but then unit/1 dies. And then you've blocked out /0 and /2 from becoming leaders in their absence. We could layer *another* lease on top of it "I want leadership for the next 2 min".
And if you have buggy code and each unit decides that it wants to request the next leader, who wins?