Comment 28 for bug 1728111

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Tim,

net split example: you have Juju controllers and MAAS region controllers sitting on layer 2 networks different from rack controllers and application servers in a data center. E.g. there are 9 racks to manage in different locations within the same DC but you would like to keep the same Juju & MAAS regiond control plane located separately so that you can add more racks. In this case there may be a situation where you lose access to one management network for rack "k" from a Juju controller which is a primary in a replicaset. It's a net split but your applications are unaffected - only machine & unit agents.

I think that what we encounter is mostly deployment-time problems because after a model has converged there is little use for Juju leadership hooks. It may be needed if you need to scale your infrastructure (deployment time again) but by then service-level clustering will have already been done.

Another use-case is rolling upgrades: a single unit should initiate them even if the "rolling" part is managed at the service level. But there are two different types of rolling upgrades:

1. for stateless applications - ordering of operations (by a leader) should be done on the Juju side as this is operator-driven if done manually in many cases. Otherwise we will need a "software-upgrader" application which will have to handle that and maintain the deployment state;
2. stateful applications - service-level quorum awareness is required so a leader unit only initiates an upgrade which is done in software itself.

In the cases I've seen we go through the following logic:

1. a leader unit defines who will bootstrap a service-level cluster;
2. service-level elections are performed (ordered connections to a master, PAXOS, RAFT, Totem RRP etc.);
3. leadership is managed at the service level. Leader settings contain an indication of a completed bootstrap procedure and leadership hooks are no-ops.

A practical example:

1. percona cluster (master bootstraps, slaves join without bootstrapping);
2. new slaves join the quorum;
3. any service-level failure conditions require disaster recovery and manual intervention.