add rolling update / partial update support

Bug #1243768 reported by JuanJo Ciarlante
30
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Low
Unassigned
Charm Helpers
New
Undecided
Unassigned
juju-core
Won't Fix
Medium
Unassigned

Bug Description

For the discussion below , "update" refers to upgrade-charm or config-change.

To add reliability to mission critical services, we should be able to do rolling updates ("update ok -> next"), and/or being able to partially update a subset of units.
E.g. use-case: ~1000 units cassandra restarting at the same time (think also: shared storage layer thundering effect).

E.g suggested usage:
$ juju upgrade-charm --max-in-flight=2 ...

Then to properly support resuming above (if interrupted)
$ juju upgrade-charm --num-units-to-update=10

Then to complete either rollback:
$ juju upgrade-charm --roll-back ...
$ juju upgrade-charm --roll-forward ...

FYI webops currently has workarounded this at some charms by:
juju set units-to-update=0,2,5
juju set units-to-update=all

Curtis Hovey (sinzui)
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
tags: added: improvement upgrade-charm
Revision history for this message
JuanJo Ciarlante (jjo) wrote :

ERRATA: at 1st post, it should obviously for 'upgrade-charm' (and likely 'set'), instead of 'deploy'.

Revision history for this message
John A Meinel (jameinel) wrote :

I'm not sure that this should be classified as High. Certainly it is feature work that isn't on the immediate roadmap. Definitely a nice to have, and might get higher priority from stakeholder escalation (we must have this for site X where we are rolling out 100s of units of Y).

jjo: you can edit a description after the fact, which I'll do now.

description: updated
Revision history for this message
John A Meinel (jameinel) wrote :

Note that you can simulate this a little bit by doing multiple service deploys.

So you do something like:

 juju deploy cs:cassandra-10 cassandra-A
 juju add-unit -n 100 cassandra-A

Then when you want to upgrade
 juju deploy cs:cassandra-11 cassandra-B
 juju add-unit -n 10 cassandra-B
 juju remove-unit -n 10 cassandra-A

See that things keep working:
 juju add-unit -n 50 cassandra-B
 juju remove-unit -n 50 cassandra-A
etc

However, you'd need a way to configure the charm so that its configuration can cross services. (Peer relations only work inside a service ATM.)

The nice thing about this method is that it gives you very fine control over the process, and it lets you do the upgrade as clearly bringing up *more* units of work, rather than shutting down the units before adding more capacity. (though you can always remove-unit before you add-unit if you want that.)

It is a workaround, though, and being able to roll out upgrades in place would be very nice to have.

Changed in juju-core:
importance: High → Medium
Revision history for this message
Anastasia (anastasia-macmood) wrote :

Re-targeting for Juju 2.x

Changed in juju-core:
status: Triaged → Won't Fix
Changed in juju:
status: New → Triaged
importance: Undecided → Wishlist
tags: added: canonical-bootstack
Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

This would be extremely useful for doing critical applications.

Eg. have a workload which is able to failover between AZs. Being able to do charm upgrades per AZ and then fail over the application would greatly reduce the risk of user impact. Lacking this feature means we have to have a maintenance window for the whole cloud.

Revision history for this message
Xav Paice (xavpaice) wrote :

https://launchpad.net/layer-coordinator provides a facility which may be what we need to allow this to happen in the charm code, rather than expecting Juju to know how to go about restarting the applications.

Revision history for this message
Xav Paice (xavpaice) wrote :

Added charm-helpers to this bug report since it is likely that a change to charm-helpers, then the multitude of charms that need rolling restarts/upgrades/etc is going to be what's needed.

Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: Wishlist → Low
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.