Add a wait command

Bug #1488777 reported by Stuart Bishop on 2015-08-26
56
This bug affects 10 people
Affects Status Importance Assigned to Milestone
juju
Wishlist
Unassigned

Bug Description

Charm developers writing tests need to wait for a freshly deployed environment to settle before running tests.

Operations need to wait for an environment to settle after making changes to perform smoke tests

For the purposes of these use cases, the way to defined settled is 'are there any hooks running, or are there any hooks scheduled to run'. At this particular moment in time, the environment is in a stable state. It will remain in the stable state until further juju operations are performed (by the operator, or scheduled).

The proposed command name is 'juju wait'.

With Juju 1.24, it may be possible to simply inspect the unit and agent statuses. It depends on if this is racy, and if the races can be worked around to make it reliable.

If the statuses are unsuitable, a plugin adding this command can be found at https://launchpad.net/juju-wait. As far as I'm aware, it is the only implementation of the only algorithm that works for older versions of Juju.

Please add a 'juju wait' command with these semantics to juju-core, so developers and operations no longer need to install a plugin from a ppa: to write reliable integration tests and deployment scripts.

Curtis Hovey (sinzui) on 2015-08-26
Changed in juju-core:
status: New → Triaged
importance: Undecided → Low
tags: added: charmers feature

Is it possible this is already present internally? What does the bundle deployer use?

Aaron Bentley (abentley) on 2016-04-01
tags: added: jujuqa
Stuart Bishop (stub) on 2016-04-02
tags: added: canonical-is
Ryan Beisner (1chb1n) wrote :
Download full text (3.3 KiB)

Opinionated assertions follow. :-)

I think this is important in at least two arenas:

 - Production deployment automation where post-deployment activities need to take place; and

 - Test automation, where the deployed model needs to be inspected post-deployment.

The impact of not having some sort of approach in place is that testing begins too early, introducing race conditions where a test may pass when the weather is good, but may start to fail when the substrate is under load, or the internet is slower, or any other variable impacts the timing of things.

We too (OpenStack Engineering) have struggled over time, and believe we have conquered the art of systematically detecting when a Juju deployment is actually "done." We did this by implementing extended status messaging into the charms, allowing the charm declare itself "ready." When all units in a model do so, we proceed. We've found that all other approaches leave varying degrees of raciness.

We've found that in cases where not all charms possess intentional extended status advertising, the juju-wait plugin almost always reliably waits the necessary amount of time for the deployment to complete. Even with that, we've found a couple of gaps and currently have a juju-wait fork and corresponding merge proposal to address those.

Unfortunately at this time, there is no global uniform extended status message to watch across all charms.
We've implemented a predictable extended status message in the OpenStack charms, but that message is perhaps different than other charms. This means that currently, one approach may not translate well to another type of workload.

Some observations, opinions, mixed with some facts:

`juju deploy foo` exits 0 nearly immediately, so that cannot be used to block/wait.

juju-deployer exits 0, many minutes (2 to 35min in practice) before the OpenStack deployments are actually done.

Amulet's wait logic has grown to be closer to perfect in waiting for things to wrap up, but we still observe test races when using that alone, especially when subordinates are involved.

Amulet's "wait for extended status" logic is solid, if the charm is written to declare itself ready.

For legacy charms which do not / will not have extended status, juju-wait is nearly perfect in predicting readiness.

...

To sum up:

1. juju-wait is the closest thing to a global, generically usable way to block/wait for deployment readiness that I have seen and used.

2. Extended status is the only guaranteed way to block/wait for a deployment to complete, presuming that the charm author's logic checks that the required relations are met, deployed services are up, processes are running and any expected network sockets are bound, listening and responsive -- before declaring itself ready.

3. Since a typical OpenStack deployment includes two charms which do not have extended status (mysql and mongodb), we do both [1] and [2], and that successfully avoids the "Am-I-really-ready?" races we used to battle. Now we are free to chase more meaningful races in the deployed workloads. ;-)

4. Sleep not. If you find yourself about to add time.sleep() anywhere in anything outside of a retry loop,...

Read more...

tags: added: uosci
Changed in juju-core:
importance: Low → Wishlist
Paul Gear (paulgear) wrote :

Anastasia asked me to comment on this from the Canonical IS perspective, but I don't think I could do so any better than Ryan has already done.

We need a reliable method of determining when it is 100% safe to run post-deployment steps. This is all but essential for any CI infrastructure depending on juju, and I think it should be considered much higher priority than wishlist - probably at least a medium.

affects: juju-core → juju
tags: added: conjure
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers