bad timeout caused kill-controller to leave resources behind

Bug #1604102 reported by Curtis Hovey
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-ci-tools
Fix Released
High
Curtis Hovey

Bug Description

We observed that azure sometimes had instances let behind, preventing subsequent tests to get enough instances to test with. Andrew reviewed the logs and saw that juju was waiting for instances to be when CI killed the long running proc. CI must let Juju try to finish. Azure can take a long time to reclaim resources. 10 minutes is not enough even for a trivial deployment. Juju can take as long as 30 minutes to clean up.

Sane outout looks like this example

   2016-07-18 10:01:50 INFO cmd cmd.go:141 admin@local/azure-arm-deploy (dying), 1 machine
    Waiting on 1 model
    2016-07-18 10:01:52 INFO cmd cmd.go:141 admin@local/azure-arm-deploy (dying)
    All hosted models reclaimed, cleaning up controller machines

If the console log is missing "All hosted models reclaimed, cleaning up controller machines" then juju did not clean up, and if we see in CI's log that it collected timings, we can see that CI prematurely interrupted Juju.

This issue is currently masked by the azure cleanup script that reclaims resources older than 6 hours.

Revision history for this message
Curtis Hovey (sinzui) wrote :

EnvJujuClient.kill_controller() sets a 600 second timeout for all calls to bring down the controller/state-server and their machines. This time is twice the time needed to gce and 4/5 x time needed by other clouds.

Azure is the exception a trivial stack of 3 machines takes 666 seconds. It can take 30 minutes to bring down a large deployment. We could change the timeout to 1800 seconds. I prefer to only pass 1800 when the client.config['type'] is 'azure'.

Changed in juju-ci-tools:
assignee: nobody → Leo Zhang (nealpzhang)
Revision history for this message
Richard Harding (rharding) wrote :

Note to check https://bugs.launchpad.net/juju-core/+bug/1571687 when this is fix-committed to make sure that it's not an issue.

Curtis Hovey (sinzui)
Changed in juju-ci-tools:
assignee: Leo Zhang (nealpzhang) → Curtis Hovey (sinzui)
status: Triaged → In Progress
Curtis Hovey (sinzui)
Changed in juju-ci-tools:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-ci-tools:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.