juju-ci-tools

bad timeout caused kill-controller to leave resources behind

Bug #1604102 reported by Curtis Hovey on 2016-07-18

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	juju-ci-tools	Fix Released	High	Curtis Hovey

Bug Description

We observed that azure sometimes had instances let behind, preventing subsequent tests to get enough instances to test with. Andrew reviewed the logs and saw that juju was waiting for instances to be when CI killed the long running proc. CI must let Juju try to finish. Azure can take a long time to reclaim resources. 10 minutes is not enough even for a trivial deployment. Juju can take as long as 30 minutes to clean up.

Sane outout looks like this example

   2016-07-18 10:01:50 INFO cmd cmd.go:141 admin@local/azure-arm-deploy (dying), 1 machine
    Waiting on 1 model
    2016-07-18 10:01:52 INFO cmd cmd.go:141 admin@local/azure-arm-deploy (dying)
    All hosted models reclaimed, cleaning up controller machines

If the console log is missing "All hosted models reclaimed, cleaning up controller machines" then juju did not clean up, and if we see in CI's log that it collected timings, we can see that CI prematurely interrupted Juju.

This issue is currently masked by the azure cleanup script that reclaims resources older than 6 hours.

Tags:

Revision history for this message

Curtis Hovey (sinzui) wrote on 2016-07-28:

EnvJujuClient.kill_controller() sets a 600 second timeout for all calls to bring down the controller/state-server and their machines. This time is twice the time needed to gce and 4/5 x time needed by other clouds.

Azure is the exception a trivial stack of 3 machines takes 666 seconds. It can take 30 minutes to bring down a large deployment. We could change the timeout to 1800 seconds. I prefer to only pass 1800 when the client.config['type'] is 'azure'.

Changed in juju-ci-tools:
assignee:	nobody → Leo Zhang (nealpzhang)

Revision history for this message

Richard Harding (rharding) wrote on 2016-08-03:

Note to check https://bugs.launchpad.net/juju-core/+bug/1571687 when this is fix-committed to make sure that it's not an issue.

Curtis Hovey (sinzui) on 2016-08-05

Changed in juju-ci-tools:
assignee:	Leo Zhang (nealpzhang) → Curtis Hovey (sinzui)
status:	Triaged → In Progress

Curtis Hovey (sinzui) on 2016-08-05

Changed in juju-ci-tools:
status:	In Progress → Fix Committed

Curtis Hovey (sinzui) on 2016-08-10

Changed in juju-ci-tools:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.