Canonical Juju

Azure-arm leaves machine-0 from the admin model behind

Bug #1571687 reported by Curtis Hovey on 2016-04-18

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Fix Released	Undecided	Unassigned

Bug Description

Juju CI is finding many 10's of resource groups left behind each week in Azure. On Monday 2016-04-18, Azure had 26 resource groups/instances running from 1 or more days ago. Most instances were from April 15. A few were older from April 13 and some were from April 16. All but two were machine-0 from the admin model.

Tags:

Curtis Hovey (sinzui) on 2016-04-20

Changed in juju-core:
milestone:	2.0-beta5 → 2.0-rc1

Revision history for this message

Andrew Wilkins (axwalk) wrote on 2016-04-22:

That would be because CI is timing out on kill-controller:
http://data.vapour.ws/juju-ci/products/version-3914/native-deploy-landscape-azure/build-51/consoleText

I've found deleting VMs in Azure to be considerably slower than on other clouds. Resource group deletion is also very slow, and this is necessary to destroy a model.

I do think we should stop trying to be "friendly" in kill-controller, and just talk directly to the cloud API like we used to with --force. That would probably speed things up a bit, because then we'd just delete everything at once by deleting the resource group.

Revision history for this message

Andrew Wilkins (axwalk) wrote on 2016-04-22:

kill-controller improvements may be in order, but the crux of this issue is that CI needs to wait for destruction to complete, or should expect leakage.

Changed in juju-core:
status:	Triaged → Invalid

Revision history for this message

Curtis Hovey (sinzui) wrote on 2016-04-22:

When CI times out, the build is marked a a failure. The example shows a success. I do not see Keyboard interrupt or python exceptions raised in the example. http://data.vapour.ws/juju-ci/products/version-3914/native-deploy-landscape-azure/build-51/consoleText

What does CI need to see in the log to know it didn't wait long enough?

Changed in juju-core:
status:	Invalid → Incomplete

Revision history for this message

Cheryl Jennings (cherylj) wrote on 2016-04-24:

CI would at least need to see the message "All hosted models reclaimed, cleaning up controller machines" to know that kill-controller is trying to take down admin/machine-0.

Azure is taking FOREVER to kill / destroy controllers. I timed the last one and it was just about 10 minutes: http://paste.ubuntu.com/16033544/

Curtis Hovey (sinzui) on 2016-04-25

Changed in juju-core:
milestone:	2.0-beta6 → 2.0-beta7

Curtis Hovey (sinzui) on 2016-05-13

Changed in juju-core:
milestone:	2.0-beta7 → 2.0-beta8

Cheryl Jennings (cherylj) on 2016-05-26

Changed in juju-core:
milestone:	2.0-beta8 → none

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2016-06-22:

Is this still an issue?

Revision history for this message

Curtis Hovey (sinzui) wrote on 2016-08-03:

I suspect that that juju-ci-tools is not waiting long enough for kill-controller to complete. Azure can take 30 minutes to delete a large deployment, but the timeout is set for 10 minutes. I think this issues will go away when bug 1604102 is fixed.

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2016-08-07:

@Curtis,

The dependent bug 1604102, has been "Fix committed" since on 2016-08-05.

So is this still an issue or has it indeed been addressed?

Anastasia (anastasia-macmood) on 2016-08-07

Changed in juju-core:
importance:	High → Undecided

Canonical Juju QA Bot (juju-qa-bot) on 2016-08-23

affects:

juju-core → juju

Curtis Hovey (sinzui) on 2016-08-25

Changed in juju:
status:	Incomplete → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.