destroy-controller fails when an application has no machines

Bug #1672549 reported by Curtis Hovey
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Anastasia
2.1
Won't Fix
Undecided
Unassigned

Bug Description

As seen at
    http://reports.vapour.ws/releases/issue/581c994e749a5605e21355ee

The assess_cloud.py test consistently fails to destroy a near empty model to destroy the controllers. The test generates clouds.yaml, bootstraps, deploys ubuntu, and lastly calls remove-unit. The test thinks is is successful and begins to teardown, but destroy-controller fails. We can see that the the controller reports it is removing the application (since the machine was already removed). we see the application removed, but we never see the model destroyed.

This is only seen with the maas and openstack providers.

Other tests do not see this because they do not call remove-unit. Some tests call remove-application and maybe destroy-machine, but non call remove-unit to leave an application with 0 units.

Curtis Hovey (sinzui)
summary: - destroy-comtroller fails when an application has no machines
+ destroy-cobtroller fails when an application has no machines
Changed in juju:
milestone: none → 2.2-rc1
summary: - destroy-cobtroller fails when an application has no machines
+ destroy-controller fails when an application has no machines
Revision history for this message
John A Meinel (jameinel) wrote :

I don't think this is related to bug #1668792 as that is more about "the controller is dead but I'm unable to 'kill-controller' it". This bug is more about 'incomplete teardown of one piece prevents further teardown of another'.

I believe we do have some other related bugs, things like:
  juju add-machine lxd:0

can cause destroy-controller to fail, as it is waiting for the machine to die, but it never actually triggers tear-down to start.
And there is another bug about destroy-controller failing if there is an application with no actual units (failed to provision a machine) where there is a unit in 'dying' but nothing is actually cleaning it up.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

There seems to be a workaround for scenarios where application with no units cannot be removed - restart the agent.

From an IRC conversation:
[22:43:36] <ivy> I think I've resolved the problem! One of the units lost the contact with the controller although the agent was alive... the juju log on the unit said:
[22:43:38] <ivy> 2017-03-23 09:46:31 ERROR juju.worker.dependency engine.go:539 "api-caller" manifold worker returned unexpected error: cannot open api: unable to connect to API: websocket.Dial [details omitted ]getsockopt: no route to host
[22:43:41] <ivy> 2017-03-23 09:46:34 ERROR juju.worker.dependency engine.go:539 "api-caller" manifold worker returned unexpected error: cannot open api: try again (try again)
[22:43:42] <ivy> I restarted the agent and after a while the relation with glance disappeared, and so did the application!

Note this does not help situations where destroy-controller fails. Offending application needs to be removed first, agent restarted and the destroy-controller should succeed.

We still need to address removal of application with no units/machine in code.

Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.2-beta1 → 2.2-beta2
Changed in juju:
milestone: 2.2-beta2 → 2.2-beta3
Revision history for this message
Anastasia (anastasia-macmood) wrote :

There was another oversight in the code that was responsible for model cleanup after destruction. The oversight was corrected as a drive-by fix by
https://github.com/juju/juju/pull/7184/ - a PR against develop which is
heading into 2.2-beta3.

Changed in juju:
status: Triaged → Fix Committed
assignee: nobody → Ian Booth (wallyworld)
Revision history for this message
Anastasia (anastasia-macmood) wrote :

Just saw another re-occurrence. Still looking :D

Changed in juju:
status: Fix Committed → In Progress
assignee: Ian Booth (wallyworld) → Anastasia (anastasia-macmood)
Revision history for this message
Anastasia (anastasia-macmood) wrote :

PR against develop (2.2): https://github.com/juju/juju/pull/7218

Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.