machine destruction depends on machine agents

Bug #1217781 reported by William Reade
42
This bug affects 6 people
Affects Status Importance Assigned to Milestone
juju-core
High
Dimiter Naydenov

Bug Description

When a machine has been provisioned but the agent hasn't yet come up, it can't be removed; this is a problem particularly if there's a problem that will cause the machine agent to *never* come up. We can and probably should handle this in the provisioner, around provisioner_task.go:199

        switch machine.Life() {
        case state.Dying:
            if _, err := machine.InstanceId(); err == nil {
                continue

...but the obvious fix -- EnsureDead if agent not started -- suffers from the same slight weirdness as unit destruction does, that there's no clear handoff of responsibility (other than the agent setting a status, and EnsureDead doesn't handle that). So it'd be racy but probably not harmfully so; two things racing to set dead shouldn't be a problem, and the MA should gracefully handle being set to dead from outside.

See also lp:1190715 for context.

William Reade (fwereade)
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
Jonathan Davies (jpds)
tags: added: cts
Revision history for this message
Kapil Thangavelu (hazmat) wrote :

This also needs to cope with machines that have been deleted externally.

Chris J Arges (arges)
tags: added: cts-cloud-review
removed: cts
Chris J Arges (arges)
tags: added: cts
John A Meinel (jameinel)
Changed in juju-core:
milestone: none → 1.16.0
Revision history for this message
William Reade (fwereade) wrote :

Destroying any units assigned to unprovisioned machines allows us to set those machines to dying; the CodeNotProvisioned handling in provisioner_task.go is confirmed to clean up dying, unprovisioned machines.

Changed in juju-core:
status: Triaged → Fix Committed
Tim Penhey (thumper)
Changed in juju-core:
milestone: 1.16.0 → 1.15.1
Mark Ramm (mark-ramm)
Changed in juju-core:
assignee: nobody → Dimiter Naydenov (dimitern)
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
tags: removed: cts
Revision history for this message
Matt Rae (mattrae) wrote :

Hi I'm using juju 1.16.2 and machines that never came up are still do not get terminated. They stay in the 'dying' state forever (over night at least).

$ juju -v bootstrap
$ juju status
environment: maas
machines:
"0":
agent-state: started
agent-version: 1.16.2
dns-name: node1001.cloud
instance-id: /MAAS/api/1.0/nodes/node-70cc4f36-3730-11e3-a5f3-525400c5247e/
series: precise
services: {}
$ juju deploy ubuntu

# now power off the machine that was allocated at some point before the machine agent comes up.

$ juju destroy-service ubuntu
$ juju terminate-machine 1
$ juju status
environment: maas
machines:
"0":
agent-state: started
agent-version: 1.16.2
dns-name: node1001.cloud
instance-id: /MAAS/api/1.0/nodes/node-70cc4f36-3730-11e3-a5f3-525400c5247e/
series: precise
"1":
agent-state: pending
dns-name: node1003.cloud
instance-id: /MAAS/api/1.0/nodes/node-77bece04-3730-11e3-a5f3-525400c5247e/
life: dying
series: precise
services: {}

$ apt-cache policy juju-core
juju-core:
Installed: 1.16.2-0ubuntu1~ubuntu12.04.1~juju1
Candidate: 1.16.2-0ubuntu1~ubuntu12.04.1~juju1
Version table:
*** 1.16.2-0ubuntu1~ubuntu12.04.1~juju1 0
500 http://ppa.launchpad.net/juju/stable/ubuntu/ precise/main amd64 Packages
100 /var/lib/dpkg/status
1.16.0-0ubuntu1~ctools1 0
500 http://10.0.0.1/ubuntu-cloud/ubuntu/ precise-updates/cloud-tools/main amd64 Packages

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.04.3 LTS
Release: 12.04
Codename: precise

Revision history for this message
William Reade (fwereade) wrote :

Machines corresponding to instances shut down out of band aren't automatically destroyed by juju, because we can't always trust what providers tell us (in ec2,for instance, eventual consistency can sometimes be very eventual). lp:1089289 is now fixed in trunk, and being backported to 1.16, for release in 1.16.4; those machines can now be removed with "juju destroy-machine --force".

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers