juju add-machine ssh: may not clean up properly on failure
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
juju-core |
Won't Fix
|
Medium
|
Unassigned |
Bug Description
If you are manually provisioning a machine and it is able to create the record in state, but fails at some point before actually setting up the machine (for example the ssh into the host fails), it tries to clean up the record in the database by calling DestroyMachine(
However, because manual provisioning assigns an instance-id, we end up in a case that the API server thinks the machine will come up soon, and so it doesn't want to remove the entry entirely.
A possible fix is that if we see we need to call DestroyMachine, then we also call something to remove the instance-id from the record in the DB. Alternatively, we do something with ForceDestroyMac
There is still the problem that the Machine record for manually provisioned machines *looks* exactly like it was provisioned from the environment, it just has an invalid instanceid which we can't actually kill. (I could certainly see killing a manually provisioned machine causing the Provisioner code to bounce itself when it gets an error trying to Terminate an invalid instance-id.)
It might be that we end up with special state in the Machine doc that clearly marks it as manual, so a regular DestroyMachine acts differently. However, if you actually did have the machine set up with an agent running, we want that agent to clean itself up. This particular bug is different because we know that we failed to actually set up the agent, so we just want the record removed from state.
tags: | added: add-machine |
Changed in juju-core: | |
importance: | High → Medium |
tags: | added: manual-story |
There have been a lot of changes in the area in Juju 2.
If you are not happy with current Juju 2 behavior, please file new bug against "juju" with later information.