Comment 3 for bug 900873

Revision history for this message
Juan L. Negron (negronjl) wrote : Re: [Bug 900873] Re: Automatically terminate machines that do not register with ZK

Maybe add a --force to juju terminate-machine that will override whatever
error/warning is currently associated with the "machine not available"
scenario.

My two cents ...

Thanks,

Juan

On Wed, Dec 7, 2011 at 2:51 PM, Clint Byrum <email address hidden> wrote:

> There is definitely a scenario where we start an instance and it fails
> to start the agent for uncontrollable reasons. It could get expensive if
> its a terminal problem (such as the AMI the user has chosen doesn't
> support cloud-init properly or some such thing). Start it, pay $0.08,
> fails, start a new one.. pay $0.08 ... fails.. we MUST guard against
> that scenario.
>
> I think the current status that is shown in juju status is enough to
> allow users to decide for themselves when to give up on an instance.
> What would be useful would be the ability to use 'juju terminate-
> machine' while it is still "pending", which right now will fail because
> it is "not available". By enabling that, you allow the user to say "I've
> waited long enough, kill it" and the provisioner will then select a new
> destination for the unit. Of course, the user could just work around the
> issue by adding a new unit, and removing the failing one.
>
> I think ultimately, we can only automate what happens after the agent
> has checked in. Until then, we give up control to whatever the provider
> does, and so , we cannot do *anything* intelligently in juju except tell
> the user that we're in that state.
>
> Leaving New/Undecided for now, but I'd suggest that this be changed to
> suggest a documentation item for troubleshooting.. "What do I do if my
> instance does not come up?"
>
> --
> You received this bug notification because you are subscribed to juju.
> https://bugs.launchpad.net/bugs/900873
>
> Title:
> Automatically terminate machines that do not register with ZK
>
> Status in juju:
> New
>
> Bug description:
> Machines that fail to come up after being provisioned should be
> automatically terminated. This seems to be rare, but can potentially
> happen. See this blog post:
> http://www.outflux.net/blog/archives/2011/12/05/ec2-instances-in-
> support-of-a-bsp/
>
> We will need to define some sort of reasonable heuristic for this,
> given eventuality and the fact that this sort of automation can
> readily cascade into other issues. For providers like EC2, this could
> also rapidly increase costs as machines are attempted to be brought
> up, and then are terminated.
>
> Tread carefully, in other words.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/900873/+subscriptions
>
>