juju ensure-availability should be able to target existing machines

Bug #1394755 reported by Adam Collard
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
High
Nate Finch
1.23
Fix Released
High
Nate Finch

Bug Description

With Juju 1.21b3 I can run "juju ensure-availability --to name1.maas,name2.maas" (specifying maas node names that have not been added to the environment via add-machine).

However, if you first add those machines to the environment with add-machine, ensure-availability --to machine1,machine2 will not work.

Ian Booth (wallyworld)
Changed in juju-core:
milestone: none → 1.22
status: New → Triaged
importance: Undecided → High
Curtis Hovey (sinzui)
tags: added: ha
Mark Ramm (mark-ramm)
Changed in juju-core:
assignee: nobody → Nate Finch (natefinch)
Revision history for this message
Nate Finch (natefinch) wrote :

Wait, can you explain this better? I thought you were asking for ensure-availability --to ... but you're saying that exists, so what exactly are you asking for?

Nate Finch (natefinch)
description: updated
Revision history for this message
Björn Tillenius (bjornt) wrote : Re: [Bug 1394755] Re: juju ensure-availability should be able to target existing machines

On Thu, Dec 18, 2014 at 03:44:29PM -0000, Nate Finch wrote:
> Wait, can you explain this better? I thought you were asking for
> ensure-availability --to ... but you're saying that exists, so what
> exactly are you asking for?

What currently works is

    juju ensure-availability --to machine1.maas,machine2.maas

That will add two new machines to the environment.

What doesn't work is this work flow:

    juju add-machine machine1.maas
    <Added machine 1>
    juju add-machine machine2.maas
    <Added machine 2>
    juju ensure-availability --to 1,2

We might even want the latter to be:

    juju ensure-availability --to lxc:1,lxc:2

Unless there's a reason why the state server shouldn't be in an LXC?

--
Björn Tillenius | https://launchpad.net/~bjornt

Revision history for this message
Kapil Thangavelu (hazmat) wrote :

fwiw, that's fairly equivalent to the extant bug of ha supporting manual
provider, albeit phrased more generically.

On Fri, Dec 19, 2014 at 7:47 AM, Björn Tillenius <email address hidden>
wrote:
>
> On Thu, Dec 18, 2014 at 03:44:29PM -0000, Nate Finch wrote:
> > Wait, can you explain this better? I thought you were asking for
> > ensure-availability --to ... but you're saying that exists, so what
> > exactly are you asking for?
>
> What currently works is
>
> juju ensure-availability --to machine1.maas,machine2.maas
>
> That will add two new machines to the environment.
>
> What doesn't work is this work flow:
>
> juju add-machine machine1.maas
> <Added machine 1>
> juju add-machine machine2.maas
> <Added machine 2>
> juju ensure-availability --to 1,2
>
> We might even want the latter to be:
>
> juju ensure-availability --to lxc:1,lxc:2
>
> Unless there's a reason why the state server shouldn't be in an LXC?
>
>
> --
> Björn Tillenius | https://launchpad.net/~bjornt
>
> --
> You received this bug notification because you are subscribed to juju-
> core.
> https://bugs.launchpad.net/bugs/1394755
>
> Title:
> juju ensure-availability should be able to target existing machines
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1394755/+subscriptions
>

Changed in juju-core:
milestone: 1.22-alpha1 → 1.23
Revision history for this message
Curtis Hovey (sinzui) wrote :

I think this bug is a duplicate of bug 1357760. The underlying issue is that HA cannot use a machine with a jujud on it.

Ian Booth (wallyworld)
Changed in juju-core:
assignee: Nate Finch (natefinch) → nobody
Revision history for this message
Ian Booth (wallyworld) wrote :

There's potentially significant work to implement support for allowing existing machines to be converted to state servers. The workflow associated with provisioning a machine is not set up to change a machine's role to being a state server after it has already been set up. We'd need to change that, plus install mongo, plus convert any existing non state server config in the case where the machine was provisioned to run units, plus ensure there's no conflict if there are units installed etc. It's all quite messy.

Revision history for this message
Dean Henrichsmeyer (dean) wrote :

To be clear, the driver of this is not deployment. When deploying we can easily ensure-availability and then do what we need to do with the machine.

The problem is when you have a deployed environment using available hardware and then a node that is a state server dies. As it stands now, you can't get Juju back into HA because it can't use an existing node. I just wanted to add some clarification to the use case.

Revision history for this message
Nate Finch (natefinch) wrote :

The main problem seems to be when all machines in a maas environment are in use, so if a state server goes down, the only way to replace it is to create a new state server on one of the machine already added to the Juju environment.

Now, the mitigating factor is that the environment will still work with one state server down, that's the whole point of HA.... so you can work until you replace the down machine in maas. However, if it seems like it'll be an extended period, you may want to replace the state server sooner rather than later. To core, this seems like an edge case, but landscape wants this ability.

In this case, the easiest fix for core would be to spin up a container on an existing machine in the environment and put a state server in that container... that way the state server won't collide with any existing deployments on that machine, but it still gives us a redundant machine HA.

Revision history for this message
Kapil Thangavelu (hazmat) wrote :

the clarification is helpful, but the issue also prevents ha from being
useable w/ manual provider which hopefully is not an edge case. the
container workaround sounds viable.

On Thu, Jan 29, 2015 at 10:45 AM, Dean Henrichsmeyer <
<email address hidden>> wrote:

> To be clear, the driver of this is not deployment. When deploying we can
> easily ensure-availability and then do what we need to do with the
> machine.
>
> The problem is when you have a deployed environment using available
> hardware and then a node that is a state server dies. As it stands now,
> you can't get Juju back into HA because it can't use an existing node. I
> just wanted to add some clarification to the use case.
>
> --
> You received this bug notification because you are subscribed to juju-
> core.
> https://bugs.launchpad.net/bugs/1394755
>
> Title:
> juju ensure-availability should be able to target existing machines
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1394755/+subscriptions
>

Revision history for this message
Nate Finch (natefinch) wrote :

Core has actually changed course on the solution, and now thinks that the best way to fix this is the original suggestion - to upgrade an existing machine to become a state server, rather than using a container for the state server. This is what I am currently working on.

Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.23 → 1.24-alpha1
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.24-alpha1 → 1.23-beta1
Nate Finch (natefinch)
Changed in juju-core:
assignee: nobody → Nate Finch (natefinch)
assignee: Nate Finch (natefinch) → Wayne Witzel III (wwitzel3)
assignee: Wayne Witzel III (wwitzel3) → Nate Finch (natefinch)
Changed in juju-core:
status: Triaged → In Progress
Revision history for this message
Nate Finch (natefinch) wrote :

We had some trouble with the infrastructure around watchers but we've gotten past that, and our proof of concepts are working well, so I expect this to get finished up fairly soon.

Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.23-beta1 → 1.23-beta2
Revision history for this message
Nate Finch (natefinch) wrote :

This is coming along great... it works, but requires a manual restart of jujud in the process, so I'm just trying to figure out how to get that to be not manual.

Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.23-beta2 → 1.23-beta3
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.23-beta3 → 1.23-beta4
Revision history for this message
Nate Finch (natefinch) wrote :

This got tested by the Landscape team, and functions to their satisfaction, though there was one confusing bit. When using --to for machines already in juju, you need to use their machine ids, not their maas name. This is primarily because juju doesn't use the maas node names to reference machines already in the environment.

Thus, this is what Landscape tried first:

juju bootstrap --to nodeA
juju machine add nodeB
juju machine add nodeC
juju ensure-availability --to nodeB,nodeC

But, what they needed to do was this:

juju bootstrap --to nodeA
juju machine add nodeB
juju machine add nodeC
juju ensure-availability --to 1,2

It would be nice if we could reference machines in the environment by their node names, but that is currently not supported.

Nate Finch (natefinch)
Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.23-beta4 → 1.24-alpha1
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.