enable-ha to an existing machine gave confusing error

Bug #1817564 reported by Tim Penhey on 2019-02-25
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
juju
Medium
Joseph Phillips
2.6
Medium
Joseph Phillips

Bug Description

Was testing enable-ha.

Had a new controller, added one new machine.

$ juju status -m controller
Model Controller Cloud/Region Version SLA Timestamp
controller test localhost/localhost 2.5.2.1 unsupported 03:15:36+13:00

Machine State DNS Inst id Series AZ Message
0 started 10.172.145.216 juju-a6fd6a-0 bionic Running
1 started 10.172.145.89 juju-a6fd6a-1 bionic Running

$ juju enable-ha --to 1
ERROR failed to create new controller machines: availability zone "#:1" not valid

Expected machine 1 to become a new controller machine, and a new machine 2 created to become the third of three controller machines

Tim Penhey (thumper) wrote :

For the record, when I added a second machine

$ juju enable-ha --to 1,2
maintaining machines: 0
converting machines: 1, 2

The command behaved as expected.

Changed in juju:
milestone: none → 2.5.3
assignee: nobody → Joseph Phillips (manadart)
Changed in juju:
milestone: 2.5.3 → 2.5.4
Changed in juju:
milestone: 2.5.4 → 2.5.5
Changed in juju:
milestone: 2.5.6 → 2.5.8
Changed in juju:
milestone: 2.5.8 → 2.7-beta1
status: Triaged → In Progress
Changed in juju:
status: In Progress → Fix Committed
Richard Harding (rharding) wrote :

@joe can you please backport this in your AM?

Pedro Guimarães (pguimaraes) wrote :

Hi, I am trying juju 2.6/candidate snap.
I manually added 3 machines to my controller model; run:

juju enable-ha -c controller --to=1,2

That complains on missing juju-ha-space configurations: ERROR juju-ha-space is not set and a unique usable address was not found for machines: 0

Also, during enable-ha, all machines on controller model went to "down" state; although their own systemctl jujud-machine-X was "active(running)".

juju show-machine -m controller 0/1/2 shows that all controllers have 2 networks and they can reach each on one of them.

Here is full crashdump: https://drive.google.com/open?id=15NMlsZ3ZT8oniE52HNLqN9zAcoh50P6O

Some notices: I've added a model and 30 machines to this model before enabling controller HA.
I've run a successful enable-ha on an AWS environment, using 3 machines (bigger, with 8 vcpus and 32G, rather than 4cpus / 8G).

Machines on model started to flip between "started" and "down" after I've ran this command.

This is an ongoing deployment for a customer.

Pedro Guimarães (pguimaraes) wrote :

I am marking this bug as field-critical since we need enable-ha functionality for manual provider on ongoing deployment.

Lou Peers (louie-pe) wrote :

Please could someone look at comment #4 (for Pedro) and look into the logs, check if there is anything wrong, because enable-ha was failing on this env. We're currently in the middle of a deployment and this is blocking our progress. Many thanks.

Pedro Guimarães (pguimaraes) wrote :

On AWS, controllers had only a single interface.

On this deployment, I can also see juju list-machines -m controller:

Machine State DNS Inst id Series AZ Message
0 down IP_1 manual: xenial
1 down IP_2 manual:IP_2 xenial
2 down IP_3 manual:IP_3 xenial

Node 0 has no IP on "Inst id" field.
AWS machines was also that way.

Tim Penhey (thumper) wrote :

I'm removing the field-critical designation as this is a missing feature of the manual provider and doesn't match the SLA guidelines.

The manual provider does not support spaces, so the error message being written to the logs is unhelpful. The fundamental here is that the machine that Juju has been given has multiple network addresses and Juju won't guess which one to use.

In order to have the controllers work with a manual machine the controllers should have just one non-local address.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers