Canonical Juju

enable-ha to an existing machine gave confusing error

Bug #1817564 reported by Tim Penhey on 2019-02-25

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Fix Released	Medium	Joseph Phillips	Canonical Juju 2.7-beta1
	2.6	Fix Released	Medium	Joseph Phillips	Canonical Juju 2.6.4

Bug Description

Was testing enable-ha.

Had a new controller, added one new machine.

$ juju status -m controller
Model Controller Cloud/Region Version SLA Timestamp
controller test localhost/localhost 2.5.2.1 unsupported 03:15:36+13:00

Machine State DNS Inst id Series AZ Message
0 started 10.172.145.216 juju-a6fd6a-0 bionic Running
1 started 10.172.145.89 juju-a6fd6a-1 bionic Running

$ juju enable-ha --to 1
ERROR failed to create new controller machines: availability zone "#:1" not valid

Expected machine 1 to become a new controller machine, and a new machine 2 created to become the third of three controller machines

Tags:

Revision history for this message

Tim Penhey (thumper) wrote on 2019-02-25:

For the record, when I added a second machine

$ juju enable-ha --to 1,2
maintaining machines: 0
converting machines: 1, 2

The command behaved as expected.

Richard Harding (rharding) on 2019-02-26

Changed in juju:
milestone:	none → 2.5.3
assignee:	nobody → Joseph Phillips (manadart)

Canonical Juju QA Bot (juju-qa-bot) on 2019-03-26

Changed in juju:
milestone:	2.5.3 → 2.5.4

Canonical Juju QA Bot (juju-qa-bot) on 2019-04-02

Changed in juju:
milestone:	2.5.4 → 2.5.5

Anastasia (anastasia-macmood) on 2019-05-14

Changed in juju:
milestone:	2.5.6 → 2.5.8

Joseph Phillips (manadart) on 2019-05-16

Changed in juju:
milestone:	2.5.8 → 2.7-beta1
status:	Triaged → In Progress

Joseph Phillips (manadart) on 2019-05-16

Changed in juju:
status:	In Progress → Fix Committed

Revision history for this message

Richard Harding (rharding) wrote on 2019-06-12:

@joe can you please backport this in your AM?

Revision history for this message

Joseph Phillips (manadart) wrote on 2019-06-13:

2.6: https://github.com/juju/juju/pull/10325

Revision history for this message

Pedro Guimarães (pguimaraes) wrote on 2019-06-18:

Hi, I am trying juju 2.6/candidate snap.
I manually added 3 machines to my controller model; run:

juju enable-ha -c controller --to=1,2

That complains on missing juju-ha-space configurations: ERROR juju-ha-space is not set and a unique usable address was not found for machines: 0

Also, during enable-ha, all machines on controller model went to "down" state; although their own systemctl jujud-machine-X was "active(running)".

juju show-machine -m controller 0/1/2 shows that all controllers have 2 networks and they can reach each on one of them.

Here is full crashdump: https://drive.google.com/open?id=15NMlsZ3ZT8oniE52HNLqN9zAcoh50P6O

Some notices: I've added a model and 30 machines to this model before enabling controller HA.
I've run a successful enable-ha on an AWS environment, using 3 machines (bigger, with 8 vcpus and 32G, rather than 4cpus / 8G).

Machines on model started to flip between "started" and "down" after I've ran this command.

This is an ongoing deployment for a customer.

Revision history for this message

Pedro Guimarães (pguimaraes) wrote on 2019-06-18:

I am marking this bug as field-critical since we need enable-ha functionality for manual provider on ongoing deployment.

Revision history for this message

Lou Peers (louie-pe) wrote on 2019-06-18:

Please could someone look at comment #4 (for Pedro) and look into the logs, check if there is anything wrong, because enable-ha was failing on this env. We're currently in the middle of a deployment and this is blocking our progress. Many thanks.

Revision history for this message

Pedro Guimarães (pguimaraes) wrote on 2019-06-18:

On AWS, controllers had only a single interface.

On this deployment, I can also see juju list-machines -m controller:

Machine State DNS Inst id Series AZ Message
0 down IP_1 manual: xenial
1 down IP_2 manual:IP_2 xenial
2 down IP_3 manual:IP_3 xenial

Node 0 has no IP on "Inst id" field.
AWS machines was also that way.

Revision history for this message

Tim Penhey (thumper) wrote on 2019-06-18:

I'm removing the field-critical designation as this is a missing feature of the manual provider and doesn't match the SLA guidelines.

The manual provider does not support spaces, so the error message being written to the logs is unhelpful. The fundamental here is that the machine that Juju has been given has multiple network addresses and Juju won't guess which one to use.

In order to have the controllers work with a manual machine the controllers should have just one non-local address.

Anastasia (anastasia-macmood) on 2019-12-09

Changed in juju:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1832393

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.