ensure-availability not safe from adding 7+ nodes, can't remove stale ones

Bug #1559062 reported by Mario Splivalo
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Won't Fix
High
Unassigned

Bug Description

I wanted to enable juju-high-availability by running: "juju ensure-availability".

However, while juju was setting up additional state servers I run 'ensure-availability' again. This had juju starting to demote newly added machines and spinning up additional ones.

I can run 'ensure-availability' countless number of times, and juju will spin additional servers each time.

At the end I ended up with 12 machines - three of them were in 'has-vote' status, two of them are in 'no-vote' status, and the rest don't have state-server-member-status. The later ones are easily removed with 'juju remove-machine', but I am stuck with 'no-vote' machines.

Running 'juju ensure-availability' does nothing, and running 'juju remove-machine' on non-voting machines results with juju yielding "ERROR no machines were destroyed: machine is required by the environment".

As I have a customer who accidentally ended up in this situation I'd like to know if there is a way to remove 'non-voting' machines? (Even if it includes altering the juju's mongodb database).

I have tested this, against openstack provider (http://pastebin.ubuntu.com/15414152/). Even when I tell openstack to kill the non-voting machine, juju doesn't seem to be aware if it. 'juju status' still lists that one as active, and 'destroy-machine' returns the same ERROR message. (http://paste.ubuntu.com/15414194/)

It seems that juju should prevent running 'ensure-availability' while state machines are being set up (as once I have replicaset built and juju recognizes them as 'has-vote' members, running 'ensure-availability' does nothing.

Revision history for this message
Mario Splivalo (mariosplivalo) wrote :

Just an update, as I managed to 'clean' juju status from 'obsolete' machines.

Once I deleted non-voting machines with 'nova delete', and waited for some time, I run 'juju ensure-availability' again. Juju then 'removed' the machines from 'state-server-member-state' (group? option?). After that I used 'juju remove-machine --force' to get rid of those machines.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

I see the same behavior on current tip of master.

Changed in juju-core:
status: New → Triaged
importance: Undecided → High
tags: added: ensure-availability
Changed in juju-core:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.