Can't remove down controller from HA cluster, even though cluster has minimum of three required nodes

Bug #1690573 reported by Craig Bender
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Won't Fix
Low
Unassigned

Bug Description

Lost a controller, added a new controller, but cannot remove the down controller.

Machine State DNS Inst id Series AZ
0 down 10.1x4.xx.50 xdatne xenial default
1 started 10.xx44.xx.56 n3asas xenial default
2 started 10.1x4.xx.62 gxa7wk xenial default
4 started 10.1x4.xx.176 ph4b8x xenial default

ubuntu@osinfra-ch2-g01:~$ juju remove-machine 0 --force
ERROR no machines were destroyed: machine is required by the model

Remaining nodes spend a lot of time trying to talk to down node.

Tags: enable-ha juju
Revision history for this message
Craig Bender (craig-bender) wrote :

Have to give it a juju enable-ha the -n3 flag, then down controller can be removed.
Doc is missing that bit.

Recovery section: https://jujucharms.com/docs/2.1/controllers-ha

Also the doc should probably mention that constraints used on initial HA enablement need to be used, otherwise it will grab any machine.

Perhaps that's a separate bug.

Revision history for this message
Witold Krecicki (wpk) wrote : Re: [Bug 1690573] [NEW] Can't remove down controller from HA cluster, even though cluster has minimum of three required nodes

Could you paste output of juju show-controller and juju show-machines?

13.05.2017 20:15 "Craig Bender" <email address hidden> napisał(a):

> Public bug reported:
>
> Lost a controller, added a new controller, but cannot remove the down
> controller.
>
> Machine State DNS Inst id Series AZ
> 0 down 10.1x4.xx.50 xdatne xenial default
> 1 started 10.xx44.xx.56 n3asas xenial default
> 2 started 10.1x4.xx.62 gxa7wk xenial default
> 4 started 10.1x4.xx.176 ph4b8x xenial default
>
> ubuntu@osinfra-ch2-g01:~$ juju remove-machine 0 --force
> ERROR no machines were destroyed: machine is required by the model
>
> Remaining nodes spend a lot of time trying to talk to down node.
>
> ** Affects: juju
> Importance: Undecided
> Status: New
>
>
> ** Tags: enable-ha juju
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju-bugs
> https://bugs.launchpad.net/bugs/1690573
>
> Title:
> Can't remove down controller from HA cluster, even though cluster has
> minimum of three required nodes
>
> Status in juju:
> New
>
> Bug description:
> Lost a controller, added a new controller, but cannot remove the down
> controller.
>
> Machine State DNS Inst id Series AZ
> 0 down 10.1x4.xx.50 xdatne xenial default
> 1 started 10.xx44.xx.56 n3asas xenial default
> 2 started 10.1x4.xx.62 gxa7wk xenial default
> 4 started 10.1x4.xx.176 ph4b8x xenial default
>
> ubuntu@osinfra-ch2-g01:~$ juju remove-machine 0 --force
> ERROR no machines were destroyed: machine is required by the model
>
> Remaining nodes spend a lot of time trying to talk to down node.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1690573/+subscriptions
>

Revision history for this message
Witold Krecicki (wpk) wrote :

Also, try juju enable-ha -n 3 and then removing the down machine.

13.05.2017 20:15 "Craig Bender" <email address hidden> napisał(a):

> Public bug reported:
>
> Lost a controller, added a new controller, but cannot remove the down
> controller.
>
> Machine State DNS Inst id Series AZ
> 0 down 10.1x4.xx.50 xdatne xenial default
> 1 started 10.xx44.xx.56 n3asas xenial default
> 2 started 10.1x4.xx.62 gxa7wk xenial default
> 4 started 10.1x4.xx.176 ph4b8x xenial default
>
> ubuntu@osinfra-ch2-g01:~$ juju remove-machine 0 --force
> ERROR no machines were destroyed: machine is required by the model
>
> Remaining nodes spend a lot of time trying to talk to down node.
>
> ** Affects: juju
> Importance: Undecided
> Status: New
>
>
> ** Tags: enable-ha juju
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju-bugs
> https://bugs.launchpad.net/bugs/1690573
>
> Title:
> Can't remove down controller from HA cluster, even though cluster has
> minimum of three required nodes
>
> Status in juju:
> New
>
> Bug description:
> Lost a controller, added a new controller, but cannot remove the down
> controller.
>
> Machine State DNS Inst id Series AZ
> 0 down 10.1x4.xx.50 xdatne xenial default
> 1 started 10.xx44.xx.56 n3asas xenial default
> 2 started 10.1x4.xx.62 gxa7wk xenial default
> 4 started 10.1x4.xx.176 ph4b8x xenial default
>
> ubuntu@osinfra-ch2-g01:~$ juju remove-machine 0 --force
> ERROR no machines were destroyed: machine is required by the model
>
> Remaining nodes spend a lot of time trying to talk to down node.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1690573/+subscriptions
>

Revision history for this message
John A Meinel (jameinel) wrote :

I walked him through that last night. This bug is essentially that the docs
don't guide you through that well.

John
=:->

On May 13, 2017 22:25, "Witold Krecicki" <email address hidden> wrote:

> Also, try juju enable-ha -n 3 and then removing the down machine.
>
> 13.05.2017 20:15 "Craig Bender" <email address hidden> napisał(a):
>
> > Public bug reported:
> >
> > Lost a controller, added a new controller, but cannot remove the down
> > controller.
> >
> > Machine State DNS Inst id Series AZ
> > 0 down 10.1x4.xx.50 xdatne xenial default
> > 1 started 10.xx44.xx.56 n3asas xenial default
> > 2 started 10.1x4.xx.62 gxa7wk xenial default
> > 4 started 10.1x4.xx.176 ph4b8x xenial default
> >
> > ubuntu@osinfra-ch2-g01:~$ juju remove-machine 0 --force
> > ERROR no machines were destroyed: machine is required by the model
> >
> > Remaining nodes spend a lot of time trying to talk to down node.
> >
> > ** Affects: juju
> > Importance: Undecided
> > Status: New
> >
> >
> > ** Tags: enable-ha juju
> >
> > --
> > You received this bug notification because you are subscribed to juju.
> > Matching subscriptions: juju-bugs
> > https://bugs.launchpad.net/bugs/1690573
> >
> > Title:
> > Can't remove down controller from HA cluster, even though cluster has
> > minimum of three required nodes
> >
> > Status in juju:
> > New
> >
> > Bug description:
> > Lost a controller, added a new controller, but cannot remove the down
> > controller.
> >
> > Machine State DNS Inst id Series AZ
> > 0 down 10.1x4.xx.50 xdatne xenial default
> > 1 started 10.xx44.xx.56 n3asas xenial default
> > 2 started 10.1x4.xx.62 gxa7wk xenial default
> > 4 started 10.1x4.xx.176 ph4b8x xenial default
> >
> > ubuntu@osinfra-ch2-g01:~$ juju remove-machine 0 --force
> > ERROR no machines were destroyed: machine is required by the model
> >
> > Remaining nodes spend a lot of time trying to talk to down node.
> >
> > To manage notifications about this bug go to:
> > https://bugs.launchpad.net/juju/+bug/1690573/+subscriptions
> >
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1690573
>
> Title:
> Can't remove down controller from HA cluster, even though cluster has
> minimum of three required nodes
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1690573/+subscriptions
>

Revision history for this message
Tim Penhey (thumper) wrote :

Well, if enable-ha got the system into a situation where there were three active working HA machines in the cluster, then 'remove-machine 0' *should* just do the right thing.

Changed in juju:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Joseph Phillips (manadart) wrote :

Mongo peer-group logic has been overhauled since this report.

Changed in juju:
status: Triaged → Won't Fix
importance: High → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.