ha tear down causes last controller to be unusable

Bug #1971627 reported by Heather Lanigan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Ian Booth

Bug Description

Found while running (cd tests ; ./main.sh -v controller), reproducible outside of the test as well:

juju 2.9.29, lxd 5.1, jammy/focal host

juju bootstrap localhost
juju enable-ha
wait for highavailability
juju remove-machine -m controller 2
juju remove-machine -m controller 1

juju commands start returning:
ERROR not master and slaveOk=false

Investigation shows that the sole remaining controller is marked as a SECONDARY rather than PRIMARY. Looking at logs for details. It seems to happen as 2nd machine is removed. juju show-controller lists machine 0 as the primary until the commands starts to fail.

There is a timing aspect to this issue.

Recovery steps:

1. Login to the remaining controller's db. https://discourse.charmhub.io/t/login-into-mongodb/309

2. Follow steps here: https://www.mongodb.com/docs/manual/tutorial/reconfigure-replica-set-with-unavailable-members/

juju:SECONDARY> cfg = rs.conf()
juju:SECONDARY> cfg.members = [cfg.members[0]]
juju:SECONDARY> rs.reconfig(cfg, {force : true})

Not reproducible on aws so far. Nor every lxd config.

Tags: enable-ha
Revision history for this message
Heather Lanigan (hmlanigan) wrote (last edit ):

machine-2: 19:52:25 ERROR juju.worker.dependency "machiner" manifold worker returned unexpected error: machine-2 failed to set machine to dead: machine 2 is still a voting controller member
machine-1: 19:52:25 ERROR juju.worker.dependency "machiner" manifold worker returned unexpected error: machine-1 failed to set machine to dead: machine 1 is still a voting controller member
machine-0: 19:52:26 ERROR juju.worker.peergrouper failed to remove dying controller as a controller after removing its vote: controller 1 cannot be removed as it is the last controller
machine-0: 19:52:26 ERROR juju.worker.peergrouper failed to remove dying controller as a controller after removing its vote: controller 2 cannot be removed as it is the last controller
machine-1: 19:52:26 ERROR juju.worker.dependency "is-primary-controller-flag" manifold worker returned unexpected error: connection is shut down
machine-1: 19:52:26 ERROR juju.cmd.jujud.runner fatal "1-container-watcher": worker "1-container-watcher" exited: connection is shut down
machine-0: 19:52:26 ERROR juju.cmd.jujud.runner fatal "0-container-watcher": worker "0-container-watcher" exited: connection is shut down
machine-2: 19:52:26 ERROR juju.cmd.jujud.runner fatal "2-container-watcher": connection is shut down
machine-1: 19:52:26 ERROR juju.worker.dependency "is-primary-controller-flag" manifold worker returned unexpected error: permission denied (unauthorized access)
machine-2: 19:52:27 ERROR juju.worker.dependency "is-primary-controller-flag" manifold worker returned unexpected error: permission denied (unauthorized access)

machine-0: 19:52:37 ERROR juju.worker.peergrouper cannot set replicaset: cannot remove member 2 from replicaset: Reconfig finished but failed to propagate to a majority :: caused by :: Current config with {version: 4, term: 2} has not yet propagated to a majority of nodes :: caused by :: operation was interrupted

machine-0: 19:52:37 ERROR juju.worker.dependency "peer-grouper" manifold worker returned unexpected error: cannot get controller ids: reading controller info: cannot get controllers document: not master and slaveOk=false

"juju-machine-id" : "2" is the mongo replica set member I had to remove from the config in the recovery steps above.

Ian Booth (wallyworld)
Changed in juju:
milestone: none → 2.9.33
assignee: nobody → Ian Booth (wallyworld)
importance: Undecided → High
status: New → In Progress
Ian Booth (wallyworld)
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.