(hit post too soon) > use presence > db.presence.pings.find() { "_id" : "d557e179-59b0-4f88-86f3-18a80e23b3ae:1502973210", "slot" : NumberLong(1502973210), "alive" : { "0" : NumberLong("2190433320960") } } # Add 30s/60s/90s to the 'slot' and inject values with non-numerics > db.presence.pings.insert({"_id": "d557e179-59b0-4f88-86f3-18a80e23b3ae:1502973240", "slot" : NumberLong(1502973240), "alive" : { "0" : "a"}) > db.presence.pings.insert({"_id": "d557e179-59b0-4f88-86f3-18a80e23b3ae:1502973270", "slot" : NumberLong(1502973270), "alive" : { "0" : "a"}) > db.presence.pings.insert({"_id": "d557e179-59b0-4f88-86f3-18a80e23b3ae:1502973300", "slot" : NumberLong(1502973300), "alive" : { "0" : "a"}) # Watch 'juju status' to see that machine-0 is eventually recorded as 'down' $ watch --color juju status --color # Once it is down, issue a request to enable-ha $ juju enable-ha adding machines: 1, 2, 3 demoting machines: 0 # You can see that it is demoting the one machine that is currently the controller. # If you then watch debug-log you should see messages like: machine-0: 16:33:50 ERROR juju.worker.peergrouper cannot set replicaset: This node, 10.67.99.75:37017, with _id 1 is not electable under the new configuration version 5 for replica set juju $ juju status # shows all the machines as up and happy $ juju show-controller test-ha2: details: ... controller-machines: "1": instance-id: juju-23b3ae-1 ha-status: ha-pending "2": instance-id: juju-23b3ae-2 ha-status: ha-pending "3": instance-id: juju-23b3ae-3 ha-status: ha-pending Which shows that the HA hasn't actually taken effect. > rs.status() { "set" : "juju", "date" : ISODate("2017-08-17T12:20:42.485Z"), "myState" : 1, "term" : NumberLong(1), "heartbeatIntervalMillis" : NumberLong(2000), "members" : [ { "_id" : 1, "name" : "10.67.99.75:37017", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 535, "optime" : { "ts" : Timestamp(1502972442, 1), "t" : NumberLong(1) }, "optimeDate" : ISODate("2017-08-17T12:20:42Z"), "electionTime" : Timestamp(1502971909, 2), "electionDate" : ISODate("2017-08-17T12:11:49Z"), "configVersion" : 4, "self" : true }, { "_id" : 2, "name" : "10.67.99.24:37017", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 204, "optime" : { "ts" : Timestamp(1502972440, 6), "t" : NumberLong(1) }, "optimeDate" : ISODate("2017-08-17T12:20:40Z"), "lastHeartbeat" : ISODate("2017-08-17T12:20:42.011Z"), "lastHeartbeatRecv" : ISODate("2017-08-17T12:20:41.057Z"), "pingMs" : NumberLong(0), "syncingTo" : "10.67.99.75:37017", "configVersion" : 4 }, { "_id" : 3, "name" : "10.67.99.87:37017", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 186, "optime" : { "ts" : Timestamp(1502972440, 6), "t" : NumberLong(1) }, "optimeDate" : ISODate("2017-08-17T12:20:40Z"), "lastHeartbeat" : ISODate("2017-08-17T12:20:42.069Z"), "lastHeartbeatRecv" : ISODate("2017-08-17T12:20:41.080Z"), "pingMs" : NumberLong(0), "syncingTo" : "10.67.99.24:37017", "configVersion" : 4 }, { "_id" : 4, "name" : "10.67.99.53:37017", ... # Shows that machine 0 (_id: 1 above) is currently the primary, but the other nodes have been added to the set. At this point, I did try running: > rs.stepDown() which is supposed to be a way that it will wait up to 10s for another replicaset to get into sync, and then promote that machine to primary. In my testing, I was never able to get rs.stepDown() to actually believe a SECONDARY was up to date enough. I'm guessing the issue is that 'jujud' is still running, which is still generating things like log data, and thus never actually being quiet for long enough that a secondary will ever be fully up-to-date. One option is to stop jujud here. I tried something else with: https://docs.mongodb.com/manual/reference/command/replSetStepDown/#dbcmd.replSetStepDown > db.runCommand( {replSetStepDown: 10, secondaryCatchUpPeriodSecs: 5, force: true}) However, while I saw that my current connection died, and it appeared that Juju itself also got restarted, when I got back in, machine-0 still had PRIMARY on the mongo replica (presumably because it still had a newer replica state than everyone else?) Looking at the logs, and running 'juju enable-ha' still showed that machine 0 had vote but didn't want it, and machines 1-3 wanted the vote but didn't have it (ha-pending). And we did still have: machine-2: 16:48:10 ERROR juju.worker.peergrouper cannot set replicaset: This node, 10.67.99.75:37017, with _id 1 is not electable under the new configuration version 5 for replica set juju So instead, I just stopped Mongo on machine-0 $$ sudo service juju-db stop However, it seems that the SECONDARY all, indeed, were not viable voting candidates under the current rs.conf. Specifically, you ended up with: juju:SECONDARY> rs.status() { "set" : "juju", "date" : ISODate("2017-08-17T12:53:20.080Z"), "myState" : 2, "term" : NumberLong(2), "heartbeatIntervalMillis" : NumberLong(2000), "members" : [ { "_id" : 1, "name" : "10.67.99.75:37017", "health" : 0, "state" : 8, "stateStr" : "(not reachable/healthy)", "uptime" : 0, "optime" : { "ts" : Timestamp(0, 0), "t" : NumberLong(-1) }, "optimeDate" : ISODate("1970-01-01T00:00:00Z"), "lastHeartbeat" : ISODate("2017-08-17T12:53:18.731Z"), "lastHeartbeatRecv" : ISODate("2017-08-17T12:52:01.257Z"), "pingMs" : NumberLong(0), "lastHeartbeatMessage" : "Connection refused", "configVersion" : -1 }, { "_id" : 2, "name" : "10.67.99.24:37017", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 2143, "optime" : { "ts" : Timestamp(1502974318, 2), "t" : NumberLong(2) }, "optimeDate" : ISODate("2017-08-17T12:51:58Z"), "lastHeartbeat" : ISODate("2017-08-17T12:53:18.054Z"), "lastHeartbeatRecv" : ISODate("2017-08-17T12:53:18.054Z"), "pingMs" : NumberLong(0), "configVersion" : 4 }, But *all* the secondaries were non-voting. Looking at rs.conf() only machine-0 (_id: 1) had "votes": 1, all the others had "priority": 0 and "votes": 0. So the system couldn't elect any secondary, because none of them were able to vote. (Which might actually explain why rs.stepDown() never worked.) I *think* the answer here is that we could enforce that whoever is *currently* the primary cannot have its voting rights taken away in the first step.