Mongo unable to join ReplicaSet (ha-pending)

Bug #1704072 reported by John A Meinel
32
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
John A Meinel
2.2
Fix Released
High
John A Meinel

Bug Description

As originally reported on https://bugs.launchpad.net/juju/2.2/+bug/1701275

See especially:
https://bugs.launchpad.net/juju/2.2/+bug/1701275/comments/6
https://bugs.launchpad.net/juju/2.2/+bug/1701275/comments/21
https://bugs.launchpad.net/juju/2.2/+bug/1701275/comments/24
https://bugs.launchpad.net/juju/2.2/+bug/1701275/comments/29
https://bugs.launchpad.net/juju/2.2/+bug/1701275/comments/30

One possibility that I may have encountered is that we traditionally try to use the machine id as the mongo replica-set id. Something seems to have changed where 'machine-0' no longer gets id=0, and instead gets id=1.

Which then means if your next machine to be added to the replica set is 'machine-1', then it, too, wants 'id=1', and you get a collision.

That would seem to match what I saw locally, and what was reported with
https://bugs.launchpad.net/juju/2.2/+bug/1701275/comments/30

I wonder if it is a case of Mongo 3.2.X starting with id=0, and now Mongo 3.2.Y is forcing our first entry to be id=1 somehow.

John A Meinel (jameinel)
Changed in juju:
status: Triaged → In Progress
assignee: nobody → John A Meinel (jameinel)
Revision history for this message
Alvaro Uria (aluria) wrote :

rs.conf shared on bug 1704075

tags: added: 4010
Revision history for this message
John A Meinel (jameinel) wrote :

I'm on holiday for 1.5weeks. If nobody picks this up in the mean time, I'll try and pick it up once I get back.

Changed in juju:
assignee: John A Meinel (jameinel) → nobody
status: In Progress → Triaged
tags: added: eda
tags: added: cdo-qa cdo-qa-blocker
tags: added: foundations-engine
John A Meinel (jameinel)
Changed in juju:
assignee: nobody → John A Meinel (jameinel)
Revision history for this message
John A Meinel (jameinel) wrote :

One of the ways that I was able to reproduce this, is by getting Juju into a state where it wants to change the replicaset to exclude whoever the current primary of Mongo is.

The specific steps I used was:
 $ juju bootstrap lxd
 $ juju switch controller
 $ juju ssh 0
 # connect to Mongo, inject bad data into the db.presence.pings table so that the presence data is
 # forced to say that machine 0 is dead
 # I did that with:
 $$ agent=$(cd /var/lib/juju/agents; echo machine-*)
 $$ pw=$(sudo grep statepassword /var/lib/juju/agents/${agent}/agent.conf | cut '-d ' -sf2)
 $$ /usr/lib/juju/mongo3.2/bin/mongo --ssl -u ${agent} -p $pw --authenticationDatabase admin --sslAllowInvalidHostnames --sslAllowInvalidCertificates localhost:37017/juju

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (8.4 KiB)

(hit post too soon)

 > use presence
 > db.presence.pings.find()
 { "_id" : "d557e179-59b0-4f88-86f3-18a80e23b3ae:1502973210", "slot" : NumberLong(1502973210), "alive" : { "0" : NumberLong("2190433320960") } }
 # Add 30s/60s/90s to the 'slot' and inject values with non-numerics
 > db.presence.pings.insert({"_id": "d557e179-59b0-4f88-86f3-18a80e23b3ae:1502973240", "slot" : NumberLong(1502973240), "alive" : { "0" : "a"})
 > db.presence.pings.insert({"_id": "d557e179-59b0-4f88-86f3-18a80e23b3ae:1502973270", "slot" : NumberLong(1502973270), "alive" : { "0" : "a"})
 > db.presence.pings.insert({"_id": "d557e179-59b0-4f88-86f3-18a80e23b3ae:1502973300", "slot" : NumberLong(1502973300), "alive" : { "0" : "a"})

# Watch 'juju status' to see that machine-0 is eventually recorded as 'down'
 $ watch --color juju status --color
# Once it is down, issue a request to enable-ha
 $ juju enable-ha
 adding machines: 1, 2, 3
 demoting machines: 0

# You can see that it is demoting the one machine that is currently the controller.
# If you then watch debug-log you should see messages like:
machine-0: 16:33:50 ERROR juju.worker.peergrouper cannot set replicaset: This node, 10.67.99.75:37017, with _id 1 is not electable under the new configuration version 5 for replica set juju

 $ juju status # shows all the machines as up and happy
 $ juju show-controller
 test-ha2:
   details:
 ...
   controller-machines:
     "1":
       instance-id: juju-23b3ae-1
       ha-status: ha-pending
     "2":
       instance-id: juju-23b3ae-2
       ha-status: ha-pending
     "3":
       instance-id: juju-23b3ae-3
       ha-status: ha-pending

Which shows that the HA hasn't actually taken effect.

 > rs.status()
 {
        "set" : "juju",
        "date" : ISODate("2017-08-17T12:20:42.485Z"),
        "myState" : 1,
        "term" : NumberLong(1),
        "heartbeatIntervalMillis" : NumberLong(2000),
        "members" : [
                {
                        "_id" : 1,
                        "name" : "10.67.99.75:37017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 535,
                        "optime" : {
                                "ts" : Timestamp(1502972442, 1),
                                "t" : NumberLong(1)
                        },
                        "optimeDate" : ISODate("2017-08-17T12:20:42Z"),
                        "electionTime" : Timestamp(1502971909, 2),
                        "electionDate" : ISODate("2017-08-17T12:11:49Z"),
                        "configVersion" : 4,
                        "self" : true
                },
                {
                        "_id" : 2,
                        "name" : "10.67.99.24:37017",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 204,
                        "optime" : {
                                "ts" : Timestamp(1502972440, 6),
                                "t" : NumberLong(1)
                        },
                        "optimeDate" : ISODate("2017-08-17T12:20:40Z"),...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote :

I went in and did a manual 'rs.reconfig()' to set all of the machines to "votes": 1.
At which point I could do 'rs.stepDown()' on the current PRIMARY and it happily transferred the vote to a different machine (in this case machine-3).
However, because I now had an even number of voting machines, the logic was broken for peer-grouper, which assumes that it is starting with an odd number of voting machines, and only works pair-wise to keep an odd number of machines. (It may replace one machine with another, or add 2 machines, but it never removes/adds a single machine.)

I think the issue is that it ended up with 4 machines that kind-of wanted to be voting, and couldn't work out how to get to an odd number, along with the fact that you cannot directly demote the current primary out of the replicaset.

Revision history for this message
John A Meinel (jameinel) wrote :

One thing that might be valid is for us to just never demote whoever is the current primary. And maybe follow up the reconfigure with the equivalent of rs.stepDown() if we know that we do want the primary to be removed from the configuration so that once Mongo allows us to connect again it will let us step it down.

John A Meinel (jameinel)
Changed in juju:
status: Triaged → In Progress
Revision history for this message
John A Meinel (jameinel) wrote :

Original thoughts on using machine-ids was wrong. We instead use loops that check for what the 'max' id is in the current list, and then increment from there. Going off:
https://bugs.launchpad.net/juju/2.2/+bug/1701275/comments/30

Something was causing us to not determine that '1' was already in use, still investigating that part.

Revision history for this message
John A Meinel (jameinel) wrote :

The function that assigns replicaset ids is here:
https://github.com/juju/replicaset/blob/master/replicaset.go#L290

From what I read of that code, it takes a lists of members to set and compares it with the current config to figure out what Ids might already be in use.

It starts with 'max == 0', and then goes through the existing config.Members to find the ids, and compares it with max.
It then goes through the members, and for members that *don't* have an Id already set, it sets them to max++.

So one *possibility*, would be that we somehow ended up with a CurrentConfig that has a member with replicaset Id 0 (thus not originally changing 'max') and then the members list we are passing in *also* has a member with Id = 1, and other members without their Ids set.

I think a potential fix here is that we should also iterate over the 'members' list and consider those ids also as part of the potential values that we need to avoid.

I haven't found any way (yet) to actually trigger this configuration, its only been by code inspection that I've found any way to match the comment 30 from bug #1701275.

Something like this:
diff --git a/replicaset.go b/replicaset.go
index ed9cc48..7440435 100644
--- a/replicaset.go
+++ b/replicaset.go
@@ -307,6 +307,12 @@ func Set(session *mgo.Session, members []Member) error {
                        max = m.Id
                }
        }
+ // Also check if any of the members being passed in already have an Id that we would be reusing.
+ for _, m := range members {
+ if m.Id > max {
+ max = m.Id
+ }
+ }

        for x, m := range members {
                if id, ok := ids[m.Address]; ok {

Revision history for this message
John A Meinel (jameinel) wrote :

https://github.com/juju/replicaset/pull/4
  and
https://github.com/jameinel/juju/tree/2.2-replicaset-primary-1704072

Are possible fixes for these bugs.

They both need a fair bit of testing updates and manual testing to be confident that they concretely solve the various issues. But they at least seem to be on the right track.

Changed in juju:
milestone: 2.2.3 → 2.3-alpha1
Revision history for this message
John A Meinel (jameinel) wrote :

https://github.com/juju/juju/pull/7849 is merging replicaset into core.

Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → In Progress
Ante Karamatić (ivoks)
tags: added: cpe-onsite
Changed in juju:
milestone: 2.3-beta1 → 2.3-beta2
Revision history for this message
Anastasia (anastasia-macmood) wrote :

I suspect that the fix for this has been already ported into 2.3 (develop) as part of one of the larger merges we have undertaken recently.

Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
milestone: 2.3-beta2 → 2.3-beta1
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.