Mongo replica-set not reestablished after losing leader
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Won't Fix
|
High
|
Unassigned | ||
2.9 |
Won't Fix
|
High
|
Ian Booth | ||
3.1 |
Won't Fix
|
High
|
Unassigned | ||
3.2 |
Won't Fix
|
High
|
Unassigned |
Bug Description
Using Juju 3.2, I established HA, then removed machine 0, which was leader for both MongoDB and Dqlite.
I then re-ran enable-ha. Status took a very long time to quiesce, and eventually Dqlite settled its cluster:
- ID: 3297041220608546238
Address: 10.246.27.193:17666
Role: 2
- ID: 114623700027091
Address: 10.246.27.31:17666
Role: 0
- ID: 179389882111230348
Address: 10.246.27.194:17666
Role: 0
- ID: 160760743712558
Address: 10.246.27.27:17666
Role: 0
As expected, the original node is marked as a stand-by. 3 healthy nodes are all voters.
But Mongo can not update its replica-set.
Controllers show repeated attempts to contact the old leader:
https:/
Having gone down to 1 voter, machine 1 can not update the replica-set, because it can't get a quorum:
https:/
Machine 3 (the one added by reestablishing HA): Keeps removing the vote from 2 after the desired peer group is calculated in order to maintain an odd voter group
The peer-grouper runs successfully, but it is a different replica set:
https:/
rs.status() on machine 1 and 2 are the same:
https:/
I can't connect to Mongo on 3:
https:/
Status says we're all healthy:
https:/
So it appears that machine 3 has a Mongo segmented from the rest with no way to get back.
description: | updated |
description: | updated |
Changed in juju: | |
status: | New → Triaged |
importance: | Undecided → High |
summary: |
- Mongo replica-set not reestablished after looding leader + Mongo replica-set not reestablished after losing leader |
Joe and I dug into this for a bit today. It looks like what is happening is that we start with HA 3, and machine-0,-1,-2 are all happily in the replicaset.
However, when we 'juju remove-machine 0', it goes into dying, and then the replicaset notices that it wants to remove machine 0, so it kicks off the process to have machine 0 lose its primary status (which it does, and in this case it moved to 1).
At which point, we notice that we can remove machine 0 safely, and that ultimately we want to go to HA 2, where we have only 1 voter, and another non-voter backup.
So we end up going from:
{
0: true,
1: true,
2: true,
}
to
{
1: true,
2: false,
}
However, Mongo refuses that change, because you are changing 2 voters simultaneously (machine 0 is going from voting to non-voting and removed, and machine-2 is going to non-voting.)
If we logged into the mongo shell, and directly changed only machine-0 so that it was no longer in the replicaset, the rest of the peergrouper worker kicked off and we ended up in the stable desired target. (To prove this, I suppose we could try the opposite and only downgrade machine-2 to non-voting and see if the peergrouper successfully removes machine-0).
I think the logic that we have to 'only do one change at a time' is missing something about handling removals.
The other piece is that when we tried to issue the request for removal and downgrade directly on the mongodb commandline, we got an error from mongo. For whatever reason, the peergrouper is not reporting an error in the logs. So somehow we are suppressing the error.
It may be that we are using 'force' and mongo isn't erroring, but also isn't respecting our request. Or some code path where we fail to notice the response from mongo.