Force removal of the primary (0), IP 10.246.27.32. It does a step-down, and calculates are peer-group change with a single voter to maintain an odd number. It reports success, *but* it is confused as to who the primary is. 10.246.27.45 (machine 1) is reported as self=true. https://pastebin.canonical.com/p/98PVCrsjbx/
An no matter how many times you try to force delete machine 0, even with the container gone, it will not go away.
Restart the whole container under machine 1 to see what happens. It reports success changing the replicaset, but it thinks it's machine 2. https://pastebin.canonical.com/p/tNvrBBVNRH/
Tried bouncing the container for machine 2 as well. No improvement.
All the while we keep reporting the same replica-set.
As an added bonus in this case. Raft gets borked, and we have no singular controller lease. So running enable HA again adds a machine that can never be provisioned.
Reproduced on 3.0
Status: /pastebin. canonical. com/p/Mw4SmjHfP 8/
https:/
Force removal of the primary (0), IP 10.246.27.32. It does a step-down, and calculates are peer-group change with a single voter to maintain an odd number. It reports success, *but* it is confused as to who the primary is. 10.246.27.45 (machine 1) is reported as self=true. /pastebin. canonical. com/p/98PVCrsjb x/
https:/
Despite reporting success, the replica-set does not change. /pastebin. canonical. com/p/FB8ntXsrn 2/
https:/
Over on machine 1, which is reported as the primary when you log into Mongo, it seems to think that it is machine 0. /pastebin. canonical. com/p/d94BpdkpK N/
https:/
It is trying to remove member 1 (machine 0) but keeps failing, despite being primary. /pastebin. canonical. com/p/3rYK5JZJG c/
https:/
It certainly says PRIMARY when you connect directly to it. /pastebin. canonical. com/p/gwbzRd34N d/
https:/
I think we're somehow using sessions with crossed wires.
At this point machine 0 is still not gone. If you force delete the container you get into the situation we observed 2 days ago. /pastebin. canonical. com/p/G2jdnf2hz s/
https:/
An no matter how many times you try to force delete machine 0, even with the container gone, it will not go away.
Restart the whole container under machine 1 to see what happens. It reports success changing the replicaset, but it thinks it's machine 2. /pastebin. canonical. com/p/tNvrBBVNR H/
https:/
Tried bouncing the container for machine 2 as well. No improvement.
All the while we keep reporting the same replica-set.
2023-04-20 09:29:04 DEBUG juju.replicaset replicaset.go:669 current replicaset config: { 27.32:37017" juju-machine-id:0 voting}, 27.45:37017" juju-machine-id:1 voting}, 27.136: 37017" juju-machine-id:2 voting},
Name: juju,
Version: 3,
Term: 11,
Protocol Version: 1,
Members: {
{1 "10.246.
{2 "10.246.
{3 "10.246.
},
}
As an added bonus in this case. Raft gets borked, and we have no singular controller lease. So running enable HA again adds a machine that can never be provisioned.
An altogether miserable state. /pastebin. canonical. com/p/VPbggMVwd Z/
https:/