2023-04-18 12:04:36 |
Joseph Phillips |
bug |
|
|
added bug |
2023-04-18 12:04:55 |
Joseph Phillips |
description |
Using Juju 3.2, I established HA, then removed machine 0, which was leader for both MongoDB and Dqlite.
I then re-ran enable-ha. Status took a very long time to quiesce, and eventually Dqlite settled its cluster:
- ID: 3297041220608546238
Address: 10.246.27.193:17666
Role: 2
- ID: 11462370002709127048
Address: 10.246.27.31:17666
Role: 0
- ID: 179389882111230348
Address: 10.246.27.194:17666
Role: 0
- ID: 16076074371255822966
Address: 10.246.27.27:17666
Role: 0
As expected, the original node is marked as a stand-by. 3 healthy nodes are all voters.
But Mongo can not update its replica-set.
Controllers show repeated attempts to contact the old leader:
https://pastebin.canonical.com/p/XHXjsmkpS4/
Having gone down to 1 voter, machine 1 can not update the replica-set, because it can't get a quorum:
https://pastebin.canonical.com/p/XKcKqmKXY4/
Machine 3 (the one added by reestablishing HA): Keeps removing the vote from 2 after the desired peer group is calculated in order to maintain an odd voter group:
https://pastebin.canonical.com/p/VP3VqRycnW/
rs.status() on machine 1 and 2 is the same:
https://pastebin.canonical.com/p/DpKDzvKhqY/
I can't connect to Mongo on 3:
https://pastebin.canonical.com/p/HpmTHFZWCX/
Status says we're all healthy:
https://pastebin.canonical.com/p/NjDRkSnrDX/
So it appears that machine 3 has a Mongo segmented from the rest. |
Using Juju 3.2, I established HA, then removed machine 0, which was leader for both MongoDB and Dqlite.
I then re-ran enable-ha. Status took a very long time to quiesce, and eventually Dqlite settled its cluster:
- ID: 3297041220608546238
Address: 10.246.27.193:17666
Role: 2
- ID: 11462370002709127048
Address: 10.246.27.31:17666
Role: 0
- ID: 179389882111230348
Address: 10.246.27.194:17666
Role: 0
- ID: 16076074371255822966
Address: 10.246.27.27:17666
Role: 0
As expected, the original node is marked as a stand-by. 3 healthy nodes are all voters.
But Mongo can not update its replica-set.
Controllers show repeated attempts to contact the old leader:
https://pastebin.canonical.com/p/XHXjsmkpS4/
Having gone down to 1 voter, machine 1 can not update the replica-set, because it can't get a quorum:
https://pastebin.canonical.com/p/XKcKqmKXY4/
Machine 3 (the one added by reestablishing HA): Keeps removing the vote from 2 after the desired peer group is calculated in order to maintain an odd voter group:
https://pastebin.canonical.com/p/VP3VqRycnW/
rs.status() on machine 1 and 2 is the same:
https://pastebin.canonical.com/p/DpKDzvKhqY/
I can't connect to Mongo on 3:
https://pastebin.canonical.com/p/HpmTHFZWCX/
Status says we're all healthy:
https://pastebin.canonical.com/p/NjDRkSnrDX/
So it appears that machine 3 has a Mongo segmented from the rest with no way to get back. |
|
2023-04-18 12:06:45 |
Joseph Phillips |
description |
Using Juju 3.2, I established HA, then removed machine 0, which was leader for both MongoDB and Dqlite.
I then re-ran enable-ha. Status took a very long time to quiesce, and eventually Dqlite settled its cluster:
- ID: 3297041220608546238
Address: 10.246.27.193:17666
Role: 2
- ID: 11462370002709127048
Address: 10.246.27.31:17666
Role: 0
- ID: 179389882111230348
Address: 10.246.27.194:17666
Role: 0
- ID: 16076074371255822966
Address: 10.246.27.27:17666
Role: 0
As expected, the original node is marked as a stand-by. 3 healthy nodes are all voters.
But Mongo can not update its replica-set.
Controllers show repeated attempts to contact the old leader:
https://pastebin.canonical.com/p/XHXjsmkpS4/
Having gone down to 1 voter, machine 1 can not update the replica-set, because it can't get a quorum:
https://pastebin.canonical.com/p/XKcKqmKXY4/
Machine 3 (the one added by reestablishing HA): Keeps removing the vote from 2 after the desired peer group is calculated in order to maintain an odd voter group:
https://pastebin.canonical.com/p/VP3VqRycnW/
rs.status() on machine 1 and 2 is the same:
https://pastebin.canonical.com/p/DpKDzvKhqY/
I can't connect to Mongo on 3:
https://pastebin.canonical.com/p/HpmTHFZWCX/
Status says we're all healthy:
https://pastebin.canonical.com/p/NjDRkSnrDX/
So it appears that machine 3 has a Mongo segmented from the rest with no way to get back. |
Using Juju 3.2, I established HA, then removed machine 0, which was leader for both MongoDB and Dqlite.
I then re-ran enable-ha. Status took a very long time to quiesce, and eventually Dqlite settled its cluster:
- ID: 3297041220608546238
Address: 10.246.27.193:17666
Role: 2
- ID: 11462370002709127048
Address: 10.246.27.31:17666
Role: 0
- ID: 179389882111230348
Address: 10.246.27.194:17666
Role: 0
- ID: 16076074371255822966
Address: 10.246.27.27:17666
Role: 0
As expected, the original node is marked as a stand-by. 3 healthy nodes are all voters.
But Mongo can not update its replica-set.
Controllers show repeated attempts to contact the old leader:
https://pastebin.canonical.com/p/XHXjsmkpS4/
Having gone down to 1 voter, machine 1 can not update the replica-set, because it can't get a quorum:
https://pastebin.canonical.com/p/XKcKqmKXY4/
Machine 3 (the one added by reestablishing HA): Keeps removing the vote from 2 after the desired peer group is calculated in order to maintain an odd voter group
The peer-grouper runs successfully, but it is a different replica set:
https://pastebin.canonical.com/p/VP3VqRycnW/
rs.status() on machine 1 and 2 are the same:
https://pastebin.canonical.com/p/DpKDzvKhqY/
I can't connect to Mongo on 3:
https://pastebin.canonical.com/p/HpmTHFZWCX/
Status says we're all healthy:
https://pastebin.canonical.com/p/NjDRkSnrDX/
So it appears that machine 3 has a Mongo segmented from the rest with no way to get back. |
|
2023-04-18 12:06:59 |
Joseph Phillips |
juju: status |
New |
Triaged |
|
2023-04-18 12:07:02 |
Joseph Phillips |
juju: importance |
Undecided |
High |
|
2023-04-18 12:12:02 |
Joseph Phillips |
summary |
Mongo replica-set not reestablished after looding leader |
Mongo replica-set not reestablished after losing leader |
|
2023-04-18 16:59:39 |
John A Meinel |
nominated for series |
|
juju/3.1 |
|
2023-04-18 16:59:39 |
John A Meinel |
bug task added |
|
juju/3.1 |
|
2023-04-18 16:59:39 |
John A Meinel |
nominated for series |
|
juju/2.9 |
|
2023-04-18 16:59:39 |
John A Meinel |
bug task added |
|
juju/2.9 |
|
2023-04-18 16:59:39 |
John A Meinel |
nominated for series |
|
juju/3.2 |
|
2023-04-18 16:59:39 |
John A Meinel |
bug task added |
|
juju/3.2 |
|
2023-04-18 16:59:48 |
John A Meinel |
juju/2.9: milestone |
|
3.2.0 |
|
2023-04-18 16:59:57 |
John A Meinel |
juju/2.9: milestone |
3.2.0 |
2.9.43 |
|
2023-04-18 17:00:02 |
John A Meinel |
juju/3.1: milestone |
|
3.1.3 |
|
2023-04-18 17:00:06 |
John A Meinel |
juju/3.2: milestone |
|
3.2.0 |
|
2023-04-18 17:00:08 |
John A Meinel |
juju/2.9: status |
New |
Triaged |
|
2023-04-18 17:00:10 |
John A Meinel |
juju/3.1: status |
New |
Triaged |
|
2023-04-18 17:00:11 |
John A Meinel |
juju/3.2: status |
New |
Triaged |
|
2023-04-18 17:00:13 |
John A Meinel |
juju/2.9: importance |
Undecided |
High |
|
2023-04-18 17:00:14 |
John A Meinel |
juju/3.1: importance |
Undecided |
High |
|
2023-04-18 17:00:15 |
John A Meinel |
juju/3.2: importance |
Undecided |
High |
|
2023-04-19 05:13:24 |
Ian Booth |
juju/2.9: assignee |
|
Ian Booth (wallyworld) |
|
2023-05-26 09:44:15 |
Canonical Juju QA Bot |
juju/3.2: milestone |
3.2.0 |
3.2.1 |
|
2023-06-02 10:18:51 |
Canonical Juju QA Bot |
juju/2.9: milestone |
2.9.43 |
2.9.44 |
|
2023-06-13 12:34:06 |
Canonical Juju QA Bot |
juju/3.1: milestone |
3.1.3 |
3.1.4 |
|
2023-06-15 10:48:59 |
Canonical Juju QA Bot |
juju/3.1: milestone |
3.1.4 |
3.1.5 |
|
2023-06-19 13:32:51 |
Joseph Phillips |
juju/2.9: milestone |
2.9.44 |
|
|
2023-06-19 13:32:54 |
Joseph Phillips |
juju/3.1: milestone |
3.1.5 |
|
|
2023-06-19 13:32:56 |
Joseph Phillips |
juju/3.2: milestone |
3.2.1 |
|
|
2023-12-04 13:07:28 |
Joseph Phillips |
juju/3.1: status |
Triaged |
Won't Fix |
|
2023-12-04 13:07:31 |
Joseph Phillips |
juju/3.2: status |
Triaged |
Won't Fix |
|
2023-12-04 13:07:48 |
Joseph Phillips |
juju: status |
Triaged |
Won't Fix |
|
2023-12-04 13:07:51 |
Joseph Phillips |
juju/2.9: status |
Triaged |
Won't Fix |
|