Canonical Juju

Bug #2016868
Activity log

Activity log for bug #2016868

Date	Who	What changed	Old value	New value	Message
2023-04-18 12:04:36	Joseph Phillips	bug			added bug
2023-04-18 12:04:55	Joseph Phillips	description	Using Juju 3.2, I established HA, then removed machine 0, which was leader for both MongoDB and Dqlite. I then re-ran enable-ha. Status took a very long time to quiesce, and eventually Dqlite settled its cluster: - ID: 3297041220608546238 Address: 10.246.27.193:17666 Role: 2 - ID: 11462370002709127048 Address: 10.246.27.31:17666 Role: 0 - ID: 179389882111230348 Address: 10.246.27.194:17666 Role: 0 - ID: 16076074371255822966 Address: 10.246.27.27:17666 Role: 0 As expected, the original node is marked as a stand-by. 3 healthy nodes are all voters. But Mongo can not update its replica-set. Controllers show repeated attempts to contact the old leader: https://pastebin.canonical.com/p/XHXjsmkpS4/ Having gone down to 1 voter, machine 1 can not update the replica-set, because it can't get a quorum: https://pastebin.canonical.com/p/XKcKqmKXY4/ Machine 3 (the one added by reestablishing HA): Keeps removing the vote from 2 after the desired peer group is calculated in order to maintain an odd voter group: https://pastebin.canonical.com/p/VP3VqRycnW/ rs.status() on machine 1 and 2 is the same: https://pastebin.canonical.com/p/DpKDzvKhqY/ I can't connect to Mongo on 3: https://pastebin.canonical.com/p/HpmTHFZWCX/ Status says we're all healthy: https://pastebin.canonical.com/p/NjDRkSnrDX/ So it appears that machine 3 has a Mongo segmented from the rest.	Using Juju 3.2, I established HA, then removed machine 0, which was leader for both MongoDB and Dqlite. I then re-ran enable-ha. Status took a very long time to quiesce, and eventually Dqlite settled its cluster: - ID: 3297041220608546238 Address: 10.246.27.193:17666 Role: 2 - ID: 11462370002709127048 Address: 10.246.27.31:17666 Role: 0 - ID: 179389882111230348 Address: 10.246.27.194:17666 Role: 0 - ID: 16076074371255822966 Address: 10.246.27.27:17666 Role: 0 As expected, the original node is marked as a stand-by. 3 healthy nodes are all voters. But Mongo can not update its replica-set. Controllers show repeated attempts to contact the old leader: https://pastebin.canonical.com/p/XHXjsmkpS4/ Having gone down to 1 voter, machine 1 can not update the replica-set, because it can't get a quorum: https://pastebin.canonical.com/p/XKcKqmKXY4/ Machine 3 (the one added by reestablishing HA): Keeps removing the vote from 2 after the desired peer group is calculated in order to maintain an odd voter group: https://pastebin.canonical.com/p/VP3VqRycnW/ rs.status() on machine 1 and 2 is the same: https://pastebin.canonical.com/p/DpKDzvKhqY/ I can't connect to Mongo on 3: https://pastebin.canonical.com/p/HpmTHFZWCX/ Status says we're all healthy: https://pastebin.canonical.com/p/NjDRkSnrDX/ So it appears that machine 3 has a Mongo segmented from the rest with no way to get back.
2023-04-18 12:06:45	Joseph Phillips	description	Using Juju 3.2, I established HA, then removed machine 0, which was leader for both MongoDB and Dqlite. I then re-ran enable-ha. Status took a very long time to quiesce, and eventually Dqlite settled its cluster: - ID: 3297041220608546238 Address: 10.246.27.193:17666 Role: 2 - ID: 11462370002709127048 Address: 10.246.27.31:17666 Role: 0 - ID: 179389882111230348 Address: 10.246.27.194:17666 Role: 0 - ID: 16076074371255822966 Address: 10.246.27.27:17666 Role: 0 As expected, the original node is marked as a stand-by. 3 healthy nodes are all voters. But Mongo can not update its replica-set. Controllers show repeated attempts to contact the old leader: https://pastebin.canonical.com/p/XHXjsmkpS4/ Having gone down to 1 voter, machine 1 can not update the replica-set, because it can't get a quorum: https://pastebin.canonical.com/p/XKcKqmKXY4/ Machine 3 (the one added by reestablishing HA): Keeps removing the vote from 2 after the desired peer group is calculated in order to maintain an odd voter group: https://pastebin.canonical.com/p/VP3VqRycnW/ rs.status() on machine 1 and 2 is the same: https://pastebin.canonical.com/p/DpKDzvKhqY/ I can't connect to Mongo on 3: https://pastebin.canonical.com/p/HpmTHFZWCX/ Status says we're all healthy: https://pastebin.canonical.com/p/NjDRkSnrDX/ So it appears that machine 3 has a Mongo segmented from the rest with no way to get back.	Using Juju 3.2, I established HA, then removed machine 0, which was leader for both MongoDB and Dqlite. I then re-ran enable-ha. Status took a very long time to quiesce, and eventually Dqlite settled its cluster: - ID: 3297041220608546238 Address: 10.246.27.193:17666 Role: 2 - ID: 11462370002709127048 Address: 10.246.27.31:17666 Role: 0 - ID: 179389882111230348 Address: 10.246.27.194:17666 Role: 0 - ID: 16076074371255822966 Address: 10.246.27.27:17666 Role: 0 As expected, the original node is marked as a stand-by. 3 healthy nodes are all voters. But Mongo can not update its replica-set. Controllers show repeated attempts to contact the old leader: https://pastebin.canonical.com/p/XHXjsmkpS4/ Having gone down to 1 voter, machine 1 can not update the replica-set, because it can't get a quorum: https://pastebin.canonical.com/p/XKcKqmKXY4/ Machine 3 (the one added by reestablishing HA): Keeps removing the vote from 2 after the desired peer group is calculated in order to maintain an odd voter group The peer-grouper runs successfully, but it is a different replica set: https://pastebin.canonical.com/p/VP3VqRycnW/ rs.status() on machine 1 and 2 are the same: https://pastebin.canonical.com/p/DpKDzvKhqY/ I can't connect to Mongo on 3: https://pastebin.canonical.com/p/HpmTHFZWCX/ Status says we're all healthy: https://pastebin.canonical.com/p/NjDRkSnrDX/ So it appears that machine 3 has a Mongo segmented from the rest with no way to get back.
2023-04-18 12:06:59	Joseph Phillips	juju: status	New	Triaged
2023-04-18 12:07:02	Joseph Phillips	juju: importance	Undecided	High
2023-04-18 12:12:02	Joseph Phillips	summary	Mongo replica-set not reestablished after looding leader	Mongo replica-set not reestablished after losing leader
2023-04-18 16:59:39	John A Meinel	nominated for series		juju/3.1
2023-04-18 16:59:39	John A Meinel	bug task added		juju/3.1
2023-04-18 16:59:39	John A Meinel	nominated for series		juju/2.9
2023-04-18 16:59:39	John A Meinel	bug task added		juju/2.9
2023-04-18 16:59:39	John A Meinel	nominated for series		juju/3.2
2023-04-18 16:59:39	John A Meinel	bug task added		juju/3.2
2023-04-18 16:59:48	John A Meinel	juju/2.9: milestone		3.2.0
2023-04-18 16:59:57	John A Meinel	juju/2.9: milestone	3.2.0	2.9.43
2023-04-18 17:00:02	John A Meinel	juju/3.1: milestone		3.1.3
2023-04-18 17:00:06	John A Meinel	juju/3.2: milestone		3.2.0
2023-04-18 17:00:08	John A Meinel	juju/2.9: status	New	Triaged
2023-04-18 17:00:10	John A Meinel	juju/3.1: status	New	Triaged
2023-04-18 17:00:11	John A Meinel	juju/3.2: status	New	Triaged
2023-04-18 17:00:13	John A Meinel	juju/2.9: importance	Undecided	High
2023-04-18 17:00:14	John A Meinel	juju/3.1: importance	Undecided	High
2023-04-18 17:00:15	John A Meinel	juju/3.2: importance	Undecided	High
2023-04-19 05:13:24	Ian Booth	juju/2.9: assignee		Ian Booth (wallyworld)
2023-05-26 09:44:15	Canonical Juju QA Bot	juju/3.2: milestone	3.2.0	3.2.1
2023-06-02 10:18:51	Canonical Juju QA Bot	juju/2.9: milestone	2.9.43	2.9.44
2023-06-13 12:34:06	Canonical Juju QA Bot	juju/3.1: milestone	3.1.3	3.1.4
2023-06-15 10:48:59	Canonical Juju QA Bot	juju/3.1: milestone	3.1.4	3.1.5
2023-06-19 13:32:51	Joseph Phillips	juju/2.9: milestone	2.9.44
2023-06-19 13:32:54	Joseph Phillips	juju/3.1: milestone	3.1.5
2023-06-19 13:32:56	Joseph Phillips	juju/3.2: milestone	3.2.1
2023-12-04 13:07:28	Joseph Phillips	juju/3.1: status	Triaged	Won't Fix
2023-12-04 13:07:31	Joseph Phillips	juju/3.2: status	Triaged	Won't Fix
2023-12-04 13:07:48	Joseph Phillips	juju: status	Triaged	Won't Fix
2023-12-04 13:07:51	Joseph Phillips	juju/2.9: status	Triaged	Won't Fix