Activity log for bug #2016868

Date Who What changed Old value New value Message
2023-04-18 12:04:36 Joseph Phillips bug added bug
2023-04-18 12:04:55 Joseph Phillips description Using Juju 3.2, I established HA, then removed machine 0, which was leader for both MongoDB and Dqlite. I then re-ran enable-ha. Status took a very long time to quiesce, and eventually Dqlite settled its cluster: - ID: 3297041220608546238 Address: 10.246.27.193:17666 Role: 2 - ID: 11462370002709127048 Address: 10.246.27.31:17666 Role: 0 - ID: 179389882111230348 Address: 10.246.27.194:17666 Role: 0 - ID: 16076074371255822966 Address: 10.246.27.27:17666 Role: 0 As expected, the original node is marked as a stand-by. 3 healthy nodes are all voters. But Mongo can not update its replica-set. Controllers show repeated attempts to contact the old leader: https://pastebin.canonical.com/p/XHXjsmkpS4/ Having gone down to 1 voter, machine 1 can not update the replica-set, because it can't get a quorum: https://pastebin.canonical.com/p/XKcKqmKXY4/ Machine 3 (the one added by reestablishing HA): Keeps removing the vote from 2 after the desired peer group is calculated in order to maintain an odd voter group: https://pastebin.canonical.com/p/VP3VqRycnW/ rs.status() on machine 1 and 2 is the same: https://pastebin.canonical.com/p/DpKDzvKhqY/ I can't connect to Mongo on 3: https://pastebin.canonical.com/p/HpmTHFZWCX/ Status says we're all healthy: https://pastebin.canonical.com/p/NjDRkSnrDX/ So it appears that machine 3 has a Mongo segmented from the rest. Using Juju 3.2, I established HA, then removed machine 0, which was leader for both MongoDB and Dqlite. I then re-ran enable-ha. Status took a very long time to quiesce, and eventually Dqlite settled its cluster: - ID: 3297041220608546238   Address: 10.246.27.193:17666   Role: 2 - ID: 11462370002709127048   Address: 10.246.27.31:17666   Role: 0 - ID: 179389882111230348   Address: 10.246.27.194:17666   Role: 0 - ID: 16076074371255822966   Address: 10.246.27.27:17666   Role: 0 As expected, the original node is marked as a stand-by. 3 healthy nodes are all voters. But Mongo can not update its replica-set. Controllers show repeated attempts to contact the old leader: https://pastebin.canonical.com/p/XHXjsmkpS4/ Having gone down to 1 voter, machine 1 can not update the replica-set, because it can't get a quorum: https://pastebin.canonical.com/p/XKcKqmKXY4/ Machine 3 (the one added by reestablishing HA): Keeps removing the vote from 2 after the desired peer group is calculated in order to maintain an odd voter group: https://pastebin.canonical.com/p/VP3VqRycnW/ rs.status() on machine 1 and 2 is the same: https://pastebin.canonical.com/p/DpKDzvKhqY/ I can't connect to Mongo on 3: https://pastebin.canonical.com/p/HpmTHFZWCX/ Status says we're all healthy: https://pastebin.canonical.com/p/NjDRkSnrDX/ So it appears that machine 3 has a Mongo segmented from the rest with no way to get back.
2023-04-18 12:06:45 Joseph Phillips description Using Juju 3.2, I established HA, then removed machine 0, which was leader for both MongoDB and Dqlite. I then re-ran enable-ha. Status took a very long time to quiesce, and eventually Dqlite settled its cluster: - ID: 3297041220608546238   Address: 10.246.27.193:17666   Role: 2 - ID: 11462370002709127048   Address: 10.246.27.31:17666   Role: 0 - ID: 179389882111230348   Address: 10.246.27.194:17666   Role: 0 - ID: 16076074371255822966   Address: 10.246.27.27:17666   Role: 0 As expected, the original node is marked as a stand-by. 3 healthy nodes are all voters. But Mongo can not update its replica-set. Controllers show repeated attempts to contact the old leader: https://pastebin.canonical.com/p/XHXjsmkpS4/ Having gone down to 1 voter, machine 1 can not update the replica-set, because it can't get a quorum: https://pastebin.canonical.com/p/XKcKqmKXY4/ Machine 3 (the one added by reestablishing HA): Keeps removing the vote from 2 after the desired peer group is calculated in order to maintain an odd voter group: https://pastebin.canonical.com/p/VP3VqRycnW/ rs.status() on machine 1 and 2 is the same: https://pastebin.canonical.com/p/DpKDzvKhqY/ I can't connect to Mongo on 3: https://pastebin.canonical.com/p/HpmTHFZWCX/ Status says we're all healthy: https://pastebin.canonical.com/p/NjDRkSnrDX/ So it appears that machine 3 has a Mongo segmented from the rest with no way to get back. Using Juju 3.2, I established HA, then removed machine 0, which was leader for both MongoDB and Dqlite. I then re-ran enable-ha. Status took a very long time to quiesce, and eventually Dqlite settled its cluster: - ID: 3297041220608546238   Address: 10.246.27.193:17666   Role: 2 - ID: 11462370002709127048   Address: 10.246.27.31:17666   Role: 0 - ID: 179389882111230348   Address: 10.246.27.194:17666   Role: 0 - ID: 16076074371255822966   Address: 10.246.27.27:17666   Role: 0 As expected, the original node is marked as a stand-by. 3 healthy nodes are all voters. But Mongo can not update its replica-set. Controllers show repeated attempts to contact the old leader: https://pastebin.canonical.com/p/XHXjsmkpS4/ Having gone down to 1 voter, machine 1 can not update the replica-set, because it can't get a quorum: https://pastebin.canonical.com/p/XKcKqmKXY4/ Machine 3 (the one added by reestablishing HA): Keeps removing the vote from 2 after the desired peer group is calculated in order to maintain an odd voter group The peer-grouper runs successfully, but it is a different replica set: https://pastebin.canonical.com/p/VP3VqRycnW/ rs.status() on machine 1 and 2 are the same: https://pastebin.canonical.com/p/DpKDzvKhqY/ I can't connect to Mongo on 3: https://pastebin.canonical.com/p/HpmTHFZWCX/ Status says we're all healthy: https://pastebin.canonical.com/p/NjDRkSnrDX/ So it appears that machine 3 has a Mongo segmented from the rest with no way to get back.
2023-04-18 12:06:59 Joseph Phillips juju: status New Triaged
2023-04-18 12:07:02 Joseph Phillips juju: importance Undecided High
2023-04-18 12:12:02 Joseph Phillips summary Mongo replica-set not reestablished after looding leader Mongo replica-set not reestablished after losing leader
2023-04-18 16:59:39 John A Meinel nominated for series juju/3.1
2023-04-18 16:59:39 John A Meinel bug task added juju/3.1
2023-04-18 16:59:39 John A Meinel nominated for series juju/2.9
2023-04-18 16:59:39 John A Meinel bug task added juju/2.9
2023-04-18 16:59:39 John A Meinel nominated for series juju/3.2
2023-04-18 16:59:39 John A Meinel bug task added juju/3.2
2023-04-18 16:59:48 John A Meinel juju/2.9: milestone 3.2.0
2023-04-18 16:59:57 John A Meinel juju/2.9: milestone 3.2.0 2.9.43
2023-04-18 17:00:02 John A Meinel juju/3.1: milestone 3.1.3
2023-04-18 17:00:06 John A Meinel juju/3.2: milestone 3.2.0
2023-04-18 17:00:08 John A Meinel juju/2.9: status New Triaged
2023-04-18 17:00:10 John A Meinel juju/3.1: status New Triaged
2023-04-18 17:00:11 John A Meinel juju/3.2: status New Triaged
2023-04-18 17:00:13 John A Meinel juju/2.9: importance Undecided High
2023-04-18 17:00:14 John A Meinel juju/3.1: importance Undecided High
2023-04-18 17:00:15 John A Meinel juju/3.2: importance Undecided High
2023-04-19 05:13:24 Ian Booth juju/2.9: assignee Ian Booth (wallyworld)
2023-05-26 09:44:15 Canonical Juju QA Bot juju/3.2: milestone 3.2.0 3.2.1
2023-06-02 10:18:51 Canonical Juju QA Bot juju/2.9: milestone 2.9.43 2.9.44
2023-06-13 12:34:06 Canonical Juju QA Bot juju/3.1: milestone 3.1.3 3.1.4
2023-06-15 10:48:59 Canonical Juju QA Bot juju/3.1: milestone 3.1.4 3.1.5
2023-06-19 13:32:51 Joseph Phillips juju/2.9: milestone 2.9.44
2023-06-19 13:32:54 Joseph Phillips juju/3.1: milestone 3.1.5
2023-06-19 13:32:56 Joseph Phillips juju/3.2: milestone 3.2.1
2023-12-04 13:07:28 Joseph Phillips juju/3.1: status Triaged Won't Fix
2023-12-04 13:07:31 Joseph Phillips juju/3.2: status Triaged Won't Fix
2023-12-04 13:07:48 Joseph Phillips juju: status Triaged Won't Fix
2023-12-04 13:07:51 Joseph Phillips juju/2.9: status Triaged Won't Fix