juju upgrade-model with HA controllers needs to confirm that Mongo replicaset is good before starting

Bug #1855956 reported by Xav Paice on 2019-12-11
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
juju
High
Tim Penhey

Bug Description

In a situation where we have 3 Juju controllers, and the resulting mongdb replicaset, we should verify that the rs.status() is healthy before starting the upgrade, and fail if not so that can be resolved.

This caused problems because:
- hosts 2 and 3 were out of date by (months)
- the upgrade started and restarted juju-db on all 3 machines.
- host 2, while unable to contact hosts 1 and 3, became the PRIMARY
- other hosts failed to replicate
- the outdated database contained on host 2 was then the one that Juju connected to, resulting in loss of the changes since the last good replica.

This particular case was also affected by https://bugs.launchpad.net/juju/+bug/1820327 which could have caused the problems connecting to the other two mongo instances (we don't know at this point).

https://pastebin.canonical.com/p/Xv5TYFXT8w/ has some logs - unfortunately the machine agent logs have rotated out already and we weren't fast enough to collect them.

Tim Penhey (thumper) on 2019-12-11
Changed in juju:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.7.1
tags: added: upgrade-controller
Tim Penhey (thumper) wrote :
Changed in juju:
status: Triaged → In Progress
assignee: nobody → Tim Penhey (thumper)
Tim Penhey (thumper) on 2019-12-17
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers