juju controllers couldn't elect raft leader
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Won't Fix
|
High
|
Joseph Phillips |
Bug Description
Version: juju 2.8.6
Story:
Prior to an openstack upgrade, the juju controllers did not agree on the leadership of a mysql application. 2 of the 3 controllers believe one unit was the app leader, and the other controller believed another was the app leader.
Using the output of the fault controller (we didn't know of the fault at the time), we paused mysql on the "non-leader" mysql units and ran a prepare series-upgrade to bionic on the leader. This application tried to 'leader-set' something in one of its hooks and failed because it was not the leader according to two of the controllers.
The juju team asked us to check mongodb, where the replication status agreed on all three units which was the app leader (the one that 2 of 3 indicated with an asterisk in juju status). the juju team first believed leadership was flipping, but `juju status --debug` indicated the fault leadership continued to come from the same controller.
the juju team had us stop the two agreeing controllers, leave active the controller that indicated the leader was the one being upgraded, cancel the prepare series upgrade, reset the db to remove the upgradeSeriesLock on that unit, update the machine status so they appeared to have not started the series upgrade.
next the juju team had us try to remove the unit which was stuck trying to finish the prepare-series upgrade and deploy a replacement unit. At this point juju would no longer elect a leader for this application
Ultimately, the juju team had us stop all three controllers, load a specially prepared raft-log binary at /var/lib/
Changed in juju: | |
importance: | Undecided → High |
milestone: | none → 2.9-next |
status: | New → Triaged |
assignee: | nobody → Simon Richardson (simonrichardson) |
Changed in juju: | |
assignee: | Simon Richardson (simonrichardson) → Joseph Phillips (manadart) |
https:/ /github. com/hashicorp/ raft/pull/ 474