Make it possible to unpin leadership

Bug #1890072 reported by James Troup
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Medium
Ian Booth

Bug Description

We recently ended up in a situation where a mysql application had no leader in Juju, at least according to 'juju status'. As far as I can tell, this happened because:

 * 'series-upgrade-prepare' was re-run on one of the units in error (it was already upgraded)
 * mysql leadership was pinned to this unit by Juju
 * [months pass]
 * as part of some maintenance work, we removed this unit
 * At the time of removal, Juju was 2.7.6, which does not have the following commit:
  - https://github.com/juju/juju/commit/e30353430d672578c8e2371a8ecd3a086322ce75

Discovering the pinning was hard enough (that's LP #1890070), but once we had discovered it there was no obvious way to undo it and we ended up doing the following:

 * Shutdown machine agent on all controllers
 * Hack pinning out of /var/lib/juju/raft/snapshots/*/state.bin on all controllers
 * Update CRC and size in /var/lib/juju/raft/snapshots/*/meta.json on all controllers
 * rm /var/lib/juju/raft/logs on all controllers
 * Start up machine agent on all controllers
 * Juju restored the raft cluster from the snapshot and without the Pinning

This triggered a new election for mysql (and all other applications) which a live unit won.

Obviously, that's insane and I don't want anyone else to have to do it. TL;DR: please add a CLI way to undo leadership pins. Thanks.

Tags: leadership
Revision history for this message
Ian Booth (wallyworld) wrote :

With the commit referenced in the bug, there's an internal revoke leadership lease operation that is done on the raft FSM. It may be that what's needed here is to expose that via the CLI. This would force the election of a new leader.

tags: added: leadership
Changed in juju:
milestone: none → 2.8-next
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Ian Booth (wallyworld) wrote :

juju 2.9 has a new juju_revoke_lease bash script (introspection endpoint) which can be run after sshing to a controller. This can be used to revoke leadership on a specified unit.

$ juju_leases will show the current leases
revoke a lease:
$ juju_revoke_lease -m modeluuid -l appname
$ juju_leases will show the lease get allocated to a new unit

We don't want to expose as a juju CLI as it's potentially dangerous to use ad hoc.

Changed in juju:
milestone: 2.8-next → 2.9-rc6
status: Triaged → Fix Committed
assignee: nobody → Ian Booth (wallyworld)
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.