Guarantee leadership revoked from departing unit before departed hooks run

Bug #1532085 reported by Stuart Bishop
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Low
Unassigned

Bug Description

If a lead unit is destroyed, it will make detrimental decisions when its relation-departed hooks are run if it still thinks it is the leader.

We must guarantee a new leader is elected before running departure hooks on the departing unit.

A special case will be needed if the service is being destroyed or if the leader is the sole remaining unit.

This may be worked around if the departing unit can detect that it is being destroyed (not currently possible), but charm authors would need to remember to guard for this special case.

As an example, if the lead unit in a PostgreSQL service is destroyed it will trigger a failover every time it sees the master unit departing the peer relation (appointing one of the remaining peers as master, which will shortly also depart from the leader's perspective). The surviving units cannot tell that these failovers are bogus, because they are unaware who the leader is and that it is being destroyed.

tags: added: leadership
Changed in juju-core:
status: New → Triaged
importance: Undecided → Critical
importance: Critical → High
Changed in juju:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.0.0
no longer affects: juju-core
Changed in juju:
milestone: 2.0.0 → 2.0.1
Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.0.1 → none
Revision history for this message
Anastasia (anastasia-macmood) wrote :

Implementing this functionality will also minimise perceived leaderless interval as per bug # 1668238.

Revision history for this message
Stuart Bishop (stub) wrote :

Fixing Bug #1777841 may make this easier. If a unit reliably knows it is being destroyed, is-leader can start returning false immediately and the jujud can stop renewing the lease. Per Bug #1668238, it is acceptable for there to be a period without a leader.

Revision history for this message
Tim Penhey (thumper) wrote :

I'm not sure what actually happens if we don't have a leader for a particular application. In theory, I think everything should be fine. The first unit that wants to be the leader can be.

I think the subtlety here is that if there is no leader and a unit asks if it is the leader, it will be told yes. So if we remove leadership from a unit, and then that same unit asks if it is the leader, it may well reclaim the leadership.

I agree that the is-leader hook should start returning false if the unit is being destroyed.

Revision history for this message
Tim Penhey (thumper) wrote :

In fact, perhaps we have an explict revocation of leadership when the unit is being destroyed. That way another unit could claim it earlier. Used in addition to the is-leader returning false when the unit is being destroyed should reduce the leaderless window.

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1532085] Re: Guarantee leadership revoked from departing unit before departed hooks run

Our general guarantee is that we will never have >1 leader, not that we
will always have a leader. Our other common constraint is that when you get
a lease, we guarantee not to give the lease to someone else until your time
has expired. We *could* do revocation, but that potentially leads to >1
leader for at least some time.
I think Stub's request that we just demote the leader without necessarily
allowing another leader is fine.

On Tue, Jul 10, 2018 at 3:08 AM, Tim Penhey <email address hidden>
wrote:

> In fact, perhaps we have an explict revocation of leadership when the
> unit is being destroyed. That way another unit could claim it earlier.
> Used in addition to the is-leader returning false when the unit is being
> destroyed should reduce the leaderless window.
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1532085
>
> Title:
> Guarantee leadership revoked from departing unit before departed hooks
> run
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1532085/+subscriptions
>

Revision history for this message
Trent Lloyd (lathiat) wrote :

In practice, a lot of charms seem to generally make the assumption that there is in fact a leader at all times (particularly clustered charms, e.g. hacluster, percona, ceph-mon).

Such charms often gate various hook changes on is_leader where the affects are cluster-wide so as not to act on them multiple times. For example making changes to the pacemaker CIB resources in the hacluster charm. Pacemaker itself synchronizes this across the cluster so it only needs to be done on one node.

In many cases they won't "re-do" those changes if re-elected leader (although some charms also do in practice "re-do" such things by iterating all relations under some hooks.. mostly as a workaround to having no other way to detect if something the relations care about, e.g. configuration changes, will affect them.

Revision history for this message
Trent Lloyd (lathiat) wrote :

These two bugs appear to be effectively duplicates #1469731 and #1532085

Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: High → Low
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.