ERROR cannot write leadership settings: cannot write settings: not the leader

Bug #1760138 reported by Chris Gregan on 2018-03-30
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack ceph-mon charm
Undecided
Unassigned
juju
Medium
Unassigned

Bug Description

2018-03-30 01:56:50 DEBUG worker.uniter.jujuc server.go:181 running hook tool "leader-set"
2018-03-30 01:57:01 DEBUG juju.worker.uniter.remotestate watcher.go:467 got a relation units change: {44 {map[ceph-radosgw/0:{1}] []}}
2018-03-30 01:57:26 DEBUG juju.worker.uniter.remotestate watcher.go:467 got a relation units change: {44 {map[ceph-radosgw/1:{0}] []}}
2018-03-30 01:58:27 INFO juju.worker.leadership tracker.go:223 ceph-mon leadership for ceph-mon/1 denied
2018-03-30 01:58:27 DEBUG juju.worker.leadership tracker.go:237 notifying ceph-mon/1 ticket of impending loss of ceph-mon leadership
2018-03-30 01:58:27 DEBUG juju.worker.leadership tracker.go:276 ceph-mon/1 is not ceph-mon leader
2018-03-30 01:58:27 DEBUG juju.worker.leadership tracker.go:230 ceph-mon/1 waiting for ceph-mon leadership release
2018-03-30 01:58:27 DEBUG juju.worker.uniter.remotestate watcher.go:445 got leadership change for ceph-mon/1: minion
2018-03-30 01:58:27 DEBUG config-changed ERROR cannot write leadership settings: cannot write settings: not the leader
2018-03-30 01:58:27 DEBUG config-changed Traceback (most recent call last):
2018-03-30 01:58:27 DEBUG config-changed File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/config-changed", line 725, in <module>
2018-03-30 01:58:27 DEBUG config-changed hooks.execute(sys.argv)
2018-03-30 01:58:27 DEBUG config-changed File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/charmhelpers/core/hookenv.py", line 800, in execute
2018-03-30 01:58:27 DEBUG config-changed self._hooks[hook_name]()
2018-03-30 01:58:27 DEBUG config-changed File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/charmhelpers/contrib/hardening/harden.py", line 79, in _harden_inner2
2018-03-30 01:58:27 DEBUG config-changed return f(*args, **kwargs)
2018-03-30 01:58:27 DEBUG config-changed File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/config-changed", line 214, in config_changed
2018-03-30 01:58:27 DEBUG config-changed leader_set(opts)
2018-03-30 01:58:27 DEBUG config-changed File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/charmhelpers/core/hookenv.py", line 929, in inner_translate_exc2
2018-03-30 01:58:27 DEBUG config-changed return f(*args, **kwargs)
2018-03-30 01:58:27 DEBUG config-changed File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/charmhelpers/core/hookenv.py", line 982, in leader_set
2018-03-30 01:58:27 DEBUG config-changed subprocess.check_call(cmd)
2018-03-30 01:58:27 DEBUG config-changed File "/usr/lib/python3.5/subprocess.py", line 581, in check_call
2018-03-30 01:58:27 DEBUG config-changed raise CalledProcessError(retcode, cmd)
2018-03-30 01:58:27 DEBUG config-changed subprocess.CalledProcessError: Command '['leader-set', 'fsid=9a557a74-33bd-11e8-aedb-00163ef9463f', 'monitor-secret=AQBemb1ajb3qLRAAlPxv893HqNK1oos2V1jUzg==']' returned non-zero exit status 1
2018-03-30 01:58:27 ERROR juju.worker.uniter.operation runhook.go:114 hook "config-changed" failed: exit status 1

Frode Nordahl (fnordahl) wrote :
Download full text (3.5 KiB)

This looks like a bug or race in juju itself.

2018-03-30 01:49:16 DEBUG juju.worker.dependency engine.go:486 "leadership-tracker" manifold worker started
2018-03-30 01:49:16 DEBUG juju.worker.leadership tracker.go:131 ceph-mon/1 making initial claim for ceph-mon leadership
2018-03-30 01:49:59 INFO juju.worker.leadership tracker.go:200 ceph-mon/1 promoted to leadership of ceph-mon
2018-03-30 01:54:26 DEBUG juju.worker.uniter.operation executor.go:60 running operation accept leadership
2018-03-30 01:54:26 DEBUG juju.worker.uniter.operation executor.go:91 preparing operation "accept leadership"
2018-03-30 01:54:26 DEBUG juju.worker.uniter.operation executor.go:91 committing operation "accept leadership"
2018-03-30 01:58:27 INFO juju.worker.leadership tracker.go:223 ceph-mon leadership for ceph-mon/1 denied
2018-03-30 01:58:27 DEBUG juju.worker.leadership tracker.go:237 notifying ceph-mon/1 ticket of impending loss of ceph-mon leadership
2018-03-30 01:58:27 DEBUG juju.worker.leadership tracker.go:276 ceph-mon/1 is not ceph-mon leader
2018-03-30 01:58:27 DEBUG juju.worker.leadership tracker.go:230 ceph-mon/1 waiting for ceph-mon leadership release
2018-03-30 01:58:27 DEBUG juju.worker.uniter.remotestate watcher.go:445 got leadership change for ceph-mon/1: minion

However, just before the Traceback the charm executes is-leader in the hookenv and gets a positive answer from Juju.

2018-03-30 01:58:27 DEBUG config-changed ERROR cannot write leadership settings: cannot write settings: not the leader
2018-03-30 01:58:27 DEBUG config-changed Traceback (most recent call last):
2018-03-30 01:58:27 DEBUG config-changed File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/config-changed", line 725, in <module>
2018-03-30 01:58:27 DEBUG config-changed hooks.execute(sys.argv)
2018-03-30 01:58:27 DEBUG config-changed File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/charmhelpers/core/hookenv.py", line 800, in execute
2018-03-30 01:58:27 DEBUG config-changed self._hooks[hook_name]()
2018-03-30 01:58:27 DEBUG config-changed File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/charmhelpers/contrib/hardening/harden.py", line 79, in _harden_inner2
2018-03-30 01:58:27 DEBUG config-changed return f(*args, **kwargs)
2018-03-30 01:58:27 DEBUG config-changed File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/config-changed", line 214, in config_changed
2018-03-30 01:58:27 DEBUG config-changed leader_set(opts)
2018-03-30 01:58:27 DEBUG config-changed File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/charmhelpers/core/hookenv.py", line 929, in inner_translate_exc2
2018-03-30 01:58:27 DEBUG config-changed return f(*args, **kwargs)
2018-03-30 01:58:27 DEBUG config-changed File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/charmhelpers/core/hookenv.py", line 982, in leader_set
2018-03-30 01:58:27 DEBUG config-changed subprocess.check_call(cmd)
2018-03-30 01:58:27 DEBUG config-changed File "/usr/lib/python3.5/subprocess.py", line 581, in check_call
2018-03-30 01:58:27 DEBUG config-changed raise CalledProcessError(retcode, cmd)
2018-03-30 01:58:27 DEBUG config-changed subprocess.CalledProcessError: Command '['le...

Read more...

Changed in charm-ceph-mon:
status: New → Invalid
John A Meinel (jameinel) wrote :
Download full text (4.2 KiB)

This seems to be some sort of load issue, as this is what I'm seeing:
$ grep -rnI "is-leader\|leader-set" .
./unit-ceph-mon-1.log:1196:2018-03-30 01:54:36 DEBUG worker.uniter.jujuc server.go:181 running hook tool "is-leader"
./unit-ceph-mon-1.log:1206:2018-03-30 01:56:50 DEBUG worker.uniter.jujuc server.go:181 running hook tool "leader-set"
./unit-ceph-mon-1.log:1230:2018-03-30 01:58:27 DEBUG config-changed subprocess.CalledProcessError: Command '['leader-set', 'fsid=9a557a74-33bd-11e8-aedb-00163ef9463f', 'monitor-secret=AQBemb1ajb3qLRAAlPxv893HqNK1oos2V1jUzg==']' returned non-zero exit status 1

It isn't clear how long it is taking for "is-leader" to return but we are seeing that the time between "is-leader" and "leader-set" being called is a full 2m14s, and that we don't get to "leader-set" returning nonzero for another 1m37s.

When we nominate a node as a leader, we give them a lease that the Juju agent refreshes every 30s that lasts for 1minute. If we cannot get a refresh of "yes I'm still alive" within that minute, then we will give leadership over to someone else that is responding.

Now, is the problem the Jujud controller itself is so overloaded that it can't respond to requests in a timely manner?

AFAICT the code in question is:
    if is_leader():
        if not config('no-bootstrap'):
            if not leader_get('fsid') or not leader_get('monitor-secret'):
                if config('fsid'):
                    fsid = config('fsid')
                else:
                    fsid = "{}".format(uuid.uuid1())
                if config('monitor-secret'):
                    mon_secret = config('monitor-secret')
                else:
                    mon_secret = "{}".format(ceph.generate_monitor_secret())
                status_set('maintenance', 'Creating FSID and Monitor Secret')
                opts = {
                    'fsid': fsid,
                    'monitor-secret': mon_secret,
                }
                log("Settings for the cluster are: {}".format(opts))
                leader_set(opts)

$ grep -rnI "is-leader\|leader-set\|leader-get\|config-get" .
./unit-ceph-mon-1.log:1196:2018-03-30 01:54:36 DEBUG worker.uniter.jujuc server.go:181 running hook tool "is-leader"
./unit-ceph-mon-1.log:1200:2018-03-30 01:56:46 DEBUG worker.uniter.jujuc server.go:181 running hook tool "leader-get"
./unit-ceph-mon-1.log:1201:2018-03-30 01:56:46 DEBUG worker.uniter.jujuc server.go:181 running hook tool "config-get"
./unit-ceph-mon-1.log:1202:2018-03-30 01:56:46 DEBUG worker.uniter.jujuc server.go:181 running hook tool "config-get"
./unit-ceph-mon-1.log:1206:2018-03-30 01:56:50 DEBUG worker.uniter.jujuc server.go:181 running hook tool "leader-set"
./unit-ceph-mon-1.log:1230:2018-03-30 01:58:27 DEBUG config-changed subprocess.CalledProcessError: Command '['leader-set', 'fsid=9a557a74-33bd-11e8-aedb-00163ef9463f', 'monitor-secret=AQBemb1ajb3qLRAAlPxv893HqNK1oos2V1jUzg==']' returned non-zero exit status 1

That seems to say leader-get and config-get are both very fast (sub second). At it is no more than 10s between is-leader and subsequent commands.
But maybe "is-leader" itself is swamped for some reason. Just
$ grep -rnI "running hook...

Read more...

Chris Gregan (cgregan) wrote :
tags: added: cdo-qa foundation-engine
Alex Kavanagh (ajkavanagh) wrote :

Thanks John for doing the additional analysis; I was also heading in the direction of a gap between is_leader() and leader_set().

The code that you quoted looks like it ought to complete in 30 seconds (the only call out is very quick), and this code (the only other leader_set()) doesn't have any paths that should be longer than 30 seconds to complete before calling leader_set():

    if not is_leader():
        log('Deferring leader-setting updates to the leader unit')
        return

    curr_fsid = leader_get('fsid')
    curr_secret = leader_get('monitor-secret')
    for relid in relation_ids('bootstrap-source'):
        for unit in related_units(relid=relid):
            mon_secret = relation_get('monitor-secret', unit, relid)
            fsid = relation_get('fsid', unit, relid)

            if not (mon_secret and fsid):
                log('Relation data is not ready as the fsid or the '
                    'monitor-secret are missing from the relation: '
                    'mon_secret = {} and fsid = {} '.format(mon_secret, fsid))
                continue

            if not (curr_fsid or curr_secret):
                curr_fsid = fsid
                curr_secret = mon_secret
            else:
                # The fsids and secrets need to match or the local monitors
                # will fail to join the mon cluster. If they don't,
                # bail because something needs to be investigated.
                assert curr_fsid == fsid, \
                    "bootstrap fsid '{}' != current fsid '{}'".format(
                        fsid, curr_fsid)
                assert curr_secret == mon_secret, \
                    "bootstrap secret '{}' != current secret '{}'".format(
                        mon_secret, curr_secret)

            opts = {
                'fsid': fsid,
                'monitor-secret': mon_secret,
            }

            log('Updating leader settings for fsid and monitor-secret '
                'from remote relation data: {}'.format(opts))
            leader_set(opts)

This code is ONLY used when upgrading from 'ceph' only charm, which isn't the case here, so we can discount it. It does look like this code *could* call leader_set for every unit in the 'bootstrap-source' relation, which may be a problem.

So the only is_leader() ... leader_set() code path is the one that was quoted, and the only call out, that isn't Juju related, is "ceph.generate_monitor_secret()" which is very quick.

So, I'm wondering if the charm does manage to call leader_set() in time, but the controller doesn't 'see' the leader_set until several minutes later?

Alex Kavanagh (ajkavanagh) wrote :

Sorry, I meant to comment on https://bugs.launchpad.net/charm-ceph-mon/+bug/1768241 -- but it also seems relevant here!

John A Meinel (jameinel) on 2018-05-03
tags: added: leadership
Changed in juju:
importance: Undecided → Medium
status: New → Triaged
John A Meinel (jameinel) wrote :

So we've noted several performance issues around leadership that have been fixed in or around 2.5.1. (moving to raft for leadership, and fixing performance issues around some of the lease update operations as the number of leases gets large.)
Given those fixes, it is likely that *this* bug was around "is-leader => leader-set" was taking longer than 30s to refresh its leadership, causing it to lose its leadership in that time, and thus we believe those issues to be fixed.
Not directly in 2.5.8, but as a close approximation.

Changed in juju:
milestone: none → 2.5.8
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers