Bug #1760138 “ERROR cannot write leadership settings: cannot wri...” : Bugs : Ceph Monitor Charm

Revision history for this message

Chris Gregan (cgregan) wrote on 2018-03-30:

#1

juju-crashdump-e7fe9635-b40d-407a-a15d-2f164183b805.tar.gz Edit (10.8 MiB, application/x-tar)

Revision history for this message

Frode Nordahl (fnordahl) wrote on 2018-04-03:

#2

Download full text (3.5 KiB)

This looks like a bug or race in juju itself.

2018-03-30 01:49:16 DEBUG juju.worker.dependency engine.go:486 "leadership-tracker" manifold worker started
2018-03-30 01:49:16 DEBUG juju.worker.leadership tracker.go:131 ceph-mon/1 making initial claim for ceph-mon leadership
2018-03-30 01:49:59 INFO juju.worker.leadership tracker.go:200 ceph-mon/1 promoted to leadership of ceph-mon
2018-03-30 01:54:26 DEBUG juju.worker.uniter.operation executor.go:60 running operation accept leadership
2018-03-30 01:54:26 DEBUG juju.worker.uniter.operation executor.go:91 preparing operation "accept leadership"
2018-03-30 01:54:26 DEBUG juju.worker.uniter.operation executor.go:91 committing operation "accept leadership"
2018-03-30 01:58:27 INFO juju.worker.leadership tracker.go:223 ceph-mon leadership for ceph-mon/1 denied
2018-03-30 01:58:27 DEBUG juju.worker.leadership tracker.go:237 notifying ceph-mon/1 ticket of impending loss of ceph-mon leadership
2018-03-30 01:58:27 DEBUG juju.worker.leadership tracker.go:276 ceph-mon/1 is not ceph-mon leader
2018-03-30 01:58:27 DEBUG juju.worker.leadership tracker.go:230 ceph-mon/1 waiting for ceph-mon leadership release
2018-03-30 01:58:27 DEBUG juju.worker.uniter.remotestate watcher.go:445 got leadership change for ceph-mon/1: minion

However, just before the Traceback the charm executes is-leader in the hookenv and gets a positive answer from Juju.

2018-03-30 01:58:27 DEBUG config-changed ERROR cannot write leadership settings: cannot write settings: not the leader
2018-03-30 01:58:27 DEBUG config-changed Traceback (most recent call last):
2018-03-30 01:58:27 DEBUG config-changed File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/config-changed", line 725, in <module>
2018-03-30 01:58:27 DEBUG config-changed hooks.execute(sys.argv)
2018-03-30 01:58:27 DEBUG config-changed File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/charmhelpers/core/hookenv.py", line 800, in execute
2018-03-30 01:58:27 DEBUG config-changed self._hooks[hook_name]()
2018-03-30 01:58:27 DEBUG config-changed File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/charmhelpers/contrib/hardening/harden.py", line 79, in _harden_inner2
2018-03-30 01:58:27 DEBUG config-changed return f(*args, **kwargs)
2018-03-30 01:58:27 DEBUG config-changed File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/config-changed", line 214, in config_changed
2018-03-30 01:58:27 DEBUG config-changed leader_set(opts)
2018-03-30 01:58:27 DEBUG config-changed File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/charmhelpers/core/hookenv.py", line 929, in inner_translate_exc2
2018-03-30 01:58:27 DEBUG config-changed return f(*args, **kwargs)
2018-03-30 01:58:27 DEBUG config-changed File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/charmhelpers/core/hookenv.py", line 982, in leader_set
2018-03-30 01:58:27 DEBUG config-changed subprocess.check_call(cmd)
2018-03-30 01:58:27 DEBUG config-changed File "/usr/lib/python3.5/subprocess.py", line 581, in check_call
2018-03-30 01:58:27 DEBUG config-changed raise CalledProcessError(retcode, cmd)
2018-03-30 01:58:27 DEBUG config-changed subprocess.CalledProcessError: Command '['le...

This looks like  a bug or race in juju itself.

2018-03-30 01:49:16 DEBUG juju.worker.dependency engine.go:486 "leadership-tracker" manifold worker started
2018-03-30 01:49:16 DEBUG juju.worker.leadership tracker.go:131 ceph-mon/1 making initial claim for ceph-mon leadership
2018-03-30 01:49:59 INFO juju.worker.leadership tracker.go:200 ceph-mon/1 promoted to leadership of ceph-mon
2018-03-30 01:54:26 DEBUG juju.worker.uniter.operation executor.go:60 running operation accept leadership
2018-03-30 01:54:26 DEBUG juju.worker.uniter.operation executor.go:91 preparing operation "accept leadership"
2018-03-30 01:54:26 DEBUG juju.worker.uniter.operation executor.go:91 committing operation "accept leadership"
2018-03-30 01:58:27 INFO juju.worker.leadership tracker.go:223 ceph-mon leadership for ceph-mon/1 denied
2018-03-30 01:58:27 DEBUG juju.worker.leadership tracker.go:237 notifying ceph-mon/1 ticket of impending loss of ceph-mon leadership
2018-03-30 01:58:27 DEBUG juju.worker.leadership tracker.go:276 ceph-mon/1 is not ceph-mon leader
2018-03-30 01:58:27 DEBUG juju.worker.leadership tracker.go:230 ceph-mon/1 waiting for ceph-mon leadership release
2018-03-30 01:58:27 DEBUG juju.worker.uniter.remotestate watcher.go:445 got leadership change for ceph-mon/1: minion

However, just before the Traceback the charm executes is-leader in the hookenv and gets a positive answer from Juju.

2018-03-30 01:58:27 DEBUG config-changed ERROR cannot write leadership settings: cannot write settings: not the leader
2018-03-30 01:58:27 DEBUG config-changed Traceback (most recent call last):
2018-03-30 01:58:27 DEBUG config-changed   File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/config-changed", line 725, in <module>
2018-03-30 01:58:27 DEBUG config-changed     hooks.execute(sys.argv)
2018-03-30 01:58:27 DEBUG config-changed   File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/charmhelpers/core/hookenv.py", line 800, in execute
2018-03-30 01:58:27 DEBUG config-changed     self._hooks[hook_name]()
2018-03-30 01:58:27 DEBUG config-changed   File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/charmhelpers/contrib/hardening/harden.py", line 79, in _harden_inner2
2018-03-30 01:58:27 DEBUG config-changed     return f(*args, **kwargs)
2018-03-30 01:58:27 DEBUG config-changed   File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/config-changed", line 214, in config_changed
2018-03-30 01:58:27 DEBUG config-changed     leader_set(opts)
2018-03-30 01:58:27 DEBUG config-changed   File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/charmhelpers/core/hookenv.py", line 929, in inner_translate_exc2
2018-03-30 01:58:27 DEBUG config-changed     return f(*args, **kwargs)
2018-03-30 01:58:27 DEBUG config-changed   File "/var/lib/juju/agents/unit-ceph-mon-1/charm/hooks/charmhelpers/core/hookenv.py", line 982, in leader_set
2018-03-30 01:58:27 DEBUG config-changed     subprocess.check_call(cmd)
2018-03-30 01:58:27 DEBUG config-changed   File "/usr/lib/python3.5/subprocess.py", line 581, in check_call
2018-03-30 01:58:27 DEBUG config-changed     raise CalledProcessError(retcode, cmd)
2018-03-30 01:58:27 DEBUG config-changed subprocess.CalledProcessError: Command '['leader-set', 'fsid=9a557a74-33bd-11e8-aedb-00163ef9463f', 'monitor-secret=AQBemb1ajb3qLRAAlPxv893HqNK1oos2V1jUzg==']' returned non-zero exit status 1
2018-03-30 01:58:27 ERROR juju.worker.uniter.operation runhook.go:114 hook "config-changed" failed: exit status 1

I am not sure how the charm could or should handle this differently?

Adding juju task.

Changed in charm-ceph-mon:
status:	New → Invalid

Revision history for this message

John A Meinel (jameinel) wrote on 2018-04-03:

#3

Download full text (4.2 KiB)

This seems to be some sort of load issue, as this is what I'm seeing:
$ grep -rnI "is-leader\|leader-set" .
./unit-ceph-mon-1.log:1196:2018-03-30 01:54:36 DEBUG worker.uniter.jujuc server.go:181 running hook tool "is-leader"
./unit-ceph-mon-1.log:1206:2018-03-30 01:56:50 DEBUG worker.uniter.jujuc server.go:181 running hook tool "leader-set"
./unit-ceph-mon-1.log:1230:2018-03-30 01:58:27 DEBUG config-changed subprocess.CalledProcessError: Command '['leader-set', 'fsid=9a557a74-33bd-11e8-aedb-00163ef9463f', 'monitor-secret=AQBemb1ajb3qLRAAlPxv893HqNK1oos2V1jUzg==']' returned non-zero exit status 1

It isn't clear how long it is taking for "is-leader" to return but we are seeing that the time between "is-leader" and "leader-set" being called is a full 2m14s, and that we don't get to "leader-set" returning nonzero for another 1m37s.

When we nominate a node as a leader, we give them a lease that the Juju agent refreshes every 30s that lasts for 1minute. If we cannot get a refresh of "yes I'm still alive" within that minute, then we will give leadership over to someone else that is responding.

Now, is the problem the Jujud controller itself is so overloaded that it can't respond to requests in a timely manner?

AFAICT the code in question is:
    if is_leader():
        if not config('no-bootstrap'):
            if not leader_get('fsid') or not leader_get('monitor-secret'):
                if config('fsid'):
                    fsid = config('fsid')
                else:
                    fsid = "{}".format(uuid.uuid1())
                if config('monitor-secret'):
                    mon_secret = config('monitor-secret')
                else:
                    mon_secret = "{}".format(ceph.generate_monitor_secret())
                status_set('maintenance', 'Creating FSID and Monitor Secret')
                opts = {
                    'fsid': fsid,
                    'monitor-secret': mon_secret,
                }
                log("Settings for the cluster are: {}".format(opts))
                leader_set(opts)

$ grep -rnI "is-leader\|leader-set\|leader-get\|config-get" .
./unit-ceph-mon-1.log:1196:2018-03-30 01:54:36 DEBUG worker.uniter.jujuc server.go:181 running hook tool "is-leader"
./unit-ceph-mon-1.log:1200:2018-03-30 01:56:46 DEBUG worker.uniter.jujuc server.go:181 running hook tool "leader-get"
./unit-ceph-mon-1.log:1201:2018-03-30 01:56:46 DEBUG worker.uniter.jujuc server.go:181 running hook tool "config-get"
./unit-ceph-mon-1.log:1202:2018-03-30 01:56:46 DEBUG worker.uniter.jujuc server.go:181 running hook tool "config-get"
./unit-ceph-mon-1.log:1206:2018-03-30 01:56:50 DEBUG worker.uniter.jujuc server.go:181 running hook tool "leader-set"
./unit-ceph-mon-1.log:1230:2018-03-30 01:58:27 DEBUG config-changed subprocess.CalledProcessError: Command '['leader-set', 'fsid=9a557a74-33bd-11e8-aedb-00163ef9463f', 'monitor-secret=AQBemb1ajb3qLRAAlPxv893HqNK1oos2V1jUzg==']' returned non-zero exit status 1

That seems to say leader-get and config-get are both very fast (sub second). At it is no more than 10s between is-leader and subsequent commands.
But maybe "is-leader" itself is swamped for some reason. Just
$ grep -rnI "running hook...

This seems to be some sort of load issue, as this is what I'm seeing:
$ grep -rnI "is-leader\|leader-set" .
./unit-ceph-mon-1.log:1196:2018-03-30 01:54:36 DEBUG worker.uniter.jujuc server.go:181 running hook tool "is-leader"
./unit-ceph-mon-1.log:1206:2018-03-30 01:56:50 DEBUG worker.uniter.jujuc server.go:181 running hook tool "leader-set"
./unit-ceph-mon-1.log:1230:2018-03-30 01:58:27 DEBUG config-changed subprocess.CalledProcessError: Command '['leader-set', 'fsid=9a557a74-33bd-11e8-aedb-00163ef9463f', 'monitor-secret=AQBemb1ajb3qLRAAlPxv893HqNK1oos2V1jUzg==']' returned non-zero exit status 1

It isn't clear how long it is taking for "is-leader" to return but we are seeing that the time between "is-leader" and "leader-set" being called is a full 2m14s, and that we don't get to "leader-set" returning nonzero for another 1m37s.

When we nominate a node as a leader, we give them a lease that the Juju agent refreshes every 30s that lasts for 1minute. If we cannot get a refresh of "yes I'm still alive" within that minute, then we will give leadership over to someone else that is responding.

Now, is the problem the Jujud controller itself is so overloaded that it can't respond to requests in a timely manner?

AFAICT the code in question is:
    if is_leader():
        if not config('no-bootstrap'):
            if not leader_get('fsid') or not leader_get('monitor-secret'):
                if config('fsid'):
                    fsid = config('fsid')
                else:
                    fsid = "{}".format(uuid.uuid1())
                if config('monitor-secret'):
                    mon_secret = config('monitor-secret')
                else:
                    mon_secret = "{}".format(ceph.generate_monitor_secret())
                status_set('maintenance', 'Creating FSID and Monitor Secret')
                opts = {
                    'fsid': fsid,
                    'monitor-secret': mon_secret,
                }
                log("Settings for the cluster are: {}".format(opts))
                leader_set(opts)

$ grep -rnI "is-leader\|leader-set\|leader-get\|config-get" .
./unit-ceph-mon-1.log:1196:2018-03-30 01:54:36 DEBUG worker.uniter.jujuc server.go:181 running hook tool "is-leader"
./unit-ceph-mon-1.log:1200:2018-03-30 01:56:46 DEBUG worker.uniter.jujuc server.go:181 running hook tool "leader-get"
./unit-ceph-mon-1.log:1201:2018-03-30 01:56:46 DEBUG worker.uniter.jujuc server.go:181 running hook tool "config-get"
./unit-ceph-mon-1.log:1202:2018-03-30 01:56:46 DEBUG worker.uniter.jujuc server.go:181 running hook tool "config-get"
./unit-ceph-mon-1.log:1206:2018-03-30 01:56:50 DEBUG worker.uniter.jujuc server.go:181 running hook tool "leader-set"
./unit-ceph-mon-1.log:1230:2018-03-30 01:58:27 DEBUG config-changed subprocess.CalledProcessError: Command '['leader-set', 'fsid=9a557a74-33bd-11e8-aedb-00163ef9463f', 'monitor-secret=AQBemb1ajb3qLRAAlPxv893HqNK1oos2V1jUzg==']' returned non-zero exit status 1

That seems to say leader-get and config-get are both very fast (sub second). At it is no more than 10s between is-leader and subsequent commands.
But maybe "is-leader" itself is swamped for some reason. Just
$ grep -rnI "running hook tool" .

./unit-ceph-mon-1.log:1196:2018-03-30 01:54:36 DEBUG worker.uniter.jujuc server.go:181 running hook tool "is-leader"
./unit-ceph-mon-1.log:1200:2018-03-30 01:56:46 DEBUG worker.uniter.jujuc server.go:181 running hook tool "leader-get"
./unit-ceph-mon-1.log:1201:2018-03-30 01:56:46 DEBUG worker.uniter.jujuc server.go:181 running hook tool "config-get"
./unit-ceph-mon-1.log:1202:2018-03-30 01:56:46 DEBUG worker.uniter.jujuc server.go:181 running hook tool "config-get"
./unit-ceph-mon-1.log:1203:2018-03-30 01:56:46 DEBUG worker.uniter.jujuc server.go:181 running hook tool "status-set"
./unit-ceph-mon-1.log:1204:2018-03-30 01:56:50 DEBUG worker.uniter.jujuc server.go:181 running hook tool "juju-log"
./unit-ceph-mon-1.log:1206:2018-03-30 01:56:50 DEBUG worker.uniter.jujuc server.go:181 running hook tool "leader-set"

The crashdump appears to have all the log files from all the machines running in the model, but no logs from the controller itself.
I was hoping to peek around and see if the controller is reporting any issues running around the 01:54:36 => 01:58:27 timeframe.

Revision history for this message

Chris Gregan (cgregan) wrote on 2018-04-03:

#4

juju-crashdump-controller-e7fe9635-b40d-407a-a15d-2f164183b805.tar.gz Edit (641.2 KiB, application/x-tar)

tags:

added: cdo-qa foundation-engine

Revision history for this message

Alex Kavanagh (ajkavanagh) wrote on 2018-05-02:

#5

Thanks John for doing the additional analysis; I was also heading in the direction of a gap between is_leader() and leader_set().

The code that you quoted looks like it ought to complete in 30 seconds (the only call out is very quick), and this code (the only other leader_set()) doesn't have any paths that should be longer than 30 seconds to complete before calling leader_set():

    if not is_leader():
        log('Deferring leader-setting updates to the leader unit')
        return

    curr_fsid = leader_get('fsid')
    curr_secret = leader_get('monitor-secret')
    for relid in relation_ids('bootstrap-source'):
        for unit in related_units(relid=relid):
            mon_secret = relation_get('monitor-secret', unit, relid)
            fsid = relation_get('fsid', unit, relid)

            if not (mon_secret and fsid):
                log('Relation data is not ready as the fsid or the '
                    'monitor-secret are missing from the relation: '
                    'mon_secret = {} and fsid = {} '.format(mon_secret, fsid))
                continue

            if not (curr_fsid or curr_secret):
                curr_fsid = fsid
                curr_secret = mon_secret
            else:
                # The fsids and secrets need to match or the local monitors
                # will fail to join the mon cluster. If they don't,
                # bail because something needs to be investigated.
                assert curr_fsid == fsid, \
                    "bootstrap fsid '{}' != current fsid '{}'".format(
                        fsid, curr_fsid)
                assert curr_secret == mon_secret, \
                    "bootstrap secret '{}' != current secret '{}'".format(
                        mon_secret, curr_secret)

            opts = {
                'fsid': fsid,
                'monitor-secret': mon_secret,
            }

            log('Updating leader settings for fsid and monitor-secret '
                'from remote relation data: {}'.format(opts))
            leader_set(opts)

This code is ONLY used when upgrading from 'ceph' only charm, which isn't the case here, so we can discount it. It does look like this code *could* call leader_set for every unit in the 'bootstrap-source' relation, which may be a problem.

So the only is_leader() ... leader_set() code path is the one that was quoted, and the only call out, that isn't Juju related, is "ceph.generate_monitor_secret()" which is very quick.

So, I'm wondering if the charm does manage to call leader_set() in time, but the controller doesn't 'see' the leader_set until several minutes later?

Thanks John for doing the additional analysis; I was also heading in the direction of a gap between is_leader() and leader_set().

The code that you quoted looks like it ought to complete in 30 seconds (the only call out is very quick), and this code (the only other leader_set()) doesn't have any paths that should be longer than 30 seconds to complete before calling leader_set():

if not is_leader():
        log('Deferring leader-setting updates to the leader unit')
        return

curr_fsid = leader_get('fsid')
    curr_secret = leader_get('monitor-secret')
    for relid in relation_ids('bootstrap-source'):
        for unit in related_units(relid=relid):
            mon_secret = relation_get('monitor-secret', unit, relid)
            fsid = relation_get('fsid', unit, relid)

if not (mon_secret and fsid):
                log('Relation data is not ready as the fsid or the '
                    'monitor-secret are missing from the relation: '
                    'mon_secret = {} and fsid = {} '.format(mon_secret, fsid))
                continue

if not (curr_fsid or curr_secret):
                curr_fsid = fsid
                curr_secret = mon_secret
            else:
                # The fsids and secrets need to match or the local monitors
                # will fail to join the mon cluster. If they don't,
                # bail because something needs to be investigated.
                assert curr_fsid == fsid, \
                    "bootstrap fsid '{}' != current fsid '{}'".format(
                        fsid, curr_fsid)
                assert curr_secret == mon_secret, \
                    "bootstrap secret '{}' != current secret '{}'".format(
                        mon_secret, curr_secret)

opts = {
                'fsid': fsid,
                'monitor-secret': mon_secret,
            }

log('Updating leader settings for fsid and monitor-secret '
                'from remote relation data: {}'.format(opts))
            leader_set(opts)

This code is ONLY used when upgrading from 'ceph' only charm, which isn't the case here, so we can discount it.  It does look like this code *could* call leader_set for every unit in the 'bootstrap-source' relation, which may be a problem.

So the only is_leader() ... leader_set() code path is the one that was quoted, and the only call out, that isn't Juju related, is "ceph.generate_monitor_secret()" which is very quick.

So, I'm wondering if the charm does manage to call leader_set() in time, but the controller doesn't 'see' the leader_set until several minutes later?

Revision history for this message

Alex Kavanagh (ajkavanagh) wrote on 2018-05-02:

#6

Sorry, I meant to comment on https://bugs.launchpad.net/charm-ceph-mon/+bug/1768241 -- but it also seems relevant here!

John A Meinel (jameinel) on 2018-05-03

tags:	added: leadership
Changed in juju:
importance:	Undecided → Medium
status:	New → Triaged

Revision history for this message

John A Meinel (jameinel) wrote on 2019-06-12:

#7

So we've noted several performance issues around leadership that have been fixed in or around 2.5.1. (moving to raft for leadership, and fixing performance issues around some of the lease update operations as the number of leases gets large.)
Given those fixes, it is likely that *this* bug was around "is-leader => leader-set" was taking longer than 30s to refresh its leadership, causing it to lose its leadership in that time, and thus we believe those issues to be fixed.
Not directly in 2.5.8, but as a close approximation.

Changed in juju:
milestone:	none → 2.5.8
status:	Triaged → Fix Released

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Fix Released	Medium	Unassigned	Canonical Juju 2.5.8
	Ceph Monitor Charm	Invalid	Undecided	Unassigned

Ceph Monitor Charm

ERROR cannot write leadership settings: cannot write settings: not the leader

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches