Unit falls out of quorum into leader-settings-changed loop

Bug #1768241 reported by Michael Skalka on 2018-05-01
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack ceph-mon charm
High
Alex Kavanagh

Bug Description

Ceph-mon charm rev 24

Had a Ceph-Mon cluster in quorum and healthy on Friday. Sometime on the following Sunday the unit fell out of quorum following what looks like a leader election. The charm fired off a config-'changed hook', then started to loop in 'leader-settings-changed' for a number of days.

`ceph status` on the stuck mon reports healthy, all three mons in quorum, all OSDs and PGs healthy.

Michael Skalka (mskalka) wrote :

Some status logs while crashdump churns:

ceph-mon/0 (the troublesome unit) https://pastebin.canonical.com/p/3rQf8CD9sv/

ceph-mon/2 (the new leader) https://pastebin.canonical.com/p/PcMYCpJWnB/

Michael Skalka (mskalka) wrote :

In attempting to debug what was happening I fired off the `leader-settings-changed` in the debug-hooks environment. After approximately four hours the same hook was still running. It looks like it's looping through every OSD in the cluster taking ~15 minutes per OSD to do... something:

...
Reading state information... Done
The following package was automatically installed and is no longer required:
  libfreetype6
Use 'sudo apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 22 not upgraded.
exported keyring for client.radosgw.gateway
obtained 'cephx.groups.objects'
value stored
obtained 'cephx.groups.objects'
value stored
obtained 'cephx.groups.objects'
value stored
obtained 'cephx.groups.objects'
value stored
obtained 'cephx.groups.objects'
value stored
obtained 'cephx.groups.objects'
value stored
obtained 'cephx.groups.objects'
value stored
obtained 'cephx.groups.objects'
value stored
obtained 'cephx.groups.objects'
value stored
obtained 'cephx.groups.objects'
value stored
obtained 'cephx.groups.objects'
value stored
exported keyring for client.nova-compute-kvm
...

Changed in charm-ceph-mon:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Alex Kavanagh (ajkavanagh)
John A Meinel (jameinel) wrote :
Download full text (5.3 KiB)

I don't seem to see the unit-ceph-mon-0 log file which is a bit surprising.

However, if I grep around for "resolver loop error" there seem to be 2 possible causes:
/5/var/log/juju/unit-filebeat-53.log:15079:2018-04-19 20:14:05 ERROR juju.worker.uniter agent.go:28 resolver loop error: could not acquire lock: cancelled acquiring mutex

./5/var/log/juju/unit-keystone-ldap-citrix-1.log:4180:2018-04-19 20:21:42 ERROR juju.worker.uniter agent.go:28 resolver loop error: preparing operation "run update-status hook": connection is shut down

The former message happens in the logs about 10x more often.
"cancelled acquiring mutex" is triggered in github.com/juju/mutex. Most likely in mutex_flock.go:
func acquire(spec Spec, timeout <-chan time.Time) (Releaser, error) {
    done := make(chan struct{})
    defer close(done)
    select {
    case result := <-acquireFlock(spec.Name, done):
        if result.err != nil {
            return nil, errors.Trace(result.err)
        }
        return result.m, nil
    case <-timeout:
        return nil, ErrTimeout
    case <-spec.Cancel:
        return nil, ErrCancelled
    }
}

Looking at how we're using the lock in github.com/juju/juju worker/uniter/uniter.go we see:
func (u *Uniter) acquireExecutionLock() (mutex.Releaser, error) {
 // We want to make sure we don't block forever when locking, but take the
 // Uniter's catacomb into account.
 spec := mutex.Spec{
  Name: u.hookLockName,
  Clock: u.clock,
  Delay: 250 * time.Millisecond,
  Cancel: u.catacomb.Dying(),
 }
 logger.Debugf("acquire lock %q for uniter hook execution", u.hookLockName)
 releaser, err := mutex.Acquire(spec)
 if err != nil {
  return nil, errors.Trace(err)
 }
 logger.Debugf("lock %q acquired", u.hookLockName)
 return releaser, nil
}

u.catacomb.Dying() should only be firing if we are shutting down this worker for some reason.

Again digging into the log files for "resolver loop error" reveals:
grep -rnI -C 2 "resolver loop error" .

/5/var/log/juju/unit-filebeat-53.log-15077-2018-04-19 20:14:05 ERROR juju.api monitor.go:59 health ping timed out after 30s
./5/var/log/juju/unit-filebeat-53.log-15078-2018-04-19 20:14:05 ERROR juju.worker.dependency engine.go:551 "api-caller" manifold worker returned unexpected error: api connection broken unexpectedly
./5/var/log/juju/unit-filebeat-53.log:15079:2018-04-19 20:14:05 ERROR juju.worker.uniter agent.go:28 resolver loop error: could not acquire lock: cancelled acquiring mutex
./5/var/log/juju/unit-filebeat-53.log-15080-2018-04-19 20:14:05 ERROR juju.worker.uniter agent.go:31 updating agent status: connection is shut down

./5/var/log/juju/unit-nrpe-57.log-583-2018-04-19 20:14:04 ERROR juju.api monitor.go:59 health ping timed out after 30s
./5/var/log/juju/unit-nrpe-57.log-584-2018-04-19 20:14:04 ERROR juju.worker.dependency engine.go:551 "api-caller" manifold worker returned unexpected error: api connection broken unexpectedly
./5/var/log/juju/unit-nrpe-57.log:585:2018-04-19 20:14:04 ERROR juju.worker.uniter agent.go:28 resolver loop error: could not acquire lock: cancelled acquiring mutex
./5/var/log/juju/unit-nrpe-57.log-586-2018-04-19 20:14:04 ERROR juju.worker.uniter agent.go:31 updating ag...

Read more...

Alex Kavanagh (ajkavanagh) wrote :
Download full text (5.3 KiB)

Unfortunately, the crashdump file doesn't show when the leader changed; there's no leader-elected for it in the logs. mysql, on the other hand is trying to do something weird with leadership ever 5 minutes. However, there are no mysql charm unit logs in the crashdump:

tar xJvf juju-crashdump-c16a6243-ad52-4eb6-838c-5fbf0fd71c6b.tar.xz | grep mysql
0/lxd/6/var/log/mysql.log
0/lxd/6/var/log/juju/unit-hacluster-mysql-1.log
0/lxd/6/var/log/mysql.err
0/lxd/6/var/log/mysql/
0/lxd/6/var/log/mysql/mysql-bin.index
0/lxd/6/var/log/mysql/error.log
1/lxd/7/var/log/mysql.log
1/lxd/7/var/log/juju/unit-hacluster-mysql-0.log
1/lxd/7/var/log/mysql.err
1/lxd/7/var/log/mysql/
1/lxd/7/var/log/mysql/mysql-bin.000001
1/lxd/7/var/log/mysql/mysql-bin.index
1/lxd/7/var/log/mysql/error.log
12/lxd/7/var/log/mysql.log
12/lxd/7/var/log/juju/unit-hacluster-mysql-2.log
12/lxd/7/var/log/mysql.err
12/lxd/7/var/log/mysql/
12/lxd/7/var/log/mysql/mysql-bin.000001
12/lxd/7/var/log/mysql/mysql-bin.index
12/lxd/7/var/log/mysql/error.log
hacluster-mysql_0
hacluster-mysql_1
hacluster-mysql_2
mysql_0
mysql_1
mysql_2

And ceph-mon/0 is missing:
tar xJvf juju-crashdump-c16a6243-ad52-4eb6-838c-5fbf0fd71c6b.tar.xz | grep ceph-mon
0/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-0-lxd-0.log.3.gz
0/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-0-lxd-0.log.6.gz
0/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-0-lxd-0.log.4.gz
0/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-0-lxd-0.log.7.gz
0/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-0-lxd-0.log.5.gz
0/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-0-lxd-0.log.2.gz
0/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-0-lxd-0.log.1.gz
0/lxd/0/var/lib/charm/ceph-mon/
0/lxd/0/var/lib/charm/ceph-mon/ceph.conf
1/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-1-lxd-0.log.5.gz
1/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-1-lxd-0.log.3.gz
1/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-1-lxd-0.log.2.gz
1/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-1-lxd-0.log.7.gz
1/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-1-lxd-0.log.1.gz
1/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-1-lxd-0.log.6.gz
1/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-1-lxd-0.log
1/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-1-lxd-0.log.4.gz
1/lxd/0/var/log/juju/unit-ceph-mon-1.log
1/lxd/0/var/lib/charm/ceph-mon/
1/lxd/0/var/lib/charm/ceph-mon/ceph.conf
12/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-12-lxd-0.log.6.gz
12/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-12-lxd-0.log
12/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-12-lxd-0.log.3.gz
12/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-12-lxd-0.log.1.gz
12/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-12-lxd-0.log.7.gz
12/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-12-lxd-0.log.2.gz
12/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-12-lxd-0.log.5.gz
12/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-12-lxd-0.log.4.gz
12/lxd/0/var/log/juju/unit-ceph-mon-2.log
12/lxd/0/var/lib/charm/ceph-mon/
12/lxd/0/var/lib/charm/ceph-mon/ceph.conf
ceph-mon_0
ceph-mon_1
ceph-mon_2

I think, sadly, the juju crash dump util didn't pick them up because it's huge.

(crashdump was shared via google drive as too big for launchpad).

One thing that is dying is mysql: (from debug_log.txt):

unit-mysql-2: 04:51:49 DEBUG unit.mysql/2.leader-elected ERROR cannot write leadership ...

Read more...

Corey Bryant (corey.bryant) wrote :

Hi Michael,

Can you add any controller logs to this bug per John's comments, regarding unresponsive controller, to help correlate the controller status vs his findings?

It might aslo be interesting to see what the actual failure is on mysql/2 mentioned in the traceback above from Alex for the failing 'leader-set leader-ip=10.14.244.133' call.

Thanks,
Corey

Michael Skalka (mskalka) wrote :

Missing ceph-mon-0.log

Michael Skalka (mskalka) wrote :

Controller model juju-crashdump

Alex Kavanagh (ajkavanagh) wrote :

This is my comment from: https://bugs.launchpad.net/charm-ceph-mon/+bug/1760138/comments/5

The relevance is:

"So the only is_leader() ... leader_set() code path is the one that was quoted, and the only call out, that isn't Juju related, is "ceph.generate_monitor_secret()" which is very quick.

So, I'm wondering if the charm does manage to call leader_set() in time, but the controller doesn't 'see' the leader_set until several minutes later?"

I'll also take a look at the log files/crash-dump.

Sandor Zeestraten (szeestraten) wrote :

Hitting the same issue here upgrading from rev. 9 to 24. Just keeps looping.

$ juju status ceph-mon
Model Controller Cloud/Region Version
openstack prodcont1 prodmaas 2.3.4

App Version Status Scale Charm Store Rev OS Notes
ceph-mon 10.2.6 maintenance 3 ceph-mon jujucharms 24 ubuntu
ceph-mon-nrpe active 3 nrpe jujucharms 41 ubuntu

Unit Workload Agent Machine Public address Ports Message
ceph-mon/0 maintenance executing 3 10.aa.b.213 (leader-settings-changed) Bootstrapping MON cluster
  ceph-mon-nrpe/0 active idle 10.aa.b.213 ready
ceph-mon/1* active idle 4 10.aa.b.109 Unit is ready and clustered
  ceph-mon-nrpe/1* active idle 10.aa.b.109 ready
ceph-mon/2 maintenance executing 5 10.aa.b.89 (leader-settings-changed) Bootstrapping MON cluster
  ceph-mon-nrpe/2 active idle 10.aa.b.89 ready

Correction for #9. Issue cleared after about an hour of retrying.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers