Unit falls out of quorum into leader-settings-changed loop

Bug #1768241 reported by Michael Skalka
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
Expired
High
Unassigned

Bug Description

Ceph-mon charm rev 24

Had a Ceph-Mon cluster in quorum and healthy on Friday. Sometime on the following Sunday the unit fell out of quorum following what looks like a leader election. The charm fired off a config-'changed hook', then started to loop in 'leader-settings-changed' for a number of days.

`ceph status` on the stuck mon reports healthy, all three mons in quorum, all OSDs and PGs healthy.

Revision history for this message
Michael Skalka (mskalka) wrote :

Some status logs while crashdump churns:

ceph-mon/0 (the troublesome unit) https://pastebin.canonical.com/p/3rQf8CD9sv/

ceph-mon/2 (the new leader) https://pastebin.canonical.com/p/PcMYCpJWnB/

Revision history for this message
Michael Skalka (mskalka) wrote :

In attempting to debug what was happening I fired off the `leader-settings-changed` in the debug-hooks environment. After approximately four hours the same hook was still running. It looks like it's looping through every OSD in the cluster taking ~15 minutes per OSD to do... something:

...
Reading state information... Done
The following package was automatically installed and is no longer required:
  libfreetype6
Use 'sudo apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 22 not upgraded.
exported keyring for client.radosgw.gateway
obtained 'cephx.groups.objects'
value stored
obtained 'cephx.groups.objects'
value stored
obtained 'cephx.groups.objects'
value stored
obtained 'cephx.groups.objects'
value stored
obtained 'cephx.groups.objects'
value stored
obtained 'cephx.groups.objects'
value stored
obtained 'cephx.groups.objects'
value stored
obtained 'cephx.groups.objects'
value stored
obtained 'cephx.groups.objects'
value stored
obtained 'cephx.groups.objects'
value stored
obtained 'cephx.groups.objects'
value stored
exported keyring for client.nova-compute-kvm
...

Changed in charm-ceph-mon:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Alex Kavanagh (ajkavanagh)
Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (5.3 KiB)

I don't seem to see the unit-ceph-mon-0 log file which is a bit surprising.

However, if I grep around for "resolver loop error" there seem to be 2 possible causes:
/5/var/log/juju/unit-filebeat-53.log:15079:2018-04-19 20:14:05 ERROR juju.worker.uniter agent.go:28 resolver loop error: could not acquire lock: cancelled acquiring mutex

./5/var/log/juju/unit-keystone-ldap-citrix-1.log:4180:2018-04-19 20:21:42 ERROR juju.worker.uniter agent.go:28 resolver loop error: preparing operation "run update-status hook": connection is shut down

The former message happens in the logs about 10x more often.
"cancelled acquiring mutex" is triggered in github.com/juju/mutex. Most likely in mutex_flock.go:
func acquire(spec Spec, timeout <-chan time.Time) (Releaser, error) {
    done := make(chan struct{})
    defer close(done)
    select {
    case result := <-acquireFlock(spec.Name, done):
        if result.err != nil {
            return nil, errors.Trace(result.err)
        }
        return result.m, nil
    case <-timeout:
        return nil, ErrTimeout
    case <-spec.Cancel:
        return nil, ErrCancelled
    }
}

Looking at how we're using the lock in github.com/juju/juju worker/uniter/uniter.go we see:
func (u *Uniter) acquireExecutionLock() (mutex.Releaser, error) {
 // We want to make sure we don't block forever when locking, but take the
 // Uniter's catacomb into account.
 spec := mutex.Spec{
  Name: u.hookLockName,
  Clock: u.clock,
  Delay: 250 * time.Millisecond,
  Cancel: u.catacomb.Dying(),
 }
 logger.Debugf("acquire lock %q for uniter hook execution", u.hookLockName)
 releaser, err := mutex.Acquire(spec)
 if err != nil {
  return nil, errors.Trace(err)
 }
 logger.Debugf("lock %q acquired", u.hookLockName)
 return releaser, nil
}

u.catacomb.Dying() should only be firing if we are shutting down this worker for some reason.

Again digging into the log files for "resolver loop error" reveals:
grep -rnI -C 2 "resolver loop error" .

/5/var/log/juju/unit-filebeat-53.log-15077-2018-04-19 20:14:05 ERROR juju.api monitor.go:59 health ping timed out after 30s
./5/var/log/juju/unit-filebeat-53.log-15078-2018-04-19 20:14:05 ERROR juju.worker.dependency engine.go:551 "api-caller" manifold worker returned unexpected error: api connection broken unexpectedly
./5/var/log/juju/unit-filebeat-53.log:15079:2018-04-19 20:14:05 ERROR juju.worker.uniter agent.go:28 resolver loop error: could not acquire lock: cancelled acquiring mutex
./5/var/log/juju/unit-filebeat-53.log-15080-2018-04-19 20:14:05 ERROR juju.worker.uniter agent.go:31 updating agent status: connection is shut down

./5/var/log/juju/unit-nrpe-57.log-583-2018-04-19 20:14:04 ERROR juju.api monitor.go:59 health ping timed out after 30s
./5/var/log/juju/unit-nrpe-57.log-584-2018-04-19 20:14:04 ERROR juju.worker.dependency engine.go:551 "api-caller" manifold worker returned unexpected error: api connection broken unexpectedly
./5/var/log/juju/unit-nrpe-57.log:585:2018-04-19 20:14:04 ERROR juju.worker.uniter agent.go:28 resolver loop error: could not acquire lock: cancelled acquiring mutex
./5/var/log/juju/unit-nrpe-57.log-586-2018-04-19 20:14:04 ERROR juju.worker.uniter agent.go:31 updating ag...

Read more...

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :
Download full text (5.3 KiB)

Unfortunately, the crashdump file doesn't show when the leader changed; there's no leader-elected for it in the logs. mysql, on the other hand is trying to do something weird with leadership ever 5 minutes. However, there are no mysql charm unit logs in the crashdump:

tar xJvf juju-crashdump-c16a6243-ad52-4eb6-838c-5fbf0fd71c6b.tar.xz | grep mysql
0/lxd/6/var/log/mysql.log
0/lxd/6/var/log/juju/unit-hacluster-mysql-1.log
0/lxd/6/var/log/mysql.err
0/lxd/6/var/log/mysql/
0/lxd/6/var/log/mysql/mysql-bin.index
0/lxd/6/var/log/mysql/error.log
1/lxd/7/var/log/mysql.log
1/lxd/7/var/log/juju/unit-hacluster-mysql-0.log
1/lxd/7/var/log/mysql.err
1/lxd/7/var/log/mysql/
1/lxd/7/var/log/mysql/mysql-bin.000001
1/lxd/7/var/log/mysql/mysql-bin.index
1/lxd/7/var/log/mysql/error.log
12/lxd/7/var/log/mysql.log
12/lxd/7/var/log/juju/unit-hacluster-mysql-2.log
12/lxd/7/var/log/mysql.err
12/lxd/7/var/log/mysql/
12/lxd/7/var/log/mysql/mysql-bin.000001
12/lxd/7/var/log/mysql/mysql-bin.index
12/lxd/7/var/log/mysql/error.log
hacluster-mysql_0
hacluster-mysql_1
hacluster-mysql_2
mysql_0
mysql_1
mysql_2

And ceph-mon/0 is missing:
tar xJvf juju-crashdump-c16a6243-ad52-4eb6-838c-5fbf0fd71c6b.tar.xz | grep ceph-mon
0/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-0-lxd-0.log.3.gz
0/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-0-lxd-0.log.6.gz
0/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-0-lxd-0.log.4.gz
0/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-0-lxd-0.log.7.gz
0/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-0-lxd-0.log.5.gz
0/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-0-lxd-0.log.2.gz
0/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-0-lxd-0.log.1.gz
0/lxd/0/var/lib/charm/ceph-mon/
0/lxd/0/var/lib/charm/ceph-mon/ceph.conf
1/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-1-lxd-0.log.5.gz
1/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-1-lxd-0.log.3.gz
1/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-1-lxd-0.log.2.gz
1/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-1-lxd-0.log.7.gz
1/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-1-lxd-0.log.1.gz
1/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-1-lxd-0.log.6.gz
1/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-1-lxd-0.log
1/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-1-lxd-0.log.4.gz
1/lxd/0/var/log/juju/unit-ceph-mon-1.log
1/lxd/0/var/lib/charm/ceph-mon/
1/lxd/0/var/lib/charm/ceph-mon/ceph.conf
12/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-12-lxd-0.log.6.gz
12/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-12-lxd-0.log
12/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-12-lxd-0.log.3.gz
12/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-12-lxd-0.log.1.gz
12/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-12-lxd-0.log.7.gz
12/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-12-lxd-0.log.2.gz
12/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-12-lxd-0.log.5.gz
12/lxd/0/var/log/ceph/ceph-mon.juju-33f5dc-12-lxd-0.log.4.gz
12/lxd/0/var/log/juju/unit-ceph-mon-2.log
12/lxd/0/var/lib/charm/ceph-mon/
12/lxd/0/var/lib/charm/ceph-mon/ceph.conf
ceph-mon_0
ceph-mon_1
ceph-mon_2

I think, sadly, the juju crash dump util didn't pick them up because it's huge.

(crashdump was shared via google drive as too big for launchpad).

One thing that is dying is mysql: (from debug_log.txt):

unit-mysql-2: 04:51:49 DEBUG unit.mysql/2.leader-elected ERROR cannot write leadership ...

Read more...

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Hi Michael,

Can you add any controller logs to this bug per John's comments, regarding unresponsive controller, to help correlate the controller status vs his findings?

It might aslo be interesting to see what the actual failure is on mysql/2 mentioned in the traceback above from Alex for the failing 'leader-set leader-ip=10.14.244.133' call.

Thanks,
Corey

Revision history for this message
Michael Skalka (mskalka) wrote :

Missing ceph-mon-0.log

Revision history for this message
Michael Skalka (mskalka) wrote :

Controller model juju-crashdump

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

This is my comment from: https://bugs.launchpad.net/charm-ceph-mon/+bug/1760138/comments/5

The relevance is:

"So the only is_leader() ... leader_set() code path is the one that was quoted, and the only call out, that isn't Juju related, is "ceph.generate_monitor_secret()" which is very quick.

So, I'm wondering if the charm does manage to call leader_set() in time, but the controller doesn't 'see' the leader_set until several minutes later?"

I'll also take a look at the log files/crash-dump.

Revision history for this message
Sandor Zeestraten (szeestraten) wrote :

Hitting the same issue here upgrading from rev. 9 to 24. Just keeps looping.

$ juju status ceph-mon
Model Controller Cloud/Region Version
openstack prodcont1 prodmaas 2.3.4

App Version Status Scale Charm Store Rev OS Notes
ceph-mon 10.2.6 maintenance 3 ceph-mon jujucharms 24 ubuntu
ceph-mon-nrpe active 3 nrpe jujucharms 41 ubuntu

Unit Workload Agent Machine Public address Ports Message
ceph-mon/0 maintenance executing 3 10.aa.b.213 (leader-settings-changed) Bootstrapping MON cluster
  ceph-mon-nrpe/0 active idle 10.aa.b.213 ready
ceph-mon/1* active idle 4 10.aa.b.109 Unit is ready and clustered
  ceph-mon-nrpe/1* active idle 10.aa.b.109 ready
ceph-mon/2 maintenance executing 5 10.aa.b.89 (leader-settings-changed) Bootstrapping MON cluster
  ceph-mon-nrpe/2 active idle 10.aa.b.89 ready

Revision history for this message
Sandor Zeestraten (szeestraten) wrote :

Correction for #9. Issue cleared after about an hour of retrying.

Changed in charm-ceph-mon:
assignee: Alex Kavanagh (ajkavanagh) → nobody
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Is this still occurring for anyone? It's a relatively old bug, and there's been quite a lot of work done in this area of the charm since the bug was filed. Setting to incomplete; please set back to Triaged if it's still a concern or has happened since the last update to the bug. Thanks.

Changed in charm-ceph-mon:
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack ceph-mon charm because there has been no activity for 60 days.]

Changed in charm-ceph-mon:
status: Incomplete → Expired
Revision history for this message
Alireza Nasri (sysnasri) wrote :

I have got the same issue at the moment.

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Hi Alireza, which version of the ceph-mon charm are you using?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.