Juju Charms Collection
cinder package

Juju 1.25.3 - infinite loop on cluster-relation-changed - cinder/apache services constantly restarted

Bug #1561927 reported by Alvaro Uria on 2016-03-25

This bug affects 9 people

Affects		Status	Importance	Assigned to	Milestone
	cinder (Juju Charms Collection)	Fix Released	High	Edward Hope-Morley	Juju Charms Collection 17.01

Bug Description

LSB: Ubuntu 14.04.4 LTS
openstack: cloud:trusty-liberty
cinder packages: 2:7.0.1-0ubuntu1~cloud0
cinder charm: lp:charms/trusty/cinder;revno=106
Juju: 1.25.3.1
num_units: 3
related to hacluster charm (which remains idle)

Symptoms:
All three units constantly run cluster-relation-changed, causing restart of all cinder upstart jobs as well as apache2.
If a unit is stopped, the other two stop looping. By restarting stopped unit and juju resolving it, loop starts on all three.

Temp solution to end loop (only applied on cinder/1):
"""
@hooks.hook('cluster-relation-changed',
            'cluster-relation-departed')
@restart_on_change(restart_map(), stopstart=True)
def cluster_changed():
    #check_db_initialised()
    #CONFIGS.write_all()
    pass
"""

Such temporary solution was done at 13:08 (see attached 20160325-unit-cinder-1.log). Once all three units settled, I rolledback cluster_changed() config to original code (uncommenting check_db_initialised and CONFIGS.write_all), at 13:09.

Please let me know if you need further details.

See original description

Tags:

Revision history for this message

Alvaro Uria (aluria) wrote on 2016-03-25:

20160325-unit-cinder-1.log Edit (185.9 KiB, application/octet-stream)

Revision history for this message

Alvaro Uria (aluria) wrote on 2016-03-25:

juju status and status-history Edit (3.7 KiB, text/plain)

cinder-0 attachment shows juju status when all three units are in the loop.

juju status-history cinder/0 shows transition between states when I was stopping peer units (showing active, as it stops looping) or cinder/0 unit itself (showing error state).

description:

updated

Revision history for this message

Alvaro Uria (aluria) wrote on 2016-04-01:

Hi,

This is happening on three different ha+liberty+juju 1.25.3 deployments.

I made cluster_changed() "pass" until units settled (less than a minute) and restored cluster_changed() code.

Cheers,
-Alvaro.

Revision history for this message

Jill Rouleau (jillrouleau) wrote on 2016-04-05:

This behaviour manifests in these clouds every few days. Changing cluster_changed() to "pass", then restoring the original code temporarily resolves things but it always comes back. Are there additional logs, diagnostics, or troubleshooting we can provide?

Revision history for this message

Robert Clark (returntoreptar) wrote on 2016-04-06:

This behavior is manifesting it with my deployment and as of right now I have applied no fixes to it. Any logs I can provide I would be happy to.

Revision history for this message

Alvaro Uria (aluria) wrote on 2016-04-18:

Hi,

We're seeing this behaviour every weeks on different Clouds. Would permanently leaving "pass" on cluster_changed hook be ok?

Thank you,
-Alvaro.

James Page (james-page) on 2016-04-18

Changed in cinder (Juju Charms Collection):
milestone:	none → 16.04
assignee:	nobody → James Page (james-page)

James Page (james-page) on 2016-04-22

Changed in cinder (Juju Charms Collection):
milestone:	16.04 → 16.07

Revision history for this message

Junien F (axino) wrote on 2016-05-03:

Hi,

I'm impacted by this bug as well, with 3 cinder units. I believe this is due to the following code :
https://paste.ubuntu.com/16202202/

Each unit sets CINDER_DB_INIT_RKEY and CINDER_DB_INIT_ECHO_RKEY in the relation, which in turns calls the "cluster-relation-changed" hook on the other units, which changes this in the relation and so on.

I'm not sure what the purpose of this settings are, but I guess the best solution is to use Juju's leader mechanism, and have only the leader instruct its peers to restart.

Thank you !

James Page (james-page) on 2016-08-01

Changed in cinder (Juju Charms Collection):
milestone:	16.07 → 16.10

Revision history for this message

Edward Hope-Morley (hopem) wrote on 2016-11-23:

I have just upgrade cinder to 16.10 and am seeing all cinder processes restarted every 10s.

Changed in cinder (Juju Charms Collection):
importance:	Undecided → High
milestone:	16.10 → 17.01

Edward Hope-Morley (hopem) on 2016-11-23

tags:

added: openstack sts

Edward Hope-Morley (hopem) on 2016-11-24

Changed in cinder (Juju Charms Collection):
assignee:	James Page (james-page) → Edward Hope-Morley (hopem)

Revision history for this message

Edward Hope-Morley (hopem) wrote on 2016-11-25:

Ok i think ive found the problem and it looks like the db init check code does not tolerate leader switch. I'll have a patch up shortly.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-11-25: Fix proposed to charm-cinder (master)

#10

Fix proposed to branch: master
Review: https://review.openstack.org/402954

Changed in cinder (Juju Charms Collection):
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-12-06: Fix merged to charm-cinder (master)

#11

Reviewed: https://review.openstack.org/402954
Committed: https://git.openstack.org/cgit/openstack/charm-cinder/commit/?id=1e1000a0892046b158e104025b04d3cf53a2a1b8
Submitter: Jenkins
Branch: master

commit 1e1000a0892046b158e104025b04d3cf53a2a1b8
Author: Edward Hope-Morley <email address hidden>
Date: Fri Nov 25 16:20:05 2016 +0000

Fix cluster relation unnecessary service restarts

    The logic introduced in commit 619ce065 to formalise database
    initialisation did not support the leader switching and re-runs
    of the shared-db relation. This resulted in extraneous service
    restarts. We avoid this by adding some extra logic around this
    code.

Change-Id: If988331e552da930eff868abded323014fd50f04
Closes-Bug: 1561927

Changed in cinder (Juju Charms Collection):
status:	In Progress → Fix Committed

Edward Hope-Morley (hopem) on 2016-12-07

tags:

added: backport-potential

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-12-07: Fix proposed to charm-cinder (stable/16.10)

#12

Fix proposed to branch: stable/16.10
Review: https://review.openstack.org/408050

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-12-12: Fix merged to charm-cinder (stable/16.10)

#13

Reviewed: https://review.openstack.org/408050
Committed: https://git.openstack.org/cgit/openstack/charm-cinder/commit/?id=0f7358b258dd21a8a9567175ed298c1ffacc556d
Submitter: Jenkins
Branch: stable/16.10

commit 0f7358b258dd21a8a9567175ed298c1ffacc556d
Author: Edward Hope-Morley <email address hidden>
Date: Fri Nov 25 16:20:05 2016 +0000

Fix cluster relation unnecessary service restarts

    Closes-Bug: 1561927
    (cherry picked from commit 1e1000a0892046b158e104025b04d3cf53a2a1b8)
    Change-Id: If988331e552da930eff868abded323014fd50f04

Edward Hope-Morley (hopem) on 2017-02-16

Changed in cinder (Juju Charms Collection):
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Juju Charms Collectioncinder package

Juju 1.25.3 - infinite loop on cluster-relation-changed - cinder/apache services constantly restarted

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Juju Charms Collection
cinder package