Juju Charms Collection
nova-compute package

New deploys of nova-compute charm sometimes go into a relation-changed loop

Bug #1415763 reported by Paul Gear on 2015-01-29

This bug report is a duplicate of: Bug #1389670: 100x increase on shared-db-relation-changed activity recent charm revisions. Edit Remove

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	nova-compute (Juju Charms Collection)	In Progress	Medium	Edward Hope-Morley	Juju Charms Collection 15.04

Bug Description

On a new deploy of the nova-compute charm, 2 out of 5 deploys have resulted in nova-compute going into a loop running two different relation-changed hooks. I'll attach a log of the ceilometer subordinate charm on the same host, showing the nova-compute cycling between these two hooks. I've also confirmed that the two hooks run successfully using debug-hooks.

I suspect a race condition between nova-compute and one of the other OpenStack components. I'll also attach our juju status brief output, showing the units with failures.

Tags:

Revision history for this message

Paul Gear (paulgear) wrote on 2015-01-29:

124520.txt Edit (6.1 KiB, text/plain)

Revision history for this message

Paul Gear (paulgear) wrote on 2015-01-29:

Juju status summary when there is a failure Edit (1.8 KiB, text/plain)

Edward Hope-Morley (hopem) on 2015-01-29

tags:

added: openstack

JuanJo Ciarlante (jjo) on 2015-01-29

tags:

added: canonical-bootstack

Revision history for this message

Edward Hope-Morley (hopem) wrote on 2015-01-29:

I wonder if this is the same issue we saw with percona-cluster bug 1389670.

If your relations are spinning could you please try:

Get relid:

juju run --unit nova-compute/0 "relation-ids shared-db"

Then:

    juju run --unit nova-compute/0 "relation-get -r <relid> - mysql/0" > 1
    juju run --unit nova-compute/0 "relation-get -r <relid> - mysql/0" > 2
    juju run --unit nova-compute/0 "relation-get -r <relid> - mysql/0" > 3
    juju run --unit nova-compute/0 "relation-get -r <relid> - mysql/0" > 4

Then diff 1 2, diff 2 3 etc and paste the output. If it looks like the the settings are changing/toggling on each run then it is likely the same issue we are seing with Percona and would require the same fix.

Also, you can actually do without relating db with nova-compute unless you are using nova-network

Revision history for this message

Paul Gear (paulgear) wrote on 2015-01-30:

Thanks Edward. Next time it comes up, I'll make sure I gather those.

Revision history for this message

Paul Gear (paulgear) wrote on 2015-02-02:

I've encountered this issue again, and unfortunately I'm not getting past square 1: The first juju run you mentioned above gives the message "ERROR command timed out" after about 3 minutes. More than 30 minutes after attempting it, "juju-run nova-compute/2 relation-ids shared-db" and "juju-run nova-compute/2 df" (which I ran to find out whether the problem lay with the relation-ids part or juju run in general) are still polling a domain sockets that lsof is unable to identify as something other than "socket". Is there some further troubleshooting that I can do to work out what's going on with juju on this node?

Revision history for this message

Paul Gear (paulgear) wrote on 2015-02-03:

A further note: the relation that seemed to be experiencing the most churn was the nova-cloud-controller <-> nova-compute one; I tried a deploy without the mysql <-> nova-compute relation present and still encountered this issue.

Revision history for this message

Paul Gear (paulgear) wrote on 2015-02-03:

I downgraded to juju 1.20.14 from trusty-updates and this issue still occurs.

Revision history for this message

Paul Gear (paulgear) wrote on 2015-02-03:

Correction: 1.20.11

Revision history for this message

Paul Gear (paulgear) wrote on 2015-02-03:

#10

I've been attempting to debug this all day, and I don't believe it's a bug in nova-compute. On machine zero, the following hooks are running constantly:

root 20082 29682 47 06:41 ? 00:00:03 /usr/bin/python /var/lib/juju/agents/unit-nova-cloud-controller-0/charm/hooks/shared-db-relation-changed
root 21307 2397 45 06:41 ? 00:00:02 /usr/bin/python /var/lib/juju/agents/unit-keystone-0/charm/hooks/shared-db-relation-changed
root 22246 15530 71 06:41 ? 00:00:02 /usr/bin/python /var/lib/juju/agents/unit-neutron-api-0/charm/hooks/shared-db-relation-changed
root 22451 16225 89 06:41 ? 00:00:03 /usr/bin/python /var/lib/juju/agents/unit-glance-0/charm/hooks/shared-db-relation-changed
root 24414 14705 99 06:41 ? 00:00:01 /usr/bin/python /var/lib/juju/agents/unit-cinder-0/charm/hooks/shared-db-relation-changed

As an example, the keystone hook takes nearly 40 seconds to run, during which time it produces over 1500 lines of logging data. In about 96 minutes since that unit's log file was created, that hook has run 138 times. Figures for the other units:

unit-nova-cloud-controller-0: 516 times
unit-neutron-api-0: 736 times
unit-glance-0: 581 times
unit-cinder-0: 701 times

It seems to me this is an issue either with juju itself, or with the mysql charm.

Edward Hope-Morley (hopem) on 2015-02-04

tags:	added: cts
Changed in nova-compute (Juju Charms Collection):
status:	New → In Progress
importance:	Undecided → Medium
assignee:	nobody → Edward Hope-Morley (hopem)
milestone:	none → 15.04
tags:	added: backport-potential

Revision history for this message

Edward Hope-Morley (hopem) wrote on 2015-02-04:

#11

Ok so, firstly I think the root cause here is that same as bug 1389670 since mysql and percona-cluster charms share the same logic for determining and distributing allowed_units to shared-db relation. It is currently unnecessarily noisy and I believe that you could hit this problem with any charm that uses the shared-db relation. So I am going to couple this bug with bug 1389670 which I am fixing first. ultimate I will be moving the duplicate coded into charm-helpers.contrib so that both share the same common code.

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #1389670 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Juju Charms Collectionnova-compute package

New deploys of nova-compute charm sometimes go into a relation-changed loop

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Juju Charms Collection
nova-compute package