subordinate juju agent does not start when principal is very busy

Bug #1717590 reported by Jason Hobbs
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Medium
Andrew Wilkins
OpenStack HA Cluster Charm
Invalid
Undecided
Unassigned
OpenStack Keystone Charm
Triaged
Medium
Unassigned
OpenStack Percona Cluster Charm
Fix Released
Critical
David Ames
Telegraf Charm
Invalid
Undecided
Unassigned

Bug Description

When a principle is very busy immediately after deploy, it is possible that subordinates might never get a Juju agent started.

=====

juju version: 2.2.4
percona-cluster charm: cs:percona-cluster-254

=Description=
This is a bundle using percona-cluster and hacluster:
http://paste.ubuntu.com/25542493/

It gets stuck, so far for about an hour, in a state where hacluster-pxc is not setup because the hacluster unit on the mysql/0 host never gets to run its hooks to setup:
http://paste.ubuntu.com/25542486/

The mysql/0 unit continuously fires, here's the log for it:
http://paste.ubuntu.com/25542506/

=To Reproduce=
Deploy the bundle shown above. I don't know for sure if it happens everytime yet.

=Partial Workaround=
I have been able to workaround this for a manual deployment by stopping the jujud-unit-mysql-0 service, letting the hacluster charm do its thing, then restarting the service. Unfortunately, this is difficult to automate.

tags: added: foundations-engine
tags: added: cdo-qa
tags: added: cdo-qa-blocker
description: updated
description: updated
Revision history for this message
Liam Young (gnuoy) wrote :

Hi Jason.
    To narrow down the issue would it be possible to test without telegraf ? Looking at the mysql hook executions from your pastebin I see ~60 telegraf units with relation to mysql ( http://paste.ubuntu.com/25563847/ )

Revision history for this message
Jason Hobbs (jason-hobbs) wrote : Re: [Bug 1717590] Re: percona-cluster unit constantly fires hooks, blocks hacluster unit from being setup

Liam,

I tested again without telegraf and didn't hit this issue - hacluster-pxc
was able to come up on its own without any intervention in a reasonable
amount of time - about 90 minutes after the deploy started, where before it
was hitting a 4 hour time limit and failing.

Jason

On Mon, Sep 18, 2017 at 4:03 AM, Liam Young <email address hidden>
wrote:

> Hi Jason.
> To narrow down the issue would it be possible to test without telegraf
> ? Looking at the mysql hook executions from your pastebin I see ~60
> telegraf units with relation to mysql ( http://paste.ubuntu.com/25563847/
> )
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1717590
>
> Title:
> percona-cluster unit constantly fires hooks, blocks hacluster unit
> from being setup
>
> Status in OpenStack percona-cluster charm:
> New
>
> Bug description:
> juju version: 2.2.4
> percona-cluster charm: cs:percona-cluster-254
>
> =Description=
> This is a bundle using percona-cluster and hacluster:
> http://paste.ubuntu.com/25542493/
>
> It gets stuck, so far for about an hour, in a state where hacluster-pxc
> is not setup because the hacluster unit on the mysql/0 host never gets to
> run its hooks to setup:
> http://paste.ubuntu.com/25542486/
>
> The mysql/0 unit continuously fires, here's the log for it:
> http://paste.ubuntu.com/25542506/
>
> =To Reproduce=
> Deploy the bundle shown above. I don't know for sure if it happens
> everytime yet.
>
> =Partial Workaround=
> I have been able to workaround this for a manual deployment by stopping
> the jujud-unit-mysql-0 service, letting the hacluster charm do its thing,
> then restarting the service. Unfortunately, this is difficult to automate.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/charm-percona-cluster/+bug/
> 1717590/+subscriptions
>

Ryan Beisner (1chb1n)
tags: added: uosci
Revision history for this message
Ryan Beisner (1chb1n) wrote :

Per the attached logs, one of the hacluster-pxc appears to not have gotten a juju agent:

  hacluster-pxc/0* waiting allocating 10.245.209.34 agent initializing

no longer affects: prometheus-charm
Changed in charm-hacluster:
status: New → Incomplete
Changed in charm-percona-cluster:
status: New → Incomplete
status: Incomplete → Invalid
Changed in charm-hacluster:
status: Incomplete → Invalid
Changed in telegraf-charm:
status: New → Invalid
Revision history for this message
Ryan Beisner (1chb1n) wrote :

This appears to be a system resource/sizing issue, and/or a tooling issue.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

No, sorry, that's exactly the problem. Its agent doesn't get initialized until I disable the mysqld unit on that machine.

Changed in telegraf-charm:
status: Invalid → New
Changed in charm-percona-cluster:
status: Invalid → New
Changed in charm-hacluster:
status: Invalid → New
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Once mysqld is disabled on the machine, the hacluster-pxc unit gets a chance to start doing stuff, and moves past "agent initializing" to get setup.

Revision history for this message
Ryan Beisner (1chb1n) wrote :

The mysql/* units are all in active/ready state.

The problem unit is in a waiting/allocating machine/agent state. That is entirely at the level of the underlying system and tooling (Ubuntu, LXD, cloudinit, Juju, et al). That is outside the purview of the charm/charmer.

Whether that is a system resource overcommit, or a tooling bug, I'm not certain.

Changed in telegraf-charm:
status: New → Incomplete
Changed in charm-percona-cluster:
status: New → Incomplete
Changed in charm-hacluster:
status: New → Incomplete
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

fyi everybody on telegraf:

scope = scopes.GLOBAL
https://github.com/juju-solutions/interface-mysql-root/blob/master/requires.py#L19-L26

https://git.launchpad.net/telegraf-charm/tree/hooks/relations/mysql-root/requires.py?h=built#n26

https://jujucharms.com/docs/2.2/developer-layers-interfaces#global
"All connected applications and units for this relation will share a single conversation. The same data will be broadcast to every remote unit, and retrieved data will be aggregated across all remote units and is expected to either eventually agree or be set by a single leader."

https://git.launchpad.net/telegraf-charm/tree/metadata.yaml?h=built#n18
"requires":
  "mysql":
    "interface": "mysql-root"

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

I'm not convinced it's not a charm issue. If mysql is being clustered using hacluster, should it not wait to join relations with other applications until its hacluster is done being setup?

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Dmitrii, sorry, I don't fully get your comment. Is there some action to be taken based on that information?

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

I'm not sure what context this is in, but I see the same services causing the mysql unit to be doing stuff over and over and over, blocking hacluster-pxc from getting a chance to do anything:

keystone for example:
jason@inire:~/Dropbox/jason stuff$ grep "Grant exists for host '192.168.33.154' on db 'keystone'" mysql0.txt | wc -l
184

First at:
2017-09-15 18:16:45 DEBUG juju-log shared-db:63: Grant exists for host '192.168.33.154' on db 'keystone'

all the way through to:
2017-09-15 19:20:55 DEBUG juju-log db-admin:153: Grant exists for host '192.168.33.154' on db 'keystone'

same for cinder:
jason@inire:~/Dropbox/jason stuff$ grep "Grant exists for host '192.168.33.140' on db 'cinder'" mysql0.txt | wc -l
225

and many more..

To me that seems like a charm issue. Why are the same operations being repeated over and over before the service is actually ready, preventing the service from ever becoming ready?

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Jason, I suspect that having a non-container-scoped relation will generate a lot of unnecessary traffic.

I think we might need to change telegraf's mysql relation scope to "container".

"requires":
  "mysql":
    "interface": "mysql-root"
     "scope": "container"

I pasted "reactive" framework side of it on top which isn't exactly right.

Relation scope is a juju-level not a reactive library level concept:

https://github.com/juju/juju/blob/juju-2.2.4/apiserver/uniter/subordinaterelationwatcher.go#L27-L33
// newSubordinateRelationsWatcher creates a watcher that will notify
// about relation lifecycle events for subordinateApp, but filtered to
// be relevant to a unit deployed to a container with the
// principalName app. Global relations will be included, but only
// container-scoped relations for the principal application will be
// emitted - other container-scoped relations will be filtered out.

https://jujucharms.com/docs/2.2/authors-subordinate-applications
"Principal application: A traditional application or charm in whose container subordinate applications will execute.

Subordinate application/charm: An application designed for and deployed to the running container of another application unit.

Container relation: A scope:container relationship. While modeled identically to traditional, scope: global, relationships, juju only implements the relationship between the units belonging to the same container."

So, we need to make sure that telegraf, while being a subordinate charm (subordinate": "true" in metadata.yaml), also has a proper relation scope for mysql (and pgsql) relation so that mysql-relation-{joined,changed} events are not fired for other telegraf units that do not collect data from mysql and do not reside on the same host.

Just in case, telegraf collects metrics from mysql so a single agent only needs to reside on the same host and talk to a particular mysql instance https://github.com/influxdata/telegraf/tree/master/plugins/inputs/mysql

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Ok Dmitrii, that makes sense. I think what you're saying is telegraf is only talking to mysql to get information from it to report on, not to use it as a database, so only the telegrafs running on the same containers as mysql need to join the relation with mysql.

Revision history for this message
Ryan Beisner (1chb1n) wrote :

Hi @jason-hobbs - I think there are 3 things happening here.

Dmitrii's comments #10 + #14 represent an engineering assessment that the telegraf charm looks to have a scale/efficiency and/or potential design issue. That should likely be a separate bug. @dmitriis, can you raise that bug?

We agree there is a separate bug re: #11 (LP 1718247). Indeed that condition could be handled more gracefully, and we're talking about a way to guard that. But that won't cause a lower-level juju agent not to initialize.

While those ^ may be compounding this, the lack of a juju agent on one of the units in question is the essence of this bug as reported and observed. If all agents are good and this issue still occurs, that's naturally a different position.

From a Juju model perspective, the principle (mysql aka pxc) is all good from an agent and application state perspective, which means the juju agents for the subordinates should be clear to init as well.

However, one of the subordinates doesn't have a juju agent, so the charm code isn't a factor at the time the condition is observed.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Ryan, thanks for the responses.

I've stated this before, but the reason there is no agent for hacluster-pxc/0 is because the mysqld unit on that machine is continuously firing hooks, which is blocking the hacluster-pxc/0 agent from ever being setup, as a hook needs to fire to do that, juju only allows one hook to fire at a time, and mysqld/0 is constantly firing hooks. Disabling the mysqld service on that machine gets us out of that situation by giving hacluster-pxc service a chance to fire hooks and get setup.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

The mysql charm code is very much a factor in that situation, in that it's servicing a bunch of hooks, hundreds of times for the same services, preventing the hacluster-pxc agent from getting a chance to initialize itself. I do think bug 1718247 would address that situation.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Can someone from the juju team confirm my theory that the mysqld unit servicing hooks constantly can block the hacluster-pxc unit on the same machine from progressing past "agent initializing"? I haven't found any clear documentation on this.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Jason,

> "running on the same containers as mysql need to join the relation with mysql"

Yes, and, being generic, Juju doesn't tie "container scope of a relation" to "charms placed in the same system container". So, in practice, this will work for both "in-container" and outside container cases with subordinate and principal charms simply residing on the same logical machine.

Just wanted to go through all cases because there are 3 terms here related to "scope" and "container" words in one way or another which makes it confusing:

1. scope (as in reactive charm library "communication scopes" https://jujucharms.com/docs/2.2/developer-layers-interfaces#communication-scopes)
2. (container) scope (as in Juju relation scope https://jujucharms.com/docs/2.2/authors-subordinate-applications#terms)
3. container (lxd)

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
Revision history for this message
Ryan Beisner (1chb1n) wrote :

Apologies, I could have made it more clear which charm code I was referring to: The unit which has no agent is the hacluster charm, and since that charm has no agent up, that charm code is not in effect.

I do think there is a Juju bug here. Juju needs to be able to spawn subordinate units even when a principle charm application is very busy (for whatever reason).

Thanks, all for the good info and input here btw.

summary: - percona-cluster unit constantly fires hooks, blocks hacluster unit from
- being setup
+ subordinate juju agent does not start when principle is very busy
David Ames (thedac)
Changed in charm-percona-cluster:
status: Incomplete → In Progress
importance: Undecided → Critical
assignee: nobody → David Ames (thedac)
milestone: none → 17.11
Paul Gear (paulgear)
summary: - subordinate juju agent does not start when principle is very busy
+ subordinate juju agent does not start when principal is very busy
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-percona-cluster (master)

Reviewed: https://review.openstack.org/507187
Committed: https://git.openstack.org/cgit/openstack/charm-percona-cluster/commit/?id=fb95b22add55e0254523c9a6a6f84c832931c20c
Submitter: Jenkins
Branch: master

commit fb95b22add55e0254523c9a6a6f84c832931c20c
Author: David Ames <email address hidden>
Date: Mon Sep 25 15:59:31 2017 +0000

    Update all clients once and only once

    This change solves two problmes. First, one off problems waiting
    for the cluster to complete. Second, running update_shared_db_rels
    too often.

    Update clients will get executed only once as soon as
    leader_node_is_ready() or client_node_is_ready() returns True.
    Subsequent client requests will be handled by normal
    *db-relation-changed hooks.

    Co-Authored-By: Liam Young <email address hidden>

    Partial-Bug: #1717590
    Change-Id: I7004218fe4750427bb0d3a957c13c4acfa02e1cd

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-percona-cluster (stable/17.08)

Fix proposed to branch: stable/17.08
Review: https://review.openstack.org/507553

David Ames (thedac)
Changed in charm-percona-cluster:
status: In Progress → Fix Committed
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

we had a run last night using the cs:~openstack-charmers-next/percona-cluster charm that actually completed deployment before timing out; hacluster-pxc units were all up about 45 minutes after deploy where they wouldn't all come up after 3 hours before. the mysql unit log looks much quieter too.

Ryan Beisner (1chb1n)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-percona-cluster (stable/17.08)

Reviewed: https://review.openstack.org/507553
Committed: https://git.openstack.org/cgit/openstack/charm-percona-cluster/commit/?id=c8e8adc1e69fcabeebb0901175e52d8bd1c68d42
Submitter: Jenkins
Branch: stable/17.08

commit c8e8adc1e69fcabeebb0901175e52d8bd1c68d42
Author: David Ames <email address hidden>
Date: Mon Sep 25 15:59:31 2017 +0000

    Update all clients once and only once

    This change solves two problmes. First, one off problems waiting
    for the cluster to complete. Second, running update_shared_db_rels
    too often.

    Update clients will get executed only once as soon as
    leader_node_is_ready() or client_node_is_ready() returns True.
    Subsequent client requests will be handled by normal
    *db-relation-changed hooks.

    Co-Authored-By: Liam Young <email address hidden>

    Partial-Bug: #1717590
    Change-Id: I7004218fe4750427bb0d3a957c13c4acfa02e1cd
    (cherry picked from commit fb95b22add55e0254523c9a6a6f84c832931c20c)

Revision history for this message
James Page (james-page) wrote :

Landed into 17.08 stable - marking fix released and closing out hacluster bug task.

Changed in charm-percona-cluster:
status: Fix Committed → Fix Released
Changed in charm-hacluster:
status: Incomplete → Invalid
Revision history for this message
John A Meinel (jameinel) wrote :

So to read through this, if a hook is never exited, then definitely it would cause starvation because only 1 hook is allowed to run. If a hook was run, exited, and then something caused it to run again, I would expect that would still provide opportunity for a different application on the same machine/container to get a chance to acquire the hook lock and run.

That said, subordinates and primaries do share a bit more exclusivity. I'm pretty sure a subordinate doesn't get set up before the primary it is associated with, but I'm guessing at that. There may be a particular time that we are waiting for (though again, guessing, something like Install hook having completed would have been enough.)

It sounds like this was fixed without changes needed from Juju, though I'm wondering if there was something you were looking for from our side? Just worried about a way to avoid lock starvation?

Changed in juju:
status: New → Incomplete
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

We're still running into this. These logs show the hacluster unti waiting 5 minutes to acquire the machine lock, while the mysql unit acquires and releases it about 32 times during the same time period. It doesn't seem like the lock has any queuing properties.

http://paste.ubuntu.com/25824352/

In this case, the mysql unit can't actually make any progress until the hacluster unit gets a chance to run its hook, because it's waiting on hacluster to finish setting up.

Changed in juju:
status: Incomplete → New
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Here's another example from the same deployment. The keystone unit on this machine has been hogging the machine-lock for over an hour, preventing subordinates like hacluster-keystone (which needs to finish setup before keystone can really do anything) from getting the lock. You can see the subordinates on this machine have all been waiting over an hour for the lock

http://paste.ubuntu.com/25824403/

Tim Penhey (thumper)
Changed in juju:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Tim Penhey (thumper) wrote :

Yes, this is the case. The lock is taken for any particular hook execution. The lock is machine wide, and impacts all agents on that machine.

Unfortunately golang doesn't have any machine wide, OS scheduled, semaphores, so the Juju codebase uses polling with a 250ms interval to attempt to grab the lock, which is implemented as a flock on linux.

The units are very busy, and this is normal behaviour.

There are several things on the current and future roadmaps to look to address the amount of hook calls.
 * application data bags - to reduce the synchronisation between peers of common application config
 * application goal state - to reduce rework inside a charm, give the charm a preview into what is coming
 * (future) charm v2 - the consolidation of hook calls, so a single hook execution may relate to multiple events.

Revision history for this message
Tim Penhey (thumper) wrote :

There is one thing juju could do, and that is to add a sleep for half a second once the hook execution lock is released. This would allow other units on the machine to acquire the lock and progress, but I'm not sure this would result in a faster final deployment.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote : Re: [Bug 1717590] Re: subordinate juju agent does not start when principal is very busy

+1 I was thinking the same thing. I think it would result in faster
deployments - I think the busy units are running hooks that end up saying
"we can't do anything because we're not clustered" and will be able to
progress past that faster if the unit responsible for clustering gets to
run its hooks faster.

On Mon, Oct 30, 2017 at 3:55 PM, Tim Penhey <email address hidden>
wrote:

> There is one thing juju could do, and that is to add a sleep for half a
> second once the hook execution lock is released. This would allow other
> units on the machine to acquire the lock and progress, but I'm not sure
> this would result in a faster final deployment.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1717590
>
> Title:
> subordinate juju agent does not start when principal is very busy
>
> Status in OpenStack hacluster charm:
> Invalid
> Status in OpenStack keystone charm:
> New
> Status in OpenStack percona-cluster charm:
> Fix Released
> Status in juju:
> Triaged
> Status in Telegraf Charm:
> Incomplete
>
> Bug description:
> When a principle is very busy immediately after deploy, it is possible
> that subordinates might never get a Juju agent started.
>
>
> =====
>
> juju version: 2.2.4
> percona-cluster charm: cs:percona-cluster-254
>
> =Description=
> This is a bundle using percona-cluster and hacluster:
> http://paste.ubuntu.com/25542493/
>
> It gets stuck, so far for about an hour, in a state where hacluster-pxc
> is not setup because the hacluster unit on the mysql/0 host never gets to
> run its hooks to setup:
> http://paste.ubuntu.com/25542486/
>
> The mysql/0 unit continuously fires, here's the log for it:
> http://paste.ubuntu.com/25542506/
>
> =To Reproduce=
> Deploy the bundle shown above. I don't know for sure if it happens
> everytime yet.
>
> =Partial Workaround=
> I have been able to workaround this for a manual deployment by stopping
> the jujud-unit-mysql-0 service, letting the hacluster charm do its thing,
> then restarting the service. Unfortunately, this is difficult to automate.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/charm-hacluster/+bug/1717590/+subscriptions
>

Revision history for this message
James Page (james-page) wrote :

keystone lacks the 'if vip and not clustered, defer' optimisation at the moment which would help this; however it won't avoid the work having to be done at some point in time so I don't think it will impact hugely on overall deployment performance.

Changed in charm-keystone:
status: New → Triaged
importance: Undecided → Medium
milestone: none → 18.02
Tim Penhey (thumper)
Changed in juju:
milestone: none → 2.3-rc1
assignee: nobody → Andrew Wilkins (axwalk)
status: Triaged → In Progress
Revision history for this message
Andrew Wilkins (axwalk) wrote :

https://github.com/juju/mutex/pull/4 will make the hook execution lock fair. Once that lands I'll update Juju to get this into 2.3.

Tim Penhey (thumper)
Changed in juju:
milestone: 2.3-rc1 → 2.3-beta3
Revision history for this message
Andrew Wilkins (axwalk) wrote :

juju/mutex changes have landed, https://github.com/juju/juju/pull/8039 updates Juju to bring it in for 2.3-beta3.

Tim Penhey (thumper)
Changed in juju:
status: In Progress → Fix Committed
Revision history for this message
Andrew Wilkins (axwalk) wrote :

Unfortunately the fix had to be backed out just before the beta3 release process started, due to a bug in the new code. It'll be in rc1 instead.

Changed in juju:
milestone: 2.3-beta3 → 2.3-rc1
Changed in juju:
status: Fix Committed → Fix Released
Ryan Beisner (1chb1n)
Changed in charm-keystone:
milestone: 18.02 → 18.05
David Ames (thedac)
Changed in charm-keystone:
milestone: 18.05 → 18.08
James Page (james-page)
Changed in charm-keystone:
milestone: 18.08 → 18.11
David Ames (thedac)
Changed in charm-keystone:
milestone: 18.11 → 19.04
David Ames (thedac)
Changed in charm-keystone:
milestone: 19.04 → 19.07
David Ames (thedac)
Changed in charm-keystone:
milestone: 19.07 → 19.10
David Ames (thedac)
Changed in charm-keystone:
milestone: 19.10 → 20.01
James Page (james-page)
Changed in charm-keystone:
milestone: 20.01 → 20.05
no longer affects: charm-telegraf
Jeremy Lounder (jldev)
Changed in charm-telegraf:
status: New → Invalid
David Ames (thedac)
Changed in charm-keystone:
milestone: 20.05 → 20.08
James Page (james-page)
Changed in charm-keystone:
milestone: 20.08 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.