Comment 33 for bug 1717590

Revision history for this message
Jason Hobbs (jason-hobbs) wrote : Re: [Bug 1717590] Re: subordinate juju agent does not start when principal is very busy

+1 I was thinking the same thing. I think it would result in faster
deployments - I think the busy units are running hooks that end up saying
"we can't do anything because we're not clustered" and will be able to
progress past that faster if the unit responsible for clustering gets to
run its hooks faster.

On Mon, Oct 30, 2017 at 3:55 PM, Tim Penhey <email address hidden>
wrote:

> There is one thing juju could do, and that is to add a sleep for half a
> second once the hook execution lock is released. This would allow other
> units on the machine to acquire the lock and progress, but I'm not sure
> this would result in a faster final deployment.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1717590
>
> Title:
> subordinate juju agent does not start when principal is very busy
>
> Status in OpenStack hacluster charm:
> Invalid
> Status in OpenStack keystone charm:
> New
> Status in OpenStack percona-cluster charm:
> Fix Released
> Status in juju:
> Triaged
> Status in Telegraf Charm:
> Incomplete
>
> Bug description:
> When a principle is very busy immediately after deploy, it is possible
> that subordinates might never get a Juju agent started.
>
>
> =====
>
> juju version: 2.2.4
> percona-cluster charm: cs:percona-cluster-254
>
> =Description=
> This is a bundle using percona-cluster and hacluster:
> http://paste.ubuntu.com/25542493/
>
> It gets stuck, so far for about an hour, in a state where hacluster-pxc
> is not setup because the hacluster unit on the mysql/0 host never gets to
> run its hooks to setup:
> http://paste.ubuntu.com/25542486/
>
> The mysql/0 unit continuously fires, here's the log for it:
> http://paste.ubuntu.com/25542506/
>
> =To Reproduce=
> Deploy the bundle shown above. I don't know for sure if it happens
> everytime yet.
>
> =Partial Workaround=
> I have been able to workaround this for a manual deployment by stopping
> the jujud-unit-mysql-0 service, letting the hacluster charm do its thing,
> then restarting the service. Unfortunately, this is difficult to automate.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/charm-hacluster/+bug/1717590/+subscriptions
>