"juju status" slows down non-linearly with new units/services/etc

Bug #1097015 reported by Ryan Finnie
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
pyjuju
Triaged
Low
Unassigned

Bug Description

We have a large-ish Juju environment (with MAAS): 33 machines, 10 primary services. Previously, "juju status" on this environment took about 10 seconds. After adding a simple subordinate charm to all primary services, it now takes 1 minute 47 seconds for "juju status" to complete.

Revision history for this message
Sidnei da Silva (sidnei) wrote :

FYI, I've confirmed a slowdown but not as noticeable. On my environment with 13 instances, juju status takes ~7.5s with no subordinates. For each extra subordinate added it increases the time to run juju status by 0.8 to 1s. With 7 subordinates, juju status takes 12s+.

Tom Haddon (mthaddon)
tags: added: canonical-webops
Revision history for this message
Kapil Thangavelu (hazmat) wrote : Re: [Bug 1097015] Re: "juju status" slows down non-linearly with new units/services/etc

moving to the remote websocket api turns this into an O(n) op. i'm a little
unclear if this can be pushed forward for pyju

On Tue, Jan 8, 2013 at 3:16 AM, Tom Haddon <email address hidden> wrote:

> ** Tags added: canonical-webops
>
> --
> You received this bug notification because you are a member of Canonical
> Server Team, which is subscribed to the bug report.
> https://bugs.launchpad.net/bugs/1097015
>
> Title:
> "juju status" slows down non-linearly with new units/services/etc
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1097015/+subscriptions
>

Revision history for this message
Mark Mims (mark-mims) wrote :

Known problem

Try limiting scope in your status... makes life a little more bearable in a large environment

    juju status <service_name>

also, maybe consider more, smaller, environments :) /me runs

Changed in juju:
status: New → Confirmed
Revision history for this message
Paul Collins (pjdc) wrote :

Unfortunately this environment is as small as it can be; there are no disconnected clumps of unrelated services.

We've worked around it in a couple ways. We're learning to limit scope where we can, and we've added a wrapper so that a bare "juju status" will read from a cached copy if it's considered recent enough.

FWIW, we've deployed more subordinates since the original report and the run time is now up to around 6 minutes.

Revision history for this message
Mark Mims (mark-mims) wrote :

On Wed, Jan 30, 2013 at 10:41:28PM -0000, Paul Collins wrote:
> Unfortunately this environment is as small as it can be; there are no
> disconnected clumps of unrelated services.
>
> We've worked around it in a couple ways. We're learning to limit scope
> where we can, and we've added a wrapper so that a bare "juju status"
> will read from a cached copy if it's considered recent enough.
Please consider upstreaming this into juju-jitsu?

> FWIW, we've deployed more subordinates since the original report and the
> run time is now up to around 6 minutes.
>
> --
> You received this bug notification because you are a member of Canonical
> Server Team, which is subscribed to the bug report.
> https://bugs.launchpad.net/bugs/1097015
>
> Title:
> "juju status" slows down non-linearly with new units/services/etc
>
> Status in juju:
> Confirmed
>
> Bug description:
> We have a large-ish Juju environment (with MAAS): 33 machines, 10
> primary services. Previously, "juju status" on this environment took
> about 10 seconds. After adding a simple subordinate charm to all
> primary services, it now takes 1 minute 47 seconds for "juju status"
> to complete.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1097015/+subscriptions
>

--
Mark Mims, Ph.D.
Ubuntu Server Team
Canonical Ltd.
<email address hidden>
+1(512)981-6467

Revision history for this message
Martin Packman (gz) wrote :

Caused by juju doing operations that require client-to-bootstrap-node communication for every machine/unit/relation/something. Kapil landed a change that reduced the provider api roundtrips from one per machine to one in total a while ago:

<https://code.launchpad.net/~hazmat/juju/stat-potato/+merge/118258>

Unfortunately this didn't help much, as the majority of the time is now spent in zookeeper communication which still has this issue.

As the new go version of juju replaces the underlying infrastructure with something else, it shouldn't suffer from the same issue. However we're due to do some scale testing to see exactly how it behaves for larger deployments.

Changed in juju:
importance: Undecided → High
Haw Loeung (hloeung)
tags: added: canonical-webops-juju
removed: canonical-webops
Curtis Hovey (sinzui)
Changed in juju:
status: Confirmed → Triaged
importance: High → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.