Canonical Juju

Juju status slow on large model

Bug #1865172 reported by Tim Penhey on 2020-02-28

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Fix Released	High	Tim Penhey	Canonical Juju 2.7.5

Bug Description

The code that is fetching the network interfaces is doing a database query to load the space name for every interface.

Line 620 in apiserver/facades/client/client/status.go

Tags:

Revision history for this message

John A Meinel (jameinel) wrote on 2020-02-28:

The code already does model.AllSubnets() at the beginning, but as it iterates all of the machine interfaces, it then calls interface.subnet.SpaceName() which does a DB query to resolve the SpaceID on the subnet doc into the SpaceName in the space doc.
We could do the same caching of subnet CIDR to SpaceName. This is dangerous in the long term if we ever want to support multiple Networks inside a model. but it would solve the immediate problem. (There is, already, a State.AllSpaces() that would let us load it in a single pass.)

Looking a different way, Status also doesn't filter down to just the machines we care about (if you do 'juju status ubuntu/0' it will read all the machine interfaces on all machines to build the map, and then filter it down to just the instances we care about.)

So ideally we would figure out the machines we care about, then use those ids to load all the interfaces we care about, build the set of the subnet ids, use that to load just the subnets we care about, and use that to load just the spaces that we care about.

Revision history for this message

Joseph Phillips (manadart) wrote on 2020-02-28:

Looks like this blew out with the space ID changes.

Previously, subnet.SpaceName would access the local doc. Now that it is an ID, we go to Mongo to look up the name.

The Backend interface already implements SpaceLookup, so to prevent this, we just retrieve SpaceInfos before the loop, then look each one up with SpaceInfos.GetByID inside.

Revision history for this message

John A Meinel (jameinel) wrote on 2020-02-28:

https://github.com/juju/juju/pull/11260 possible fix (needs testing)

Revision history for this message

John A Meinel (jameinel) wrote on 2020-02-29: Re: [Bug 1865172] Re: Juju status slow on large model

This wasnt quite sufficient. We have the same problem during
Application.EndpointBindings because it loads all spaces for every
application.

John
=:->

On Fri, Feb 28, 2020, 17:25 John A Meinel <email address hidden> wrote:

> https://github.com/juju/juju/pull/11260 possible fix (needs testing)
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1865172
>
> Title:
> Juju status slow on large model
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1865172/+subscriptions
>

Revision history for this message

Richard Harding (rharding) wrote on 2020-03-02:

From John:

Here's the list of things that are obviously scaling incorrectly:

fetchAllApplicationsAndUnits reads all spaces for each application to map the binding space ID to the space name (cache the space names and pass in the lookup map)

fetchAllApplicationsAndUnits reads the charm for each application one-by-one. (read the charms in bulk, and then use that map lookup)

fetchNetworkInterface could share the spaceIDtoSpaceName map

fetchRelations iterates remoteApplications status.go:832

FullStatus -> modelStatus -> model.Config() reads settings{e}

FullStatus -> read statuses for the model statuses though we've read all statuses

makeMachineStatus -> instanceData and constraints one per Machine (and instanceData *multiple* times for the same Machine) InstanceNames, HardwareCharacteristics, CharmProfiles, which all read instanceData

processApplication calls Application.Charm which rereads Charm

processUnits
    Unit.publicAddress rereads the Machine object
    Unit.openedPorts rereads openedPorts (we've batch read openedPorts already)
    Unit.AllAddresses reads cloudContainer information per unit

processRelations
reads relationStatus for each relation
storage information is a separate API call that can't share the cache

Revision history for this message

Canonical Juju QA Bot (juju-qa-bot) wrote on 2020-03-11:

One PR has landed already that deals with the per-unit additional queries.

I have another to propose as soon as the 2.7.4 release is out that addresses the per-machine additional queries.

Changed in juju:
status:	Triaged → In Progress
milestone:	2.7.4 → 2.7.5

Revision history for this message

Tim Penhey (thumper) wrote on 2020-03-12:

And this is how I found out that I was still logged in as the bot.

/me sighs.

Revision history for this message

Tim Penhey (thumper) wrote on 2020-03-12:

https://github.com/juju/juju/pull/11310

Revision history for this message

Tim Penhey (thumper) wrote on 2020-03-13:

While I've not fixed everything, addressing the per-unit and per-machine additional queries should speed this up significantly.

Changed in juju:
status:	In Progress → Fix Committed

Canonical Juju QA Bot (juju-qa-bot) on 2020-04-01

Changed in juju:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.