reporting messages can slow down operations greatly

Bug #1606999 reported by Scott Moser
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Confirmed
High
Unassigned

Bug Description

When looking into bug 1604962 I investigated cloud-init and curtin logs and looked at timestamps.
In Adams scenario shown at
  https://bugs.launchpad.net/maas/+bug/1604962/comments/16
posting a message back to maas was taking up to 7 seconds.

cloud-init and curtin during an installation can be expected to post dozens of messages. if each of those took just 5 seconds the group of only 12 would make an installation take 60 seconds longer than it needed to.

This can be considered a "client" problem (curtin and cloud-init) in some respects. These clients could definitely background their posting of data so that they can go on. However, if they do that at some point they probably should verify that all messages were correctly posted, so its possible that backgrounding the posting wouldn't actually help.

Adam's system I think was a "orange box", with 10 clients all installing. That does not seem like enough load to account for 5+ second posts.

Related bugs:
  * bug 1604962: node set to "failed deployment" for no visible reason

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1606999] [NEW] reporting messages can slow down operations greatly

In my add reporting to vmtests branch, I can reproduce this by adding a
time.sleep(1) in our in-tree webhook server.

XenialBasic takes ~65 seconds to install

Adding the sleep to the webserver on each post, total time is 160 seconds.

On Wed, Jul 27, 2016 at 12:21 PM, Scott Moser <email address hidden> wrote:

> Public bug reported:
>
> When looking into bug 1604962 I investigated cloud-init and curtin logs
> and looked at timestamps.
> In Adams scenario shown at
> https://bugs.launchpad.net/maas/+bug/1604962/comments/16
> posting a message back to maas was taking up to 7 seconds.
>
> cloud-init and curtin during an installation can be expected to post
> dozens of messages. if each of those took just 5 seconds the group of
> only 12 would make an installation take 60 seconds longer than it needed
> to.
>
> This can be considered a "client" problem (curtin and cloud-init) in
> some respects. These clients could definitely background their posting
> of data so that they can go on. However, if they do that at some point
> they probably should verify that all messages were correctly posted, so
> its possible that backgrounding the posting wouldn't actually help.
>
> Adam's system I think was a "orange box", with 10 clients all
> installing. That does not seem like enough load to account for 5+
> second posts.
>
> Related bugs:
> * bug 1604962: node set to "failed deployment" for no visible reason
>
> ** Affects: cloud-init
> Importance: Undecided
> Status: New
>
> ** Affects: curtin
> Importance: Undecided
> Status: New
>
> ** Affects: maas
> Importance: Undecided
> Status: New
>
> ** Also affects: cloud-init
> Importance: Undecided
> Status: New
>
> ** Also affects: curtin
> Importance: Undecided
> Status: New
>
> --
> You received this bug notification because you are subscribed to curtin.
> Matching subscriptions: curtin-bugs-all
> https://bugs.launchpad.net/bugs/1606999
>
> Title:
> reporting messages can slow down operations greatly
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cloud-init/+bug/1606999/+subscriptions
>

Changed in curtin:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Gavin Panella (allenap) wrote :

MAAS should definitely be quicker here. Can you find out what was being POSTed back to MAAS? It may be that a particular processing task was to blame, e.g. evaluating tags.

Changed in maas:
status: New → Incomplete
importance: Undecided → High
Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1606999] Re: reporting messages can slow down operations greatly

AFAIK, just the various event points mostly:

{
 "origin": "curtin",
 "timestamp": 1440688425.6038516,
 "event_type": "start",
 "name": "cmd-install",
 "description": "curtin command install",
 "level": "INFO"
}

Then only larger post is at the end where we do a postfiles and send the
entire curtin log file, but still in the kilobyte range.

The scenario involves an orange box with multiple clients (installs)
talking to a single maas; so possibly high load on the maas instance doing
other things as well.

On Thu, Jul 28, 2016 at 6:07 AM, Gavin Panella <email address hidden>
wrote:

> MAAS should definitely be quicker here. Can you find out what was being
> POSTed back to MAAS? It may be that a particular processing task was to
> blame, e.g. evaluating tags.
>
> ** Changed in: maas
> Status: New => Incomplete
>
> ** Changed in: maas
> Importance: Undecided => High
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1606999
>
> Title:
> reporting messages can slow down operations greatly
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cloud-init/+bug/1606999/+subscriptions
>

Revision history for this message
Blake Rouse (blake-rouse) wrote :

Another bug in MAAS was making the POST process slow because the CPU was
maxed out. I don't think MAAS genuinely takes a long time.

On Thu, Jul 28, 2016 at 1:48 PM, Ryan Harper <email address hidden>
wrote:

> AFAIK, just the various event points mostly:
>
> {
> "origin": "curtin",
> "timestamp": 1440688425.6038516,
> "event_type": "start",
> "name": "cmd-install",
> "description": "curtin command install",
> "level": "INFO"
> }
>
> Then only larger post is at the end where we do a postfiles and send the
> entire curtin log file, but still in the kilobyte range.
>
> The scenario involves an orange box with multiple clients (installs)
> talking to a single maas; so possibly high load on the maas instance doing
> other things as well.
>
>
> On Thu, Jul 28, 2016 at 6:07 AM, Gavin Panella <
> <email address hidden>>
> wrote:
>
> > MAAS should definitely be quicker here. Can you find out what was being
> > POSTed back to MAAS? It may be that a particular processing task was to
> > blame, e.g. evaluating tags.
> >
> > ** Changed in: maas
> > Status: New => Incomplete
> >
> > ** Changed in: maas
> > Importance: Undecided => High
> >
> > --
> > You received this bug notification because you are subscribed to the bug
> > report.
> > https://bugs.launchpad.net/bugs/1606999
> >
> > Title:
> > reporting messages can slow down operations greatly
> >
> > To manage notifications about this bug go to:
> > https://bugs.launchpad.net/cloud-init/+bug/1606999/+subscriptions
> >
>
> --
> You received this bug notification because you are subscribed to MAAS.
> https://bugs.launchpad.net/bugs/1606999
>
> Title:
> reporting messages can slow down operations greatly
>
> Status in cloud-init:
> New
> Status in curtin:
> Confirmed
> Status in MAAS:
> Incomplete
>
> Bug description:
> When looking into bug 1604962 I investigated cloud-init and curtin logs
> and looked at timestamps.
> In Adams scenario shown at
> https://bugs.launchpad.net/maas/+bug/1604962/comments/16
> posting a message back to maas was taking up to 7 seconds.
>
> cloud-init and curtin during an installation can be expected to post
> dozens of messages. if each of those took just 5 seconds the group of
> only 12 would make an installation take 60 seconds longer than it
> needed to.
>
> This can be considered a "client" problem (curtin and cloud-init) in
> some respects. These clients could definitely background their posting
> of data so that they can go on. However, if they do that at some
> point they probably should verify that all messages were correctly
> posted, so its possible that backgrounding the posting wouldn't
> actually help.
>
> Adam's system I think was a "orange box", with 10 clients all
> installing. That does not seem like enough load to account for 5+
> second posts.
>
> Related bugs:
> * bug 1604962: node set to "failed deployment" for no visible reason
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cloud-init/+bug/1606999/+subscriptions
>

Revision history for this message
Scott Moser (smoser) wrote :

I marked this back to 'confirmed' on maas and deleted the curtin and cloud-init tasks. curtin and cloud-init are reporting events as they're instructed to do.

Ryan provided the information that Gavin was after. The posts are not large at all, on the order of small kilobytes at *most*.

Changed in maas:
status: Incomplete → Confirmed
no longer affects: cloud-init
no longer affects: curtin
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.