Improve reliability of gate's npm-run-test

Bug #1587698 reported by Matt Borland
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Dashboard (Horizon)
Fix Released
Critical
Matt Borland

Bug Description

The npm-run-test job in the gate has shown itself to be less reliable lately. In particular it tends to hang until the timeout, after successfully completing the tests.

There have been similar failures in the past due to a variety of reasons. Most likely there's a memory problem with the reloading of modules over and over again in Chrome. One factor that had affected this was the loading of modules that were too 'high' in the hierarchy. For example, instead of using 'horizon.app.core.images' a test would just load 'horizon.app.core' which would load ALL dependent modules, then have to destroy them.

Ideally we localize the tests loading only the modules needed.

Other options for assisting with the tests would be to reduce the number of dependencies within app.core, such as moving resource registrations out since they are not really needed as part of the core registrations (where common features are placed, esp. APIs).

This bug will remain in effect until npm-run-test appears to be fully stabilized.

Tags: angularjs
Changed in horizon:
assignee: nobody → Matt Borland (palecrow)
status: New → In Progress
Revision history for this message
Richard Jones (r1chardj0n3s) wrote :

Do we have a citation for the memory / module loading assertion?

Revision history for this message
Matt Borland (palecrow) wrote : Re: [Bug 1587698] Re: Improve reliability of gate's npm-run-test

I'd noticed it in a patch Cindy introduced awhile ago, where she introduced
the exact opposite...generalizing the modules in tests to use higher level
modules...and that patch sustained a noticeably high rate of failure
compared with its contemporaries.

It's an anecdotal theory at best...I'm just sad this is a problem to begin
with...it's not as though the failures are intermittent in any other
environment. :P

Up for any suggestions, it seems fated that my patch approved last Monday
will never go thru...it makes me sad.

Thanks!

Matt

On Tuesday, May 31, 2016, Richard Jones <email address hidden> wrote:

> Do we have a citation for the memory / module loading assertion?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1587698
>
> Title:
> Improve reliability of gate's npm-run-test
>
> Status in OpenStack Dashboard (Horizon):
> In Progress
>
> Bug description:
> The npm-run-test job in the gate has shown itself to be less reliable
> lately. In particular it tends to hang until the timeout, after
> successfully completing the tests.
>
> There have been similar failures in the past due to a variety of
> reasons. Most likely there's a memory problem with the reloading of
> modules over and over again in Chrome. One factor that had affected
> this was the loading of modules that were too 'high' in the hierarchy.
> For example, instead of using 'horizon.app.core.images' a test would
> just load 'horizon.app.core' which would load ALL dependent modules,
> then have to destroy them.
>
> Ideally we localize the tests loading only the modules needed.
>
> Other options for assisting with the tests would be to reduce the
> number of dependencies within app.core, such as moving resource
> registrations out since they are not really needed as part of the core
> registrations (where common features are placed, esp. APIs).
>
> This bug will remain in effect until npm-run-test appears to be fully
> stabilized.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/horizon/+bug/1587698/+subscriptions
>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on horizon (master)

Change abandoned by Matt Borland (<email address hidden>) on branch: master
Review: https://review.openstack.org/323570

Matt Borland (palecrow)
tags: added: angularjs
Revision history for this message
Matt Borland (palecrow) wrote :

Krotscheck has found the problem is due to low open file ulimit settings. He has a patch here:

https://review.openstack.org/#/c/324735/

Changed in horizon:
importance: Undecided → Critical
Revision history for this message
Matt Borland (palecrow) wrote :

The above patch merged. With luck this will close this ticket.

Changed in horizon:
status: In Progress → Fix Committed
Revision history for this message
Matt Borland (palecrow) wrote :

So far, the prognosis is that the above change to up the ulimit on open files *helped* somewhat with our failure rate, but hasn't reduced it to zero by any stretch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to horizon (master)

Fix proposed to branch: master
Review: https://review.openstack.org/346944

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on horizon (master)

Change abandoned by Matt Borland (<email address hidden>) on branch: master
Review: https://review.openstack.org/346944

Changed in horizon:
status: Fix Committed → Confirmed
Revision history for this message
Rob Cresswell (robcresswell-deactivatedaccount) wrote :

I think we can close this now. It doesn't appear to be a current issue.

Changed in horizon:
milestone: none → newton-rc1
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.