Unclear indication in dashboard when test setup fails

Bug #1269782 reported by Mark Brown
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
LAVA Server
In Progress
Medium
Unassigned

Bug Description

Looking at the image report for TC2 LSK at https://validation.linaro.org/dashboard/image-reports/linux-linaro-lsk-vexpress-tc2 I see that several tests such as gator are showing as '-' indicating that they have not been run but there is no indication as to why. Looking at the output from the job this is due to failures starting the testsuites for example:

 + lava-install-packages bootchart lsb-release pybootchartgui
 0% [Working] Err http://ports.ubuntu.com saucy InRelease

  Err http://ppa.launchpad.net saucy InRelease

  Err http://ppa.launchpad.net saucy InRelease

 0% [Working] Err http://ports.ubuntu.com saucy Release.gpg
  Could not resolve 'ports.ubuntu.com'
  Err http://ppa.launchpad.net saucy Release.gpg
  Could not resolve 'ppa.launchpad.net'
 0% [Working] Err http://ppa.launchpad.net saucy Release.gpg
  Could not resolve 'ppa.launchpad.net'

I would expect this to show up as red on the dashboard or to have a separate line item there for test prerequisites showing that some of the testsuites failed to set up, or perhaps the test setup should be included as a test within the test list for reporting. As things stand it was not at all clear to me looking at the dashboard that any attempt had been made to run the tests.

The fact that the failures happened is a separate issue to the fact that this isn't reported clearly.

Revision history for this message
Neil Williams (codehelp) wrote : Re: [Bug 1269782] [NEW] Unclear indication in dashboard when test setup fails

On Thu, 16 Jan 2014 12:17:21 -0000
Mark Brown <email address hidden> wrote:

> Public bug reported:
>
> Looking at the image report for TC2 LSK at
> https://validation.linaro.org/dashboard/image-reports/linux-linaro-lsk-
> vexpress-tc2 I see that several tests such as gator are showing as '-'
> indicating that they have not been run but there is no indication as
> to why.

There are two reasons why:

0: The relevant lava_test_shell sections were not included in the JSON
for the job

1: The test didn't get into that part of the test suite

Only the second reason would need to be indicated as a failure. When
the job JSON changes to not include that bit of YAML, that is not a
test failure.

Unfortunately, there is currently no way to reliably identify the first
case because the name of the test does not have to relate to the
filename of the YAML file.

The filter cannot tell the difference - there are simply no results
which match the filter.

> I would expect this to show up as red on the dashboard or to have a
> separate line item there for test prerequisites showing that some of
> the testsuites failed to set up, or perhaps the test setup should be
> included as a test within the test list for reporting. As things stand
> it was not at all clear to me looking at the dashboard that any
> attempt had been made to run the tests.

Quite possibly because the filter could be supporting a historical test
which applied to previous runs but has now been removed from subsequent
runs.

Filters are not tied directly to the job submission, only to the test
results and test results can be generated by a number of different job
submissions, some of which may or may not include all of the tests used
in the other submissions. e.g. this allows one test to be run across a
variety of platforms (where the other tests would not be supportable)
whilst collating all of the results from all platforms in one filter.

It may be possible to collate the lava_test_shell data in the result
bundle in such a way as to create a list of test definitions for that
job and then annotate each test_id if the job failed.

This could help the filter distinguish between tests which were not
requested and tests which were requested but failed to run.

--

Neil Williams
=============
http://www.linux.codehelp.co.uk/

Revision history for this message
Mark Brown (broonie) wrote :

| Unfortunately, there is currently no way to reliably identify the first
| case because the name of the test does not have to relate to the
| filename of the YAML file.

| The filter cannot tell the difference - there are simply no results
| which match the filter.

As a user of this stuff I have no idea what a "filter" is...

| It may be possible to collate the lava_test_shell data in the result
| bundle in such a way as to create a list of test definitions for that
| job and then annotate each test_id if the job failed.

| This could help the filter distinguish between tests which were not
| requested and tests which were requested but failed to run.

...I think the above matches what I'd thought was going on here. I had thought that we were scheduling a bunch of existing testsuites that LAVA knows about to run.

Revision history for this message
Alan Bennett (akbennett) wrote : Re: [Bug 1269782] Re: Unclear indication in dashboard when test setup fails
Download full text (3.2 KiB)

On 16 January 2014 11:31, Mark Brown <email address hidden> wrote:

> | Unfortunately, there is currently no way to reliably identify the first
> | case because the name of the test does not have to relate to the
> | filename of the YAML file.
>
> | The filter cannot tell the difference - there are simply no results
> | which match the filter.
>
> As a user of this stuff I have no idea what a "filter" is...
>

FWIW:
https://validation.linaro.org/static/docs/filters-reports.html?highlight=filter

>
> | It may be possible to collate the lava_test_shell data in the result
> | bundle in such a way as to create a list of test definitions for that
> | job and then annotate each test_id if the job failed.
>
> | This could help the filter distinguish between tests which were not
> | requested and tests which were requested but failed to run.
>
> ...I think the above matches what I'd thought was going on here. I had
> thought that we were scheduling a bunch of existing testsuites that LAVA
> knows about to run.
>
> --
> You received this bug notification because you are a member of Linaro
> Validation Team, which is subscribed to LAVA Server.
> https://bugs.launchpad.net/bugs/1269782
>
> Title:
> Unclear indication in dashboard when test setup fails
>
> Status in LAVA Server:
> New
>
> Bug description:
> Looking at the image report for TC2 LSK at
> https://validation.linaro.org/dashboard/image-reports/linux-linaro-
> lsk-vexpress-tc2 I see that several tests such as gator are showing as
> '-' indicating that they have not been run but there is no indication
> as to why. Looking at the output from the job this is due to failures
> starting the testsuites for example:
>
> + lava-install-packages bootchart lsb-release pybootchartgui
> 0% [Working] Err http://ports.ubuntu.com saucy InRelease
>
> Err http://ppa.launchpad.net saucy InRelease
>
> Err http://ppa.launchpad.net saucy InRelease
>
> 0% [Working] Err http://ports.ubuntu.com saucy Release.gpg
> Could not resolve 'ports.ubuntu.com'
> Err http://ppa.launchpad.net saucy Release.gpg
> Could not resolve 'ppa.launchpad.net'
> 0% [Working] Err http://ppa.launchpad.net saucy Release.gpg
> Could not resolve 'ppa.launchpad.net'
>
> I would expect this to show up as red on the dashboard or to have a
> separate line item there for test prerequisites showing that some of
> the testsuites failed to set up, or perhaps the test setup should be
> included as a test within the test list for reporting. As things stand
> it was not at all clear to me looking at the dashboard that any
> attempt had been made to run the tests.
>
> The fact that the failures happened is a separate issue to the fact
> that this isn't reported clearly.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/lava-server/+bug/1269782/+subscriptions
>

--

Alan Bennett, Engineering Manager, Linaro LAVA Team
Linaro.org <http://www.linaro.org/> *│ *Open source software for ARM SoCs |
Follow Linaro*:* Facebook <http://www.facebook.com/pages/Linaro> |
Twitter<http://twitter.com/#%21/linaroorg>
 | Blog <http://www.linaro.org/linaro-blog/>
irc: akbennett | ...

Read more...

Revision history for this message
Mark Brown (broonie) wrote :

OK, having peered at the job definitions further I see what you mean about unclear data... shouldn't the results bundle just always be including at least the fact that the lava_run_shell was in there, with separate markups for skipped and flunked tests (perhaps a convention of marking the test setup as a test would be enough here?).

Revision history for this message
Neil Williams (codehelp) wrote :

On Thu, 16 Jan 2014 18:31:38 -0000
Mark Brown <email address hidden> wrote:

> | It may be possible to collate the lava_test_shell data in the result
> | bundle in such a way as to create a list of test definitions for
> that | job and then annotate each test_id if the job failed.
>
> | This could help the filter distinguish between tests which were not
> | requested and tests which were requested but failed to run.
>
> ...I think the above matches what I'd thought was going on here. I had
> thought that we were scheduling a bunch of existing testsuites that
> LAVA knows about to run.

Yes, but each scheduling operation is open to anyone to re-use
testsuites in different LAVA jobs. Therefore, more results can exist
for the filter to match. (Filters are the database queries behind the
image reports - the filter collates the test suite results into sets
which provide the data for the reports.)

Your scheduled tests are not necessarily the only submissions which
would match any particular filter - anyone is free to copy your
definitions and re-use them (in whole or in part, modified in some
ways) in their own tests. Results are collated into bundle streams,
many of which are public access - so these modified tests can be run by
anyone in LAVA and submitted to any public bundle stream, the filter
then picks up all matches, possibly including tests scheduled by
someone else.

Therefore, LAVA cannot assume that the same scheduled job always
contains the same tests, most do not. Even if it could, users can
change the set of tests submitted for any one set of tests at any time.

Your specific bunch of existing testsuites may not change (often or at
all) but most sets change frequently. There will always be situations
where some tests expected to be in any one filter will simply be
omitted from the submission by the user.

(BTW, worth filing a bug against the documentation which Alan linked in
his reply to make this clear.)

--

Neil Williams
=============
http://www.linux.codehelp.co.uk/

Revision history for this message
Mark Brown (broonie) wrote :

| Yes, but each scheduling operation is open to anyone to re-use
| testsuites in different LAVA jobs. Therefore, more results can exist
| for the filter to match. (Filters are the database queries behind the
| image reports - the filter collates the test suite results into sets
| which provide the data for the reports.)

I think the biggest UX issue I'm having here is that I'm having a hard time matching this and therefore the conclusions you draw from it with what I'm looking at in the UI. What I'm doing is going to an image report like:

https://validation.linaro.org/dashboard/image-reports/linux-linaro-lsk-vexpress-tc2

There I can see a list of builds which if I click through are linked to specific jobs that LAVA ran. This means that LAVA knows exactly what testsuites were run in that job (since it was what ran them) which in turn means that it should be able to tell me if some of them generated errors and bombed out during their setup phase.

More generally LAVA is the one running jobs so it really ought to know what testsuites it tried to run.

| Your specific bunch of existing testsuites may not change (often or at
| all) but most sets change frequently. There will always be situations
| where some tests expected to be in any one filter will simply be
| omitted from the submission by the user.

Sure, but that doesn't mean that if a testsuite is run and then fails during the environment setup then that information should be discarded.

If I were doing this by searching for results of a given testsuite what you're saying would be a bit easier to relate to but the UI I'm going through shows results organised by job.

Revision history for this message
Mark Brown (broonie) wrote :

Looking at the filters documentation isn't particularly enlightening by the way - the filters being used to generate the image report aren't visible in the UI so I can't inspect them. This is one of the reasons why I said I didn't know what a filter is, it's not obvious that the image report is built from a filter or filters, the "Filters" section of the image report page looks like a way of filtering the results in the image report.

In any case I would have expected that if we were filtering on results from testsuite X then a failure in setup for testsuite X would match.

Revision history for this message
Neil Williams (codehelp) wrote :

On Fri, 17 Jan 2014 11:31:14 -0000
Mark Brown <email address hidden> wrote:

> | Yes, but each scheduling operation is open to anyone to re-use
> | testsuites in different LAVA jobs. Therefore, more results can exist
> | for the filter to match. (Filters are the database queries behind
> the | image reports - the filter collates the test suite results into
> sets | which provide the data for the reports.)
>
> I think the biggest UX issue I'm having here is that I'm having a hard
> time matching this and therefore the conclusions you draw from it with
> what I'm looking at in the UI. What I'm doing is going to an image
> report like:
>
> https://validation.linaro.org/dashboard/image-reports/linux-linaro-lsk-
> vexpress-tc2
>
> There I can see a list of builds which if I click through are linked
> to specific jobs that LAVA ran. This means that LAVA knows exactly
> what testsuites were run in that job (since it was what ran them)
> which in turn means that it should be able to tell me if some of them
> generated errors and bombed out during their setup phase.
>
> More generally LAVA is the one running jobs so it really ought to know
> what testsuites it tried to run.

During the job execution, yes, it does know this and the fix for this
bug is to make this clearer in the final image reports, possibly by
retaining the original list of test definitions passed to the job in
the final result bundle so that the filter and then the image report
can indicate the difference between a test definition which was not
submitted as part of the job and a test definition which was submitted
but failed to provide any results.

> | Your specific bunch of existing testsuites may not change (often or
> at | all) but most sets change frequently. There will always be
> situations | where some tests expected to be in any one filter will
> simply be | omitted from the submission by the user.
>
> Sure, but that doesn't mean that if a testsuite is run and then fails
> during the environment setup then that information should be
> discarded.
>
> If I were doing this by searching for results of a given testsuite
> what you're saying would be a bit easier to relate to but the UI I'm
> going through shows results organised by job.

Under the hood, the filter is indeed searching for results of a given
testsuite.... with extra layers for particular device types, particular
bundle streams. The testsuite then matches a bundle which contains
details of the job.

This is not obvious, so that is another part of the bug.

--

Neil Williams
=============
http://www.linux.codehelp.co.uk/

Revision history for this message
Mark Brown (broonie) wrote :

| During the job execution, yes, it does know this and the fix for this
| bug is to make this clearer in the final image reports, possibly by
| retaining the original list of test definitions passed to the job in
| the final result bundle so that the filter and then the image report
| can indicate the difference between a test definition which was not
| submitted as part of the job and a test definition which was submitted
| but failed to provide any results.

Yes, that would definitely resolve the issue.

Revision history for this message
Neil Williams (codehelp) wrote :

On Fri, 17 Jan 2014 12:02:29 -0000
Mark Brown <email address hidden> wrote:

> Looking at the filters documentation isn't particularly enlightening
> by the way - the filters being used to generate the image report
> aren't visible in the UI so I can't inspect them.

This is changed in Image Reports 2.0, e.g.

https://validation.linaro.org/dashboard/image-charts/leg-java-armv8-akbtest

links directly to

https://validation.linaro.org/dashboard/filters/~frobware/leg-openjdk-results

which links directly to the relevant result bundles.

> This is one of the
> reasons why I said I didn't know what a filter is, it's not obvious
> that the image report is built from a filter or filters, the
> "Filters" section of the image report page looks like a way of
> filtering the results in the image report.
>
> In any case I would have expected that if we were filtering on results
> from testsuite X then a failure in setup for testsuite X would match.

The bug is that the result bundle omits data from test runs which didn't
run. We need to include empty test_run metadata for every test
definition in the original JSON submission so that the image reports
can tell you if the test was not submitted or not executed. Not
executed would be a failure or warning. Not submitted would be the
existing blank entry.

--

Neil Williams
=============
http://www.linux.codehelp.co.uk/

Alan Bennett (akbennett)
Changed in lava-server:
assignee: nobody → Neil Williams (codehelp)
Revision history for this message
Mark Brown (broonie) wrote :

On 17 January 2014 12:49, Neil Williams <email address hidden> wrote:

> On Fri, 17 Jan 2014 12:02:29 -0000
> Mark Brown <email address hidden> wrote:
>
> > Looking at the filters documentation isn't particularly enlightening
> > by the way - the filters being used to generate the image report
> > aren't visible in the UI so I can't inspect them.
>
> This is changed in Image Reports 2.0, e.g.
>
> https://validation.linaro.org/dashboard/image-charts/leg-java-
> armv8-akbtest
>
> links directly to
>
> https://validation.linaro.org/dashboard/filters/~frobware/leg-openjdk-
> results
>
> which links directly to the relevant result bundles.

Yes, that's much clearer.

Changed in lava-server:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Neil Williams (codehelp) wrote :

Just a clarification note: During job execution, LAVA does not know how many individual test results are within any one test definition - this is because LAVA allows custom scripts which themselves can call lava-test-case, potentially within a loop. e.g. tests which process an upstream test suite may report more test results for version 1.1 than for version 0.9. So the 9/10 in the image report data table is all that LAVA can know - if the loop would have run 18/20 with a new version of the upstream or after a change to the custom script, this cannot be spotted.

So this bug is limited to counting the number of test definitions (which come directly from the JSON and map as test runs in the bundles) and not the number of test results (which are contained in the YAML).

I've got an initial fix which calculates the total number of test definitions contained in the JSON, includes that into the bundle attributes and displays this on the bundle detail page. Once this is in, we can integrate that into the image report views.

Neil Williams (codehelp)
Changed in lava-server:
status: Confirmed → In Progress
Revision history for this message
Milosz Wasilewski (mwasilew) wrote :

One note here about lava-android-test - there is at least one test there (blackbox) that produces more than 1 result (in LAVA naming convention). In the bundle it appears as bunch of lava-test-shells:
https://validation.linaro.org/dashboard/streams/anonymous/mwasilew/bundles/9c878290ea3a727bfca636b35f64af9a100a3f8e/

Revision history for this message
Neil Williams (codehelp) wrote :

lava-android-test is, effectively, a custom script called from the lava_test_shell in that the rest of LAVA has no idea how many lava-test-case calls are entailed, so it cannot make any judgements. The support here would be simply to show 'blackbox' as not being executed.

It's a different question of whether lava-android-test should be merged into lava-dispatcher during the refactoring. If it does, then it would be ported to the new dispatcher object pipeline and therefore be able to describe the test runs and the test results within each run.

Neil Williams (codehelp)
Changed in lava-server:
assignee: Neil Williams (codehelp) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.