Auto Package Testing

Don't run tests that never pass (or don't run tests with force-badtest)

Bug #1903913 reported by Balint Reczey on 2020-11-11

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Auto Package Testing	New	Undecided	Unassigned

Bug Description

Running tests for results that's ignored anyway is wasting resources and adding delay to proposed migration.

I propose not running the tests for which the results will be surely ignored. Instead skip those tests and possibly perform whole archive testing of the release pocket in an additional queue.

I've collected some statistics to back this proposal in https://code.launchpad.net/~ubuntu-core-dev/+git/autopkgtest-db-reports :

$ make
sqlite3 autopkgtest.db < arch-speed.sql 2>&1 | tee arch-speed.report
Average test run time in seconds per architecture (in Groovy):
amd64|346.0
arm64|927.0
armhf|694.0
i386|371.0
ppc64el|409.0
s390x|315.0
Total time needed to run each test once on the latest version (in hours, per architecture, in Groovy):
amd64|1284.0
arm64|3331.0
armhf|2506.0
i386|849.0
ppc64el|1482.0
s390x|1122.0
sqlite3 autopkgtest.db < pass-analysis.sql 2>&1 | tee pass-analysis.report
Number of packages with tests in Focal and Groovy
amd64|14271
arm64|14099
armhf|14145
i386|13693
ppc64el|14173
s390x|14108
Number of packages in Focal and Groovy not passing a single test
amd64|989
arm64|1118
armhf|1326
i386|2318
ppc64el|1180
s390x|1311

Around 5-10% of tests never passed and roughly this is the percentage of load we would not put on the CI infrastructure by skipping them. I can give a better estimate if needed.

Revision history for this message

Iain Lane (laney) wrote on 2020-11-12:

What is the proposal? I don't find 'surely ignored' to be descriptive enough.

Is it: if the result is going to be hinted away anyway, do not queue the tests?

I could get on board with that probably. The history *can* be useful, but if we eventually couple this with 'baseline' retesting then we will get to keep some of the benefits of that.

Revision history for this message

Balint Reczey (rbalint) wrote on 2020-11-12:

Download full text (3.6 KiB)

The discussion on IRC about this bug's details started here:

https://irclogs.ubuntu.com/2020/11/11/%23ubuntu-release.html#t21:04

2020-11-11 22:03:59 2020-11-11 22:04:01 2020-11-11 22:05:06 2020-11-11 22:05:48 2020-11-11 22:06:47 2020-11-11 22:07:01 2020-11-11 22:07:06 2020-11-11 22:07:27 2020-11-11 22:07:32 2020-11-11 22:07:56 2020-11-11 22:08:03 2020-11-11 22:08:20 2020-11-11 22:08:31 2020-11-11 22:08:36 2020-11-11 22:08:44 2020-11-11 22:08:48 2020-11-11 22:09:02 2020-11-11 22:09:09 2020-11-11 22:09:25 2020-11-11 22:09:58 2020-11-11 22:10:35 2020-11-11 22:10:39 2020-11-11 22:10:40 2020-11-11 22:10:54 2020-11-11 22:11:01 2020-11-11 22:11:26 2020-11-11 22:11:43 2020-11-11 22:12:06 2020-11-11 22:12:18 2020-11-11 22:12:34 2020-11-11 22:12:51 2020-11-11 22:12:54 2020-11-11 22:13:03 2020-11-11 22:13:09 2020-... rbalint vorlon, Laney, juliank, doko about 5-10% of load reduction would be the result of skipping never passing tests LP: #1903913
ubot5 Launchpad bug 1903913 in Auto Package Testing "Don't run tests that never pass (or don't run tests with force-badtest)" [Undecided,New] https://launchpad.net/bugs/1903913
juliank rbalint: we have to try them eventually otherwise we won't notice once they start working
rbalint juliank, please see the bug, also there is not much to loose not noticing that quickly
juliank rbalint: force-badtest ignoring is ok IMO, though we should try running the tests at least once a month or so
rbalint juliank, yes, i'm proposing that
juliank Well if we run them at least once a month, we can ignore all always failed tests
juliank Without having to rely on continuous baseline retesting
rbalint juliank, yes
juliank Just have britney only schedule the test for an always-failed if it had no results for a month, essentially
rbalint retesting everything is ~1200h on amd64 and ~3000h on arm64
rbalint juliank, yes, and having a separate queue for that
juliank rbalint: not a separate queue, no
rbalint juliank, why?
rbalint then which queue
juliank The separate queue for the baseline retest
juliank But I'm talking about the normal tests
juliank e.g. foo has always failed
juliank Bar triggers foo gets scheduled if we haven't run foo tests in a month
juliank This avoids having to deal with the complicated continuous baseline retesting bits and extra queues
juliank As an intermediate solution until we have baseline retesting, at which point we don't need that :(
juliank Um :)
rbalint if this is easier to implement, i like that
rbalint bileto can always skip those
juliank Depends on whether britney has the data for it
rbalint juliank, imo the force-badtest data is good for that
juliank That one is easy I guess
rbalint do we have a plan? ;-)
juliank But also doing that for any test that is always failed and not run in a month would be more effective.
juliank The question here is whether britney knows when a test last ran
juliank Or whether it only knows that the test always failed
rbalint autopkgtest.db does not seem to have dates
juliank I think it downloads the database from autopkgtesr
juliank Hmm

The discussion on IRC about this bug's details started here:

https://irclogs.ubuntu.com/2020/11/11/%23ubuntu-release.html#t21:04

2020-11-11 22:03:59     rbalint vorlon, Laney, juliank, doko about 5-10% of load reduction would be the result of skipping never passing tests LP: #1903913 
2020-11-11 22:04:01     ubot5   Launchpad bug 1903913 in Auto Package Testing "Don't run tests that never pass (or don't run tests with force-badtest)" [Undecided,New] https://launchpad.net/bugs/1903913
2020-11-11 22:05:06     juliank rbalint: we have to try them eventually otherwise we won't notice once they start working
2020-11-11 22:05:48     rbalint juliank, please see the bug, also there is not much to loose not noticing that quickly
2020-11-11 22:06:47     juliank rbalint: force-badtest ignoring is ok IMO, though we should try running the tests at least once a month or so
2020-11-11 22:07:01     rbalint juliank, yes, i'm proposing that
2020-11-11 22:07:06     juliank Well if we run them at least once a month, we can ignore all always failed tests
2020-11-11 22:07:27     juliank Without having to rely on continuous baseline retesting
2020-11-11 22:07:32     rbalint juliank, yes
2020-11-11 22:07:56     juliank Just have britney only schedule the test for an always-failed if it had no results for a month, essentially
2020-11-11 22:08:03     rbalint retesting everything is ~1200h on amd64 and ~3000h on arm64
2020-11-11 22:08:20     rbalint juliank, yes, and having a separate queue for that
2020-11-11 22:08:31     juliank rbalint: not a separate queue, no
2020-11-11 22:08:36     rbalint juliank, why?
2020-11-11 22:08:44     rbalint then which queue
2020-11-11 22:08:48     juliank The separate queue for the baseline retest
2020-11-11 22:09:02     juliank But I'm talking about the normal tests
2020-11-11 22:09:09     juliank e.g. foo has always failed
2020-11-11 22:09:25     juliank Bar triggers foo gets scheduled if we haven't run foo tests in a month
2020-11-11 22:09:58     juliank This avoids having to deal with the complicated continuous baseline retesting bits and extra queues
2020-11-11 22:10:35     juliank As an intermediate solution until we have baseline retesting, at which point we don't need that :(
2020-11-11 22:10:39     juliank Um :)
2020-11-11 22:10:40     rbalint if this is easier to implement, i like that
2020-11-11 22:10:54     rbalint bileto can always skip those
2020-11-11 22:11:01     juliank  Depends on whether britney has the data for it
2020-11-11 22:11:26     rbalint juliank, imo the force-badtest data is good for that
2020-11-11 22:11:43     juliank That one is easy I guess
2020-11-11 22:12:06     rbalint do we have a plan? ;-)
2020-11-11 22:12:18     juliank But also doing that for any test that is always failed and not run in a month would be more effective.
2020-11-11 22:12:34     juliank The question here is whether britney knows when a test last ran
2020-11-11 22:12:51     juliank Or whether it only knows that the test always failed
2020-11-11 22:12:54     rbalint autopkgtest.db does not seem to have dates
2020-11-11 22:13:03     juliank I think it downloads the database from autopkgtesr
2020-11-11 22:13:09     juliank Hmm
2020-11-11 22:13:27     juliank Id expect there to be a timestamp given that we see dates in the histor
2020-11-11 22:14:22     rbalint juliank, hm, it is encoded in run_id
2020-11-11 22:14:50     rbalint so that would work :-)
2020-11-11 22:15:33     juliank Now write a merge proposal :)
2020-11-11 22:16:09     juliank Also I have to have another look at britney triggers
2020-11-11 22:16:22     rbalint i hopes someone more familiar with britney would do it ;-)
...

Revision history for this message

Balint Reczey (rbalint) wrote on 2020-11-12:

As I unuderstand Julian's proposal is:

When collecting the triggered tests to run skip tests which are marked as force-badtest (or force-reset-test) unless they haven't been run in a month.

IMO the monthly check can be implemented later, than the skipping, since we can easily find and schedule running them with a script from outside of britney.

Revision history for this message

Julian Andres Klode (juliank) wrote on 2020-11-12:

tl;dr:

- skip force-badtest
- skip always failed tests unless the last run was over a month ago

Eventually we should get to the point where we don't run them at all as part of normal migration testing, but only with continuous baseline retesting -> only baseline retests will invalidate the force-reset-test; that saves the maximum amount of tests possible.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.