Don't run tests that never pass (or don't run tests with force-badtest)

Bug #1903913 reported by Balint Reczey
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Auto Package Testing
New
Undecided
Unassigned

Bug Description

Running tests for results that's ignored anyway is wasting resources and adding delay to proposed migration.

I propose not running the tests for which the results will be surely ignored. Instead skip those tests and possibly perform whole archive testing of the release pocket in an additional queue.

I've collected some statistics to back this proposal in https://code.launchpad.net/~ubuntu-core-dev/+git/autopkgtest-db-reports :

$ make
sqlite3 autopkgtest.db < arch-speed.sql 2>&1 | tee arch-speed.report
Average test run time in seconds per architecture (in Groovy):
amd64|346.0
arm64|927.0
armhf|694.0
i386|371.0
ppc64el|409.0
s390x|315.0
Total time needed to run each test once on the latest version (in hours, per architecture, in Groovy):
amd64|1284.0
arm64|3331.0
armhf|2506.0
i386|849.0
ppc64el|1482.0
s390x|1122.0
sqlite3 autopkgtest.db < pass-analysis.sql 2>&1 | tee pass-analysis.report
Number of packages with tests in Focal and Groovy
amd64|14271
arm64|14099
armhf|14145
i386|13693
ppc64el|14173
s390x|14108
Number of packages in Focal and Groovy not passing a single test
amd64|989
arm64|1118
armhf|1326
i386|2318
ppc64el|1180
s390x|1311

Around 5-10% of tests never passed and roughly this is the percentage of load we would not put on the CI infrastructure by skipping them. I can give a better estimate if needed.

Revision history for this message
Iain Lane (laney) wrote :

What is the proposal? I don't find 'surely ignored' to be descriptive enough.

Is it: if the result is going to be hinted away anyway, do not queue the tests?

I could get on board with that probably. The history *can* be useful, but if we eventually couple this with 'baseline' retesting then we will get to keep some of the benefits of that.

Revision history for this message
Balint Reczey (rbalint) wrote :
Download full text (3.6 KiB)

The discussion on IRC about this bug's details started here:

https://irclogs.ubuntu.com/2020/11/11/%23ubuntu-release.html#t21:04

2020-11-11 22:03:59 rbalint vorlon, Laney, juliank, doko about 5-10% of load reduction would be the result of skipping never passing tests LP: #1903913
2020-11-11 22:04:01 ubot5 Launchpad bug 1903913 in Auto Package Testing "Don't run tests that never pass (or don't run tests with force-badtest)" [Undecided,New] https://launchpad.net/bugs/1903913
2020-11-11 22:05:06 juliank rbalint: we have to try them eventually otherwise we won't notice once they start working
2020-11-11 22:05:48 rbalint juliank, please see the bug, also there is not much to loose not noticing that quickly
2020-11-11 22:06:47 juliank rbalint: force-badtest ignoring is ok IMO, though we should try running the tests at least once a month or so
2020-11-11 22:07:01 rbalint juliank, yes, i'm proposing that
2020-11-11 22:07:06 juliank Well if we run them at least once a month, we can ignore all always failed tests
2020-11-11 22:07:27 juliank Without having to rely on continuous baseline retesting
2020-11-11 22:07:32 rbalint juliank, yes
2020-11-11 22:07:56 juliank Just have britney only schedule the test for an always-failed if it had no results for a month, essentially
2020-11-11 22:08:03 rbalint retesting everything is ~1200h on amd64 and ~3000h on arm64
2020-11-11 22:08:20 rbalint juliank, yes, and having a separate queue for that
2020-11-11 22:08:31 juliank rbalint: not a separate queue, no
2020-11-11 22:08:36 rbalint juliank, why?
2020-11-11 22:08:44 rbalint then which queue
2020-11-11 22:08:48 juliank The separate queue for the baseline retest
2020-11-11 22:09:02 juliank But I'm talking about the normal tests
2020-11-11 22:09:09 juliank e.g. foo has always failed
2020-11-11 22:09:25 juliank Bar triggers foo gets scheduled if we haven't run foo tests in a month
2020-11-11 22:09:58 juliank This avoids having to deal with the complicated continuous baseline retesting bits and extra queues
2020-11-11 22:10:35 juliank As an intermediate solution until we have baseline retesting, at which point we don't need that :(
2020-11-11 22:10:39 juliank Um :)
2020-11-11 22:10:40 rbalint if this is easier to implement, i like that
2020-11-11 22:10:54 rbalint bileto can always skip those
2020-11-11 22:11:01 juliank Depends on whether britney has the data for it
2020-11-11 22:11:26 rbalint juliank, imo the force-badtest data is good for that
2020-11-11 22:11:43 juliank That one is easy I guess
2020-11-11 22:12:06 rbalint do we have a plan? ;-)
2020-11-11 22:12:18 juliank But also doing that for any test that is always failed and not run in a month would be more effective.
2020-11-11 22:12:34 juliank The question here is whether britney knows when a test last ran
2020-11-11 22:12:51 juliank Or whether it only knows that the test always failed
2020-11-11 22:12:54 rbalint autopkgtest.db does not seem to have dates
2020-11-11 22:13:03 juliank I think it downloads the database from autopkgtesr
2020-11-11 22:13:09 juliank Hmm
2020-...

Read more...

Revision history for this message
Balint Reczey (rbalint) wrote :

As I unuderstand Julian's proposal is:

When collecting the triggered tests to run skip tests which are marked as force-badtest (or force-reset-test) unless they haven't been run in a month.

IMO the monthly check can be implemented later, than the skipping, since we can easily find and schedule running them with a script from outside of britney.

Revision history for this message
Julian Andres Klode (juliank) wrote :

tl;dr:

- skip force-badtest
- skip always failed tests unless the last run was over a month ago

Eventually we should get to the point where we don't run them at all as part of normal migration testing, but only with continuous baseline retesting -> only baseline retests will invalidate the force-reset-test; that saves the maximum amount of tests possible.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.