Scale out excess queued tests to public cloud instances
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Auto Package Testing |
New
|
Undecided
|
Unassigned |
Bug Description
Autopkgtest queues are frequently[1] growing to contain thousands of queued tests per architecture slowing down the migration of packages to the release pocket and also causing transitions to became entangled due to newly arriving packages being rebuilt with the ongoing transitions waiting for the competion of tests.
The slow feedback on regressions also impairs the ability to act quickly on packages introducing the regressions. As a data point when I land glibc it takes around a week to have all the tests run.
The autopkgtest infrastructure [2] allows external workers and out or the 6 architectures we run autopkgtests on 4 are available in big public clouds. Using preemptible/spot instances the cost of running tests on external resources can be even lower than using on-demand instances. Scaling out arm* tests would be hugely beneficial since those architectures' instances are the least able to keep up with the load.
The autopkgtests don't create any artifact that ends up in the archive or in released software thus there is very little incentive in running them on 'trusted' or owned infrastructure [3].
The autopkgtest load will keep increasing and IMO scaling out to public clouds is the most cost-efficient way of keeping up with the load and can that can be and should be tuned to increase the productivity of the _people_ working with them.
[1] Any public graphs on that?
[2] https:/
[3] Yes, false negative/positive tests or trojans in the artifact tarballs are still attack surfaces, but those can be mitigated.