cron.germinate is very slow
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Launchpad itself |
Fix Released
|
High
|
Colin Watson |
Bug Description
cron.germinate currently takes on the order of ten minutes, and really doesn't have a particularly good excuse for taking so long. Fixing this would probably be enough to let us move back to 30-minute publisher cycles for Ubuntu (as we used to have with dak, back in the dawn of time), which would increase our velocity and make some of us very happy.
The main reason it's so slow is that it runs germinate as a separate process once for each of eight flavours (Ubuntu, Kubuntu, etc., each with its own seed collection) on each of five architectures. There are a few inefficiencies inherent in this approach, but the most significant is that it has to expand dependencies and build-dependencies of the seeds that are common to all the flavours eight times as often as necessary. Since the build-dependency chain in particular of the base system winds its way through a good fraction of main, this winds up being rather a lot of duplicated work.
I've been working on this problem for some time now, and have just released germinate 2.0 to support solving it properly. The most important change here is that germinate can now process multiple seed collections for a single architecture in a single instance of the core Germinator class, which allows reusing the expansions of common seeds. While I could technically have extended the command-line interface further for this, I felt that the command-line interface was already far too complicated, and decided instead to export a public, documented, and stable Python interface which can be used for this purpose.
I have been writing a Python replacement for most of cron.germinate in parallel with this to ensure that the interface is sufficient to meet Launchpad's needs, and this runs in about three minutes on my laptop and under five minutes on mawson (I haven't yet timed it on ftpmaster).
The deployment steps should be as follows:
* Get germinate 2.1 into the Launchpad PPA. I'm preparing backports that should be usable for this.
* Get germinate 2.1 deployed on relevant datacentre machines. This does not require any Launchpad changes, and so should be done early; also, there was a bug fix in 2.0 that changes the output in a few cases (I've manually verified that the changes are correct), and so it will be simpler to deploy this first so that we can more easily check that changes in the output due to the Python rewrite are harmless.
* Change launchpad-
* Land my branch.
Related branches
- Jeroen T. Vermeulen (community): Approve
-
Diff: 1226 lines (+950/-135)10 files modifiedcronscripts/generate-extra-overrides.py (+18/-0)
cronscripts/publishing/cron.germinate (+5/-126)
database/schema/security.cfg (+4/-0)
lib/lp/archivepublisher/config.py (+3/-0)
lib/lp/archivepublisher/scripts/generate_extra_overrides.py (+339/-0)
lib/lp/archivepublisher/tests/publisher-config.txt (+7/-0)
lib/lp/archivepublisher/tests/test_generate_extra_overrides.py (+567/-0)
lib/lp/soyuz/scripts/tests/germinate-test-data/mock-bin/germinate (+0/-5)
lib/lp/soyuz/scripts/tests/germinate-test-data/mock-lp-root/cronscripts/generate-extra-overrides.py (+5/-0)
lib/lp/soyuz/scripts/tests/test_cron_germinate.py (+2/-4)
- Gavin Panella (community): Approve
-
Diff: 26 lines (+7/-1)2 files modifieddebian/changelog (+6/-0)
debian/control (+1/-1)
Changed in launchpad: | |
status: | New → Triaged |
importance: | Undecided → High |
description: | updated |
Changed in launchpad: | |
status: | Triaged → In Progress |
description: | updated |
tags: |
added: qa-ok removed: qa-needstesting |
As a matter of interest (although it would be a separate bug, and is lower priority), a good item of future work would be to move the process of generating extra overrides to before apt-ftparchive runs, and have it read the state of the archive from the Launchpad database rather than from the published archive on disk. This would fix the extremely long-standing problem that germinate output is itself an input to the archive state, and so there are some changes that require multiple publisher runs to publish completely.
I've tried to design python-germinate's archive interface so that this should be possible, although I haven't yet actually used it this way in practice.