BuildQueue and BuildPackageJob are irrelevant to the problem. The problem area is seen in this query:
SELECT COUNT(*)
FROM
BinaryPackageBuild
JOIN PackageBuild ON BinaryPackageBuild.package_build = PackageBuild.id
JOIN BuildFarmJob ON PackageBuild.build_farm_job = BuildFarmJob.id
JOIN Archive ON PackageBuild.archive = Archive.id
JOIN DistroArchSeries
ON BinaryPackageBuild.distro_arch_series = DistroArchSeries.id
JOIN Processor ON DistroArchSeries.processorfamily = Processor.family
WHERE
BuildFarmJob.status = 0
AND Processor.id = 5
AND Archive.enabled = true
AND Archive.require_virtualized = false;
One quick note - we should be using IS TRUE and IS FALSE instead of = TRUE and =FALSE to avoid confusion with SQL's three value boolean logic.
The sequential scan on Archive could be fixed with an index, although this isn't a major contributer to the main problem:
CREATE INDEX archive__require_virtualized__idx
ON Archive(require_virtualized);
We also can make use of an index on BuildFarmJob.status:
CREATE INDEX buildfarmjob__status__idx
ON BuildFarmJob(status);
When Archive and PackageBuild are joined, we see the estimated row count of the selected PackageBuild rows off by 30x. This gets magnified by the number of Archive rows, taking a few seconds and resulting in nearly 1 million rows in the working set instead of the estimated 5.5k. This isn't the core problem though - this is just an artifact that Archive #1 (the main Ubuntu archive) has a comparitively huge number of packages, but the PostgreSQL planner doesn't know this when estimating how many rows will match in PackageBuild.
BuildQueue and BuildPackageJob are irrelevant to the problem. The problem area is seen in this query:
SELECT COUNT(*) geBuild ild.package_ build = PackageBuild.id build_farm_ job = BuildFarmJob.id archive = Archive.id ild.distro_ arch_series = DistroArchSeries.id s.processorfami ly = Processor.family b.status = 0 require_ virtualized = false;
FROM
BinaryPacka
JOIN PackageBuild ON BinaryPackageBu
JOIN BuildFarmJob ON PackageBuild.
JOIN Archive ON PackageBuild.
JOIN DistroArchSeries
ON BinaryPackageBu
JOIN Processor ON DistroArchSerie
WHERE
BuildFarmJo
AND Processor.id = 5
AND Archive.enabled = true
AND Archive.
Query plan is at http:// paste.ubuntu. com/444486/
One quick note - we should be using IS TRUE and IS FALSE instead of = TRUE and =FALSE to avoid confusion with SQL's three value boolean logic.
The sequential scan on Archive could be fixed with an index, although this isn't a major contributer to the main problem:
CREATE INDEX archive_ _require_ virtualized_ _idx require_ virtualized) ;
ON Archive(
We also can make use of an index on BuildFarmJob. status:
CREATE INDEX buildfarmjob_ _status_ _idx status) ;
ON BuildFarmJob(
When Archive and PackageBuild are joined, we see the estimated row count of the selected PackageBuild rows off by 30x. This gets magnified by the number of Archive rows, taking a few seconds and resulting in nearly 1 million rows in the working set instead of the estimated 5.5k. This isn't the core problem though - this is just an artifact that Archive #1 (the main Ubuntu archive) has a comparitively huge number of packages, but the PostgreSQL planner doesn't know this when estimating how many rows will match in PackageBuild.
(To be continued...)