Archive:+repository-size timeout retrieving many hundreds of package sizes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Launchpad itself |
Fix Released
|
Critical
|
Steve Kowalik |
Bug Description
Summary
=======
We see timeouts on Archive:
The page returns content like:
Number of packages:
195 source packages (417.6 MiB)
695 binary packages (786.7 MiB)
Repository size:
1.2 GiB (58.84%) of 2.0 GiB
but generating this content requires reading rows from the database that are often cold: the library file content rows are only needed when checking the repository size or when checking the quota during uploads. The current query can be further optimised but the best performance we can expect is about 2ms per file that is currently in the PPA because the library file content table cannot be clustered to sensibly increase locality of reference for archive lookups, so a PPA with 100 packages built on 3 architectures with (say) 4 binaries from each package (lib, -dev, bin, docs) will have 1200 distinct files, which is 2400ms of lookup time for the binaries alone; on top of that we pay the same 2ms per row for the other tables when they are cold (and we have data suggesting this happens multiple times a day).
Possible solutions
==================
* We could more directly model the files-in-a-ppa (would require generalising the librarian to use more tables) and thus permit clustering on the archive.
* We could cache the size in the archive itself which would open us up to skew but remove the need to access all these rows at all.
Details
=======
3 SELECT DISTINCT LibraryFileCont
GET: 3 Robots: 0 Local: 3
2 https:/
OOPS-1948AX197, OOPS-1948BB197
1 https:/
OOPS-1948DT521
Long queries to determine file size -
explain analyze SELECT DISTINCT LibraryFileCont
FROM BinaryPackageFile,
BinaryPack
LibraryFil
LibraryFil
WHERE BinaryPackagePu
AND BinaryPackagePu
AND BinaryPackagePu
AND BinaryPackageFi
AND LibraryFileAlia
-------
HashAggregate (cost=32489.
-> Nested Loop (cost=0.
-> Nested Loop (cost=0.
-> Nested Loop (cost=0.
-> Index Scan using libraryfilealia
-> Index Scan using libraryfilecont
Total runtime: 9478.267 ms
(14 rows)
Time: 9484.623 ms
Related branches
- William Grant (community): Approve (code)
-
Diff: 200 lines (+26/-50)1 file modifiedlib/lp/soyuz/model/archive.py (+26/-50)
description: | updated |
description: | updated |
summary: |
- Archive:+repository-size timeout + Archive:+repository-size timeout retrieving many hundreds of package + sizes |
description: | updated |
tags: |
added: qa-ok removed: qa-needstesting |
Changed in launchpad: | |
status: | Fix Committed → Fix Released |
Hot
explain analyze SELECT DISTINCT LibraryFileCont ent.datecreated ,
LibraryFileCo ntent.filesize,
LibraryFileCo ntent.id,
LibraryFileCo ntent.md5,
LibraryFileCo ntent.sha1 agePublishingHi story, eAlias, eContent blishingHistory .archive = 14516 blishingHistory .dateremoved IS NULL blishingHistory .binarypackager elease = BinaryPackageFi le.binarypackag erelease le.libraryfile = LibraryFileAlias.id s.content = LibraryFileCont ent.id;
QUERY PLAN ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ----- 87..32504. 58 rows=1471 width=94) (actual time=125. 946..126. 417 rows=596 loops=1) 00..32471. 48 rows=1471 width=94) (actual time=0.780..119.488 rows=1664 loops=1) 00..29693. 36 rows=1891 width=4) (actual time=0.597..91.107 rows=1664 loops=1) 00..25785. 40 rows=1891 width=4) (actual time=0.438..57.097 rows=1664 loops=1)
-> Index Scan using securebinarypac kagepublishingh istory_ _archive_ _status_ _idx on binarypackagepu blishinghistory (cost=0. 00..13471. 19 rows=1891 width=4) (actual time=0.209..35.282 rows=1664 loops=1)
Index Cond: (archive = 14516)
Filter: (dateremoved IS NULL)
-> Index Scan using binarypackagefi le_binarypackag e_idx on binarypackagefile (cost=0.00..6.50 rows=1 width=8) (actual time=0.011..0.011 rows=1 loops=1664)
Index Cond: (binarypackagef ile.binarypacka gerelease = binarypackagepu blishinghistory .binarypackager elease) s_pkey on libraryfilealias (cost=0.00..2.05 rows=1 width=8) (actual time=0.018..0.019 rows=1 loops=1664)
Index Cond: (libraryfileali as.id = binarypackagefi le.libraryfile) ent_pkey on libraryfilecontent (cost=0.00..1.46 rows=1 width=94) (actual time=0.014..0.015 rows=1 loops=1664)
Index Cond: (libraryfilecon tent.id = libraryfilealia s.content)
FROM BinaryPackageFile,
BinaryPack
LibraryFil
LibraryFil
WHERE BinaryPackagePu
AND BinaryPackagePu
AND BinaryPackagePu
AND BinaryPackageFi
AND LibraryFileAlia
-------
HashAggregate (cost=32489.
-> Nested Loop (cost=0.
-> Nested Loop (cost=0.
-> Nested Loop (cost=0.
-> Index Scan using libraryfilealia
-> Index Scan using libraryfilecont
Total runtime: 126.734 ms
(14 rows)
Time: 133.656 ms