Activity log for bug #739070

Date Who What changed Old value New value Message
2011-03-21 00:22:49 Robert Collins bug added bug
2011-05-03 03:06:48 Robert Collins description 4 SELECT DISTINCT LibraryFileContent.datecreated, LibraryFileContent.filesize, LibraryFileContent.i ... ackageFile.libraryfile = LibraryFileAlias.id AND LibraryFileAlias.content = LibraryFileContent.id: GET: 4 Robots: 0 Local: 4 2 https://launchpad.net/%7Efalk-t-j/+archive/lucid/+repository-size (Archive:+repository-size) OOPS-1904B1758, OOPS-1904K1614 1 https://launchpad.net/%7Ekubuntu-ppa/+archive/backports/+repository-size (Archive:+repository-size) OOPS-1904F1620 1 https://launchpad.net/%7Eubuntu-mozilla-daily/+archive/ppa/+repository-size (Archive:+repository-size) OOPS-1904E1676 3 SELECT DISTINCT LibraryFileContent.datecreated, LibraryFileContent.filesize, LibraryFileContent.i ... ackageFile.libraryfile = LibraryFileAlias.id AND LibraryFileAlias.content = LibraryFileContent.id: GET: 3 Robots: 0 Local: 3 2 https://launchpad.net/%7Enilarimogard/+archive/webupd8/+repository-size (Archive:+repository-size) OOPS-1948AX197, OOPS-1948BB197 1 https://launchpad.net/%7Eubuntu-mozilla-daily/+archive/ppa/+repository-size (Archive:+repository-size) OOPS-1948DT521
2011-05-03 03:08:27 Robert Collins description 3 SELECT DISTINCT LibraryFileContent.datecreated, LibraryFileContent.filesize, LibraryFileContent.i ... ackageFile.libraryfile = LibraryFileAlias.id AND LibraryFileAlias.content = LibraryFileContent.id: GET: 3 Robots: 0 Local: 3 2 https://launchpad.net/%7Enilarimogard/+archive/webupd8/+repository-size (Archive:+repository-size) OOPS-1948AX197, OOPS-1948BB197 1 https://launchpad.net/%7Eubuntu-mozilla-daily/+archive/ppa/+repository-size (Archive:+repository-size) OOPS-1948DT521   3 SELECT DISTINCT LibraryFileContent.datecreated, LibraryFileContent.filesize, LibraryFileContent.i ... ackageFile.libraryfile = LibraryFileAlias.id AND LibraryFileAlias.content = LibraryFileContent.id:    GET: 3 Robots: 0 Local: 3       2 https://launchpad.net/%7Enilarimogard/+archive/webupd8/+repository-size (Archive:+repository-size)        OOPS-1948AX197, OOPS-1948BB197       1 https://launchpad.net/%7Eubuntu-mozilla-daily/+archive/ppa/+repository-size (Archive:+repository-size)        OOPS-1948DT521 Long queries to determine file size - explain analyze SELECT DISTINCT LibraryFileContent.datecreated, LibraryFileContent.filesize, LibraryFileContent.id, LibraryFileContent.md5, LibraryFileContent.sha1 FROM BinaryPackageFile, BinaryPackagePublishingHistory, LibraryFileAlias, LibraryFileContent WHERE BinaryPackagePublishingHistory.archive = 14516 AND BinaryPackagePublishingHistory.dateremoved IS NULL AND BinaryPackagePublishingHistory.binarypackagerelease = BinaryPackageFile.binarypackagerelease AND BinaryPackageFile.libraryfile = LibraryFileAlias.id AND LibraryFileAlias.content = LibraryFileContent.id; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=32489.87..32504.58 rows=1471 width=94) (actual time=9477.455..9477.931 rows=596 loops=1) -> Nested Loop (cost=0.00..32471.48 rows=1471 width=94) (actual time=112.213..9470.864 rows=1664 loops=1) -> Nested Loop (cost=0.00..29693.36 rows=1891 width=4) (actual time=93.160..7408.108 rows=1664 loops=1) -> Nested Loop (cost=0.00..25785.40 rows=1891 width=4) (actual time=79.839..4577.662 rows=1664 loops=1) -> Index Scan using securebinarypackagepublishinghistory__archive__status__idx on binarypackagepublishinghistory (cost=0.00..13471.19 rows=1891 width=4) (actual time=23.987..3300.864 rows=1664 loops=1) Index Cond: (archive = 14516) Filter: (dateremoved IS NULL) -> Index Scan using binarypackagefile_binarypackage_idx on binarypackagefile (cost=0.00..6.50 rows=1 width=8) (actual time=0.765..0.765 rows=1 loops=1664) Index Cond: (binarypackagefile.binarypackagerelease = binarypackagepublishinghistory.binarypackagerelease) -> Index Scan using libraryfilealias_pkey on libraryfilealias (cost=0.00..2.05 rows=1 width=8) (actual time=1.699..1.699 rows=1 loops=1664) Index Cond: (libraryfilealias.id = binarypackagefile.libraryfile) -> Index Scan using libraryfilecontent_pkey on libraryfilecontent (cost=0.00..1.46 rows=1 width=94) (actual time=1.237..1.238 rows=1 loops=1664) Index Cond: (libraryfilecontent.id = libraryfilealias.content) Total runtime: 9478.267 ms (14 rows) Time: 9484.623 ms
2011-05-03 05:20:24 Robert Collins summary Archive:+repository-size timeout Archive:+repository-size timeout retrieving many hundreds of package sizes
2011-05-03 05:38:15 Robert Collins description   3 SELECT DISTINCT LibraryFileContent.datecreated, LibraryFileContent.filesize, LibraryFileContent.i ... ackageFile.libraryfile = LibraryFileAlias.id AND LibraryFileAlias.content = LibraryFileContent.id:    GET: 3 Robots: 0 Local: 3       2 https://launchpad.net/%7Enilarimogard/+archive/webupd8/+repository-size (Archive:+repository-size)        OOPS-1948AX197, OOPS-1948BB197       1 https://launchpad.net/%7Eubuntu-mozilla-daily/+archive/ppa/+repository-size (Archive:+repository-size)        OOPS-1948DT521 Long queries to determine file size - explain analyze SELECT DISTINCT LibraryFileContent.datecreated, LibraryFileContent.filesize, LibraryFileContent.id, LibraryFileContent.md5, LibraryFileContent.sha1 FROM BinaryPackageFile, BinaryPackagePublishingHistory, LibraryFileAlias, LibraryFileContent WHERE BinaryPackagePublishingHistory.archive = 14516 AND BinaryPackagePublishingHistory.dateremoved IS NULL AND BinaryPackagePublishingHistory.binarypackagerelease = BinaryPackageFile.binarypackagerelease AND BinaryPackageFile.libraryfile = LibraryFileAlias.id AND LibraryFileAlias.content = LibraryFileContent.id; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=32489.87..32504.58 rows=1471 width=94) (actual time=9477.455..9477.931 rows=596 loops=1) -> Nested Loop (cost=0.00..32471.48 rows=1471 width=94) (actual time=112.213..9470.864 rows=1664 loops=1) -> Nested Loop (cost=0.00..29693.36 rows=1891 width=4) (actual time=93.160..7408.108 rows=1664 loops=1) -> Nested Loop (cost=0.00..25785.40 rows=1891 width=4) (actual time=79.839..4577.662 rows=1664 loops=1) -> Index Scan using securebinarypackagepublishinghistory__archive__status__idx on binarypackagepublishinghistory (cost=0.00..13471.19 rows=1891 width=4) (actual time=23.987..3300.864 rows=1664 loops=1) Index Cond: (archive = 14516) Filter: (dateremoved IS NULL) -> Index Scan using binarypackagefile_binarypackage_idx on binarypackagefile (cost=0.00..6.50 rows=1 width=8) (actual time=0.765..0.765 rows=1 loops=1664) Index Cond: (binarypackagefile.binarypackagerelease = binarypackagepublishinghistory.binarypackagerelease) -> Index Scan using libraryfilealias_pkey on libraryfilealias (cost=0.00..2.05 rows=1 width=8) (actual time=1.699..1.699 rows=1 loops=1664) Index Cond: (libraryfilealias.id = binarypackagefile.libraryfile) -> Index Scan using libraryfilecontent_pkey on libraryfilecontent (cost=0.00..1.46 rows=1 width=94) (actual time=1.237..1.238 rows=1 loops=1664) Index Cond: (libraryfilecontent.id = libraryfilealias.content) Total runtime: 9478.267 ms (14 rows) Time: 9484.623 ms Summary ======= We see timeouts on Archive:+repository-size determining the size of the archive. The page returns content like: Number of packages: 195 source packages (417.6 MiB) 695 binary packages (786.7 MiB) Repository size: 1.2 GiB (58.84%) of 2.0 GiB but generating this content requires reading rows from the database that are often cold: the library file content rows are only needed when checking the repository size or when checking the quota during uploads. The current query can be further optimised but the best performance we can expect is about 2ms per file that is currently in the PPA because the library file content table cannot be clustered to sensibly increase locality of reference for archive lookups, so a PPA with 100 packages built on 3 architectures with (say) 4 binaries from each package (lib, -dev, bin, docs) will have 1200 distinct files, which is 2400ms of lookup time for the binaries alone; on top of that we pay the same 2ms per row for the other tables when they are cold (and we have data suggesting this happens multiple times a day). Possible solutions ================== * We could more directly model the files-in-a-ppa (would require generalising the librarian to use more tables) and thus permit clustering on the archive. * We could cache the size in the archive itself which would open us up to skew but remove the need to access all these rows at all. Details =======   3 SELECT DISTINCT LibraryFileContent.datecreated, LibraryFileContent.filesize, LibraryFileContent.i ... ackageFile.libraryfile = LibraryFileAlias.id AND LibraryFileAlias.content = LibraryFileContent.id:    GET: 3 Robots: 0 Local: 3       2 https://launchpad.net/%7Enilarimogard/+archive/webupd8/+repository-size (Archive:+repository-size)        OOPS-1948AX197, OOPS-1948BB197       1 https://launchpad.net/%7Eubuntu-mozilla-daily/+archive/ppa/+repository-size (Archive:+repository-size)        OOPS-1948DT521 Long queries to determine file size - explain analyze SELECT DISTINCT LibraryFileContent.datecreated,                 LibraryFileContent.filesize,                 LibraryFileContent.id,                 LibraryFileContent.md5,                 LibraryFileContent.sha1 FROM BinaryPackageFile,      BinaryPackagePublishingHistory,      LibraryFileAlias,      LibraryFileContent WHERE BinaryPackagePublishingHistory.archive = 14516   AND BinaryPackagePublishingHistory.dateremoved IS NULL   AND BinaryPackagePublishingHistory.binarypackagerelease = BinaryPackageFile.binarypackagerelease   AND BinaryPackageFile.libraryfile = LibraryFileAlias.id   AND LibraryFileAlias.content = LibraryFileContent.id;                                                                                                            QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  HashAggregate (cost=32489.87..32504.58 rows=1471 width=94) (actual time=9477.455..9477.931 rows=596 loops=1)    -> Nested Loop (cost=0.00..32471.48 rows=1471 width=94) (actual time=112.213..9470.864 rows=1664 loops=1)          -> Nested Loop (cost=0.00..29693.36 rows=1891 width=4) (actual time=93.160..7408.108 rows=1664 loops=1)                -> Nested Loop (cost=0.00..25785.40 rows=1891 width=4) (actual time=79.839..4577.662 rows=1664 loops=1)                      -> Index Scan using securebinarypackagepublishinghistory__archive__status__idx on binarypackagepublishinghistory (cost=0.00..13471.19 rows=1891 width=4) (actual time=23.987..3300.864 rows=1664 loops=1)                            Index Cond: (archive = 14516)                            Filter: (dateremoved IS NULL)                      -> Index Scan using binarypackagefile_binarypackage_idx on binarypackagefile (cost=0.00..6.50 rows=1 width=8) (actual time=0.765..0.765 rows=1 loops=1664)                            Index Cond: (binarypackagefile.binarypackagerelease = binarypackagepublishinghistory.binarypackagerelease)                -> Index Scan using libraryfilealias_pkey on libraryfilealias (cost=0.00..2.05 rows=1 width=8) (actual time=1.699..1.699 rows=1 loops=1664)                      Index Cond: (libraryfilealias.id = binarypackagefile.libraryfile)          -> Index Scan using libraryfilecontent_pkey on libraryfilecontent (cost=0.00..1.46 rows=1 width=94) (actual time=1.237..1.238 rows=1 loops=1664)                Index Cond: (libraryfilecontent.id = libraryfilealias.content)  Total runtime: 9478.267 ms (14 rows) Time: 9484.623 ms
2013-03-19 06:34:02 Steve Kowalik branch linked lp:~stevenk/launchpad/archive-repo-size
2013-03-20 01:50:33 Launchpad QA Bot launchpad: assignee Steve Kowalik (stevenk)
2013-03-20 01:50:34 Launchpad QA Bot tags timeout qa-needstesting timeout
2013-03-20 01:50:35 Launchpad QA Bot launchpad: status Triaged Fix Committed
2013-03-20 02:58:26 Steve Kowalik tags qa-needstesting timeout qa-ok timeout
2013-03-22 06:41:07 Steve Kowalik launchpad: status Fix Committed Fix Released