2011-03-21 00:22:49 |
Robert Collins |
bug |
|
|
added bug |
2011-05-03 03:06:48 |
Robert Collins |
description |
4 SELECT DISTINCT LibraryFileContent.datecreated, LibraryFileContent.filesize, LibraryFileContent.i ... ackageFile.libraryfile = LibraryFileAlias.id AND LibraryFileAlias.content = LibraryFileContent.id:
GET: 4 Robots: 0 Local: 4
2 https://launchpad.net/%7Efalk-t-j/+archive/lucid/+repository-size (Archive:+repository-size)
OOPS-1904B1758, OOPS-1904K1614
1 https://launchpad.net/%7Ekubuntu-ppa/+archive/backports/+repository-size (Archive:+repository-size)
OOPS-1904F1620
1 https://launchpad.net/%7Eubuntu-mozilla-daily/+archive/ppa/+repository-size (Archive:+repository-size)
OOPS-1904E1676 |
3 SELECT DISTINCT LibraryFileContent.datecreated, LibraryFileContent.filesize, LibraryFileContent.i ... ackageFile.libraryfile = LibraryFileAlias.id AND LibraryFileAlias.content = LibraryFileContent.id:
GET: 3 Robots: 0 Local: 3
2 https://launchpad.net/%7Enilarimogard/+archive/webupd8/+repository-size (Archive:+repository-size)
OOPS-1948AX197, OOPS-1948BB197
1 https://launchpad.net/%7Eubuntu-mozilla-daily/+archive/ppa/+repository-size (Archive:+repository-size)
OOPS-1948DT521 |
|
2011-05-03 03:08:27 |
Robert Collins |
description |
3 SELECT DISTINCT LibraryFileContent.datecreated, LibraryFileContent.filesize, LibraryFileContent.i ... ackageFile.libraryfile = LibraryFileAlias.id AND LibraryFileAlias.content = LibraryFileContent.id:
GET: 3 Robots: 0 Local: 3
2 https://launchpad.net/%7Enilarimogard/+archive/webupd8/+repository-size (Archive:+repository-size)
OOPS-1948AX197, OOPS-1948BB197
1 https://launchpad.net/%7Eubuntu-mozilla-daily/+archive/ppa/+repository-size (Archive:+repository-size)
OOPS-1948DT521 |
3 SELECT DISTINCT LibraryFileContent.datecreated, LibraryFileContent.filesize, LibraryFileContent.i ... ackageFile.libraryfile = LibraryFileAlias.id AND LibraryFileAlias.content = LibraryFileContent.id:
GET: 3 Robots: 0 Local: 3
2 https://launchpad.net/%7Enilarimogard/+archive/webupd8/+repository-size (Archive:+repository-size)
OOPS-1948AX197, OOPS-1948BB197
1 https://launchpad.net/%7Eubuntu-mozilla-daily/+archive/ppa/+repository-size (Archive:+repository-size)
OOPS-1948DT521
Long queries to determine file size -
explain analyze SELECT DISTINCT LibraryFileContent.datecreated,
LibraryFileContent.filesize,
LibraryFileContent.id,
LibraryFileContent.md5,
LibraryFileContent.sha1
FROM BinaryPackageFile,
BinaryPackagePublishingHistory,
LibraryFileAlias,
LibraryFileContent
WHERE BinaryPackagePublishingHistory.archive = 14516
AND BinaryPackagePublishingHistory.dateremoved IS NULL
AND BinaryPackagePublishingHistory.binarypackagerelease = BinaryPackageFile.binarypackagerelease
AND BinaryPackageFile.libraryfile = LibraryFileAlias.id
AND LibraryFileAlias.content = LibraryFileContent.id;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=32489.87..32504.58 rows=1471 width=94) (actual time=9477.455..9477.931 rows=596 loops=1)
-> Nested Loop (cost=0.00..32471.48 rows=1471 width=94) (actual time=112.213..9470.864 rows=1664 loops=1)
-> Nested Loop (cost=0.00..29693.36 rows=1891 width=4) (actual time=93.160..7408.108 rows=1664 loops=1)
-> Nested Loop (cost=0.00..25785.40 rows=1891 width=4) (actual time=79.839..4577.662 rows=1664 loops=1)
-> Index Scan using securebinarypackagepublishinghistory__archive__status__idx on binarypackagepublishinghistory (cost=0.00..13471.19 rows=1891 width=4) (actual time=23.987..3300.864 rows=1664 loops=1)
Index Cond: (archive = 14516)
Filter: (dateremoved IS NULL)
-> Index Scan using binarypackagefile_binarypackage_idx on binarypackagefile (cost=0.00..6.50 rows=1 width=8) (actual time=0.765..0.765 rows=1 loops=1664)
Index Cond: (binarypackagefile.binarypackagerelease = binarypackagepublishinghistory.binarypackagerelease)
-> Index Scan using libraryfilealias_pkey on libraryfilealias (cost=0.00..2.05 rows=1 width=8) (actual time=1.699..1.699 rows=1 loops=1664)
Index Cond: (libraryfilealias.id = binarypackagefile.libraryfile)
-> Index Scan using libraryfilecontent_pkey on libraryfilecontent (cost=0.00..1.46 rows=1 width=94) (actual time=1.237..1.238 rows=1 loops=1664)
Index Cond: (libraryfilecontent.id = libraryfilealias.content)
Total runtime: 9478.267 ms
(14 rows)
Time: 9484.623 ms |
|
2011-05-03 05:20:24 |
Robert Collins |
summary |
Archive:+repository-size timeout |
Archive:+repository-size timeout retrieving many hundreds of package sizes |
|
2011-05-03 05:38:15 |
Robert Collins |
description |
3 SELECT DISTINCT LibraryFileContent.datecreated, LibraryFileContent.filesize, LibraryFileContent.i ... ackageFile.libraryfile = LibraryFileAlias.id AND LibraryFileAlias.content = LibraryFileContent.id:
GET: 3 Robots: 0 Local: 3
2 https://launchpad.net/%7Enilarimogard/+archive/webupd8/+repository-size (Archive:+repository-size)
OOPS-1948AX197, OOPS-1948BB197
1 https://launchpad.net/%7Eubuntu-mozilla-daily/+archive/ppa/+repository-size (Archive:+repository-size)
OOPS-1948DT521
Long queries to determine file size -
explain analyze SELECT DISTINCT LibraryFileContent.datecreated,
LibraryFileContent.filesize,
LibraryFileContent.id,
LibraryFileContent.md5,
LibraryFileContent.sha1
FROM BinaryPackageFile,
BinaryPackagePublishingHistory,
LibraryFileAlias,
LibraryFileContent
WHERE BinaryPackagePublishingHistory.archive = 14516
AND BinaryPackagePublishingHistory.dateremoved IS NULL
AND BinaryPackagePublishingHistory.binarypackagerelease = BinaryPackageFile.binarypackagerelease
AND BinaryPackageFile.libraryfile = LibraryFileAlias.id
AND LibraryFileAlias.content = LibraryFileContent.id;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=32489.87..32504.58 rows=1471 width=94) (actual time=9477.455..9477.931 rows=596 loops=1)
-> Nested Loop (cost=0.00..32471.48 rows=1471 width=94) (actual time=112.213..9470.864 rows=1664 loops=1)
-> Nested Loop (cost=0.00..29693.36 rows=1891 width=4) (actual time=93.160..7408.108 rows=1664 loops=1)
-> Nested Loop (cost=0.00..25785.40 rows=1891 width=4) (actual time=79.839..4577.662 rows=1664 loops=1)
-> Index Scan using securebinarypackagepublishinghistory__archive__status__idx on binarypackagepublishinghistory (cost=0.00..13471.19 rows=1891 width=4) (actual time=23.987..3300.864 rows=1664 loops=1)
Index Cond: (archive = 14516)
Filter: (dateremoved IS NULL)
-> Index Scan using binarypackagefile_binarypackage_idx on binarypackagefile (cost=0.00..6.50 rows=1 width=8) (actual time=0.765..0.765 rows=1 loops=1664)
Index Cond: (binarypackagefile.binarypackagerelease = binarypackagepublishinghistory.binarypackagerelease)
-> Index Scan using libraryfilealias_pkey on libraryfilealias (cost=0.00..2.05 rows=1 width=8) (actual time=1.699..1.699 rows=1 loops=1664)
Index Cond: (libraryfilealias.id = binarypackagefile.libraryfile)
-> Index Scan using libraryfilecontent_pkey on libraryfilecontent (cost=0.00..1.46 rows=1 width=94) (actual time=1.237..1.238 rows=1 loops=1664)
Index Cond: (libraryfilecontent.id = libraryfilealias.content)
Total runtime: 9478.267 ms
(14 rows)
Time: 9484.623 ms |
Summary
=======
We see timeouts on Archive:+repository-size determining the size of the archive.
The page returns content like:
Number of packages:
195 source packages (417.6 MiB)
695 binary packages (786.7 MiB)
Repository size:
1.2 GiB (58.84%) of 2.0 GiB
but generating this content requires reading rows from the database that are often cold: the library file content rows are only needed when checking the repository size or when checking the quota during uploads. The current query can be further optimised but the best performance we can expect is about 2ms per file that is currently in the PPA because the library file content table cannot be clustered to sensibly increase locality of reference for archive lookups, so a PPA with 100 packages built on 3 architectures with (say) 4 binaries from each package (lib, -dev, bin, docs) will have 1200 distinct files, which is 2400ms of lookup time for the binaries alone; on top of that we pay the same 2ms per row for the other tables when they are cold (and we have data suggesting this happens multiple times a day).
Possible solutions
==================
* We could more directly model the files-in-a-ppa (would require generalising the librarian to use more tables) and thus permit clustering on the archive.
* We could cache the size in the archive itself which would open us up to skew but remove the need to access all these rows at all.
Details
=======
3 SELECT DISTINCT LibraryFileContent.datecreated, LibraryFileContent.filesize, LibraryFileContent.i ... ackageFile.libraryfile = LibraryFileAlias.id AND LibraryFileAlias.content = LibraryFileContent.id:
GET: 3 Robots: 0 Local: 3
2 https://launchpad.net/%7Enilarimogard/+archive/webupd8/+repository-size (Archive:+repository-size)
OOPS-1948AX197, OOPS-1948BB197
1 https://launchpad.net/%7Eubuntu-mozilla-daily/+archive/ppa/+repository-size (Archive:+repository-size)
OOPS-1948DT521
Long queries to determine file size -
explain analyze SELECT DISTINCT LibraryFileContent.datecreated,
LibraryFileContent.filesize,
LibraryFileContent.id,
LibraryFileContent.md5,
LibraryFileContent.sha1
FROM BinaryPackageFile,
BinaryPackagePublishingHistory,
LibraryFileAlias,
LibraryFileContent
WHERE BinaryPackagePublishingHistory.archive = 14516
AND BinaryPackagePublishingHistory.dateremoved IS NULL
AND BinaryPackagePublishingHistory.binarypackagerelease = BinaryPackageFile.binarypackagerelease
AND BinaryPackageFile.libraryfile = LibraryFileAlias.id
AND LibraryFileAlias.content = LibraryFileContent.id;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=32489.87..32504.58 rows=1471 width=94) (actual time=9477.455..9477.931 rows=596 loops=1)
-> Nested Loop (cost=0.00..32471.48 rows=1471 width=94) (actual time=112.213..9470.864 rows=1664 loops=1)
-> Nested Loop (cost=0.00..29693.36 rows=1891 width=4) (actual time=93.160..7408.108 rows=1664 loops=1)
-> Nested Loop (cost=0.00..25785.40 rows=1891 width=4) (actual time=79.839..4577.662 rows=1664 loops=1)
-> Index Scan using securebinarypackagepublishinghistory__archive__status__idx on binarypackagepublishinghistory (cost=0.00..13471.19 rows=1891 width=4) (actual time=23.987..3300.864 rows=1664 loops=1)
Index Cond: (archive = 14516)
Filter: (dateremoved IS NULL)
-> Index Scan using binarypackagefile_binarypackage_idx on binarypackagefile (cost=0.00..6.50 rows=1 width=8) (actual time=0.765..0.765 rows=1 loops=1664)
Index Cond: (binarypackagefile.binarypackagerelease = binarypackagepublishinghistory.binarypackagerelease)
-> Index Scan using libraryfilealias_pkey on libraryfilealias (cost=0.00..2.05 rows=1 width=8) (actual time=1.699..1.699 rows=1 loops=1664)
Index Cond: (libraryfilealias.id = binarypackagefile.libraryfile)
-> Index Scan using libraryfilecontent_pkey on libraryfilecontent (cost=0.00..1.46 rows=1 width=94) (actual time=1.237..1.238 rows=1 loops=1664)
Index Cond: (libraryfilecontent.id = libraryfilealias.content)
Total runtime: 9478.267 ms
(14 rows)
Time: 9484.623 ms |
|
2013-03-19 06:34:02 |
Steve Kowalik |
branch linked |
|
lp:~stevenk/launchpad/archive-repo-size |
|
2013-03-20 01:50:33 |
Launchpad QA Bot |
launchpad: assignee |
|
Steve Kowalik (stevenk) |
|
2013-03-20 01:50:34 |
Launchpad QA Bot |
tags |
timeout |
qa-needstesting timeout |
|
2013-03-20 01:50:35 |
Launchpad QA Bot |
launchpad: status |
Triaged |
Fix Committed |
|
2013-03-20 02:58:26 |
Steve Kowalik |
tags |
qa-needstesting timeout |
qa-ok timeout |
|
2013-03-22 06:41:07 |
Steve Kowalik |
launchpad: status |
Fix Committed |
Fix Released |
|