Archive:+repository-size timeout retrieving many hundreds of package sizes

Bug #739070 reported by Robert Collins
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Critical
Steve Kowalik

Bug Description

Summary
=======

We see timeouts on Archive:+repository-size determining the size of the archive.

The page returns content like:
Number of packages:
195 source packages (417.6 MiB)
695 binary packages (786.7 MiB)
Repository size:
1.2 GiB (58.84%) of 2.0 GiB

but generating this content requires reading rows from the database that are often cold: the library file content rows are only needed when checking the repository size or when checking the quota during uploads. The current query can be further optimised but the best performance we can expect is about 2ms per file that is currently in the PPA because the library file content table cannot be clustered to sensibly increase locality of reference for archive lookups, so a PPA with 100 packages built on 3 architectures with (say) 4 binaries from each package (lib, -dev, bin, docs) will have 1200 distinct files, which is 2400ms of lookup time for the binaries alone; on top of that we pay the same 2ms per row for the other tables when they are cold (and we have data suggesting this happens multiple times a day).

Possible solutions
==================

* We could more directly model the files-in-a-ppa (would require generalising the librarian to use more tables) and thus permit clustering on the archive.

* We could cache the size in the archive itself which would open us up to skew but remove the need to access all these rows at all.

Details
=======

  3 SELECT DISTINCT LibraryFileContent.datecreated, LibraryFileContent.filesize, LibraryFileContent.i ... ackageFile.libraryfile = LibraryFileAlias.id AND LibraryFileAlias.content = LibraryFileContent.id:
   GET: 3 Robots: 0 Local: 3
      2 https://launchpad.net/%7Enilarimogard/+archive/webupd8/+repository-size (Archive:+repository-size)
       OOPS-1948AX197, OOPS-1948BB197
      1 https://launchpad.net/%7Eubuntu-mozilla-daily/+archive/ppa/+repository-size (Archive:+repository-size)
       OOPS-1948DT521

Long queries to determine file size -

explain analyze SELECT DISTINCT LibraryFileContent.datecreated,
                LibraryFileContent.filesize,
                LibraryFileContent.id,
                LibraryFileContent.md5,
                LibraryFileContent.sha1
FROM BinaryPackageFile,
     BinaryPackagePublishingHistory,
     LibraryFileAlias,
     LibraryFileContent
WHERE BinaryPackagePublishingHistory.archive = 14516
  AND BinaryPackagePublishingHistory.dateremoved IS NULL
  AND BinaryPackagePublishingHistory.binarypackagerelease = BinaryPackageFile.binarypackagerelease
  AND BinaryPackageFile.libraryfile = LibraryFileAlias.id
  AND LibraryFileAlias.content = LibraryFileContent.id;
                                                                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate (cost=32489.87..32504.58 rows=1471 width=94) (actual time=9477.455..9477.931 rows=596 loops=1)
   -> Nested Loop (cost=0.00..32471.48 rows=1471 width=94) (actual time=112.213..9470.864 rows=1664 loops=1)
         -> Nested Loop (cost=0.00..29693.36 rows=1891 width=4) (actual time=93.160..7408.108 rows=1664 loops=1)
               -> Nested Loop (cost=0.00..25785.40 rows=1891 width=4) (actual time=79.839..4577.662 rows=1664 loops=1)
                     -> Index Scan using securebinarypackagepublishinghistory__archive__status__idx on binarypackagepublishinghistory (cost=0.00..13471.19 rows=1891 width=4) (actual time=23.987..3300.864 rows=1664 loops=1)
                           Index Cond: (archive = 14516)
                           Filter: (dateremoved IS NULL)
                     -> Index Scan using binarypackagefile_binarypackage_idx on binarypackagefile (cost=0.00..6.50 rows=1 width=8) (actual time=0.765..0.765 rows=1 loops=1664)
                           Index Cond: (binarypackagefile.binarypackagerelease = binarypackagepublishinghistory.binarypackagerelease)
               -> Index Scan using libraryfilealias_pkey on libraryfilealias (cost=0.00..2.05 rows=1 width=8) (actual time=1.699..1.699 rows=1 loops=1664)
                     Index Cond: (libraryfilealias.id = binarypackagefile.libraryfile)
         -> Index Scan using libraryfilecontent_pkey on libraryfilecontent (cost=0.00..1.46 rows=1 width=94) (actual time=1.237..1.238 rows=1 loops=1664)
               Index Cond: (libraryfilecontent.id = libraryfilealias.content)
 Total runtime: 9478.267 ms
(14 rows)

Time: 9484.623 ms

Tags: qa-ok timeout

Related branches

description: updated
description: updated
Revision history for this message
Robert Collins (lifeless) wrote :

Hot

 explain analyze SELECT DISTINCT LibraryFileContent.datecreated,
                LibraryFileContent.filesize,
                LibraryFileContent.id,
                LibraryFileContent.md5,
                LibraryFileContent.sha1
FROM BinaryPackageFile,
     BinaryPackagePublishingHistory,
     LibraryFileAlias,
     LibraryFileContent
WHERE BinaryPackagePublishingHistory.archive = 14516
  AND BinaryPackagePublishingHistory.dateremoved IS NULL
  AND BinaryPackagePublishingHistory.binarypackagerelease = BinaryPackageFile.binarypackagerelease
  AND BinaryPackageFile.libraryfile = LibraryFileAlias.id
  AND LibraryFileAlias.content = LibraryFileContent.id;
                                                                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate (cost=32489.87..32504.58 rows=1471 width=94) (actual time=125.946..126.417 rows=596 loops=1)
   -> Nested Loop (cost=0.00..32471.48 rows=1471 width=94) (actual time=0.780..119.488 rows=1664 loops=1)
         -> Nested Loop (cost=0.00..29693.36 rows=1891 width=4) (actual time=0.597..91.107 rows=1664 loops=1)
               -> Nested Loop (cost=0.00..25785.40 rows=1891 width=4) (actual time=0.438..57.097 rows=1664 loops=1)
                     -> Index Scan using securebinarypackagepublishinghistory__archive__status__idx on binarypackagepublishinghistory (cost=0.00..13471.19 rows=1891 width=4) (actual time=0.209..35.282 rows=1664 loops=1)
                           Index Cond: (archive = 14516)
                           Filter: (dateremoved IS NULL)
                     -> Index Scan using binarypackagefile_binarypackage_idx on binarypackagefile (cost=0.00..6.50 rows=1 width=8) (actual time=0.011..0.011 rows=1 loops=1664)
                           Index Cond: (binarypackagefile.binarypackagerelease = binarypackagepublishinghistory.binarypackagerelease)
               -> Index Scan using libraryfilealias_pkey on libraryfilealias (cost=0.00..2.05 rows=1 width=8) (actual time=0.018..0.019 rows=1 loops=1664)
                     Index Cond: (libraryfilealias.id = binarypackagefile.libraryfile)
         -> Index Scan using libraryfilecontent_pkey on libraryfilecontent (cost=0.00..1.46 rows=1 width=94) (actual time=0.014..0.015 rows=1 loops=1664)
               Index Cond: (libraryfilecontent.id = libraryfilealias.content)
 Total runtime: 126.734 ms
(14 rows)

Time: 133.656 ms

Revision history for this message
Robert Collins (lifeless) wrote :

A variant that examines slightly less rows

explain analyze SELECT DISTINCT LibraryFileContent.datecreated,
                LibraryFileContent.filesize,
                LibraryFileContent.id,
                LibraryFileContent.md5,
                LibraryFileContent.sha1
FROM BinaryPackageFile,
     BinaryPackagePublishingHistory,
     LibraryFileAlias,
     LibraryFileContent
WHERE BinaryPackagePublishingHistory.archive = 14516
  AND BinaryPackagePublishingHistory.dateremoved IS NULL
  AND BinaryPackagePublishingHistory.binarypackagerelease = BinaryPackageFile.binarypackagerelease
  AND BinaryPackageFile.libraryfile = LibraryFileAlias.id
  AND LibraryFileAlias.content = LibraryFileContent.id AND BinaryPackagePublishingHistory.status IN (2);
                                                                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate (cost=6614.28..6617.22 rows=294 width=94) (actual time=61.875..62.380 rows=596 loops=1)
   -> Nested Loop (cost=0.00..6610.60 rows=294 width=94) (actual time=0.094..58.355 rows=1642 loops=1)
         -> Nested Loop (cost=0.00..6055.27 rows=378 width=4) (actual time=0.076..43.243 rows=1642 loops=1)
               -> Nested Loop (cost=0.00..5274.09 rows=378 width=4) (actual time=0.057..27.212 rows=1642 loops=1)
                     -> Index Scan using securebinarypackagepublishinghistory__archive__status__idx on binarypackagepublishinghistory (cost=0.00..2735.80 rows=378 width=4) (actual time=0.036..13.610 rows=1642 loops=1)
                           Index Cond: ((archive = 14516) AND (status = 2))
                           Filter: (dateremoved IS NULL)
                     -> Index Scan using binarypackagefile_binarypackage_idx on binarypackagefile (cost=0.00..6.70 rows=1 width=8) (actual time=0.006..0.007 rows=1 loops=1642)
                           Index Cond: (binarypackagefile.binarypackagerelease = binarypackagepublishinghistory.binarypackagerelease)
               -> Index Scan using libraryfilealias_pkey on libraryfilealias (cost=0.00..2.05 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=1642)
                     Index Cond: (libraryfilealias.id = binarypackagefile.libraryfile)
         -> Index Scan using libraryfilecontent_pkey on libraryfilecontent (cost=0.00..1.46 rows=1 width=94) (actual time=0.007..0.007 rows=1 loops=1642)
               Index Cond: (libraryfilecontent.id = libraryfilealias.content)
 Total runtime: 62.705 ms

Revision history for this message
Robert Collins (lifeless) wrote :

my WAG - we're reading all rows for that archive, not relevant rows. That archive on staging is 6.5K rows which is not huge - though if its spread out over the db that is potentially 6.5K pages to read.

Revision history for this message
Robert Collins (lifeless) wrote :

another possibility is the loop style

explain analyze SELECT DISTINCT LibraryFileContent.datecreated,
                LibraryFileContent.filesize,
                LibraryFileContent.id,
                LibraryFileContent.md5,
                LibraryFileContent.sha1
FROM BinaryPackageFile,
     LibraryFileAlias,
     LibraryFileContent
WHERE binarypackagerelease in (select distinct binarypackagerelease from BinaryPackagePublishingHistory where BinaryPackagePublishingHistory.archive = 14516 AND BinaryPackagePublishingHistory.dateremoved IS NULL)
    AND BinaryPackageFile.libraryfile = LibraryFileAlias.id
  AND LibraryFileAlias.content = LibraryFileContent.id;

Revision history for this message
Robert Collins (lifeless) wrote :

explain analyze SELECT sum(filesize) from (select distinct on (libraryfile) filesize
FROM BinaryPackageFile,
     LibraryFileAlias,
     LibraryFileContent
WHERE binarypackagerelease in (select distinct binarypackagerelease from BinaryPackagePublishingHistory where BinaryPackagePublishingHistory.archive = 14516 AND BinaryPackagePublishingHistory.dateremoved IS NULL)
    AND BinaryPackageFile.libraryfile = LibraryFileAlias.id
  AND LibraryFileAlias.content = LibraryFileContent.id) as _tmp;

Revision history for this message
Robert Collins (lifeless) wrote :
Download full text (3.9 KiB)

cold query:

explain analyze SELECT sum(filesize) from (select distinct on (libraryfile) filesize
launchpad_standalone_1(# FROM BinaryPackageFile,
launchpad_standalone_1(# LibraryFileAlias,
launchpad_standalone_1(# LibraryFileContent
launchpad_standalone_1(# WHERE binarypackagerelease in (select distinct binarypackagerelease from BinaryPackagePublishingHistory where BinaryPackagePublishingHistory.archive = 14516 AND BinaryPackagePublishingHistory.dateremoved IS NULL)
launchpad_standalone_1(# AND BinaryPackageFile.libraryfile = LibraryFileAlias.id
launchpad_standalone_1(# AND LibraryFileAlias.content = LibraryFileContent.id) as _tmp;
                                                                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate (cost=13903.12..13903.13 rows=1 width=8) (actual time=6648.067..6648.068 rows=1 loops=1)
   -> Unique (cost=13900.40..13901.18 rows=155 width=12) (actual time=6646.089..6647.376 rows=695 loops=1)
         -> Sort (cost=13900.40..13900.79 rows=155 width=12) (actual time=6646.081..6646.504 rows=695 loops=1)
               Sort Key: binarypackagefile.libraryfile
               Sort Method: quicksort Memory: 57kB
               -> Nested Loop (cost=11753.52..13894.76 rows=155 width=12) (actual time=2671.928..6643.791 rows=695 loops=1)
                     -> Nested Loop (cost=11753.52..13662.76 rows=200 width=8) (actual time=2665.323..4892.766 rows=695 loops=1)
                           -> Nested Loop (cost=11753.52..13331.84 rows=200 width=4) (actual time=2661.632..3879.262 rows=695 loops=1)
                                 -> HashAggregate (cost=11753.52..11757.28 rows=376 width=4) (actual time=2648.004..2649.444 rows=695 loops=1)
                                       -> Bitmap Heap Scan on binarypackagepublishinghistory (cost=94.28..11748.42 rows=2039 width=4) (actual time=66.914..2644.608 rows=1804 loops=1)
                                             Recheck Cond: (archive = 14516)
                                             Filter: (dateremoved IS NULL)
                                             -> Bitmap Index Scan on securebinarypackagepublishinghistory__archive__status__idx (cost=0.00..93.77 rows=7082 width=0) (actual time=55.909..55.909 rows=7720 loops=1)
                                                   Index Cond: (archive = 14516)
                                 -> Index Scan using binarypackagefile_binarypackage_idx on binarypackagefile (cost=0.00..4.17 rows=1 width=8) (actual time=1.764..1.765 rows=1 loops=695)
                                       Index Cond: (binarypackagefile.binarypackagerelease = binarypackagepublishinghistory.binarypackagerelease)
                           -> Index Scan using libraryfilealias_pkey on libraryfilealias (cost=0.00..1.64 rows=1 width=8) (actual time=1.438..1.454 rows=1 loops=695)
        ...

Read more...

Revision history for this message
Robert Collins (lifeless) wrote :

Based on this I suspect an index will help but only by about 1.8 seconds - we'll still need the 1-2K rows from BPPH and thats going to be 20% of the IO

Revision history for this message
Robert Collins (lifeless) wrote :

libraryfilecontent has 14M rows of data, so we're looking at index pages plus a table page for every row referenced.

Its possible we have table bloat happening;

(qastaging:)
 explain analyze select sum(filesize) from libraryfilecontent where id in (SELECT trunc(random() * 13000000 + 1) FROM generate_series(1,1000));
                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate (cost=1170606.21..1170606.22 rows=1 width=8) (actual time=10274.794..10274.794 rows=1 loops=1)
   -> Hash Join (cost=39.50..1135723.70 rows=13953003 width=8) (actual time=5.543..10273.940 rows=182 loops=1)
         Hash Cond: ((libraryfilecontent.id)::double precision = (trunc(((random() * 13000000::double precision) + 1::double precision))))
         -> Seq Scan on libraryfilecontent (cost=0.00..368269.03 rows=13953003 width=12) (actual time=0.034..5751.479 rows=13952977 loops=1)
         -> Hash (cost=37.00..37.00 rows=200 width=8) (actual time=3.128..3.128 rows=1000 loops=1)
               -> HashAggregate (cost=35.00..37.00 rows=200 width=8) (actual time=2.116..2.497 rows=1000 loops=1)
                     -> Function Scan on generate_series (cost=0.00..22.50 rows=1000 width=0) (actual time=0.256..1.234 rows=1000 loops=1)
 Total runtime: 10274.957 ms
(8 rows)

Time: 10306.695 ms

Revision history for this message
Robert Collins (lifeless) wrote :

bah, seq scan, not comparable

Revision history for this message
Robert Collins (lifeless) wrote :

SELECT trunc(random() * 13000000 + 1)::integer into temp table ids FROM generate_series(1,1000);
explain analyze select sum(filesize) from libraryfilecontent , ids where libraryfilecontent.id=ids.trunc;

Revision history for this message
Robert Collins (lifeless) wrote :

And confirmed : bringing back 1K rows from LFC, at random is 2.8 seconds alone.

SELECT trunc(random() * 13000000 + 1)::integer into temp table ids FROM generate_series(1,1000);
SELECT
Time: 167.148 ms
launchpad_standalone_1=# explain analyze select sum(filesize) from libraryfilecontent , ids where libraryfilecontent.id=ids.trunc;
                                                                         QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate (cost=11275.84..11275.85 rows=1 width=8) (actual time=2816.682..2816.683 rows=1 loops=1)
   -> Nested Loop (cost=0.00..11269.84 rows=2400 width=8) (actual time=16.099..2815.581 rows=154 loops=1)
         -> Seq Scan on ids (cost=0.00..34.00 rows=2400 width=4) (actual time=0.019..1.692 rows=1000 loops=1)
         -> Index Scan using libraryfilecontent_pkey on libraryfilecontent (cost=0.00..4.67 rows=1 width=12) (actual time=2.810..2.811 rows=0 loops=1000)
               Index Cond: (libraryfilecontent.id = ids.trunc)
 Total runtime: 2816.746 ms
(6 rows)

Time: 2818.917 ms

summary: - Archive:+repository-size timeout
+ Archive:+repository-size timeout retrieving many hundreds of package
+ sizes
description: updated
Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
Changed in launchpad:
assignee: nobody → Steve Kowalik (stevenk)
tags: added: qa-needstesting
Changed in launchpad:
status: Triaged → Fix Committed
Steve Kowalik (stevenk)
tags: added: qa-ok
removed: qa-needstesting
Steve Kowalik (stevenk)
Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.