Comment 4 for bug 816870

Revision history for this message
William Grant (wgrant) wrote :

OOPS-0a167283b8de82115790872c75a0f611 is a recent example. This page has a lot of issues, of which I'll describe just a few:

Firstly, DistributionPackageSearchView.initialize() calls Distribution.has_published_binaries, which should be sub-ms but takes >1s on production. https://pastebin.canonical.com/67456/ reveals that it's doing a merge join from DistroSeries to DistroArchSeries, using distroseries__distribution__id__key. This will, of course, cause older DistroSeries to appear first. So when it subsequently does a nested loop to find active publications for the distroseries, the first few iterations find none, since all the publications are obsolete. This wouldn't be a problem if the BPPH lookup was indexed, but the query doesn't specify the archive, so it can't use binarypackagepublishinghistory__archive__distroarchseries__stat. If I restrict to just the primary archive, the index used and the query is lightning fast. If I restrict the archive to primary+partner, the index is no longer used.

Secondly, and the primary cause of timeouts on production today, DistributionPackageSearchView.has_exact_matches says "return self.exact_matches.count() > 0", which is going to be terribly slow. It in fact means to say "return not self.exact_matches.is_empty()".

Thirdly, the search queries are pretty bad. Distribution.searchBinaries with exact_match=True creates a monstrous join across 8 tables to filter DistributionSourcePackageCache on archive and binary name, when it has archive and binary name columns which eliminate the need for any joins at all. With exact_match=False it's probably even worse.