Retrieving an Archive:EntryResource causes excessive repeated database queries

Bug #474876 reported by James Westby on 2009-11-04
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
High
Muharem Hrnjadovic

Bug Description

The changes_file_url property of a SourcePackagePublishingHistory can be
expensive to compute. It calls getChangesFilesForSources, which means
that it has to query for SourcePackagePublishingHistory again, with lots of
joins. This can add up as you loop over the publications.

Please either make it less expensive to compute, or export it as a method,
rather than a property.

Thanks,

James

Related branches

James Westby (james-w) wrote :

OOPS 1404D3213 by the way.

Julian Edwards (julian-edwards) wrote :

Hmm that getChangesFilesForSources() code should be a very quick query. Is it timing out a lot or is it sometimes quick? I have a suspicion that it can use a slave store rather than the main store, which might improve things.

Changed in soyuz:
status: New → Incomplete

On Thu Nov 05 08:57:27 UTC 2009 Julian Edwards wrote:
> Hmm that getChangesFilesForSources() code should be a very quick query.
> Is it timing out a lot or is it sometimes quick? I have a suspicion
> that it can use a slave store rather than the main store, which might
> improve things.

It's not always timing out.

The OOPS I posted shows 74 repeated queries for a window of 75, with a
couple of long ones. It appears that doing the queries 75 times can
sometimes hit some DB contention, and so push the overall time over the
timeout threshold.

Thanks,

James

That OOPS is for an Archive entry, not a publication. It looks spurious, although if you get these a lot I'd be worried. It's not a particularly high number of queries but we should probably optimise it anyway.

Julian Edwards (julian-edwards) wrote :

I'll update the description if you're ok with that?

On Thu Nov 05 14:58:13 UTC 2009 Julian Edwards wrote:
> I'll update the description if you're ok with that?

That's fine with me.

Thanks,

James

summary: - getPublishedSources can timeout due to getChangesFilesForSources
+ Retrieving an Archive:EntryResource causes excessive repeated database
+ queries
Changed in soyuz:
status: Incomplete → Triaged
importance: Undecided → High
tags: added: api oops
Changed in soyuz:
milestone: none → 3.1.11
Changed in soyuz:
milestone: 3.1.11 → 3.1.12
James Westby (james-w) wrote :

Hi,

This is making the bzr importer rather painful to work with at the moment.
It seems to be tripping over it all the time now.

Thanks,

James

Muharem Hrnjadovic (al-maisan) wrote :

Hmm .. one way to optimize this would be to write a method with a slimmed-down query that does just what is needed to get the changes file LFA and avoids joining on `SourcePackagePublishingHistory`.

Changed in soyuz:
assignee: nobody → Muharem Hrnjadovic (al-maisan)
Changed in soyuz:
status: Triaged → In Progress
Muharem Hrnjadovic (al-maisan) wrote :

Hi there!

I prepared a branch that optimizes the database query behind the
'changes_file_url' property.

It stands to reason that it is more optimal because it avoids the
SourcePackagePublishingHistory and the SourcePackageRelease
tables (with 758673 and 459354 rows respectively).

I am not sure how to test/prove that it is more optimal however.

How about landing it and have James use it (via edge). If all goes
well we can close this bug.

----

What follows are some details on the queries..

The old query (http://pastebin.ubuntu.com/342524/) joins on the following
tables:

    LibraryFileAlias, LibraryFileContent, PackageUpload, PackageUploadSource,
    SourcePackagePublishingHistory, SourcePackageRelease

In contrast, the new query (http://pastebin.ubuntu.com/342528/) only uses
these tables:

    LibraryFileAlias, PackageUpload, PackageUploadSource

Kees Cook (kees) wrote :

Is there a portable way to write our scripts to use changes_file_url in either form? I.e. so we don't have to wait for it to break before changing it to changes_file_url()?

Kees Cook wrote:
> Is there a portable way to write our scripts to use changes_file_url in
> either form? I.e. so we don't have to wait for it to break before
> changing it to changes_file_url()?

I am afraid I can't think of one. The new code will become available on
edge first.

As long as you stay on launchpad.net the 'changes_file_url' will continue
to be available (until the next official LP rollout) so you can change
over to edge and the changesFileUrl() method at your own pace.

Best regards

--
Muharem Hrnjadovic <email address hidden>
Public key id : B2BBFCFC
Key fingerprint : A5A3 CC67 2B87 D641 103F 5602 219F 6B60 B2BB FCFC

Kees Cook (kees) wrote :

I think I've found a possible approach:

    # Handle transition to method
    import types
    if type(item.changes_file_url) == types.MethodType:
        src_changes = item.changes_file_url()
    else:
        src_changes = item.changes_file_url

--
Kees Cook
Ubuntu Security Team

As Muharem says, the best approach is to stay on lpnet unless you really need
to use edge, so you get some time to switch over. We will always warn of
these sorts of API changes as well.

FWIW the API will get versioning support next year which I hope will fix these
sorts of problems.

Changed in soyuz:
status: In Progress → Fix Committed
Changed in soyuz:
milestone: 3.1.12 → 10.01
Changed in soyuz:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers