Launchpad itself

Including binaries when copying pkgs with lots of binaries oopses

Bug #447138 reported by Michael Nelson on 2009-10-09

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Launchpad itself	Fix Released	Medium	Michael Nelson	Launchpad itself 3.1.10

Bug Description

OOPS-1378C1323

Nicolas experienced the above oops while trying to copy openoffice.org with binaries included - and mentioned that someone in the security team also experienced this recently?

It seems that we create and (re)fetch each BinaryPackagePublishingHistory record individually - I'm hoping we can instead batch this to reduce both the number of queries being executed (~150reps of many single queries in the above oops) as well as the time spent instantiating and populating the storm object representations.

Tags:

Related branches

lp:~michael.nelson/launchpad/copy-binaries-timeout

Merged into lp:launchpad

Muharem Hrnjadovic (community): Approve on 2009-10-29

Julian Edwards (julian-edwards) on 2009-10-09

Changed in soyuz:
status:	New → Triaged
importance:	Undecided → Medium
tags:	added: oops
tags:	added: tech-debt

Cody A.W. Somerville (cody-somerville) on 2009-10-09

tags:

added: oem-services

Michael Nelson (michael.nelson) on 2009-10-14

Changed in soyuz:
assignee:	nobody → Michael Nelson (michael.nelson)
status:	Triaged → In Progress
milestone:	none → 3.1.10

Revision history for this message

Michael Nelson (michael.nelson) wrote on 2009-10-14:

In terms of a pre-implementation discussion, here's what I'm planning:

1. Look at adding a PublishingSet.copyBinariesTo(binaries, series, pocket, archive) that ensures certain queries are not evaluated for each binary. For example, the biggest timesaver according to the oops is from the PPA component override to main - we look up the 'main' component before creating each binary.

2. The next worst offender is the repeated inserts - can we batch these with storm? would there be much of a benefit? - Hrm, no, it seems storm.store.Store only has a singular add method. Not much to do here afaics.

3. The third worst offender is a select on SecureBinaryPackagePublishingHistory - and I'm not sure where this is coming from? At first I thought it was an artifact of the SBPPH insert, but there's another query listed for that. This one only selects 3 fgields on the SBPPH.

4. The fourth being to get the corresponding BPPH - at the end of PackageSet.newBinaryPublication() we have BinaryPackagePublishingHistory.get(pub.id). This could also be a single query to get all the corresponding BPPHs in one hit if we have PS.copyBinariesTo().

5. Refactor SPPH.getBuiltBinaries() getting rid of the code that currently iterates the results to get the BPPH for unique BPRs. We could instead return a result set of unique (BPPH, BPR) tuples directly from the database as the callsites all use the related BPRs anyway. (I'm assuming this is contributing to the non-sql time).

Michael Nelson (michael.nelson) on 2009-10-16

Changed in soyuz:
status:	In Progress → Triaged

Revision history for this message

Kees Cook (kees) wrote on 2009-10-21:

I no longer see Oopses, it's just failing with really large packages (linux-source-2.6.15 on dapper, linux on hardy)

Failed to sync: HTTP Error 503: Service Unavailable
{'status': '503', 'content-length': '7319', 'via': '1.1 wildcard.edge.launchpad.net', 'x-powered-by': 'Zope (www.zope.org), Python (www.python.org)', 'server': 'zope.server.http (HTTP)', 'retry-after': '900', 'connection': 'close', 'date': 'Wed, 21 Oct 2009 21:03:37 GMT', 'content-type': 'text/html;charset=utf-8'}

Failed to sync: HTTP Error 502: Bad Gateway
{'status': '502', 'content-length': '921', 'accept-ranges': 'bytes', 'server': 'Apache/2.2.8 (Ubuntu) mod_ssl/2.2.8 OpenSSL/0.9.8g', 'last-modified': 'Wed, 23 Sep 2009 21:37:39 GMT', 'etag': '"24eaac-399-4744586254ec0"', 'date': 'Wed, 21 Oct 2009 20:57:53 GMT', 'content-type': 'text/html'}

retrying and retrying....

Revision history for this message

Kees Cook (kees) wrote on 2009-10-21:

(fails around 19 seconds, I assume this is the LP max-query-time-timeout)

Revision history for this message

Julian Edwards (julian-edwards) wrote on 2009-10-22:

Kees, if it fails without an oops it indicates that something's not right with an appserver. I think we had some trouble yesterday with them, is this still happening?

If you get this sort of thing again your best action is to find a LOSA.

Revision history for this message

Michael Nelson (michael.nelson) wrote on 2009-10-29:

The attached branch deals with (1) and (4) above. We can do a separate branch dealing with (5) too if/when needed.

Revision history for this message

Diogo Matsubara (matsubara) wrote on 2009-10-30: Bug fixed by a commit

Fixed in devel r9811 <http://bazaar.launchpad.net/~launchpad-pqm/launchpad/devel/revision/9811>

Changed in soyuz:
status:	Triaged → Fix Committed

Revision history for this message

Michael Nelson (michael.nelson) wrote on 2009-10-30:

Tested on dogfood before landing:

Setup:
* disabled redirect after post so that query stats on page are from the
actual package copy + page generation.

Via UI copied openoffice.org - 1:3.1.0-3ubuntu2~jaunty1 from
https://dogfood.launchpad.net/~openoffice-pkgs/+archive/ppa/+copy-packages
At least 2073 queries issued in 11.09 seconds

Patched and re-copied (after deleting original copy):
At least 1172 queries issued in 9.23 seconds

Michael Nelson (michael.nelson) on 2009-11-06

Changed in soyuz:
status:	Fix Committed → Fix Released

Revision history for this message

Kees Cook (kees) wrote on 2009-12-04:

This is not fixed for me...

Publishing linux-source-2.6.15 2.6.15-55.81 to dapper-security ...
Failed to sync (took 18s): HTTP Error 503: Service Unavailable
{'status': '503', 'content-length': '7410', 'via': '1.1 wildcard.edge.launchpad.net', 'x-powered-by': 'Zope (www.zope.org), Python (www.python.org)', 'server': 'zope.server.http (HTTP)', 'retry-after': '900', 'connection': 'close', 'date': 'Fri, 04 Dec 2009 17:45:16 GMT', 'content-type': 'text/html;charset=utf-8'}
Locating linux ...
Publishing linux 2.6.24-26.64 to hardy-security ...
Failed to sync (took 20s): HTTP Error 503: Service Unavailable
{'status': '503', 'content-length': '7409', 'via': '1.1 wildcard.edge.launchpad.net', 'x-powered-by': 'Zope (www.zope.org), Python (www.python.org)', 'server': 'zope.server.http (HTTP)', 'retry-after': '900', 'connection': 'close', 'date': 'Fri, 04 Dec 2009 17:45:52 GMT', 'content-type': 'text/html;charset=utf-8'}

Revision history for this message

Kees Cook (kees) wrote on 2009-12-04:

Should I open a new bug report?

Revision history for this message

Michael Nelson (michael.nelson) wrote on 2009-12-04:

#10

Hi Kees, yes please - linking back to this one. We didn't implement all the possible improvements with this fix, but the details of those other improvements are listed above.

Revision history for this message

Nicolas Valcarcel (nvalcarcel) wrote on 2010-02-02:

#11

I hitted it again:
Publishing openoffice.org 1:2.4.1-1ubuntu2.2 to hardy-security ...
Failed to sync (took 5s): HTTP Error 400: Bad Request
{'status': '400', 'content-length': '88', 'via': '1.1 wildcard.launchpad.net', 'x-powered-by': 'Zope (www.zope.org), Python (www.python.org)', 'server': 'zope.server.http (HTTP)', 'connection': 'close', 'date': 'Tue, 02 Feb 2010 21:08:01 GMT', 'content-type': 'text/plain'}

Kees: did you reported this new bug?

Revision history for this message

Nicolas Valcarcel (nvalcarcel) wrote on 2010-02-02:

#12

It seems that it is a different issue, the workaround isn't working anymore, i used to change include_binaries=True for include_binaries=False in syncSource and was able to get the package synced, but it's not working anymore.

Revision history for this message

William Grant (wgrant) wrote on 2010-02-02:

#13

A 400 is not a timeout -- this is a rather different issue. If you can catch the exception and check the 'content' attribute, you will see a description of the issue.

Revision history for this message

Nicolas Valcarcel (nvalcarcel) wrote on 2010-02-03:

#14

What i just posted is the HTTPError.response, should i print HTTPError.content?

Revision history for this message

Nicolas Valcarcel (nvalcarcel) wrote on 2010-02-03:

#15

Just printed it an got:
openoffice.org 1:2.4.1-1ubuntu2.2 in hardy (binaries conflicting with the existing ones)
But i don't have openoffice.org in the p3a i'm copying to

Revision history for this message

Julian Edwards (julian-edwards) wrote on 2010-02-03:

#16

The package will have once existed there and was deleted. You can't resurrect the same file names like that as clients may have downloaded the old files, and will be confused by the new ones.

Revision history for this message

Kees Cook (kees) wrote on 2010-02-23:

#17

I've opened bug 526645 for the timeouts the security team is still seeing.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.