Including binaries when copying pkgs with lots of binaries oopses
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| Launchpad itself |
Medium
|
Michael Nelson |
Bug Description
OOPS-1378C1323
Nicolas experienced the above oops while trying to copy openoffice.org with binaries included - and mentioned that someone in the security team also experienced this recently?
It seems that we create and (re)fetch each BinaryPackagePu
Related branches
- Muharem Hrnjadovic (community): Approve on 2009-10-29
-
Diff: 196 lines3 files modifiedlib/lp/soyuz/interfaces/publishing.py (+17/-0)
lib/lp/soyuz/model/publishing.py (+69/-26)
lib/lp/soyuz/scripts/packagecopier.py (+6/-25)
Changed in soyuz: | |
status: | New → Triaged |
importance: | Undecided → Medium |
tags: | added: oops |
tags: | added: tech-debt |
tags: | added: oem-services |
Changed in soyuz: | |
assignee: | nobody → Michael Nelson (michael.nelson) |
status: | Triaged → In Progress |
milestone: | none → 3.1.10 |
Michael Nelson (michael.nelson) wrote : | #1 |
Changed in soyuz: | |
status: | In Progress → Triaged |
Kees Cook (kees) wrote : | #2 |
I no longer see Oopses, it's just failing with really large packages (linux-
Failed to sync: HTTP Error 503: Service Unavailable
{'status': '503', 'content-length': '7319', 'via': '1.1 wildcard.
Failed to sync: HTTP Error 502: Bad Gateway
{'status': '502', 'content-length': '921', 'accept-ranges': 'bytes', 'server': 'Apache/2.2.8 (Ubuntu) mod_ssl/2.2.8 OpenSSL/0.9.8g', 'last-modified': 'Wed, 23 Sep 2009 21:37:39 GMT', 'etag': '"24eaac-
retrying and retrying....
Kees Cook (kees) wrote : | #3 |
(fails around 19 seconds, I assume this is the LP max-query-
Julian Edwards (julian-edwards) wrote : | #4 |
Kees, if it fails without an oops it indicates that something's not right with an appserver. I think we had some trouble yesterday with them, is this still happening?
If you get this sort of thing again your best action is to find a LOSA.
Michael Nelson (michael.nelson) wrote : | #5 |
The attached branch deals with (1) and (4) above. We can do a separate branch dealing with (5) too if/when needed.
Fixed in devel r9811 <http://
Changed in soyuz: | |
status: | Triaged → Fix Committed |
Michael Nelson (michael.nelson) wrote : | #7 |
Tested on dogfood before landing:
Setup:
* disabled redirect after post so that query stats on page are from the
actual package copy + page generation.
Via UI copied openoffice.org - 1:3.1.0-
https:/
At least 2073 queries issued in 11.09 seconds
Patched and re-copied (after deleting original copy):
At least 1172 queries issued in 9.23 seconds
Changed in soyuz: | |
status: | Fix Committed → Fix Released |
Kees Cook (kees) wrote : | #8 |
This is not fixed for me...
Publishing linux-source-2.6.15 2.6.15-55.81 to dapper-security ...
Failed to sync (took 18s): HTTP Error 503: Service Unavailable
{'status': '503', 'content-length': '7410', 'via': '1.1 wildcard.
Locating linux ...
Publishing linux 2.6.24-26.64 to hardy-security ...
Failed to sync (took 20s): HTTP Error 503: Service Unavailable
{'status': '503', 'content-length': '7409', 'via': '1.1 wildcard.
Kees Cook (kees) wrote : | #9 |
Should I open a new bug report?
Michael Nelson (michael.nelson) wrote : | #10 |
Hi Kees, yes please - linking back to this one. We didn't implement all the possible improvements with this fix, but the details of those other improvements are listed above.
Nicolas Valcarcel (nvalcarcel) wrote : | #11 |
I hitted it again:
Publishing openoffice.org 1:2.4.1-1ubuntu2.2 to hardy-security ...
Failed to sync (took 5s): HTTP Error 400: Bad Request
{'status': '400', 'content-length': '88', 'via': '1.1 wildcard.
Kees: did you reported this new bug?
Nicolas Valcarcel (nvalcarcel) wrote : | #12 |
It seems that it is a different issue, the workaround isn't working anymore, i used to change include_
William Grant (wgrant) wrote : | #13 |
A 400 is not a timeout -- this is a rather different issue. If you can catch the exception and check the 'content' attribute, you will see a description of the issue.
Nicolas Valcarcel (nvalcarcel) wrote : | #14 |
What i just posted is the HTTPError.response, should i print HTTPError.content?
Nicolas Valcarcel (nvalcarcel) wrote : | #15 |
Just printed it an got:
openoffice.org 1:2.4.1-1ubuntu2.2 in hardy (binaries conflicting with the existing ones)
But i don't have openoffice.org in the p3a i'm copying to
Julian Edwards (julian-edwards) wrote : | #16 |
The package will have once existed there and was deleted. You can't resurrect the same file names like that as clients may have downloaded the old files, and will be confused by the new ones.
Kees Cook (kees) wrote : | #17 |
I've opened bug 526645 for the timeouts the security team is still seeing.
In terms of a pre-implementation discussion, here's what I'm planning:
1. Look at adding a PublishingSet. copyBinariesTo( binaries, series, pocket, archive) that ensures certain queries are not evaluated for each binary. For example, the biggest timesaver according to the oops is from the PPA component override to main - we look up the 'main' component before creating each binary.
2. The next worst offender is the repeated inserts - can we batch these with storm? would there be much of a benefit? - Hrm, no, it seems storm.store.Store only has a singular add method. Not much to do here afaics.
3. The third worst offender is a select on SecureBinaryPac kagePublishingH istory - and I'm not sure where this is coming from? At first I thought it was an artifact of the SBPPH insert, but there's another query listed for that. This one only selects 3 fgields on the SBPPH.
4. The fourth being to get the corresponding BPPH - at the end of PackageSet. newBinaryPublic ation() we have BinaryPackagePu blishingHistory .get(pub. id). This could also be a single query to get all the corresponding BPPHs in one hit if we have PS.copyBinaries To().
5. Refactor SPPH.getBuiltBi naries( ) getting rid of the code that currently iterates the results to get the BPPH for unique BPRs. We could instead return a result set of unique (BPPH, BPR) tuples directly from the database as the callsites all use the related BPRs anyway. (I'm assuming this is contributing to the non-sql time).