Archive:EntryResource:syncSource timeouts

Bug #641338 reported by Kees Cook
28
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Won't Fix
Critical
Unassigned

Bug Description

https://lp-oops.canonical.com/oops.py/?oopsid=1724L1218

https://api.launchpad.net/1.0/ubuntu/+archive/primary

Branch: launchpad-rev-9769
Revno: 9769
SQL time: 12393 ms
Non-sql time: 2790 ms
Total time: 15183 ms
Statement Count: 1061

Biggest issue here is the sql time. @ 123ms each on average they aren't teribly fast or slow - but there are 1000 of them.

1 197 383 1 382 SQL-launchpad-main-master SELECT LibraryFileAlias.content, LibraryFileAlias.date_created, LibraryFileAlias.expires, LibraryFileAlias.filename, LibraryFileAlias.hits, LibraryFileAlias.id, LibraryFileAlias.last_accessed, LibraryFileAlias.mimetype, LibraryFileAlias.restricted FROM LibraryFileAlias WHERE LibraryFileAlias.id = %s LIMIT 1
2 186 631 3 628 SQL-launchpad-main-master SELECT BinaryPackageFile.binarypackagerelease, BinaryPackageFile.filetype, BinaryPackageFile.id, BinaryPackageFile.libraryfile FROM BinaryPackageFile WHERE BinaryPackageFile.binarypackagerelease = %s ORDER BY BinaryPackageFile.libraryfile
3 140 242 1 241 SQL-launchpad-main-master SELECT %s FROM Component, ComponentSelection WHERE Component.id = %s AND ComponentSelection.distroseries = 93 AND Component.id = ComponentSelection.component AND (1=1)
4 139 262 1 261 SQL-launchpad-main-master SELECT DistroArchSeries.architecturetag, DistroArchSeries.distroseries, DistroArchSeries.enabled, DistroArchSeries.id, DistroArchSeries.official, DistroArchSeries.owner, DistroArchSeries.package_count, DistroArchSeries.processorfamily, DistroArchSeries.supports_virtualized FROM DistroArchSeries WHERE DistroArchSeries.distroseries = %s ORDER BY DistroArchSeries.architecturetag

This is indicative of very procedural code rather than set based code.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Rob, I need to talk to you about this and work out the best long-term strategy. We keep making tweaks to bring it under the timeouts and then it breaks when they get lowered again.

tags: added: oops timeout
Changed in soyuz:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Robert Collins (lifeless) wrote :

Hi Julian, can you give me a quick rundown of whats happening - 152 seconds is much higher than any timeout Launchpad's appserver environment has had, so I presume that the timeout is happening in some other service?

If it is an LP timeout, can we please attach an OOPS and the pageid to this report, so that we can get a trace of it.

tags: removed: oops
Revision history for this message
Robert Collins (lifeless) wrote :

If it is an LP service issue we should be able to make a (temporary) exception to it as soon as we land a timeout featureflag knob (which I've been meaning to do)

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Where did you get 152 seconds from Rob?

Revision history for this message
Kees Cook (kees) wrote :

(I would also note that no OOPS is available from these 503 errors for some reason)

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 641338] Re: timeouts too low for security unembargo

"Failed to sync (took 152s)"

LP API OOPses are (AIUI) signaled via an X- HTTP header; turning on
lplib/httplib debugging may show that.

Revision history for this message
Robert Collins (lifeless) wrote : Re: timeouts too low for security unembargo

I can't see an oops in the headers, perhaps its in the body? I've seen API's with timeouts return a html :P

Revision history for this message
Kees Cook (kees) wrote : Re: [Bug 641338] Re: timeouts too low for security unembargo

Nothing in the body either, sorry. I didn't see one in it, so I didn't add
it to the bug because it was long :)

Revision history for this message
Robert Collins (lifeless) wrote :

What was in the body?

Revision history for this message
Robert Collins (lifeless) wrote : Re: timeouts too low for security unembargo

Looking at the previous bug it was in 'Archive:EntryResource'. Dunno if thats relevant here.

Revision history for this message
Robert Collins (lifeless) wrote :

And in the previous bug, its death by a thousand cuts:

SQL time: 15597 ms
Non-sql time: 4683 ms
Total time: 20280 ms
Statement Count: 1661

It should be < 50; package size isn't related - we don't actually copy content around AIUI, so this is basically just inefficient code.

Lets get it efficient, and then think about long term. I don't see any reason why an efficient API shouldn't be able to do the unembargo in < 500ms.

Revision history for this message
Julian Edwards (julian-edwards) wrote : Re: [Bug 641338] Re: timeouts too low for security unembargo

On Monday 20 September 2010 01:51:32 Robert Collins wrote:
> And in the previous bug, its death by a thousand cuts:
>
>
> SQL time: 15597 ms
> Non-sql time: 4683 ms
> Total time: 20280 ms
> Statement Count: 1661
>
> It should be < 50; package size isn't related - we don't actually copy
> content around AIUI, so this is basically just inefficient code.
>
> Lets get it efficient, and then think about long term. I don't see any
> reason why an efficient API shouldn't be able to do the unembargo in <
> 500ms.

The main reason is that the copy checker has to do a LOT of work. It issues a
lot of queries.

I would look at how to make that better first.

lib/lp/soyuz/scripts/packagecopier.py

Revision history for this message
Marc Deslauriers (mdeslaur) wrote : Re: timeouts too low for security unembargo

I am unable to unembargo some security updates this morning because of this issue.

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 641338] Re: timeouts too low for security unembargo

Did you get an OOPs? What was in the body of the error?

As I'm not in the security team I cannot even attempt this let alone
see whats going on...

Revision history for this message
Marc Deslauriers (mdeslaur) wrote : Re: timeouts too low for security unembargo

Yes, I got OOPSES.

Here is one:
https://lp-oops.canonical.com/oops.py/?oopsid=1724L1218

I was getting 503 errors. The bodies were the standard launchpad oops page.

summary: - timeouts too low for security unembargo
+ Archive:EntryResource:syncSource timeouts too low for security unembargo
summary: - Archive:EntryResource:syncSource timeouts too low for security unembargo
+ Archive:EntryResource:syncSource timeouts
description: updated
Revision history for this message
Julian Edwards (julian-edwards) wrote : Re: [Bug 641338] Re: Archive:EntryResource:syncSource timeouts

On Monday 20 September 2010 23:49:00 you wrote:
> + This is indicative of very procedural code rather than set based code.

Rob, check comment 12 :-)

Revision history for this message
Robert Collins (lifeless) wrote :

Sure; I've had my head up BugTask:+index all day, and its equally bad. Will be describing the horror later:)

tags: added: pg83
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Talking to StevenK, he suggested that a possible workaround until the proper fix can be found might be to increase timeout for the security team. Currently, using the API to unembargo sometimes doesn't work for several large packages (kernels, openjdk). I don't know if this can be done on a per package, per team, or per page basis (or some combination thereof), but this causes regular pain.

Revision history for this message
Robert Collins (lifeless) wrote :

We can't physically increase the timeout beyond 20 seconds - and even 20 seconds is likely to hit coordination issues with other bits of infrastructure. We should 'just' fix this - I haven't had time to look into it deeper even though I strongly desire to do so.

Changed in launchpad:
importance: High → Critical
Revision history for this message
Micah Gersten (micahg) wrote :

This seems to have gotten worse:
https://lp-oops.canonical.com/?oopsid=OOPS-2006CM76
https://lp-oops.canonical.com/?oopsid=OOPS-2006AX82

The first one, I had a similar one last week, the second one, last week the same thing worked.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

In the "devel" version of the API we plan on making the call to syncSource() and syncSources() asynchronous so that they return immediately and the actual work is done in the background.

This will fix all the timeout issues you currently have, but we're obviously going to break the API for devel. Please let me know if you have any concerns, I plan on starting this work in the next few days.

Revision history for this message
Robert Collins (lifeless) wrote :

On Tue, Jul 12, 2011 at 2:23 AM, Julian Edwards
<email address hidden> wrote:
> In the "devel" version of the API we plan on making the call to
> syncSource() and syncSources() asynchronous so that they return
> immediately and the actual work is done in the background.
>
> This will fix all the timeout issues you currently have, but we're
> obviously going to break the API for devel.  Please let me know if you
> have any concerns, I plan on starting this work in the next few days.

Perhaps we can use a new method name with the new behaviour, to make
migrating easier? I don't mind if do/don't, just raising it as a
possibility, cause it seems pretty straight forward to do - and will
mean that the maintenance of syncSource on 1.0 will be easier

Revision history for this message
Julian Edwards (julian-edwards) wrote :

On Tuesday 12 July 2011 01:24:39 you wrote:
> On Tue, Jul 12, 2011 at 2:23 AM, Julian Edwards
>
> <email address hidden> wrote:
> > In the "devel" version of the API we plan on making the call to
> > syncSource() and syncSources() asynchronous so that they return
> > immediately and the actual work is done in the background.
> >
> > This will fix all the timeout issues you currently have, but we're
> > obviously going to break the API for devel. Please let me know if you
> > have any concerns, I plan on starting this work in the next few days.
>
> Perhaps we can use a new method name with the new behaviour, to make
> migrating easier? I don't mind if do/don't, just raising it as a
> possibility, cause it seems pretty straight forward to do - and will
> mean that the maintenance of syncSource on 1.0 will be easier

We could, but I'd rather not have yet another method name when I want
synchronous copying to simply die.

However, I think I have no choice as it doesn't look like we have a way of
figuring out the api version in use inside the method code.

Revision history for this message
William Grant (wgrant) wrote :

On 12/07/11 19:15, Julian Edwards wrote:
> We could, but I'd rather not have yet another method name when I want
> synchronous copying to simply die.
>
> However, I think I have no choice as it doesn't look like we have a way of
> figuring out the api version in use inside the method code.

You export two Python methods under the same webservice name, but
restrict them to disjoint sets of versions.

But you could possibly take this opportunity to rename it to something
like 'copyPackage'.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

On Tuesday 12 July 2011 11:44:54 you wrote:
> On 12/07/11 19:15, Julian Edwards wrote:
> > We could, but I'd rather not have yet another method name when I want
> > synchronous copying to simply die.
> >
> > However, I think I have no choice as it doesn't look like we have a way
> > of figuring out the api version in use inside the method code.
>
> You export two Python methods under the same webservice name, but
> restrict them to disjoint sets of versions.

That was what I had feared.

> But you could possibly take this opportunity to rename it to something
> like 'copyPackage'.

I had considered that.

But then we'd have a syncSource() that copies packages, and a copyPackage()
that syncs sources.

GNARGH.

William Grant (wgrant)
tags: added: package-copies
Revision history for this message
Colin Watson (cjwatson) wrote :

copyPackage has existed for ages now, and the security team is using it, so from the Ubuntu point of view I don't think we much care about this any more. But I suppose it's still technically critical due to being a timeout.

FWIW, I'm told that some users still prefer syncSource despite its problems, because the synchronous nature is apparently helpful in scripts:

  http://irclogs.ubuntu.com/2012/10/16/%23launchpad-dev.html#t18:19

Revision history for this message
William Grant (wgrant) wrote :

syncSource was replaced by copyPackage years ago. It's fundamentally difficult to make performant, and it's not worth spending time on this deprecated API.

Changed in launchpad:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.