named_get's do not support custom batches via slice

Bug #487522 reported by Michael Vogt on 2009-11-24
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
High
Unassigned

Bug Description

While working on the launchpadlib based changelog extract tool I run into the issue that I need to query for all published source packages to populate the initial changelogs (this is needed only once, from that point on its just a call to getPublishedSources with created_since_date).

I read on the launchpadlib api page that slicing is the solution for big queries like this, but that does not seem to work for me. I attach a small test program that demonstrates what I have in mind, the full source can be found at "lp:~mvo/+junk/lp-changelogs-crawler". The script works nicely if I limit the search to certain packages, but it does no longer work I ask it for all source packages.

Any hints or workarounds are welcome.

Note: after viewing oopsid=OOPS-1424S588 it is obvious that the default batching is being executed (50), but in this case it is timing out, hence trying to include the custom batch via a slice.

Michael Vogt (mvo) wrote :
Michael Vogt (mvo) wrote :

The output of the script for me is:

small query
https://api.staging.launchpad.net/beta/ubuntu/+archive/primary/+sourcepub/831491
https://api.staging.launchpad.net/beta/ubuntu/+archive/primary/+sourcepub/590326
https://api.staging.launchpad.net/beta/ubuntu/+archive/primary/+sourcepub/443952
https://api.staging.launchpad.net/beta/ubuntu/+archive/primary/+sourcepub/380205
https://api.staging.launchpad.net/beta/ubuntu/+archive/primary/+sourcepub/282482
https://api.staging.launchpad.net/beta/ubuntu/+archive/primary/+sourcepub/226566
https://api.staging.launchpad.net/beta/ubuntu/+archive/primary/+sourcepub/58979
big query
Traceback (most recent call last):
  File "/tmp/launchpad-fail.py", line 65, in <module>
    c.get_changelogs()
  File "/tmp/launchpad-fail.py", line 57, in get_changelogs
    changed = archive.getPublishedSources(status="Published")[0:10]
  File "/usr/lib/python2.6/dist-packages/lazr/restfulclient/resource.py", line 460, in __call__
    url, in_representation, http_method, extra_headers=extra_headers)
  File "/usr/lib/python2.6/dist-packages/lazr/restfulclient/_browser.py", line 204, in _request
    raise HTTPError(response, content)
lazr.restfulclient.errors.HTTPError: HTTP Error 503: Service Unavailable

I *assume* 503 is this case means (server) timeout.

Michael Nelson (michael.nelson) wrote :

<noodles> mvo: it could be that slicing only works on properties (the example in the docs is launchpad.bus[:10])
<noodles> So doing getPublishedSources()[:10] might not be applying the slice to the server-side query, but rather to the result that never gets back to you :/ (but that's just a guess).
<mvo> that sounds plausible, should I still file a bug then?
<noodles> Yes please, either way it *should* still be possible to batch a collection returned by a method like getPublishedSources().

Michael Nelson (michael.nelson) wrote :

Hrm, looking at the code, getPublishedSources is definitely exposed with the @operation_returns_collection_of() decorator, which according to the lazr.restful source, ensures that a CollectionField is the return type. And the method itself returns an SQL object SelectResults (well, storm compatibility version), so it *should* support batching.

Checking in a console, it *seems* to be tagged correctly:

{{{
In [6]: IArchive['getPublishedSources'].queryTaggedValue('lazr.restful.exported')
Out[6]:
{'as': 'getPublishedSources',
 'call_with': {},
 'params': {'created_since_date': <zope.schema._field.Datetime object at 0x9e1d54c>,
            'distroseries': <lazr.restful.fields.Reference object at 0x9e1d4cc>,
            'exact_match': <zope.schema._bootstrapfields.Bool object at 0x9e1d52c>,
            'name': <zope.schema._bootstrapfields.TextLine object at 0x9e1d3ec>,
            'pocket': <zope.schema._field.Choice object at 0x9e1d4ac>,
            'status': <zope.schema._field.Choice object at 0x9e1d44c>,
            'version': <zope.schema._bootstrapfields.TextLine object at 0x9e1d48c>},
 'return_type': <lazr.restful.fields.CollectionField object at 0x9e1d5ac>,
 'type': 'read_operation'}
}}}

I'll ask Leonard if he can take a quick read of this info and see if we're missing something.

Michael Nelson (michael.nelson) wrote :

<noodles775> Hi leonardr ! When you get a chance, could you read through the comments on bug 487522 and let me know if there's something obvious we missed?
<mup> Bug #487522: getPublishedSources() does not support batch operations <Soyuz:New> <https://launchpad.net/bugs/487522>
<bigjools> oO
<bigjools> james_w has been using that for many months
<leonardr> "getPublishedSources()[:10] might not be applying the slice to the server-side query, but rather to the result that never gets back to you"
<leonardr> that's correct
<leonardr> a python slice is applied to its lhs, and the lhs is not being calculated
<leonardr> you need to forgo the syntactic sugar
<leonardr> let me find the right syntax
<noodles775> leonardr: but https://help.launchpad.net/API/launchpadlib#Collections seems to imply that it works for other CollectionFields?
<noodles775> OK, thanks!
* andrea-bs (n=andrea@79.41.96.156) has joined #launchpad-dev
<noodles775> bigjools: without any args? (ie. ubuntu.getPublishedSources() ;) ).
<leonardr> noodles: a lhs like "launchpad.bugs" is resolved without going to the server
<bigjools> noodles775: he uses published_since_date IIRC
<leonardr> launchpadlib doesn't go to the server until it sees the slice
<leonardr> but if you call a named operation, it goes to the server immediately
<noodles775> I see. Would it be possible for it to behave similarly for named operations?
<noodles775> (at least, if it's followed directly by a slice?)
<leonardr> in theory, yes. the function call would return some kind of 'defered' object
<leonardr> noodles775: try passing "ws.start" and "ws.size" parameters into the named operation
<leonardr> i know that if you send those parameters lazr.restful will respect them, but lazr.restfulclient might reject them because they're not found in the wadl

Leonard Richardson (leonardr) wrote :

I believe lazr.restfulclient will pass through unknown arguments, so what I suggested in IRC should work. If it doesn't, the simplest solution is to make lazr.restfulclient pass through arguments that start with "ws." without checking them against the WADL.

Michael Nelson (michael.nelson) wrote :

I tried the suggestion of passing ws.size, but as hinted at by leonardr, it gets rejected as an unrecognized parameter:

{{{
        kwargs = {'ws.size': 10}
        changed = archive.getPublishedSources(status="Published", **kwargs)
}}}

results in:

{{{
Traceback (most recent call last):
  File "/home/michael/tmp/launchpad-fail.py", line 66, in <module>
    c.get_changelogs()
  File "/home/michael/tmp/launchpad-fail.py", line 58, in get_changelogs
    changed = archive.getPublishedSources(status="Published", **kwargs)
  File "/home/michael/canonical/lp-sourcedeps/eggs/lazr.restfulclient-0.9.10-py2.5.egg/lazr/restfulclient/resource.py", line 465, in __call__
    url = self.wadl_method.build_request_url(**args)
  File "/home/michael/canonical/lp-sourcedeps/eggs/wadllib-1.1.4-py2.5.egg/wadllib/application.py", line 582, in build_request_url
    return self.request.build_url(param_values, **kw_param_values)
  File "/home/michael/canonical/lp-sourcedeps/eggs/wadllib-1.1.4-py2.5.egg/wadllib/application.py", line 695, in build_url
    self.query_params, param_values, **kw_param_values)
  File "/home/michael/canonical/lp-sourcedeps/eggs/wadllib-1.1.4-py2.5.egg/wadllib/application.py", line 179, in validate_param_values
    % "', '".join(param_values.keys()))
ValueError: Unrecognized parameter(s): 'ws.size'
}}}

Leonard says:

<leonardr> noodles: ok, the simplest solution is to change lazr.restfulclient and wadllib to let ws.* params through

Leonard Richardson (leonardr) wrote :

A better long-term solution would be to have a method invocation return a 'deferred'-type object when the method's return type is a list. As happens with collections today, this object would only go over to the server once the user sliced it or otherwise looked inside the list. For methods that do not have list return types, invoking the method would go directly to the server.

Gary Poster (gary) wrote :

We have generated a wadllib patch that allows any keyword args through for Michael to try, to see if this unblocks him. If so, we will make a lazr.restfulclient bug for this, and then proceed with a branch and a release. I'll also then create a bug for the better solution that Leonard describes, and link it to this one. That latter change will not be immediately scheduled.

affects: soyuz → launchpad-foundations
Leonard Richardson (leonardr) wrote :

Here's the wadllib patch, with test:

http://pastebin.ubuntu.com/327000/

Let me know if it helps.

Michael Vogt (mvo) wrote :

Thanks Leonard for the patch. I applied it to my local installation and used ws.size = 10 (just like in comment 7). Passing that parameter works, I can pass the data to the server (I verified that with httplib2.debuglevel = 2). I still get a 503. When I get the exception from getPUblishedSources() I can see in e.content that the title of the html page says "Timeout".

Muharem is looking at the timeout (as part of bug 487597), so I'll generalize the title/description of this bug.

summary: - getPublishedSources() does not support batch operations
+ named_get's do not support custom batches via slice
description: updated
Gary Poster (gary) on 2010-03-16
Changed in launchpad-foundations:
status: New → Triaged
importance: Undecided → High
Robert Collins (lifeless) wrote :

@gary is the patch to lazr.restfulclient in trunk / released? If so it seems like this can be closed...

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers