HTTP Error 503: Service Unavailable when retrieving big collection of messages

Bug #270792 reported by Markus Korn
6
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Eleanor Berger

Bug Description

When I would like to do any operation on huge objects I get a 503 error. All this error looks similar to

In [93]: b = launchpad.bugs[1]

In [94]: b.messages.total_size
---------------------------------------------------------------------------
<class 'launchpadlib.errors.HTTPError'> Traceback (most recent call last)

/media/disk-1/launchpad-vfs/trunk/<ipython console> in <module>()

/media/disk-1/lp_api/launnchpadlib/trunk/launchpadlib/resource.py in __getattr__(self=<launchpadlib.resource.Collection object at 0x10cffd0>, attr='total_size')
    281 """Try to retrive a named operation or parameter of the given name."""
    282 try:
--> 283 return self.lp_get_parameter(attr)
        self.lp_get_parameter = <bound method Collection.lp_get_parameter of <launchpadlib.resource.Collection object at 0x10cffd0>>
        attr = 'total_size'
    284 except KeyError:
    285 pass

/media/disk-1/lp_api/launnchpadlib/trunk/launchpadlib/resource.py in lp_get_parameter(self=<launchpadlib.resource.Collection object at 0x10cffd0>, param_name='total_size')
    183 representation, if the parameter is a link.
    184 """
--> 185 self._ensure_representation()
        self._ensure_representation = <bound method Collection._ensure_representation of <launchpadlib.resource.Collection object at 0x10cffd0>>
    186 for suffix in ['_link', '_collection_link']:
    187 param = self._wadl_resource.get_parameter(

/media/disk-1/lp_api/launnchpadlib/trunk/launchpadlib/resource.py in _ensure_representation(self=<launchpadlib.resource.Collection object at 0x10cffd0>)
    302 if self._wadl_resource.representation is None:
    303 # Get a representation of the linked resource.
--> 304 representation = self._root._browser.get(self._wadl_resource)
        representation = undefined
        self._root._browser.get = <bound method Browser.get of <launchpadlib._browser.Browser object at 0x10cf350>>
        self._wadl_resource = <launchpadlib.wadllib.application.Resource object at 0x10cfe90>
    305 self.__dict__['_wadl_resource'] = self._wadl_resource.bind(
    306 representation, self.JSON_MEDIA_TYPE)

/media/disk-1/lp_api/launnchpadlib/trunk/launchpadlib/_browser.py in get(self=<launchpadlib._browser.Browser object at 0x10cf350>, resource_or_uri=<launchpadlib.wadllib.application.Resource object at 0x10cfe90>, return_response=False)
    219 method = resource_or_uri.get_method('get')
    220 url = method.build_request_url()
--> 221 response, content = self._request(url)
        response = undefined
        content = undefined
        self._request = <bound method Browser._request of <launchpadlib._browser.Browser object at 0x10cf350>>
        url = u'https://api.edge.launchpad.net/beta/bugs/1/messages'
    222 if return_response:
    223 return (response, content)

/media/disk-1/lp_api/launnchpadlib/trunk/launchpadlib/_browser.py in _request(self=<launchpadlib._browser.Browser object at 0x10cf350>, url=u'https://api.edge.launchpad.net/beta/bugs/1/messages', data=None, method='GET', media_type='application/json', extra_headers=None)
    209 # Turn non-2xx responses into exceptions.
    210 if response.status // 100 != 2:
--> 211 raise HTTPError(response, content)
        global HTTPError = <class 'launchpadlib.errors.HTTPError'>
        response = {'status': '503', 'x-lazr-oopsid': 'OOPS-990EB49', 'via': '1.1 wildcard.edge.launchpad.net', 'x-powered-by': 'Zope (www.zope.org), Python (www.python.org)', 'vary': 'Accept-Encoding', 'content-length': '149', 'server': 'zope.server.http (HTTP)', 'connection': 'close', 'date': 'Tue, 16 Sep 2008 07:34:58 GMT', 'content-type': 'text/plain'}
        content = "'SELECT COUNT(*) FROM Bug, BugMessage WHERE BugMessage.message = %s AND Bug.id = BugMessage.bug', [<storm.variables.IntVariable object at 0x8daeb18>]"
    212 return response, content
    213

<class 'launchpadlib.errors.HTTPError'>: HTTP Error 503: Service Unavailable

In [95]:

I get the same error for b.messages.self_link. IMO this is a very serious bug, why does self_link depend on the messages itself? this also indicates that calling len(b.messages) is the same as b.messages.total_size

Markus

Examples of OOPS like this, both on edge and staging: OOPS-990S7, OOPS-990S18, OOPS-990EA50

Revision history for this message
Leonard Richardson (leonardr) wrote :

This is a Launchpad problem. You're seeing a timeout. Are you using staging or edge?

Revision history for this message
Markus Korn (thekorn) wrote :

In this example I used edge, but it is the same on staging

Revision history for this message
Leonard Richardson (leonardr) wrote :

I can duplicate on both staging and edge.

Changed in launchpad:
importance: Undecided → High
status: New → Triaged
description: updated
Revision history for this message
Leonard Richardson (leonardr) wrote :

After investigation I've determined that this problem is caused by excess database queries from an inefficient canonical_url implementation for bug messages.

I'm giving this to the bugs team. Let me know if you need any framework support while making canonical_url more efficient.

Changed in launchpad-foundations:
assignee: nobody → bjornt
Changed in malone:
assignee: bjornt → intellectronica
status: Triaged → In Progress
Revision history for this message
Eleanor Berger (intellectronica) wrote :

RF 7063

Changed in malone:
milestone: none → 2.1.10
status: In Progress → Fix Committed
Revision history for this message
Christian Reis (kiko) wrote :

Markus, can you confirm the issue you were having originally was fixed? Looking forward to the next set of bugs of course ;)

Revision history for this message
Markus Korn (thekorn) wrote :

I'm sorry, but this still happens for me on staging (added an extract of the traceback):

In [33]: x = launchpad.bugs[1]

In [38]: list(x.messages)
---------------------------------------------------------------------------
<class 'launchpadlib.errors.HTTPError'> Traceback (most recent call last)

[...]
--> 211 raise HTTPError(response, content)
        global HTTPError = <class 'launchpadlib.errors.HTTPError'>
        response = {'status': '503', 'x-lazr-oopsid': 'OOPS-998S71', 'via': '1.1 wildcard.staging.launchpad.net', 'x-powered-by': 'Zope (www.zope.org), Python (www.python.org)', 'content-length': '1280', 'server': 'zope.server.http (HTTP)', 'connection': 'close', 'date': 'Wed, 24 Sep 2008 17:44:45 GMT', 'content-type': 'text/plain'}
        content = '\'SELECT Message.datecreated, Message.distribution, Message.id, Message.owner, Message.parent, Message.raw, Message.rfc822msgid, Message.subject, "_prejoin1".account, "_prejoin1".addressline1, "_prejoin1".addressline2, "_prejoin1".city, "_prejoin1".country, "_prejoin1".creation_comment, "_prejoin1".creation_rationale, "_prejoin1".datecreated, "_prejoin1".defaultmembershipperiod, "_prejoin1".defaultrenewalperiod, "_prejoin1".displayname, "_prejoin1".hide_email_addresses, "_prejoin1".homepage_content, "_prejoin1".icon, "_prejoin1".id, "_prejoin1".logo, "_prejoin1".mailing_list_auto_subscribe_policy, "_prejoin1".merged, "_prejoin1".mugshot, "_prejoin1".name, "_prejoin1".organization, "_prejoin1".personal_standing, "_prejoin1".personal_standing_reason, "_prejoin1".phone, "_prejoin1".postcode, "_prejoin1".province, "_prejoin1".registrant, "_prejoin1".renewal_policy, "_prejoin1".subscriptionpolicy, "_prejoin1".teamdescription, "_prejoin1".teamowner, "_prejoin1".verbose_bugnotifications, "_prejoin1".visibility FROM BugMessage, Message LEFT JOIN Person AS "_prejoin1" ON Message.owner = "_prejoin1".id WHERE BugMessage.bug = %s AND Message.id = BugMessage.message AND (1=1) ORDER BY Message.datecreated, Message.id\', [<storm.variables.IntVariable object at 0x2aaaad811578>]'
    212 return response, content
    213

<class 'launchpadlib.errors.HTTPError'>: HTTP Error 503: Service Unavailable

In [39]:

Markus

Revision history for this message
Christian Reis (kiko) wrote :

Very weird then. But I'm curious as to why that's failing: that query is very fast on staging and production. Does it happen reproducibly? Does it take a long time to return?

Revision history for this message
Markus Korn (thekorn) wrote :

Yes, I get a 503 everytime I try to get a list of all comments of bug 1, but there seems to be different values in HTTPError.content, and yes, it takes a long time, about 37 seconds.

Some Examples (run on staging)

In [27]: x = launchpad.bugs[1]

In [28]: def test():
   ....: try:
   ....: _ = list(x.messages)
   ....: except Exception, e:
   ....: print e.content
   ....:
   ....:

In [29]: time test()
'SELECT COUNT(*) FROM Bug, BugMessage WHERE BugMessage.message = %s AND Bug.id = BugMessage.bug', [<storm.variables.IntVariable object at 0x8d556e0>]
CPU times: user 0.02 s, sys: 0.00 s, total: 0.02 s
Wall time: 37.48

In [30]: time test()
'SELECT Message.datecreated, Message.distribution, Message.id, Message.owner, Message.parent, Message.raw, Message.rfc822msgid, Message.subject, "_prejoin1".account, "_prejoin1".addressline1, "_prejoin1".addressline2, "_prejoin1".city, "_prejoin1".country, "_prejoin1".creation_comment, "_prejoin1".creation_rationale, "_prejoin1".datecreated, "_prejoin1".defaultmembershipperiod, "_prejoin1".defaultrenewalperiod, "_prejoin1".displayname, "_prejoin1".hide_email_addresses, "_prejoin1".homepage_content, "_prejoin1".icon, "_prejoin1".id, "_prejoin1".logo, "_prejoin1".mailing_list_auto_subscribe_policy, "_prejoin1".merged, "_prejoin1".mugshot, "_prejoin1".name, "_prejoin1".organization, "_prejoin1".personal_standing, "_prejoin1".personal_standing_reason, "_prejoin1".phone, "_prejoin1".postcode, "_prejoin1".province, "_prejoin1".registrant, "_prejoin1".renewal_policy, "_prejoin1".subscriptionpolicy, "_prejoin1".teamdescription, "_prejoin1".teamowner, "_prejoin1".verbose_bugnotifications, "_prejoin1".visibility FROM BugMessage, Message LEFT JOIN Person AS "_prejoin1" ON Message.owner = "_prejoin1".id WHERE BugMessage.bug = %s AND Message.id = BugMessage.message AND (1=1) ORDER BY Message.datecreated, Message.id', [<storm.variables.IntVariable object at 0x8e50f50>]
CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s
Wall time: 37.98

In [31]: time test()
'SELECT COUNT(*) FROM Bug, BugMessage WHERE BugMessage.message = %s AND Bug.id = BugMessage.bug', [<storm.variables.IntVariable object at 0x2aaab121ad70>]
CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s
Wall time: 37.68

In [32]: time test()
'SELECT COUNT(*) FROM Bug, BugMessage WHERE BugMessage.message = %s AND Bug.id = BugMessage.bug', [<storm.variables.IntVariable object at 0x2aaabb7470c8>]
CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s
Wall time: 37.50

Markus

Revision history for this message
Eleanor Berger (intellectronica) wrote :

RF 7144

Changed in malone:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.