httplib.IncompleteRead: IncompleteRead received from launchpad

Bug #545401 reported by Bryce Harrington
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
launchpadlib
Fix Released
High
Unassigned

Bug Description

I've been getting this intermittently from launchpad:

Date: Tue, 23 Mar 2010 09:32:13 -0700
From: Cron Daemon <email address hidden>
To: <email address hidden>
Subject: Cron <bryce@chideok> /home/bryce/src/Arsenal/arsenal/hooks/data-bugtotals

Traceback (most recent call last):
  File "/home/bryce/src/Arsenal/arsenal/scripts//collect-team-bugtotals", line 43, in <module>
    if len(list(tasks)) == 0:
  File "/usr/lib/pymodules/python2.6/lazr/restfulclient/resource.py", line 682, in __iter__
    unicode(self._root._browser.get(URI(next_link))))
  File "/usr/lib/pymodules/python2.6/lazr/restfulclient/_browser.py", line 227, in get
    response, content = self._request(url, extra_headers=headers)
  File "/usr/lib/pymodules/python2.6/lazr/restfulclient/_browser.py", line 214, in _request
    str(url), method=method, body=data, headers=headers)
  File "/usr/lib/pymodules/python2.6/httplib2/__init__.py", line 1044, in request
    (response, new_content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/usr/lib/pymodules/python2.6/lazr/restfulclient/_browser.py", line 115, in _request
    redirections, cachekey)
  File "/usr/lib/pymodules/python2.6/httplib2/__init__.py", line 872, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/usr/lib/pymodules/python2.6/lazr/restfulclient/_browser.py", line 120, in _conn_request
    conn, request_uri, method, body, headers)
  File "/usr/lib/pymodules/python2.6/httplib2/__init__.py", line 854, in _conn_request
    content = response.read()
  File "/usr/lib/python2.6/httplib.py", line 525, in read
    s = self._safe_read(self.length)
  File "/usr/lib/python2.6/httplib.py", line 617, in _safe_read
    raise IncompleteRead(''.join(s), amt)
httplib.IncompleteRead: IncompleteRead(15687 bytes read, 49694 more expected)

This particular error involved this chunk of code:

 tags = tag_set['tags'].split(' ')

        tasks = s.searchTasks(tags = tags, tags_combinator = "All")
        if len(list(tasks)) == 0:
            # If there are no bug_tasks, don't bother recording them
            continue
        statistics[n+'-total'] = len(list(tasks))

So I'm not really doing anything except firing a query and getting a count of the results. (If there is a more bandwidth-friendly way of doing this, I'd love to know.)

What is this IncompleteRead exception? I assume it's just some issue internally in launchpad that can't be avoided from my end, except for exception handling. Should I just immediately retry when hitting this error? Or sleep for some period?

But I really hate having to lace my code with catches to handle all the various types of exceptions launchpad emits. Is there something that can be done within the launchpadapi code to handle these exceptions more gracefully for me?

Some other times I hit this particular exception recently with this and another script:
Date: Fri, 19 Mar 2010 16:08:14 -0700
Date: Fri, 19 Mar 2010 17:01:14 -0700
Date: Sun, 21 Mar 2010 16:36:22 -0700
Date: Mon, 22 Mar 2010 12:45:55 -0700
Date: Mon, 22 Mar 2010 14:01:17 -0700
Date: Mon, 22 Mar 2010 16:02:47 -0700
Date: Mon, 22 Mar 2010 17:02:54 -0700
Date: Mon, 22 Mar 2010 18:02:47 -0700
Date: Mon, 22 Mar 2010 18:05:50 -0700
Date: Tue, 23 Mar 2010 09:32:13 -0700

affects: launchpad → launchpad-foundations
Gary Poster (gary)
Changed in launchpad-foundations:
status: New → Triaged
importance: Undecided → High
milestone: none → 10.04
Revision history for this message
Robert Collins (lifeless) wrote :

It means the http response said 'x bytes' but the script only read 'y' bytes before the connection was interrupted. A network issue on your end could easily cause this.

Does it still happen?

Changed in launchpad:
status: Triaged → Incomplete
Revision history for this message
Bryce Harrington (bryce) wrote :

It does still happen.

No, your guess that it's a network issue is incorrect. The same issue will occur every run of the script for days, with the network functioning properly.

I dug into the code further to isolate what code was faulting. It is in a chunk of code which uses cached data that happens to be truncated. Clearing the cache file and re-running the script will update the cache properly and restore the script to functioning.

As to why the cache is corrupted, I doubt it's network problems - presumably the code would not cache incomplete network data. In any case I'm on fibre and the network is quite robust. Rather, I would guess it's because the machine has kernel lockups periodically, every couple weeks, and I suppose if it was writing a cache file when it froze, only a portion of the data would be written.

Anyway, failures happen. The real question is why doesn't it detect that the cache is invalid and just request an update from launchpad?

As a workaround I've set a cronjob to delete the cache every so often, and reboot the machine weekly. Eventually I'll move the scripts somewhere more stable.

Changed in launchpad:
status: Incomplete → New
Curtis Hovey (sinzui)
Changed in launchpad:
status: New → Triaged
Revision history for this message
Bryce Harrington (bryce) wrote :

I recently took over maintenance of Pedro's launchpadlib scripts (completely unrelated coding), which run on cranberry (completely different network). Reviewing the logs of those scripts, they also periodically hit these IncompleteRead errors.

Revision history for this message
Bryce Harrington (bryce) wrote :

Also, I ditched the machine with the kernel lockups and am running the scripts on a completely stable machine, yet still see the cache corruption issue.

Checking with other launchpadlib users (Brad, Brian), they say they see cache problems too, and also are working around it by deleting all the cache files when the problem occurs.

Revision history for this message
Bryce Harrington (bryce) wrote :

Deleting the cache was rather brute force. I ended up making a script that tests JSON validation for each file in cache and delete any that throw json errors. Bit more surgical than just deleting the whole cache, and presumably less intensive on launchpad.

From looking at the codebase, it seems to me a try could be wrapped around the code that reads from the cache, and if it hits this bug then act as if the cached file didn't exist, and re-query it.

Revision history for this message
Bryce Harrington (bryce) wrote :
William Grant (wgrant)
affects: launchpad → launchpadlib
Changed in launchpadlib:
milestone: 10.04 → none
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.