LRUCache.cleanup raises KeyError

Bug #420738 reported by Aaron Bentley
50
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Bazaar
Incomplete
Low
Unassigned
loggerhead
Fix Released
Critical
Max Kanat-Alexander

Bug Description

Loggerhead produced the following error:
INF [20090828-20:16:31.044] [1158543696] loggerhead: Starting to process http://bazaar.launchpad.net/%7Eubuntu-mythtv/mythtv/mythtv-trunk-022/annotate/head%3A/debian/libmyth-0.22-0.files
INF [20090828-20:16:31.164] [1158543696] lp-loggerhead: Using branch: ~ubuntu-mythtv/mythtv/mythtv-trunk-022
INF [20090828-20:16:31.164] [1158543696] lp-loggerhead: branch_url: http://bazaar.launchpad.net/00/00/38/da
ERR [20090828-20:16:31.179] [1158543696] root: Traceback (most recent call last):
ERR [20090828-20:16:31.180] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/eggs/Paste-1.7.2-py2.4.egg/paste/httpserver.py", line 1062, in process_request_in_thread
ERR [20090828-20:16:31.180] [1158543696] root: self.finish_request(request, client_address)
ERR [20090828-20:16:31.180] [1158543696] root: File "/usr/lib/python2.4/SocketServer.py", line 254, in finish_request
ERR [20090828-20:16:31.180] [1158543696] root: self.RequestHandlerClass(request, client_address, self)
ERR [20090828-20:16:31.180] [1158543696] root: File "/usr/lib/python2.4/SocketServer.py", line 521, in __init__
ERR [20090828-20:16:31.180] [1158543696] root: self.handle()
ERR [20090828-20:16:31.180] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/eggs/Paste-1.7.2-py2.4.egg/paste/httpserver.py", line 436, in handle
ERR [20090828-20:16:31.181] [1158543696] root: BaseHTTPRequestHandler.handle(self)
ERR [20090828-20:16:31.181] [1158543696] root: File "/usr/lib/python2.4/BaseHTTPServer.py", line 316, in handle
ERR [20090828-20:16:31.181] [1158543696] root: self.handle_one_request()
ERR [20090828-20:16:31.181] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/eggs/Paste-1.7.2-py2.4.egg/paste/httpserver.py", line 431, in handle_one_request
ERR [20090828-20:16:31.181] [1158543696] root: self.wsgi_execute()
ERR [20090828-20:16:31.181] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/eggs/Paste-1.7.2-py2.4.egg/paste/httpserver.py", line 287, in wsgi_execute
ERR [20090828-20:16:31.181] [1158543696] root: self.wsgi_start_response)
ERR [20090828-20:16:31.181] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/sourcecode/launchpad-loggerhead/launchpad_loggerhead/debug.py", line 106, in wrapped_application
ERR [20090828-20:16:31.181] [1158543696] root: result = application(environ, response_hook)
ERR [20090828-20:16:31.181] [1158543696] root: File "sourcecode/launchpad-loggerhead/start-loggerhead.py", line 150, in wrapped
ERR [20090828-20:16:31.181] [1158543696] root: return app(environ, start_response)
ERR [20090828-20:16:31.181] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/sourcecode/launchpad-loggerhead/launchpad_loggerhead/debug.py", line 45, in wrapped
ERR [20090828-20:16:31.182] [1158543696] root: return app(environ, start_response)
ERR [20090828-20:16:31.182] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/eggs/Paste-1.7.2-py2.4.egg/paste/translogger.py", line 68, in __call__
ERR [20090828-20:16:31.182] [1158543696] root: return self.application(environ, replacement_start_response)
ERR [20090828-20:16:31.182] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/eggs/PasteDeploy-1.3.3-py2.4.egg/paste/deploy/config.py", line 285, in __call__
ERR [20090828-20:16:31.182] [1158543696] root: return self.app(environ, start_response)
ERR [20090828-20:16:31.182] [1158543696] root: File "sourcecode/launchpad-loggerhead/start-loggerhead.py", line 134, in wrapped
ERR [20090828-20:16:31.182] [1158543696] root: return app(environ, start_response)
ERR [20090828-20:16:31.182] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/sourcecode/launchpad-loggerhead/launchpad_loggerhead/session.py", line 53, in __call__
ERR [20090828-20:16:31.182] [1158543696] root: return self.cookie_handler(environ, start_response)
ERR [20090828-20:16:31.182] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/eggs/Paste-1.7.2-py2.4.egg/paste/auth/cookie.py", line 305, in __call__
ERR [20090828-20:16:31.182] [1158543696] root: return self.application(environ, response_hook)
ERR [20090828-20:16:31.182] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/sourcecode/launchpad-loggerhead/launchpad_loggerhead/session.py", line 73, in _process
ERR [20090828-20:16:31.183] [1158543696] root: return self.application(environ, response_hook)
ERR [20090828-20:16:31.183] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/eggs/Paste-1.7.2-py2.4.egg/paste/httpexceptions.py", line 636, in __call__
ERR [20090828-20:16:31.183] [1158543696] root: return self.application(environ, start_response)
ERR [20090828-20:16:31.183] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/sourcecode/launchpad-loggerhead/launchpad_loggerhead/app.py", line 230, in __call__
ERR [20090828-20:16:31.183] [1158543696] root: return view.app(environ, start_response)
ERR [20090828-20:16:31.183] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/lib/loggerhead/apps/branch.py", line 165, in app
ERR [20090828-20:16:31.183] [1158543696] root: return c(environ, start_response)
ERR [20090828-20:16:31.183] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/lib/loggerhead/controllers/__init__.py", line 93, in __call__
ERR [20090828-20:16:31.183] [1158543696] root: vals.update(self.get_values(path, kwargs, headers))
ERR [20090828-20:16:31.183] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/lib/loggerhead/controllers/annotate_ui.py", line 101, in get_values
ERR [20090828-20:16:31.183] [1158543696] root: history = self._history
ERR [20090828-20:16:31.183] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/lib/loggerhead/controllers/__init__.py", line 65, in _history
ERR [20090828-20:16:31.184] [1158543696] root: self.__history = self._history_callable()
ERR [20090828-20:16:31.184] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/lib/loggerhead/apps/branch.py", line 81, in get_history
ERR [20090828-20:16:31.184] [1158543696] root: revinfo_disk_cache=revinfo_disk_cache, cache_key=self.friendly_name)
ERR [20090828-20:16:31.184] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/lib/loggerhead/history.py", line 271, in __init__
ERR [20090828-20:16:31.184] [1158543696] root: self._load_whole_history_data(caches, cache_key)
ERR [20090828-20:16:31.184] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/lib/loggerhead/history.py", line 232, in _load_whole_history_data
ERR [20090828-20:16:31.184] [1158543696] root: update_missed_caches()
ERR [20090828-20:16:31.184] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/lib/loggerhead/history.py", line 227, in update_missed_caches
ERR [20090828-20:16:31.184] [1158543696] root: cache.set(cache_key, self.last_revid, self._rev_info)
ERR [20090828-20:16:31.184] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/lib/loggerhead/history.py", line 187, in set
ERR [20090828-20:16:31.184] [1158543696] root: self._cache[key] = (revid, data)
ERR [20090828-20:16:31.184] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/eggs/bzr-1.17-py2.4-linux-x86_64.egg/bzrlib/lru_cache.py", line 217, in __setitem__
ERR [20090828-20:16:31.185] [1158543696] root: self.add(key, value, cleanup=None)
ERR [20090828-20:16:31.185] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/eggs/bzr-1.17-py2.4-linux-x86_64.egg/bzrlib/lru_cache.py", line 177, in add
ERR [20090828-20:16:31.185] [1158543696] root: self.cleanup()
ERR [20090828-20:16:31.185] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/eggs/bzr-1.17-py2.4-linux-x86_64.egg/bzrlib/lru_cache.py", line 213, in cleanup
ERR [20090828-20:16:31.185] [1158543696] root: self._remove_lru()
ERR [20090828-20:16:31.185] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/eggs/bzr-1.17-py2.4-linux-x86_64.egg/bzrlib/lru_cache.py", line 272, in _remove_lru
ERR [20090828-20:16:31.185] [1158543696] root: self._remove_node(self._least_recently_used)
ERR [20090828-20:16:31.185] [1158543696] root: File "/srv/codebrowse.launchpad.net/production/launchpad-rev-8312/eggs/bzr-1.17-py2.4-linux-x86_64.egg/bzrlib/lru_cache.py", line 260, in _remove_node
ERR [20090828-20:16:31.185] [1158543696] root: node_next = self._cache[node.next_key]
ERR [20090828-20:16:31.185] [1158543696] root: KeyError: '~jtv/launchpad/bug-416434'

This might be an instance of lp:396838, but it occurs with bzr 1.17, and the bug was fixed in that release.

Related branches

Aaron Bentley (abentley)
description: updated
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Ah, I've sort of seen this one before, but was completely unable to reproduce it.

I'm reasonably sure it's due to an exception being raised somewhere in the LRUCache code that damages the data structure so that it fails in this way. So it's similar to bug 396838, but different in the details.

Martin Pool (mbp)
Changed in bzr:
status: New → Confirmed
importance: Undecided → Low
Revision history for this message
Vincent Ladeuil (vila) wrote :

John, I think you fixed that, am I wrong ?

Revision history for this message
John A Meinel (jameinel) wrote :

I fixed the one that Michael refers to (bug #396838). It is unclear how to reproduce this one. As Michael says "its different in the details".

In the bug I fixed, it was an exception being raised as part of cleanup() [which like all destructor type code, really shouldn't be raising... :)]. So I trapped stuff during cleanup() and make sure to preserve the data structure.

The traceback here doesn't include any information about how to reproduce, so I don't really know what needs to be done. I'm guessing there is, indeed, a bug in the code wrt invariants being preserved in the face of exceptions.

However, it looks like we'd have to find it by inspection, which can certainly be tricky.

Revision history for this message
John A Meinel (jameinel) wrote :

Aaron-

I just ran into something like this in my multi-threaded testing of parts of bzrlib.

Suffice it to say that LRUCache is *not* thread safe. It is probably close, but it doesn't have atomic operations. So stuff like:

# Remove from the double-linked list
node.next.prev = node.prev
node.prev.next = node.next

Has the potential to switch threads in between those two statements, and you end up with a corrupted double-linked list.

At the moment, I don't have plans to make LRUCache threadsafe. You could do it with 'one-big-lock' on the datastructure, I suppose.

I'd probably rather write a C/Pyrex implementation that would perform approx as fast as a regular dict, and then look into making *that* threadsafe. Note that python dicts themselves aren't threadsafe anyway, as python used the GIL to avoid putting a lock on every object. (Or spinlocking the refcount, etc.)

Revision history for this message
Matt Nordhoff (mnordhoff) wrote :

This just took out Launchpad's Loggerhead instance (see bug #517418), leaving it returning nothing but Internal Server Errors, so it seems somewhat more urgent...

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 420738] Re: LRUCache.cleanup raises KeyError

On Fri, 2010-02-05 at 06:57 +0000, Matt Nordhoff wrote:
> This just took out Launchpad's Loggerhead instance (see bug #517418),
> leaving it returning nothing but Internal Server Errors, so it seems
> somewhat more urgent...

Its probably a loggerhead bug: using the same LRUCache from different
threads, without synchronisation.

-Rob

Changed in loggerhead:
status: New → Confirmed
importance: Undecided → Critical
Changed in bzr:
status: Confirmed → Incomplete
Revision history for this message
Max Kanat-Alexander (mkanat) wrote :

This is one of the root causes of the frequent codebrowse crashes, though it's a separate bug from bug 513044. I'm going to assume that the Everything Solved retainer covers fixing this, since it's a pretty critical stability issue.

Changed in loggerhead:
assignee: nobody → Max Kanat-Alexander (mkanat)
Changed in loggerhead:
status: Confirmed → Fix Committed
Changed in loggerhead:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.