Invalid cache access

Bug #695087 reported by Siegfried Gevatter on 2010-12-28
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Zeitgeist Framework
Triaged
High
Siegfried Gevatter

Bug Description

alagos@alagos-desktop:~$ gnome-activity-journal
** Message: pygobject_register_sinkfunc is deprecated (GstObject)
/usr/share/gnome-activity-journal/src/common.py:747: DeprecationWarning: object.__new__() takes no parameters
  GIO_FILES[subj] = object.__new__(classtype, *args, **kwargs)
/usr/share/gnome-activity-journal/src/activity_widgets.py:303: GtkWarning: gtk_box_pack: assertion `child->parent == NULL' failed
  self.pack_end(hbox)
Error from Zeitgeist engine: org.freedesktop.DBus.Python.KeyError: Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.6/dbus/service.py", line 702, in _message_cb
    retval = candidate_method(self, *args, **keywords)
  File "/usr/share/zeitgeist/_zeitgeist/engine/remote.py", line 253, in FindEvents
    event_templates, storage_state, num_events, result_type, sender))
  File "/usr/share/zeitgeist/_zeitgeist/engine/main.py", line 395, in find_events
    return self._find_events(1, *args)
  File "/usr/share/zeitgeist/_zeitgeist/engine/main.py", line 382, in _find_events
    result = self.get_events(ids=[row[0] for row in result], sender=sender)
  File "/usr/share/zeitgeist/_zeitgeist/engine/main.py", line 187, in get_events
    event = self._get_event_from_row(row)
  File "/usr/share/zeitgeist/_zeitgeist/engine/main.py", line 152, in _get_event_from_row
    setattr(event, field, getattr(self, "_" + field).value(row[field]))
  File "/usr/share/zeitgeist/_zeitgeist/engine/sql.py", line 430, in value
    return self._inv_dict[id]
KeyError: 1

Changed in zeitgeist:
assignee: nobody → Siegfried Gevatter (rainct)
importance: Undecided → High
status: New → Triaged
Siegfried Gevatter (rainct) wrote :

I think I have a clue on what's going on, but before we fix this, I guess I'd be good to have some documentation on why we even have the cache. For this purpose, I've tried removing it from the engine (code at lp:~zeitgeist/zeitgeist/no-table-cache, but it's not ready for merging) and I'm attaching some test results.

It looks like the cache at best gives a very little speed benefit, so I'm not sure it's worth keeping it.

Siegfried Gevatter (rainct) wrote :

In the previous benchmark, /tmp/zeitgeist-trunk is trunk and ../ is the modified branch.

I'm also attaching a comparison of running GNOME Activity Journal, but the data here is too random to get anything out of it, so ignore it. I'm just attaching it because it took some time to put it in a nice table :P.

Markus Korn (thekorn) wrote :

@Siegfried, what exactly are you benchmarking here? Eg. how many different (relevant) values are you using? How do the graphs look if you have some hundred different actors and mimetypes and a wide range of used manifestations and interpretations?

Siegfried Gevatter (rainct) wrote :

That one is using your benchmark tool, so it's just getting stuff by ID so what it benchmarks is any time difference thanks to retrieving the events with only the IDs and then replacing them with the proper URI values in Python code.

What's missing to make it fair is a benchmark also for the query construction part, although I don't expect it to have much more of a difference.

Markus Korn (thekorn) wrote :

Ok, my concern is: this script only uses one manifestation and one interpretation, so dict lookup and sql query (including sqlite's internal caching) might have the same speed.
What I'm curious to know is: is there if difference if we have many different of these values, at which point are sql queries slower than our cache lookups?

Siegfried Gevatter (rainct) wrote :

I've just added a new benchmark test to lp:~zeitgeist/zeitgeist/no-table-cache, test/cache-test.py, which inserts 1000 events with a different interpretation each (could as well -and more likely in the real world- be 1000 different mimetypes or actors, but the code is basically the same for all of them so I just tried with interpretations) and then retrieves them one by one filtering by their interpretations.

This benchmark shows us that the cache is indeed justified, so now we know it's worth to fix it :).

Times (in seconds) with current trunk and having removed the cache:
 - Insertion: 0.22 0.34
 - Recovery: 0.19 1.90

Here's the same with 100 events and interpretations instead of 1000:
 - Insertion: 0.02 0.032
 - Recovery: 0.02 0.035

And with 10 events/interpretations:
 - Insertion: 0.0025 0.0035
 - Recovery: 0.0021 0.0022

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers