Zeitgeist Framework

Invalid cache access

Bug #695087 reported by Siegfried Gevatter on 2010-12-28

This bug report is a duplicate of: Bug #598666: Invalid cache access (was: Error when trying to fetch items). Edit Remove

6

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Zeitgeist Framework	Triaged	High	Siegfried Gevatter

Bug Description

alagos@alagos-desktop:~$ gnome-activity-journal
** Message: pygobject_register_sinkfunc is deprecated (GstObject)
/usr/share/gnome-activity-journal/src/common.py:747: DeprecationWarning: object.__new__() takes no parameters
  GIO_FILES[subj] = object.__new__(classtype, *args, **kwargs)
/usr/share/gnome-activity-journal/src/activity_widgets.py:303: GtkWarning: gtk_box_pack: assertion `child->parent == NULL' failed
  self.pack_end(hbox)
Error from Zeitgeist engine: org.freedesktop.DBus.Python.KeyError: Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.6/dbus/service.py", line 702, in _message_cb
    retval = candidate_method(self, *args, **keywords)
  File "/usr/share/zeitgeist/_zeitgeist/engine/remote.py", line 253, in FindEvents
    event_templates, storage_state, num_events, result_type, sender))
  File "/usr/share/zeitgeist/_zeitgeist/engine/main.py", line 395, in find_events
    return self._find_events(1, *args)
  File "/usr/share/zeitgeist/_zeitgeist/engine/main.py", line 382, in _find_events
    result = self.get_events(ids=[row[0] for row in result], sender=sender)
  File "/usr/share/zeitgeist/_zeitgeist/engine/main.py", line 187, in get_events
    event = self._get_event_from_row(row)
  File "/usr/share/zeitgeist/_zeitgeist/engine/main.py", line 152, in _get_event_from_row
    setattr(event, field, getattr(self, "_" + field).value(row[field]))
  File "/usr/share/zeitgeist/_zeitgeist/engine/sql.py", line 430, in value
    return self._inv_dict[id]
KeyError: 1

Siegfried Gevatter (rainct) on 2010-12-28

Changed in zeitgeist:
assignee:	nobody → Siegfried Gevatter (rainct)
importance:	Undecided → High
status:	New → Triaged

Revision history for this message

Siegfried Gevatter (rainct) wrote on 2010-12-28:

#1

thekorn benchmark and plot.ods Edit (17.3 KiB, application/vnd.oasis.opendocument.spreadsheet)

I think I have a clue on what's going on, but before we fix this, I guess I'd be good to have some documentation on why we even have the cache. For this purpose, I've tried removing it from the engine (code at lp:~zeitgeist/zeitgeist/no-table-cache, but it's not ready for merging) and I'm attaching some test results.

It looks like the cache at best gives a very little speed benefit, so I'm not sure it's worth keeping it.

Revision history for this message

Siegfried Gevatter (rainct) wrote on 2010-12-28:

#2

speedtest.txt Edit (1.1 KiB, text/plain)

In the previous benchmark, /tmp/zeitgeist-trunk is trunk and ../ is the modified branch.

I'm also attaching a comparison of running GNOME Activity Journal, but the data here is too random to get anything out of it, so ignore it. I'm just attaching it because it took some time to put it in a nice table :P.

Revision history for this message

Markus Korn (thekorn) wrote on 2010-12-29:

#3

@Siegfried, what exactly are you benchmarking here? Eg. how many different (relevant) values are you using? How do the graphs look if you have some hundred different actors and mimetypes and a wide range of used manifestations and interpretations?

Revision history for this message

Siegfried Gevatter (rainct) wrote on 2010-12-29:

#4

That one is using your benchmark tool, so it's just getting stuff by ID so what it benchmarks is any time difference thanks to retrieving the events with only the IDs and then replacing them with the proper URI values in Python code.

What's missing to make it fair is a benchmark also for the query construction part, although I don't expect it to have much more of a difference.

Revision history for this message

Markus Korn (thekorn) wrote on 2010-12-30:

#5

Ok, my concern is: this script only uses one manifestation and one interpretation, so dict lookup and sql query (including sqlite's internal caching) might have the same speed.
What I'm curious to know is: is there if difference if we have many different of these values, at which point are sql queries slower than our cache lookups?

Revision history for this message

Siegfried Gevatter (rainct) wrote on 2011-03-07:

#6

I've just added a new benchmark test to lp:~zeitgeist/zeitgeist/no-table-cache, test/cache-test.py, which inserts 1000 events with a different interpretation each (could as well -and more likely in the real world- be 1000 different mimetypes or actors, but the code is basically the same for all of them so I just tried with interpretations) and then retrieves them one by one filtering by their interpretations.

This benchmark shows us that the cache is indeed justified, so now we know it's worth to fix it :).

Times (in seconds) with current trunk and having removed the cache:
- Insertion: 0.22 0.34
- Recovery: 0.19 1.90

Here's the same with 100 events and interpretations instead of 1000:
- Insertion: 0.02 0.032
- Recovery: 0.02 0.035

And with 10 events/interpretations:
- Insertion: 0.0025 0.0035
- Recovery: 0.0021 0.0022

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #598666 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.