Query execution slow

Bug #844877 reported by Michal Hruby
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Zeitgeist Framework
In Progress
Undecided
Siegfried Gevatter

Bug Description

Synapse is using various not-so-complex find_events queries to display recent activities, and it seems that especially a query for "All" category is taking much longer than any other query (~750ms with a DB with 200k events).

Here's the output from zg with the debug_sql extension:

[DEBUG - root] # parameters: {'event_templates': [Event([dbus.Array([dbus.String(u''), '1315492708863', dbus.String(u''), dbus.String(u''), dbus.String(u''), ''], signature=dbus.Signature('s')), [Subject([dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u'')]), Subject([dbus.String(u''), dbus.String(u'!http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Folder'), dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u'')]), Subject([dbus.String(u''), dbus.String(u'!http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Software'), dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u'')])], dbus.Array([], signature=dbus.Signature('y'))])],
 'max_events': dbus.UInt32(96L),
 'order': dbus.UInt32(2L),
 'return_mode': 1,
 'self': <_zeitgeist.engine.main.ZeitgeistEngine instance at 0x1b66248>,
 'sender': ':1.3604',
 'storage_state': dbus.UInt32(2L),
 'time_range': [1300977508860, 9223372036854775807]}

[DEBUG - root] # database size: {'actor': 129,
 'event': 192746,
 'interpretation': 68,
 'manifestation': 27,
 'mimetype': 353,
 'uri': 60265}
[DEBUG - root] # BEGIN SQL QUERY #
no pretty sql: SELECT id FROM event_view
    NATURAL JOIN (
     SELECT subj_id,
      max(timestamp) AS timestamp

     FROM event_view WHERE (+timestamp >= ? AND ((NOT (subj_interpretation = ?) AND NOT (subj_interpretation = ? OR subj_interpretation = ? OR subj_interpretation = ?))))
     GROUP BY subj_id)
    GROUP BY subj_id
    ORDER BY timestamp DESC
     LIMIT 96 : [1300977508860, 45, 46, 47, 43]
took 0.712120s
# END SQL QUERY
[DEBUG - zeitgeist.engine] Found 96 events IDs in 0.723284s

Revision history for this message
Seif Lotfy (seif) wrote : Re: [Zeitgeist] [Bug 844877] [NEW] Query execution slow
Download full text (5.6 KiB)

Wow that is VERY SLOW...
Maybe Markus can help us look into the DB structure again :)

On Thu, Sep 8, 2011 at 4:59 PM, Michal Hruby <email address hidden> wrote:

> Public bug reported:
>
> Synapse is using various not-so-complex find_events queries to display
> recent activities, and it seems that especially a query for "All"
> category is taking much longer than any other query (~750ms with a DB
> with 200k events).
>
> Here's the output from zg with the debug_sql extension:
>
> [DEBUG - root] # parameters: {'event_templates':
> [Event([dbus.Array([dbus.String(u''), '1315492708863', dbus.String(u''),
> dbus.String(u''), dbus.String(u''), ''], signature=dbus.Signature('s')),
> [Subject([dbus.String(u''), dbus.String(u''), dbus.String(u''),
> dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''),
> dbus.String(u'')]), Subject([dbus.String(u''), dbus.String(u'!
> http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Folder'),
> dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''),
> dbus.String(u''), dbus.String(u'')]), Subject([dbus.String(u''),
> dbus.String(u'!
> http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Software'),
> dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''),
> dbus.String(u''), dbus.String(u'')])], dbus.Array([],
> signature=dbus.Signature('y'))])],
> 'max_events': dbus.UInt32(96L),
> 'order': dbus.UInt32(2L),
> 'return_mode': 1,
> 'self': <_zeitgeist.engine.main.ZeitgeistEngine instance at 0x1b66248>,
> 'sender': ':1.3604',
> 'storage_state': dbus.UInt32(2L),
> 'time_range': [1300977508860, 9223372036854775807]}
>
> [DEBUG - root] # database size: {'actor': 129,
> 'event': 192746,
> 'interpretation': 68,
> 'manifestation': 27,
> 'mimetype': 353,
> 'uri': 60265}
> [DEBUG - root] # BEGIN SQL QUERY #
> no pretty sql: SELECT id FROM event_view
> NATURAL JOIN (
> SELECT subj_id,
> max(timestamp) AS timestamp
>
> FROM event_view WHERE (+timestamp
> >= ? AND ((NOT (subj_interpretation = ?) AND NOT (subj_interpretation = ? OR
> subj_interpretation = ? OR subj_interpretation = ?))))
> GROUP BY subj_id)
> GROUP BY subj_id
> ORDER BY timestamp DESC
> LIMIT 96 : [1300977508860, 45, 46, 47, 43]
> took 0.712120s
> # END SQL QUERY
> [DEBUG - zeitgeist.engine] Found 96 events IDs in 0.723284s
>
> ** Affects: zeitgeist
> Importance: Undecided
> Status: New
>
> --
> You received this bug notification because you are a member of Zeitgeist
> Framework Team, which is subscribed to Zeitgeist Framework.
> https://bugs.launchpad.net/bugs/844877
>
> Title:
> Query execution slow
>
> Status in Zeitgeist Framework:
> New
>
> Bug description:
> Synapse is using various not-so-complex find_events queries to display
> recent activities, and it seems that especially a query for "All"
> category is taking much longer than any other query (~750ms with a DB
> with 200k events...

Read more...

Revision history for this message
Markus Korn (thekorn) wrote :

It is not *very* slow, it just seems to be at bit slower than it should be.
From reading the query plan the reason seems to be that two temp b-trees are created, which is *slow*

sele order from deta
---- ------------- ---- ----
1 0 0 SCAN TABLE event USING INDEX event_subj_id (~125000 rows)
0 0 1 SCAN SUBQUERY 1 (~100 rows)
0 1 0 SEARCH TABLE event USING INDEX event_subj_id (subj_id=?) (~2 rows)
0 0 0 USE TEMP B-TREE FOR GROUP BY
0 0 0 USE TEMP B-TREE FOR ORDER BY

It seems like all other queries are also using this temp b-trees (at least the ones I tried)
Right now I don't see a way to optimize it.

Revision history for this message
Michal Hruby (mhr3) wrote :

Ok, I've got another VERY bad one... See attachement

Revision history for this message
Michal Hruby (mhr3) wrote :

As a note the result is pretty much still the same after seif's fix to get rid of the "AND subj_current_uri = ..." part.

Revision history for this message
Michal Hruby (mhr3) wrote :

And more research:

<mhr3> i just tried two queries in `sqlite3`
 SELECT subj_id FROM event_view WHERE subj_uri IN ('application://gedit.desktop', 'application://firefox.desktop', 'application://devhelp.desktop', 'application://google-chrome.desktop', 'application://dfeet.desktop', 'http://twitter.com/', 'file:///home/miso/projects/checkout/gnome/glib/glib/gquark.h', 'http://www.google.cz/search?sourceid=chrome&ie=UTF-8&q=youtube+darky+rom+10.2') GROUP BY subj_id;
<mhr3> and
 SELECT id FROM uri WHERE value IN ('application://gedit.desktop', 'application://firefox.desktop', 'application://devhelp.desktop', 'application://google-chrome.desktop', 'application://dfeet.desktop', 'http://twitter.com/', 'file:///home/miso/projects/checkout/gnome/glib/glib/gquark.h', 'http://www.google.cz/search?sourceid=chrome&ie=UTF-8&q=youtube+darky+rom+10.2');
<mhr3> the second one is much much much faster (also there's no visible difference between "... OR value = '...' OR ..." and "IN ('...', ...)")

Revision history for this message
Michal Hruby (mhr3) wrote :

And concluding:

SELECT subj_id, max(timestamp) AS timestamp
   FROM event_view
   WHERE (((subj_uri = 'application://gedit.desktop')
           OR (subj_uri = 'application://devhelp.desktop')
           OR (subj_uri = 'application://google-chrome.desktop') OR ...
   GROUP BY subj_id;

executes in 20seconds here.

On the other hand:
SELECT subj_id, max(timestamp) FROM event_view WHERE subj_id IN
   (SELECT id
      FROM uri
      WHERE (((value = 'application://gedit.desktop')
              OR (value = 'application://devhelp.desktop')
              OR (value = 'application://google-chrome.desktop') OR ...)
   GROUP BY subj_id;

executes immediately. Of course results are equivalent.

Revision history for this message
Seif Lotfy (seif) wrote : Re: [Bug 844877] Re: Query execution slow
Download full text (3.6 KiB)

Very interesting. seems to me like the view might be a bit corrupt then...
not sure...
I think Markus would know more about that

On Sun, Sep 18, 2011 at 11:09 PM, Michal Hruby <email address hidden> wrote:

> And concluding:
>
> SELECT subj_id, max(timestamp) AS timestamp
> FROM event_view
> WHERE (((subj_uri = 'application://gedit.desktop')
> OR (subj_uri = 'application://devhelp.desktop')
> OR (subj_uri = 'application://google-chrome.desktop') OR ...
> GROUP BY subj_id;
>
> executes in 20seconds here.
>
> On the other hand:
> SELECT subj_id, max(timestamp) FROM event_view WHERE subj_id IN
> (SELECT id
> FROM uri
> WHERE (((value = 'application://gedit.desktop')
> OR (value = 'application://devhelp.desktop')
> OR (value = 'application://google-chrome.desktop') OR ...)
> GROUP BY subj_id;
>
> executes immediately. Of course results are equivalent.
>
> --
> You received this bug notification because you are subscribed to The
> Zeitgeist Project.
> https://bugs.launchpad.net/bugs/844877
>
> Title:
> Query execution slow
>
> Status in Zeitgeist Framework:
> New
>
> Bug description:
> Synapse is using various not-so-complex find_events queries to display
> recent activities, and it seems that especially a query for "All"
> category is taking much longer than any other query (~750ms with a DB
> with 200k events).
>
> Here's the output from zg with the debug_sql extension:
>
> [DEBUG - root] # parameters: {'event_templates':
> [Event([dbus.Array([dbus.String(u''), '1315492708863', dbus.String(u''),
> dbus.String(u''), dbus.String(u''), ''], signature=dbus.Signature('s')),
> [Subject([dbus.String(u''), dbus.String(u''), dbus.String(u''),
> dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''),
> dbus.String(u'')]), Subject([dbus.String(u''), dbus.String(u'!
> http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Folder'),
> dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''),
> dbus.String(u''), dbus.String(u'')]), Subject([dbus.String(u''),
> dbus.String(u'!
> http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Software'),
> dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''),
> dbus.String(u''), dbus.String(u'')])], dbus.Array([],
> signature=dbus.Signature('y'))])],
> 'max_events': dbus.UInt32(96L),
> 'order': dbus.UInt32(2L),
> 'return_mode': 1,
> 'self': <_zeitgeist.engine.main.ZeitgeistEngine instance at 0x1b66248>,
> 'sender': ':1.3604',
> 'storage_state': dbus.UInt32(2L),
> 'time_range': [1300977508860, 9223372036854775807]}
>
> [DEBUG - root] # database size: {'actor': 129,
> 'event': 192746,
> 'interpretation': 68,
> 'manifestation': 27,
> 'mimetype': 353,
> 'uri': 60265}
> [DEBUG - root] # BEGIN SQL QUERY #
> no pretty sql: SELECT id FROM event_view
> NATURAL JOIN (
> SELECT subj_id,
> max(timestamp) AS timestamp
>
> FROM event_view WHERE (+timestamp
> >= ? AND ((NOT (subj_interpretation = ?) AND NOT (subj_interpretation ...

Read more...

Revision history for this message
Siegfried Gevatter (rainct) wrote :

Looks like SQLite's optimizer can't handle the subqueries in event_view, so we should restrict their usage in WHERE clauses.

Bluebird revision 271 fixes this (please confirm).

Revision history for this message
Siegfried Gevatter (rainct) wrote :

Created https://bugs.launchpad.net/zeitgeist/+bug/910190 for the DISTINCT changes.

Changed in zeitgeist:
milestone: none → 0.9.1
status: New → Triaged
Revision history for this message
Michal Hruby (mhr3) wrote :

The original issue is still not fixed though (comments 3-8 should have been in a separate bug).

Revision history for this message
Siegfried Gevatter (rainct) wrote :
Changed in zeitgeist:
assignee: nobody → Siegfried Gevatter (rainct)
status: Triaged → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.