Query execution slow

Bug #844877 reported by Michal Hruby
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Zeitgeist Framework
In Progress
Undecided
Siegfried Gevatter

Bug Description

Synapse is using various not-so-complex find_events queries to display recent activities, and it seems that especially a query for "All" category is taking much longer than any other query (~750ms with a DB with 200k events).

Here's the output from zg with the debug_sql extension:

[DEBUG - root] # parameters: {'event_templates': [Event([dbus.Array([dbus.String(u''), '1315492708863', dbus.String(u''), dbus.String(u''), dbus.String(u''), ''], signature=dbus.Signature('s')), [Subject([dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u'')]), Subject([dbus.String(u''), dbus.String(u'!http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Folder'), dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u'')]), Subject([dbus.String(u''), dbus.String(u'!http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Software'), dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u'')])], dbus.Array([], signature=dbus.Signature('y'))])],
 'max_events': dbus.UInt32(96L),
 'order': dbus.UInt32(2L),
 'return_mode': 1,
 'self': <_zeitgeist.engine.main.ZeitgeistEngine instance at 0x1b66248>,
 'sender': ':1.3604',
 'storage_state': dbus.UInt32(2L),
 'time_range': [1300977508860, 9223372036854775807]}

[DEBUG - root] # database size: {'actor': 129,
 'event': 192746,
 'interpretation': 68,
 'manifestation': 27,
 'mimetype': 353,
 'uri': 60265}
[DEBUG - root] # BEGIN SQL QUERY #
no pretty sql: SELECT id FROM event_view
    NATURAL JOIN (
     SELECT subj_id,
      max(timestamp) AS timestamp

     FROM event_view WHERE (+timestamp >= ? AND ((NOT (subj_interpretation = ?) AND NOT (subj_interpretation = ? OR subj_interpretation = ? OR subj_interpretation = ?))))
     GROUP BY subj_id)
    GROUP BY subj_id
    ORDER BY timestamp DESC
     LIMIT 96 : [1300977508860, 45, 46, 47, 43]
took 0.712120s
# END SQL QUERY
[DEBUG - zeitgeist.engine] Found 96 events IDs in 0.723284s

Revision history for this message
Seif Lotfy (seif) wrote : Re: [Zeitgeist] [Bug 844877] [NEW] Query execution slow
Download full text (5.6 KiB)

Wow that is VERY SLOW...
Maybe Markus can help us look into the DB structure again :)

On Thu, Sep 8, 2011 at 4:59 PM, Michal Hruby <email address hidden> wrote:

> Public bug reported:
>
> Synapse is using various not-so-complex find_events queries to display
> recent activities, and it seems that especially a query for "All"
> category is taking much longer than any other query (~750ms with a DB
> with 200k events).
>
> Here's the output from zg with the debug_sql extension:
>
> [DEBUG - root] # parameters: {'event_templates':
> [Event([dbus.Array([dbus.String(u''), '1315492708863', dbus.String(u''),
> dbus.String(u''), dbus.String(u''), ''], signature=dbus.Signature('s')),
> [Subject([dbus.String(u''), dbus.String(u''), dbus.String(u''),
> dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''),
> dbus.String(u'')]), Subject([dbus.String(u''), dbus.String(u'!
> http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Folder'),
> dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''),
> dbus.String(u''), dbus.String(u'')]), Subject([dbus.String(u''),
> dbus.String(u'!
> http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Software'),
> dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''),
> dbus.String(u''), dbus.String(u'')])], dbus.Array([],
> signature=dbus.Signature('y'))])],
> 'max_events': dbus.UInt32(96L),
> 'order': dbus.UInt32(2L),
> 'return_mode': 1,
> 'self': <_zeitgeist.engine.main.ZeitgeistEngine instance at 0x1b66248>,
> 'sender': ':1.3604',
> 'storage_state': dbus.UInt32(2L),
> 'time_range': [1300977508860, 9223372036854775807]}
>
> [DEBUG - root] # database size: {'actor': 129,
> 'event': 192746,
> 'interpretation': 68,
> 'manifestation': 27,
> 'mimetype': 353,
> 'uri': 60265}
> [DEBUG - root] # BEGIN SQL QUERY #
> no pretty sql: SELECT id FROM event_view
> NATURAL JOIN (
> SELECT subj_id,
> max(timestamp) AS timestamp
>
> FROM event_view WHERE (+timestamp
> >= ? AND ((NOT (subj_interpretation = ?) AND NOT (subj_interpretation = ? OR
> subj_interpretation = ? OR subj_interpretation = ?))))
> GROUP BY subj_id)
> GROUP BY subj_id
> ORDER BY timestamp DESC
> LIMIT 96 : [1300977508860, 45, 46, 47, 43]
> took 0.712120s
> # END SQL QUERY
> [DEBUG - zeitgeist.engine] Found 96 events IDs in 0.723284s
>
> ** Affects: zeitgeist
> Importance: Undecided
> Status: New
>
> --
> You received this bug notification because you are a member of Zeitgeist
> Framework Team, which is subscribed to Zeitgeist Framework.
> https://bugs.launchpad.net/bugs/844877
>
> Title:
> Query execution slow
>
> Status in Zeitgeist Framework:
> New
>
> Bug description:
> Synapse is using various not-so-complex find_events queries to display
> recent activities, and it seems that especially a query for "All"
> category is taking much longer than any other query (~750ms with a DB
> with 200k events...

Read more...

Revision history for this message
Markus Korn (thekorn) wrote :

It is not *very* slow, it just seems to be at bit slower than it should be.
From reading the query plan the reason seems to be that two temp b-trees are created, which is *slow*

sele order from deta
---- ------------- ---- ----
1 0 0 SCAN TABLE event USING INDEX event_subj_id (~125000 rows)
0 0 1 SCAN SUBQUERY 1 (~100 rows)
0 1 0 SEARCH TABLE event USING INDEX event_subj_id (subj_id=?) (~2 rows)
0 0 0 USE TEMP B-TREE FOR GROUP BY
0 0 0 USE TEMP B-TREE FOR ORDER BY

It seems like all other queries are also using this temp b-trees (at least the ones I tried)
Right now I don't see a way to optimize it.

Revision history for this message
Michal Hruby (mhr3) wrote :

Ok, I've got another VERY bad one... See attachement

Revision history for this message
Michal Hruby (mhr3) wrote :

As a note the result is pretty much still the same after seif's fix to get rid of the "AND subj_current_uri = ..." part.

Revision history for this message
Michal Hruby (mhr3) wrote :

And more research:

<mhr3> i just tried two queries in `sqlite3`
 SELECT subj_id FROM event_view WHERE subj_uri IN ('application://gedit.desktop', 'application://firefox.desktop', 'application://devhelp.desktop', 'application://google-chrome.desktop', 'application://dfeet.desktop', 'http://twitter.com/', 'file:///home/miso/projects/checkout/gnome/glib/glib/gquark.h', 'http://www.google.cz/search?sourceid=chrome&ie=UTF-8&q=youtube+darky+rom+10.2') GROUP BY subj_id;
<mhr3> and
 SELECT id FROM uri WHERE value IN ('application://gedit.desktop', 'application://firefox.desktop', 'application://devhelp.desktop', 'application://google-chrome.desktop', 'application://dfeet.desktop', 'http://twitter.com/', 'file:///home/miso/projects/checkout/gnome/glib/glib/gquark.h', 'http://www.google.cz/search?sourceid=chrome&ie=UTF-8&q=youtube+darky+rom+10.2');
<mhr3> the second one is much much much faster (also there's no visible difference between "... OR value = '...' OR ..." and "IN ('...', ...)")

Revision history for this message
Michal Hruby (mhr3) wrote :

And concluding:

SELECT subj_id, max(timestamp) AS timestamp
   FROM event_view
   WHERE (((subj_uri = 'application://gedit.desktop')
           OR (subj_uri = 'application://devhelp.desktop')
           OR (subj_uri = 'application://google-chrome.desktop') OR ...
   GROUP BY subj_id;

executes in 20seconds here.

On the other hand:
SELECT subj_id, max(timestamp) FROM event_view WHERE subj_id IN
   (SELECT id
      FROM uri
      WHERE (((value = 'application://gedit.desktop')
              OR (value = 'application://devhelp.desktop')
              OR (value = 'application://google-chrome.desktop') OR ...)
   GROUP BY subj_id;

executes immediately. Of course results are equivalent.

Revision history for this message
Seif Lotfy (seif) wrote : Re: [Bug 844877] Re: Query execution slow
Download full text (3.6 KiB)

Very interesting. seems to me like the view might be a bit corrupt then...
not sure...
I think Markus would know more about that

On Sun, Sep 18, 2011 at 11:09 PM, Michal Hruby <email address hidden> wrote:

> And concluding:
>
> SELECT subj_id, max(timestamp) AS timestamp
> FROM event_view
> WHERE (((subj_uri = 'application://gedit.desktop')
> OR (subj_uri = 'application://devhelp.desktop')
> OR (subj_uri = 'application://google-chrome.desktop') OR ...
> GROUP BY subj_id;
>
> executes in 20seconds here.
>
> On the other hand:
> SELECT subj_id, max(timestamp) FROM event_view WHERE subj_id IN
> (SELECT id
> FROM uri
> WHERE (((value = 'application://gedit.desktop')
> OR (value = 'application://devhelp.desktop')
> OR (value = 'application://google-chrome.desktop') OR ...)
> GROUP BY subj_id;
>
> executes immediately. Of course results are equivalent.
>
> --
> You received this bug notification because you are subscribed to The
> Zeitgeist Project.
> https://bugs.launchpad.net/bugs/844877
>
> Title:
> Query execution slow
>
> Status in Zeitgeist Framework:
> New
>
> Bug description:
> Synapse is using various not-so-complex find_events queries to display
> recent activities, and it seems that especially a query for "All"
> category is taking much longer than any other query (~750ms with a DB
> with 200k events).
>
> Here's the output from zg with the debug_sql extension:
>
> [DEBUG - root] # parameters: {'event_templates':
> [Event([dbus.Array([dbus.String(u''), '1315492708863', dbus.String(u''),
> dbus.String(u''), dbus.String(u''), ''], signature=dbus.Signature('s')),
> [Subject([dbus.String(u''), dbus.String(u''), dbus.String(u''),
> dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''),
> dbus.String(u'')]), Subject([dbus.String(u''), dbus.String(u'!
> http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Folder'),
> dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''),
> dbus.String(u''), dbus.String(u'')]), Subject([dbus.String(u''),
> dbus.String(u'!
> http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Software'),
> dbus.String(u''), dbus.String(u''), dbus.String(u''), dbus.String(u''),
> dbus.String(u''), dbus.String(u'')])], dbus.Array([],
> signature=dbus.Signature('y'))])],
> 'max_events': dbus.UInt32(96L),
> 'order': dbus.UInt32(2L),
> 'return_mode': 1,
> 'self': <_zeitgeist.engine.main.ZeitgeistEngine instance at 0x1b66248>,
> 'sender': ':1.3604',
> 'storage_state': dbus.UInt32(2L),
> 'time_range': [1300977508860, 9223372036854775807]}
>
> [DEBUG - root] # database size: {'actor': 129,
> 'event': 192746,
> 'interpretation': 68,
> 'manifestation': 27,
> 'mimetype': 353,
> 'uri': 60265}
> [DEBUG - root] # BEGIN SQL QUERY #
> no pretty sql: SELECT id FROM event_view
> NATURAL JOIN (
> SELECT subj_id,
> max(timestamp) AS timestamp
>
> FROM event_view WHERE (+timestamp
> >= ? AND ((NOT (subj_interpretation = ?) AND NOT (subj_interpretation ...

Read more...

Revision history for this message
Siegfried Gevatter (rainct) wrote :

Looks like SQLite's optimizer can't handle the subqueries in event_view, so we should restrict their usage in WHERE clauses.

Bluebird revision 271 fixes this (please confirm).

Revision history for this message
Siegfried Gevatter (rainct) wrote :

Created https://bugs.launchpad.net/zeitgeist/+bug/910190 for the DISTINCT changes.

Changed in zeitgeist:
milestone: none → 0.9.1
status: New → Triaged
Revision history for this message
Michal Hruby (mhr3) wrote :

The original issue is still not fixed though (comments 3-8 should have been in a separate bug).

Revision history for this message
Siegfried Gevatter (rainct) wrote :
Changed in zeitgeist:
assignee: nobody → Siegfried Gevatter (rainct)
status: Triaged → In Progress
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments