Zim

Improve performance of tags plugin by removing filtered model

Bug #791509 reported by maacruz
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Zim
Fix Released
Medium
Unassigned

Bug Description

Performance of tags is terrible on a big notebook (about 300 pages but there is just only 4 tags in a few pages).
Each tag operation (selecting a tag in the tag cloud) takes about 10 seconds to complete at 100% cpu in a low power 400 MHz computer (N810 device)
Profiling shows there is about 5500 cursor executes for each operation, taking the lion's share of the cpu time.

Profile data:
Wed Jun 1 19:21:26 2011 profile-tags

         372336 function calls (371074 primitive calls) in 153.164 CPU seconds

   Ordered by: cumulative time
   List reduced from 1426 to 20 due to restriction <20>

   ncalls tottime percall cumtime percall filename:lineno(function)
        1 0.020 0.020 153.183 153.183 {execfile}
        1 0.015 0.015 153.163 153.163 zim.py:3(<module>)
        1 0.044 0.044 151.905 151.905 __init__.py:145(main)
        1 0.002 0.002 138.920 138.920 __init__.py:451(main)
        1 28.659 28.659 136.030 136.030 {gtk._gtk.main}
    23/12 0.015 0.001 105.128 8.761 {method 'emit' of 'gobject.GObject' objects}
        5 0.009 0.002 99.494 19.899 tags.py:674(_update)
        4 0.003 0.001 99.474 24.868 tags.py:691(<lambda>)
        5 0.001 0.000 99.320 19.864 tags.py:755(on_cloud_selection_changed)
        5 0.001 0.000 99.317 19.863 tags.py:556(set_tag_filter)
        4 5.833 1.458 99.315 24.829 {method 'refilter' of 'gtk.TreeModelFilter' objects}
    26213 9.813 0.000 86.792 0.003 tags.py:403(_get_iter)
    22138 66.922 0.003 66.922 0.003 {method 'execute' of 'sqlite3.Cursor' objects}
     7842 0.851 0.000 63.793 0.008 pageindex.py:296(on_iter_children)
    11879 5.162 0.000 60.562 0.005 index.py:1171(list_pages)
     7104 1.308 0.000 18.868 0.003 pageindex.py:287(on_iter_next)
     1986 0.681 0.000 14.848 0.007 index.py:1242(list_all_pages)
     1883 1.869 0.001 10.677 0.006 index.py:1045(lookup_id)
        1 0.018 0.018 8.884 8.884 __init__.py:295(__init__)
    11386 0.667 0.000 7.280 0.001 pageindex.py:193(on_get_iter)

Revision history for this message
maacruz (maacruz) wrote :
Revision history for this message
Jaap Karssenberg (jaap.karssenberg) wrote : Re: [Bug 791509] Re: tags: terrible performance

Can you also add zim version, python version, gtk version and platform ?

Thanks,

Jaap

Revision history for this message
Jaap Karssenberg (jaap.karssenberg) wrote : Re: tags: terrible performance

Issue is due to using TreeFilter to filter out data from the tag tree in the side pane. Fixing this would mean we have to refactor that code to not use the tree filter class, but do the filtering ourselves.

Revision history for this message
maacruz (maacruz) wrote :

Yes, I have followed the problem up to model.refilter() in TagsPageTreeView.set_tag_filter()
But I have observed that the tag tree shows the pages repeated several times. Let's suppose I have a notebook like this:
Page 1
         |
         |-Page 2
                   |
                   |- Page 21
         |-Page 3
then the pane shows:
>Page 1
    >Page 2
        Page 21
    Page 3
>Page 2
    Page 21
Page 3
Page 21

as you see, the number of lines in the tree is multiplied depending on the notebook's tree depth. So, a 300 page notebook can generate several thousand lines in the panel, making it not only slow but hard to use too.
A more efficient approach would be just to use the same tree the index tab uses and not show lower branches and leaves.
What piece of code populates the tag tree?

Revision history for this message
maacruz (maacruz) wrote :

Thinking again about it, the tags tree pane has to show the pages, not trees.

Revision history for this message
Jaap Karssenberg (jaap.karssenberg) wrote : Re: [Bug 791509] Re: tags: terrible performance

Let's not discuss that visual behavior of the tag tree - it looks as it was
designed to look.

The tree is not really populated. Instead our custom tree model does SELECT
queries directly to give the info needed by the treeview. THis works well
(without the filter) because the treeview only shows a limited set of pages
at any time resulting in a small number of SQL queries and we do not need to
keep the rest of the tree in memory.

Problem it goes wrong when we call the filter and it wants to check all
pages at once. The fix would be to make the filtering embedded in the SQL
queries directly, so we never have to filter all pages but retrieve filtered
data from the database directly.

This is something I would like to look into but is fairly complicated (which
is why we didn't do it immediately when we committed the plugin).

tags: added: tags
Changed in zim:
status: New → Confirmed
importance: Undecided → Medium
summary: - tags: terrible performance
+ Improve performance of tags plugin by removing filtered model
Changed in zim:
status: Confirmed → In Progress
Revision history for this message
Jaap Karssenberg (jaap.karssenberg) wrote :

rev 843

Changed in zim:
status: In Progress → Fix Committed
Changed in zim:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.