Zim

Improve performance of tags plugin by removing filtered model

Bug #791509 reported by maacruz on 2011-06-01

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Zim	Fix Released	Medium	Unassigned

Bug Description

Performance of tags is terrible on a big notebook (about 300 pages but there is just only 4 tags in a few pages).
Each tag operation (selecting a tag in the tag cloud) takes about 10 seconds to complete at 100% cpu in a low power 400 MHz computer (N810 device)
Profiling shows there is about 5500 cursor executes for each operation, taking the lion's share of the cpu time.

Profile data:
Wed Jun 1 19:21:26 2011 profile-tags

372336 function calls (371074 primitive calls) in 153.164 CPU seconds

Ordered by: cumulative time
List reduced from 1426 to 20 due to restriction <20>

   ncalls tottime percall cumtime percall filename:lineno(function)
        1 0.020 0.020 153.183 153.183 {execfile}
        1 0.015 0.015 153.163 153.163 zim.py:3(<module>)
        1 0.044 0.044 151.905 151.905 __init__.py:145(main)
        1 0.002 0.002 138.920 138.920 __init__.py:451(main)
        1 28.659 28.659 136.030 136.030 {gtk._gtk.main}
    23/12 0.015 0.001 105.128 8.761 {method 'emit' of 'gobject.GObject' objects}
        5 0.009 0.002 99.494 19.899 tags.py:674(_update)
        4 0.003 0.001 99.474 24.868 tags.py:691(<lambda>)
        5 0.001 0.000 99.320 19.864 tags.py:755(on_cloud_selection_changed)
        5 0.001 0.000 99.317 19.863 tags.py:556(set_tag_filter)
        4 5.833 1.458 99.315 24.829 {method 'refilter' of 'gtk.TreeModelFilter' objects}
    26213 9.813 0.000 86.792 0.003 tags.py:403(_get_iter)
    22138 66.922 0.003 66.922 0.003 {method 'execute' of 'sqlite3.Cursor' objects}
     7842 0.851 0.000 63.793 0.008 pageindex.py:296(on_iter_children)
    11879 5.162 0.000 60.562 0.005 index.py:1171(list_pages)
     7104 1.308 0.000 18.868 0.003 pageindex.py:287(on_iter_next)
     1986 0.681 0.000 14.848 0.007 index.py:1242(list_all_pages)
     1883 1.869 0.001 10.677 0.006 index.py:1045(lookup_id)
        1 0.018 0.018 8.884 8.884 __init__.py:295(__init__)
    11386 0.667 0.000 7.280 0.001 pageindex.py:193(on_get_iter)

Tags:

Revision history for this message

maacruz (maacruz) wrote on 2011-06-01:

profiling data Edit (71.7 KiB, application/octet-stream)

Revision history for this message

Jaap Karssenberg (jaap.karssenberg) wrote on 2011-06-04: Re: [Bug 791509] Re: tags: terrible performance

Can you also add zim version, python version, gtk version and platform ?

Thanks,

Jaap

Revision history for this message

Jaap Karssenberg (jaap.karssenberg) wrote on 2011-06-06: Re: tags: terrible performance

Issue is due to using TreeFilter to filter out data from the tag tree in the side pane. Fixing this would mean we have to refactor that code to not use the tree filter class, but do the filtering ourselves.

Revision history for this message

maacruz (maacruz) wrote on 2011-06-10:

Yes, I have followed the problem up to model.refilter() in TagsPageTreeView.set_tag_filter()
But I have observed that the tag tree shows the pages repeated several times. Let's suppose I have a notebook like this:
Page 1
         |
         |-Page 2
                   |
                   |- Page 21
         |-Page 3
then the pane shows:
>Page 1
    >Page 2
        Page 21
    Page 3
>Page 2
    Page 21
Page 3
Page 21

as you see, the number of lines in the tree is multiplied depending on the notebook's tree depth. So, a 300 page notebook can generate several thousand lines in the panel, making it not only slow but hard to use too.
A more efficient approach would be just to use the same tree the index tab uses and not show lower branches and leaves.
What piece of code populates the tag tree?

Revision history for this message

maacruz (maacruz) wrote on 2011-06-10:

Thinking again about it, the tags tree pane has to show the pages, not trees.

Revision history for this message

Jaap Karssenberg (jaap.karssenberg) wrote on 2011-06-14: Re: [Bug 791509] Re: tags: terrible performance

Let's not discuss that visual behavior of the tag tree - it looks as it was
designed to look.

The tree is not really populated. Instead our custom tree model does SELECT
queries directly to give the info needed by the treeview. THis works well
(without the filter) because the treeview only shows a limited set of pages
at any time resulting in a small number of SQL queries and we do not need to
keep the rest of the tree in memory.

Problem it goes wrong when we call the filter and it wants to check all
pages at once. The fix would be to make the filtering embedded in the SQL
queries directly, so we never have to filter all pages but retrieve filtered
data from the database directly.

This is something I would like to look into but is fairly complicated (which
is why we didn't do it immediately when we committed the plugin).

Jaap Karssenberg (jaap.karssenberg) on 2011-08-24

tags:	added: tags
Changed in zim:
status:	New → Confirmed
importance:	Undecided → Medium
summary:	- tags: terrible performance + Improve performance of tags plugin by removing filtered model

Jaap Karssenberg (jaap.karssenberg) on 2017-03-07

Changed in zim:
status:	Confirmed → In Progress

Revision history for this message

Jaap Karssenberg (jaap.karssenberg) wrote on 2017-04-11:

rev 843

Changed in zim:
status:	In Progress → Fix Committed

Jaap Karssenberg (jaap.karssenberg) on 2017-04-28

Changed in zim:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

profiling data Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.