GTG

backend_localfile should save tasks after time interval instead of modification

Bug #907676 reported by Izidor Matušov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GTG
Fix Released
Medium
Diego Garcia Gangl

Bug Description

Right now, backend_localfile is saving its XML right after modification. It would make more sense to save XML after certain time interval after the last modification.

Use case:
When starting GTG, every time a task is modified, XML is saved. Because modified signal is quite often sent, GTG spends much time on saving a file what is slow unless you own SSD disk.

Instead of that, it could run a timer, e.g. 30-second timer. Everytime a task is modified, timer is reset. When there is no modification and timer finish, save the changes.

It should be a huge performance improvement.

Related branches

Revision history for this message
Eduardo Pérez Ureta (edpeur) wrote :

Even better would be to use an scalable backend like SQLite instead of using XML as a database.

Revision history for this message
Izidor Matušov (izidor) wrote :

Eduardo> I've created a prototype of SQLite backend. Caveat: I compared a prototype to an optimized system, results are a little bit skewed.

With the current system, after every modification, you have to save your data. (That's the reason for this bug) Committing after each change is really slow. With my data, XML backend was about 1/2 time of SQLite backend. I also violated SQLite library by disabling thread checking because every modification is stored by a different thread -- re-opening SQLite would be even slower.

Yes, we might spend awful much time to optimize system for SQLite. It would be wasting of our time. GTG needs just toserialize data on a disk in some format - XML, JSON, plain text would be enough for that. SQLite adds too much overhead which we don't need at all.

Result: SQLite isn't a scalable backend for our usecase.

You can play with my prototype here: lp:~gtg-user/gtg/sqlite

Revision history for this message
Eduardo Pérez Ureta (edpeur) wrote :

Are you designing gtg to be scalable to many tasks or just few tasks?
If you would like gtg to be fast when there are many tasks in the database a XML file must be avoided, as writing a large file is slow. If you use a SQL database like SQLite there will not be scalability problems as inserting or updating a single task does not write the entire database.

Revision history for this message
Izidor Matušov (izidor) wrote :

We store all tasks in memory, XML file is used just for persistence. We don't need any additional features of SQL. If we use SQLite, it dumps the whole database to disk when committing as well. We don't want to use full featured databases like MySQL, PostgreSQL -- it would introduce too many requirements for a desktop application.

As could you see, it wasn't problem to create another backend where GTG stores data. If you would like to GTG use an SQL database, please contribute with (backend) code and other patches. I would be happy to merge them into the trunk. Because my GTG time is limited (it's just my hobby), I am not going to work on it and spend the time on other bugs instead.

Revision history for this message
Eduardo Pérez Ureta (edpeur) wrote :

I thought you wanted GTG to be scalable to many tasks instead of just a small set of tasks. You should state on the home page that this software is not meant to be used for many tasks. I was looking for a task manager that can handle many tasks and I will have to continue searching.

Xuan (Sean) Hu (huxuan)
Changed in gtg:
assignee: nobody → Xuan Hu (huxuan)
Revision history for this message
Izidor Matušov (izidor) wrote :

I made a simple benchmark if this bug is still critical now (rev 1181).

Scenario:
Start GTG, delete 50 random tasks and close GTG

You can automate it by adding following code at the end of __init__() method in GTG/gtk/browser/browser.py:
<code>
        gobject.idle_add(self.test_case)

    def test_case(self):
        for tid in self.activetree.get_all_nodes()[:50]:
            if self.req.has_task(tid):
                self.req.delete_task(tid,recursive=True)
        gtk.main_iteration()
        gobject.idle_add(gtk.main_quit)
</code>

Current time to complete the test case: 23 seconds
Time to complete the test case without savexml at all: 12 seconds (I need 10 seconds to start my tasks)

Deleting 50 tasks results in 73 times savexml. As you know, the work with a disk is very slow.

This bug is still critical.

Revision history for this message
Xuan (Sean) Hu (huxuan) wrote :

hi, izidor. I try to add the test_case in rev 1185, and run it with './scripts/debug.sh -s bryce' but got an error from liblarch.

the message looks like "Exception: Trying to remove node 1861@1 with no iterator"

Any suggestions?

Revision history for this message
Izidor Matušov (izidor) wrote :

After a discussion and further investigation, we should change internal interface of synchronization services/backends. Instead of saving after every task, backend should get information about tasks which were modified/added and tasks which were removed in one call. Then the backend can decide if it is worth to make a bulk request to the server/saving the file at that moment.

Changed in gtg:
assignee: Xuan Hu (huxuan) → Izidor Matušov (izidor)
tags: removed: love
Revision history for this message
Izidor Matušov (izidor) wrote :

This change would require important modifications in GTG infrastructure and it could be solved later, after releasing GTG 0.3

Changed in gtg:
milestone: 0.3 → 0.4
assignee: Izidor Matušov (izidor) → nobody
importance: Critical → Medium
Changed in gtg:
assignee: nobody → Narendra Joshi (narendraj9)
Jeff Fortin Tam (kiddo)
Changed in gtg:
status: Confirmed → Fix Committed
assignee: Narendra Joshi (narendraj9) → Diego Garcia Gangl (dnicolas)
Jeff Fortin Tam (kiddo)
Changed in gtg:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.