GTG

directory backend

Bug #625025 reported by Luca Invernizzi
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
GTG
Won't Fix
Undecided
Unassigned

Bug Description

Several people have been asking for a way to sync GTG which doesn't use external services for privacy reasons.
Couchdb is a viable solution, but non using ubuntu-one to sync couchdb is not simple.
Localfile is not a solution because it doesn't check that if the file has changed. (maybe we should state that)
Moreover, you have to merge localfiles to sync to task sets, which is not a nice thing

A solution could be a directory backend, where each file is a task. It's better for syncing and to keep in a code revision system (like bzr).

Marking this as "love" as it's not difficult to do (it just has to go along the RTM backend lines), and need the developer to understand only the Backend class (not all GTG). Ask me if you're interested.

Tags: backends love
Changed in gtg:
status: New → Triaged
importance: Undecided → Wishlist
milestone: none → 0.4
Revision history for this message
Izidor Matušov (izidor) wrote :

Luca> Do you think this is still a good idea to build this plugin? In other words, there would be people using this backend?

In what format do you think a task file should be? XML? JSON? Something else? I suppose that the name of a task file would be id of the task (+suffix like ".xml") Should be all files just in a single directory? Should be tasks that belong togheter like parents + children in the same file or each in an own file?

Do you have somewhere notes to this old bug?

Changed in gtg:
status: Triaged → Incomplete
milestone: 0.4 → 0.3
Izidor Matušov (izidor)
Changed in gtg:
milestone: 0.3 → 0.3.2
Revision history for this message
Ted (tedks) wrote :

I'd like to work on this, since I've hit the point where I have serious problems with the flat-XML backend, and this proposal will accomodate syncing using existing tools.

I propose the following format:

backend_directory/ # a directory under gtg's local directory
   projects.xml
   tags.xml
   <project_id>/ # directory for each project
      <task_id>/ # directory for each task
         task.xml # one file per task per directory -- corresponds to the xml element task in backend_localfile
         <subtask_id> # recursive; same as task_id

This structure allows gtg to update the time on task directories it needs to mark for synchronization, provoking programs based on rsync to copy those directories. It also scales to an arbitrary number of tasks, because there's never a reason to read the directory, much like maildir.

Does that sound good? (Feel free to email or IM me about this, my contact info is on launchpad.)

Revision history for this message
Izidor Matušov (izidor) wrote :

You've made a good point! Few comments:
  * I would suggest instead of having directories and subdirectories, it would be better to just have a directory full of xml files. Reasons:
     - one task can have multiple parents
     - user can change parents/children easily and its more convenient to just update an XML file than moving directories
     - common file system can handle several thousands of files in the same directory, no limitation there
  * You would spend more disk space and inodes. The minimal size of file for EXT3 is 4kB, a regular task file would have less. Nowadays, disks are cheap, it shouldn't be a problem.

If you want to work on it now, contact me and I can help you to get started. If not, I am planning to organize an online hackathon dedicated to creating backends, you may want to join.

Changed in gtg:
status: Incomplete → Triaged
Revision history for this message
Ted (tedks) wrote :

I think for the directory backend to be useful, it has to have a 1:1 mapping of tasks:files. Otherwise, we're just recreating the file backend.

I'm also not sure why or how one task could have multiple parents. I've been using GTD for maybe a year now, and I've never discovered that feature. I've only ever organized tasks in a strict hierarchy. However, continuing the maildir metaphor, it's easy to have one email in multiple folders, so it shouldn't be difficult to have one task in a number of places. In the worst case we can symlink.

As for file size, I've hit the scalability limit of backend_localfile at probably 200 tasks. That would be less than a megabyte on disk. I don't think the 4KB limit matters.

I'd love to start working on it now; I'd be down to participate in the hackathon.

Revision history for this message
Izidor Matušov (izidor) wrote :

Obviously, I didn't explain my reasons clearly enough. Let's have an example:
a
--> b
----> c
--> d

I suggest that backend create FS structure like this:
<dir>/
  a.xml
  b.xml
  c.xml
  d.xml

instead of
<dir>/
  a/
   task.xml
   b/
    task.xml
    c/
     task.xml
   d/
    task.xml

Relationships between tasks would be described in XML file of a task, no in FS. Another advantage is you can reuse code and write element <task /> into a standalone XML file.

Revision history for this message
Ted (tedks) wrote : Re: [Bug 625025] Re: directory backend

On Wed, 2012-02-29 at 20:16 +0000, Izidor Matušov wrote:
> Obviously, I didn't explain my reasons clearly enough. Let's have an example:
> a
> --> b
> ----> c
> --> d
>
> I suggest that backend create FS structure like this:
> <dir>/
> a.xml
> b.xml
> c.xml
> d.xml
>
> instead of
> <dir>/
> a/
> task.xml
> b/
> task.xml
> c/
> task.xml
> d/
> task.xml
>
> Relationships between tasks would be described in XML file of a task, no
> in FS. Another advantage is you can reuse code and write element <task
> /> into a standalone XML file.
>

My current biggest single problem in the localfile backend is the
fragility of parent-child relationships. Almost every time I start gtg,
I have to manually rework the hierarchy of tasks. Directories offer a
simple way to make the parent-child relationship far more explicit. We
need to store the subtasks in the task.xml file anyway, so that we never
need to readdir() a potentially large directory.

Frankly, I also don't see why it would be worth it to implement a
directory backend unless the backend reflects the task structure. Having
each task have its own directory also makes it easier for external tools
(like rsync) to do syncing.

Revision history for this message
Izidor Matušov (izidor) wrote :

The fragility of relationships is a problem we are working. Further information on this problem would be helpful - which version of GTG do you use? How often are you able to lose relationships? What do you do? (e.g. Every 4th time I start my GTG I lose those and those relaltionships)

Backends are designed to just execute commands from liblarch library which stores and manipulates tasks (and is reason for relationships problems). Having another backend probably doesn't solve your problem.

I see potentional problems with directories. You can effectively use rsync on a bunch of files as well. Contributing to Open Source should be fun, choose a way which you like and go for it! It's up to you how you implement it.

Revision history for this message
Ted (tedks) wrote :

It's disappointing that I won't be able to fix the relationship problem with this backend project, but I'd still like to work on it.

My main concern for having lots of files in one directory is that then, running fstat or readdir on the directory will take a longer time. Breaking it up into subdirectories makes sense and is unsurprising.

When is this backend hackfest going to be (if you don't know exactly, a rough idea will do)? What classes/files should I look at (I'm guessing genericbackend.py and similar)? My plan is to work on this in my free time and have it done by the end of March at latest.

Revision history for this message
Izidor Matušov (izidor) wrote :

I've just sent an email about the hackathon. It will be in April/May.

You want to reimplement GTG/backends/backend_localfile.py. You will be probably interested in branch lp:~gtg-user/gtg/sqlite where I tried to use SQLite instead of XML file as a backend.

Izidor Matušov (izidor)
Changed in gtg:
milestone: 0.3.2 → 0.4
Izidor Matušov (izidor)
Changed in gtg:
status: Triaged → Won't Fix
milestone: 0.4 → none
importance: Wishlist → Undecided
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.