couchdb plugin to share the same notebook on multiple computers

Reported by Raphaël Hertzog on 2010-01-11
32
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Zim
Wishlist
Unassigned

Bug Description

I would like to see a couchdb based plugin that would store the notebook in a couchdb database. That way it would be possible to edit the same notebook on multiple computers and have them synced automatically (i.e. without having to manually sync through a VCS repositiory like bzr).

Fortunately there's a python-couchdb binding (you might want to consider python-desktopcouch too, Zim could then be Ubuntu-one powered).

Especially the use of python-desktopcouch looks very promising. Seems to me this means we can actually re-use UbuntuOne interface made for TomBoy :)

Just have to figure out how they handle data source exactly. Is the couchdb interface syncing notes that are also stored locally. Or are they using the couchdb as the primary storage. I would prefer only using it for syncing, but need to figure out how to merge changes correctly.

On Mon, 11 Jan 2010, Jaap Karssenberg wrote:
> Especially the use of python-desktopcouch looks very promising. Seems to
> me this means we can actually re-use UbuntuOne interface made for TomBoy
> :)
>
> Just have to figure out how they handle data source exactly. Is the
> couchdb interface syncing notes that are also stored locally. Or are
> they using the couchdb as the primary storage. I would prefer only using
> it for syncing, but need to figure out how to merge changes correctly.

Given that python-desktopcouch is an interface to a local couchdb
instance, I believe that couchdb is supposed to be the primary storage.

In tomboy, it's likely that it is not and it only syncs from/to there
given that it has been added afterwards.
I don't have any strong opinion but if couchdb is the primary storage,
it's important to be able to export it as a regular notebook directory.

I don't think you will be able to sync a tomboy couchdb database in Zim
and vice-versa if that's what you meant by "re-use UbuntuOne interface
made for TomBoy".

Cheers,
--
Raphaël Hertzog

Changed in zim:
status: New → Incomplete
importance: Undecided → Wishlist

For a moment I thought that putting "note" records into couchdb would automatically hook into the web interface for editing notes in ubuntu one, but seems that tomboy actually connects with ubunutu one using their own ReST interface. So not sure need to investigate.

As a more general we probably should look into a generic sync function (that also sync to other directories). Than using couchdb can build on that function. This is much like what TomBoy seems to do.

* Keep "last synced version" property, either sequential id or timestamp
* If there is a conflict because a page was modified in both places just move the oldest out of the way to a timestamped backup file - maybe prompt the user first.
* For bonus points the version control could be used to attempt merging changes in a sane way - but adds a lot of complexity to get it right.

If we can do this between directories we can mount remote filesystems (or have gio do it for us). Same framework would also work if you use couchdb records instead of files to sync to. Best way to implement this is to build the syncing on top of the notebook API, so implement syncing on the level of syncing between notebook objects. Then you write a storage backend for the couchdb API. So you could either natively run on couchdb, or (as I would prefer), sync between a couchdb driven notebook and a local copy.

In summary
* Implement basic syncing algorithm for Notebook objects
* Implement store subclass for couchdb
* Add GUI for syncing

P.S. of course you can already sync with ubuntu one using the shared directory...

Raphaël Hertzog (hertzog) wrote :

On Tue, 12 Jan 2010, Jaap Karssenberg wrote:
> For a moment I thought that putting "note" records into couchdb would
> automatically hook into the web interface for editing notes in ubuntu
> one, but seems that tomboy actually connects with ubunutu one using
> their own ReST interface. So not sure need to investigate.

AFAIK that's right, but if you want it to work properly you have to follow
that structure:
http://www.freedesktop.org/wiki/Specifications/desktopcouch/note

Which means that we need to store HTML and not wiki markup, and IMO that's
not what we want.

OTOH, we could write an HTML output and store the wiki markup in a zim
specific key. It would be difficult to merge back changes made in the HTML
though.

> As a more general we probably should look into a generic sync function
> (that also sync to other directories). Than using couchdb can build on
> that function. This is much like what TomBoy seems to do.

I don't see what we gain from such a feature, I rarely need to manually
sync between two Zim directories. And if I really need that, I'm prepared
for it and use a version control system.

> * Keep "last synced version" property, either sequential id or timestamp
> * If there is a conflict because a page was modified in both places just move the oldest out of the way to a timestamped backup file - maybe prompt the user first.

In theory couchdb does this for us precisely.

> * For bonus points the version control could be used to attempt merging changes in a sane way - but adds a lot of complexity to get it right.

That doesn't work well if both directories do not share a common ancestry.

> P.S. of course you can already sync with ubuntu one using the shared
> directory...

Indeed but I'm neither using Ubuntu nor UbuntuOne and since I prefer
keeping data on my own servers, I have setup my own couchdb server. The
shared directory is something specific to ubuntu/canonical as far as I
understood.

Cheers,
--
Raphaël Hertzog

On Tue, Jan 12, 2010 at 7:13 PM, Raphael Hertzog <email address hidden> wrote:
>> * Keep "last synced version" property, either sequential id or timestamp
>> * If there is a conflict because a page was modified in both places just move the oldest out of the way to a timestamped backup file - maybe prompt the user first.
>
> In theory couchdb does this for us precisely.

I guess this works when you use couchdb as the native storage. But I
guess you will have to check client side for remote changes and deal
with them if you still have uncomitted changes yourself. Or am I
missing a couchdb feature ?

On Tue, 12 Jan 2010, Jaap Karssenberg wrote:
> On Tue, Jan 12, 2010 at 7:13 PM, Raphael Hertzog <email address hidden> wrote:
> >> * Keep "last synced version" property, either sequential id or timestamp
> >> * If there is a conflict because a page was modified in both places just move the oldest out of the way to a timestamped backup file - maybe prompt the user first.
> >
> > In theory couchdb does this for us precisely.
>
> I guess this works when you use couchdb as the native storage. But I
> guess you will have to check client side for remote changes and deal
> with them if you still have uncomitted changes yourself. Or am I
> missing a couchdb feature ?

http://books.couchdb.org/relax/reference/conflict-management

You deal with conflicts after they have happened, not in real time.
The DB stores all the conflicting version and from time to time, you have
go over the known conflicts and resolve them. They call that "Eventual
Consistency".

Cheers,
--
Raphaël Hertzog

Ok, now I get it. Pre-requisite for this is that we always have a local couchdb to store updates immediately. If I understand correctly this is the idea behind desktop-couchdb. So using couchdb as the primary storage will work best then.

So the shopping list becomes:
1) store class to interface with couchdb
    a) get / set pages
    b) get / set attachments
        - for interface with zim need to fake a directory with attachments, or use a tmp directory to access them
2) user interface to configure zim to use couchdb
3) user interface to resolve conflicts
    - show some warning for pages that contain conflicts
    - dialog to compare conflicting pages and resolve manually
    - some way to get a list of conflicts (popup when opening a notebook with conflicts)

Raphaël Hertzog (hertzog) wrote :

On Wed, 13 Jan 2010, Jaap Karssenberg wrote:
> Ok, now I get it. Pre-requisite for this is that we always have a local
> couchdb to store updates immediately. If I understand correctly this is
> the idea behind desktop-couchdb.

Right.

> So the shopping list becomes:

Looks OK from here.

> 1) store class to interface with couchdb
> a) get / set pages
> b) get / set attachments
> - for interface with zim need to fake a directory with attachments, or use a tmp directory to access them

I think a tmp directory in .cache/zim/<notebook name>/ is ok for this
purpose.

Potential problem: I'm not quite sure how well binary blob are supported in the
python-couchdb/python-desktop-couch interface.

> 2) user interface to configure zim to use couchdb
> 3) user interface to resolve conflicts
> - show some warning for pages that contain conflicts
> - dialog to compare conflicting pages and resolve manually
> - some way to get a list of conflicts (popup when opening a notebook with conflicts)

Maybe some integration with the index view to attract the attention of the
user on conflicting pages.

Cheers,
--
Raphaël Hertzog

On Wed, Jan 13, 2010 at 2:05 PM, Raphael Hertzog <email address hidden> wrote:
> Potential problem: I'm not quite sure how well binary blob are supported in the
> python-couchdb/python-desktop-couch interface.

If binary is a problem we can always wrap non-text data in base64
encoding. Adds a little bit of overhead, but I assume default usage
for attachments is relatively small files.

> Maybe some integration with the index view to attract the attention of the
> user on conflicting pages.

I was thinking to add a bar to the top of the window with an alert and
a button to open a dialog showing all conflicts. E.g. gedit uses a
similar interface element to alert for encoding issues in a file, and
nautilus uses such a message bar for special directories like Thrash.

So work flow would be
1) bar appears warning that there are conflicts
2) button in the bar shows a dialog with a page / file list
3) clicking on a page shows the page in the main window and the
conflicting version in a separate window
    - the separate window is read-only and has a button "discard" or "resolved"
    - the separate window has a list with conflicting versions to go
through (if multiple versions are conflicting)
    - if feasible highlight changes in the conflicting version
    - user should update latest version to include changes - or not -
and then discard the conflicting version
4) clicking on an attachment ...

How to resolve conflicts in attachments?
Probably should open a (tmp) directory with all conflicting version in
a file manager to allow arbitrary editing.
But how to select which one to keep after editing?
Or a simplified dialog with some file manager functions.

--
P.S. I'm also drafting some thoughts on syncing in a more general
context. I think the same conflict resolution mechanism we are
discussing here could also be used to resolve conflicts from version
control. Allowing better support for merging remote versions in the
version control plugin.

I considered doing file merging as well - but as pointed out above, it
makes more sense to use version control for this in the first place.
(Let the VCS try to merge changes - if it fails allow manual conflict
resolution.) Also there is a "custom tool" feature in the make that
would allow people to specify e.g. their own rsync command - no need
to have specific plugins for that.

On Thu, 14 Jan 2010, Jaap Karssenberg wrote:
> If binary is a problem we can always wrap non-text data in base64
> encoding. Adds a little bit of overhead, but I assume default usage
> for attachments is relatively small files.

Right.

> I was thinking to add a bar to the top of the window with an alert and
> a button to open a dialog showing all conflicts. E.g. gedit uses a
> similar interface element to alert for encoding issues in a file, and
> nautilus uses such a message bar for special directories like Thrash.

Would be fine too.

> How to resolve conflicts in attachments?
> Probably should open a (tmp) directory with all conflicting version in
> a file manager to allow arbitrary editing.
> But how to select which one to keep after editing?
> Or a simplified dialog with some file manager functions.

I think just allowing the user to pick one is ok. And he should be able to
save the other variants in case he wants to do some manual merge later on.

> P.S. I'm also drafting some thoughts on syncing in a more general
> context. I think the same conflict resolution mechanism we are
> discussing here could also be used to resolve conflicts from version
> control. Allowing better support for merging remote versions in the
> version control plugin.

Indeed, all good VCS allow you to retrieve 3 variants of the conflicting
file (common ancestor, variant A, variant B). With couchdb we might not
have the common ancestor though.

Quick question: if i write a new store class, how can I manually test it?
Is there a config entry that indicates which class is used for a given
notebook?

Cheers,
--
Raphaël Hertzog

On Thu, Jan 14, 2010 at 7:38 PM, Raphael Hertzog <email address hidden> wrote:
> Quick question: if i write a new store class, how can I manually test it?
> Is there a config entry that indicates which class is used for a given
> notebook?

Afraid that is not yet implemented. In Noetbook.__init__ you see one line

    self.add_store(Path(':'), 'files') # set root

If you change 'files' to the name of your store module you can test it.

Adding the config setting is not much more work, but I can do that as well.

Regards,

Jaap

Changed in zim:
status: Incomplete → Confirmed
tags: added: couchdb syncing

Digging this thread from the past.

I am currently implementing a couchdb store class. It's the first time I use both bzr and launchpad, so I haven't had time to set up my account and my branch. I will do it ASAP =]

I have a remark about document versions : One shouldn't use couchdb's _rev to store revisions of his documents.

_rev effectively stores revisions, but it is relevant only to couchdb to resolve conflicts upon replication with other couchdbs. When you compact your database, you lose all the _revs that are not part of a conflict: you can't rely on this to store documents versions.

My idea is to store one document per version of a page. Each of these documents would be identified by a couchdb-generated id that ensures unicity among any of your databases. It would also contain the version number.

By using one document per version, we can share all of the versions in the databases, and detect conflicts easily.

Also, I understand that everyone would like to see this implemented as a plugin rather as a core storage method. I am also a proponent of "plain text if possible". But storing pages in couchDB has the advantage of not dealing with the system's means to detect changes; all the changes are already in the db, and can be livestreamed with a friend for realtime collaboration.

Any way, thank you for Zim, it is a very good software that I am happy to use !

On Sat, Apr 28, 2012 at 11:18 PM, Matthieu Rakotojaona
<email address hidden> wrote:
> My idea is to store one document per version of a page. Each of these
> documents would be identified by a couchdb-generated id that ensures
> unicity among any of your databases. It would also contain the version
> number.
>
> By using one document per version, we can share all of the versions in
> the databases, and detect conflicts easily.

From my point of view storing old versions in the database as well
should be optional. Some people may want to use it, but it will be
much less efficient than storing versions in a dedicated version
control system like bazaar, git, etc. The main purpose of couchdb is
syncing.

> Also, I understand that everyone would like to see this implemented as a
> plugin rather as a core storage method. I am also a proponent of "plain
> text if possible". But storing pages in couchDB has the advantage of not
> dealing with the system's means to detect changes; all the changes are
> already in the db, and can be livestreamed with a friend for realtime
> collaboration.

Maybe first create a version that only uses couchdb, then we can add a
"mirroring" version on top of it which stores data both in the db and
syncs to the filesystem.

Btw. do you also intent to build in conflict resolution in zim? If so
please have a look at the open bounty for that topic and check the
wiki page with details.

-- Jaap

> From my point of view storing old versions in the database as well
> should be optional. Some people may want to use it, but it will be
> much less efficient than storing versions in a dedicated version
> control system like bazaar, git, etc. The main purpose of couchdb is
> syncing.

Very true. Yet, the CouchDB's doc format I plan to use (I'm gonna have to document it) is made for replication, and extending it to manage versions wouldn't be very hard. But this is still another matter.

> Btw. do you also intent to build in conflict resolution in zim? If so
> please have a look at the open bounty for that topic and check the
> wiki page with details.

Conflict resolution is a very interesting problem. I'd love to tackle it, but I'm afraid I do not have that much time. But I'll share anything I can think about =]

On Thu, May 3, 2012 at 10:24 PM, Matthieu Rakotojaona
<email address hidden> wrote:
>> Btw. do you also intent to build in conflict resolution in zim? If so
>> please have a look at the open bounty for that topic and check the
>> wiki page with details.
>
> Conflict resolution is a very interesting problem. I'd love to tackle
> it, but I'm afraid I do not have that much time. But I'll share anything
> I can think about =]

If I understand correctly couchdb already has a mechanism to deal with
conflicting versions. It just flags one version as a conflict but
stores it anyway. What would be needed in zim is an interface to make
the user aware of the existence of a conflicting version and allow the
user to inspect it and decide how to resolve.

See http://www.zim-wiki.org/wiki/doku.php?id=resolving_syncing_conflicts
for notes.

Regards,

Jaap

David Austin (lpdf) wrote :

>It just flags one version as a conflict but
stores it anyway.

According ISO9000 rules, the best way to do this (from a document control standpoint) is to check before doing any edits to see if anyone is currently editing it (in which case it is in read-only mode), and if so then give you the option to notify them that you'd like to make an edit (ideally vim automatically does the notify within the editor interface). When the preexisting editor confirms they are done then you get a notification that the wikipage is editable.

In fact, I think this would be much easier to implement, and it would eliminate the need to leave it up to whomever to decide what to do with the edit's in que (as if one person would have a greater right to that determination than another).

Is this method an inconvenience? I'd say it's LESS of an inconvenience than ending up with incompatible simultaneous edits that must be resolved before additional edits can be made.

It's also my preference for how to do it (admittedly, I know this means nothing to anyone else, but I thought I'd mention it).

I use ZIM at 3 PC and do sync the notebooks folder via ubuntuone. It works without problems so far. I havent tested synchron editing via 2 users, though.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related questions