Monitoring Create/Move/Copy Files events

Bug #602211 reported by Seif Lotfy
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Unity
Triaged
Low
Unassigned
Zeitgeist Datahub
Confirmed
Undecided
Unassigned
Zeitgeist Framework
Fix Released
Wishlist
Seif Lotfy
unity-2d
Invalid
Low
Unassigned
unity-lens-files
Triaged
Low
Unassigned
unity (Ubuntu)
Confirmed
Undecided
Unassigned
unity-lens-files (Ubuntu)
Triaged
Undecided
Unassigned

Bug Description

An issue we are facing at the moment is that ppl lose track of there files in a timeline if the file was moved around or renamed. I would propose using taskview or patch nautilus to actually grab those events and either:
1) Modify the uris in the uris table
2) Create a new table with | new_id | old_uri_id | event | to map uris to their actual ids and the event that allowed the change, this would allow us to track a history of renaming or moving a file. It will look a bit like the following:

9 | 9 | 48124 # CREATE EVENT
12 | 9 | 48126 # MOVE EVENT

In other words the last row means uri 12 was moved from uri 9 with event 48126

UPDATE:

3) Create a changable_uri table that is a map of the uri table. it gets updated upon moved and rename.
We then add new resulttype that allow you to ask for either pureSubject or adaptedSubject. depending on which one is chosen we then use the according table in the join of the find_events_query :)

Related branches

Revision history for this message
Siegfried Gevatter (rainct) wrote :

How would it make sense? We care about events, not files.

Revision history for this message
Seif Lotfy (seif) wrote : Re: [Bug 602211] Re: Monitoring for new files

we will be able to track move/copy/ delete :)

On Wed, Jul 7, 2010 at 12:39 AM, Siegfried Gevatter <email address hidden> wrote:
> How would it make sense? We care about events, not files.
>
> --
> Monitoring for new files
> https://bugs.launchpad.net/bugs/602211
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Zeitgeist Framework: New
>
> Bug description:
> I  was thinking on how tracker monitors new files to index. Turns out they monitor directories using inotify. Recurse XDG dirs and single iterate $HOME
> Does this make sense to us?
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/zeitgeist/+bug/602211/+subscribe
>

--
This is me doing some advertisement for my blog http://seilo.geekyogre.com

Revision history for this message
Siegfried Gevatter (rainct) wrote :

2010/7/7 Seif Lotfy <email address hidden>:
> we will be able to track move/copy/ delete :)

OK, that'd be nice. My impression is that inotify sucks, though, or
has this changed?

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote : Re: Monitoring for new files

Inotify still sucks rest assured :-)

A watch on a dir will only tell us when some unspecified thing happens to some unspecified file in that directory. We could install monitors directly on the last N (fx. 100) actual file inodes perhaps, but I am reluctant to do that - what happens when we aren't running? Do you wanna stat() all the last N files on startup (eeeeek!)?

In any case this could be done entirely in an extension.

As I said before I think FANotify and/or btrfs is our only hope here - or actually using Tracker directly to monitor these events. Tracker is probably the most advanced file monitor out there these days.

Revision history for this message
Seif Lotfy (seif) wrote : Re: [Bug 602211] Re: Monitoring for new files

If we go with Tracker then:
1) we will need to use their uuid stuff
2) new dependancy?

On Mon, Jul 19, 2010 at 10:24 AM, Mikkel Kamstrup Erlandsen
<email address hidden> wrote:
> Inotify still sucks rest assured :-)
>
> A watch on a dir will only tell us when some unspecified thing happens
> to some unspecified file in that directory. We could install monitors
> directly on the last N (fx. 100) actual file inodes perhaps, but I am
> reluctant to do that - what happens when we aren't running? Do you wanna
> stat() all the last N files on startup (eeeeek!)?
>
> In any case this could be done entirely in an extension.
>
> As I said before I think FANotify and/or btrfs is our only hope here -
> or actually using Tracker directly to monitor these events. Tracker is
> probably the most advanced file monitor out there these days.
>
> --
> Monitoring for new files
> https://bugs.launchpad.net/bugs/602211
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Zeitgeist Framework: New
>
> Bug description:
> I  was thinking on how tracker monitors new files to index. Turns out they monitor directories using inotify. Recurse XDG dirs and single iterate $HOME
> Does this make sense to us?
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/zeitgeist/+bug/602211/+subscribe
>

--
This is me doing some advertisement for my blog http://seilo.geekyogre.com

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote : Re: Monitoring for new files

Seif: Quoting myself: "In any case this could be done entirely in an extension" - so no new depency unless we want it in the main source tree.

Can you explain 2) - I can't see why this need be the case...

Revision history for this message
Seif Lotfy (seif) wrote : Re: [Bug 602211] Re: Monitoring for new files

Sorry did not see that :)

On Mon, Jul 19, 2010 at 2:53 PM, Mikkel Kamstrup Erlandsen
<email address hidden> wrote:
> Seif: Quoting myself: "In any case this could be done entirely in an
> extension" - so no new depency unless we want it in the main source
> tree.
>
> Can you explain 2) - I can't see why this need be the case...
>
> --
> Monitoring for new files
> https://bugs.launchpad.net/bugs/602211
> You received this bug notification because you are a direct suSbscriber
> of the bug.
>
> Status in Zeitgeist Framework: New
>
> Bug description:
> I  was thinking on how tracker monitors new files to index. Turns out they monitor directories using inotify. Recurse XDG dirs and single iterate $HOME
> Does this make sense to us?
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/zeitgeist/+bug/602211/+subscribe
>

--
This is me doing some advertisement for my blog http://seilo.geekyogre.com

Revision history for this message
Seif Lotfy (seif) wrote : Re: Monitoring for new files

AFAIK tracker just monitors the xdg directories recursivly and the home directory. Maybe an extension could do that for us

Revision history for this message
Markus Korn (thekorn) wrote :

I think no zeitgeist product should ever directly whatchout for such file changes/creations/etc., BUT why not adding a dataprovider to the datahub which watchs out for tracker dbus signals of modifications/creations/etc. and converts this signals to zg events.
So if a user has tracker installed we get this type of events for `free`

Revision history for this message
Markus Korn (thekorn) wrote :

Oh, sorry, hit the send button too early: that's what I'm doing for maemo

Revision history for this message
Seif Lotfy (seif) wrote :

Markus All Tracker does is use inotify on the xdg directories recursively as i said. So I agree half way with you here. We can write a dataprovider with the new datahub that montiors those directories and send us the events from there :) No need to depend on Tracker. And it is an easy hack... What do you think?

Changed in zeitgeist:
importance: Undecided → Wishlist
assignee: nobody → Michal Hruby (mhr3)
status: New → Opinion
status: Opinion → New
Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

I am still not convinced on this approach... I think file monitoring is exactly what we want to avoid in ZG...

If you insist on doing it, then I still don't know why you want to use inotify directly and not an abstraction layer like gio?

Revision history for this message
Seif Lotfy (seif) wrote : Re: [Bug 602211] Re: Monitoring for new files

If its going to be in the DataHub or DataProvider then GIO by all means...
Again I am no 100% convinced myself
but I see Tracker doing it and not having much issues. So it wont be in
Zeitgeist directly...

On Fri, Aug 13, 2010 at 3:25 PM, Mikkel Kamstrup Erlandsen <
<email address hidden>> wrote:

> I am still not convinced on this approach... I think file monitoring is
> exactly what we want to avoid in ZG...
>
> If you insist on doing it, then I still don't know why you want to use
> inotify directly and not an abstraction layer like gio?
>
> --
> Monitoring for new files
> https://bugs.launchpad.net/bugs/602211
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Zeitgeist Framework: New
> Status in Zeitgeist Datahub: New
>
> Bug description:
> I was thinking on how tracker monitors new files to index. Turns out they
> monitor directories using inotify. Recurse XDG dirs and single iterate $HOME
> Does this make sense to us?
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/zeitgeist/+bug/602211/+subscribe
>

--
This is me doing some advertisement for my blog http://seilo.geekyogre.com

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote : Re: Monitoring for new files

Seif, what makes you believe Tracker doesn't have issues? I can tell you for sure that they poured a LOT of development effort into making file monitoring as light as possible. It wont work well if you don't really take care in how you set all the monitors up.

Revision history for this message
Michal Hruby (mhr3) wrote :

I also don't think this is a good approach, for one the notifications often fire very soon - it might not even be possible to get mimetype of the file (not to mention .part-type files), and I'm not so sure the API enables you to determine that an operation is a move/copy. You'd probably just see a new file somewhere and a while later another file disappearing.

IMO it'd be better to wait for this year's TaskView API to mature, and we'd just write a data provider which will use its DBus service.

Revision history for this message
Seif Lotfy (seif) wrote : Re: [Bug 602211] Re: Monitoring for new files

Yeah I would say the best solution is to wait for TaksView

On Fri, Aug 13, 2010 at 4:14 PM, Michal Hruby <email address hidden> wrote:

> I also don't think this is a good approach, for one the notifications
> often fire very soon - it might not even be possible to get mimetype of
> the file (not to mention .part-type files), and I'm not so sure the API
> enables you to determine that an operation is a move/copy. You'd
> probably just see a new file somewhere and a while later another file
> disappearing.
>
> IMO it'd be better to wait for this year's TaskView API to mature, and
> we'd just write a data provider which will use its DBus service.
>
> --
> Monitoring for new files
> https://bugs.launchpad.net/bugs/602211
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Zeitgeist Framework: New
> Status in Zeitgeist Datahub: New
>
> Bug description:
> I was thinking on how tracker monitors new files to index. Turns out they
> monitor directories using inotify. Recurse XDG dirs and single iterate $HOME
> Does this make sense to us?
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/zeitgeist/+bug/602211/+subscribe
>

--
This is me doing some advertisement for my blog http://seilo.geekyogre.com

Revision history for this message
Seif Lotfy (seif) wrote : Re: Monitoring for new files

Guys
http://ssickert.wordpress.com/2010/08/19/taskview-status-update/
So its out and the API seems to stand...
Now:
1) We can create and extension that communicates with taskview to detect copy/paste and some create
2) We use http://github.com/ssickert/nautilus-taskview and modify it for our own good.
I don't want to appear like an asshole but if taskview is not adopted by gnome properly i intend to use the code from my send solution...

Seif Lotfy (seif)
Changed in zeitgeist:
milestone: none → 0.7
Revision history for this message
Seif Lotfy (seif) wrote :

The more I think about it, the more I see this bug has no place here. I agree with Markus now that the bug should be if handled then by the Zeitgeist Datahub, or Nautilus or so sending us the info directly. Thus I am marking it as invalid.

Changed in zeitgeist:
status: New → Invalid
Changed in zeitgeist-datahub:
status: New → Incomplete
Revision history for this message
Manish Sinha (मनीष सिन्हा) (manishsinha) wrote :

Just wanted to know that if this bug is Invalid, then why it has been assigned a Milestone. You people still want to work on it in future?

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

Marking as Triaged instead of Invalid.

As far as I know; it's not that we don't want to have a solution here, it's just that with the current software platform (from kernel, libs, to Python) it's simply not feasible to do in general.

I will not be ready to accept anything that has even the *slightest* chance of bogging down you system by thrashing the disk. - So if anyone intents to have a stab at this, please do discuss the ideas here before wasting lot of time :-)

Changed in zeitgeist:
status: Invalid → Triaged
Revision history for this message
Seif Lotfy (seif) wrote :

I changed it to Confirmed. We still have no solution or ideas how we want to solve the issue. We only know how we don't want to solve it.

Changed in zeitgeist:
status: Triaged → Confirmed
Revision history for this message
S. Sickert (s-sickert-deactivatedaccount) wrote :

If you have issues with libtaskview, please contact me. I'm reworking some bits to provide some information, which is essential for zeitgeist:

You can find the improved D-Bus API here:

http://github.com/ssickert/TaskView/blob/master/spec/Generic.xml
http://github.com/ssickert/TaskView/blob/master/spec/IO.xml

Seif Lotfy (seif)
Changed in zeitgeist:
assignee: Michal Hruby (mhr3) → nobody
Changed in zeitgeist-datahub:
assignee: nobody → S. Sickert (s-sickert)
Seif Lotfy (seif)
summary: - Monitoring for new files
+ Monitoring Create/Move/Copy Files events
Seif Lotfy (seif)
description: updated
Revision history for this message
Siegfried Gevatter (rainct) wrote :

1) Modify the uris in the uris table
-1, events are immutable by definition, they aren't supposed to change

Revision history for this message
Seif Lotfy (seif) wrote : Re: [Bug 602211] Re: Monitoring Create/Move/Copy Files events

That is why i proposed solution 2 which in my opinion covers everything :)

On Sun, Nov 7, 2010 at 2:36 PM, Siegfried Gevatter <email address hidden>wrote:

> 1) Modify the uris in the uris table
> -1, events are immutable by definition, they aren't supposed to change
>
> --
> Monitoring Create/Move/Copy Files events
> https://bugs.launchpad.net/bugs/602211
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Zeitgeist Framework: Confirmed
> Status in Zeitgeist Datahub: Incomplete
>
> Bug description:
> An issue we are facing at the moment is that ppl lose track of there files
> in a timeline if the file was moved around or renamed. I would propose using
> taskview or patch nautilus to actually grab those events and either:
> 1) Modify the uris in the uris table
> 2) Create a new table with | new_id | old_uri_id | event | to map uris
> to their actual ids and the event that allowed the change, this would allow
> us to track a history of renaming or moving a file. It will look a bit like
> the following:
>
> 9 | 9 | 48124 # CREATE EVENT
> 12 | 9 | 48126 # MOVE EVENT
>
> In other words the last row means uri 12 was moved from uri 9 with event
> 48126
>
>
>
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/zeitgeist/+bug/602211/+subscribe
>

--
This is me doing some advertisement for my blog http://seilo.geekyogre.com

Revision history for this message
S. Sickert (s-sickert-deactivatedaccount) wrote :

Reading through all the comments - it seems to me that there are two problems to tackle:

1) Determine which files are interesting for the user - this can be done with the Taskview API

2) Keep track of these files - this must be done with inotify, as there are many programs which don't notify anybody about file changes. (e.g mv, batch-renamer, ...). There is still the problem of "offline" (we are not running) changes, but I don't see any solution except scanning the whole filesystems, which is also very suboptimal.

Of course there is a bit redundancy between 1 and 2 (e.g. move already known files with nautilus) and must be filtered out.

Revision history for this message
Seif Lotfy (seif) wrote :

Nautilus can tell us when a file is being renamed or moved from whithin
nautilus.

On Sun, Nov 7, 2010 at 7:27 PM, S. Sickert <email address hidden> wrote:

> Reading through all the comments - it seems to me that there are two
> problems to tackle:
>
> 1) Determine which files are interesting for the user - this can be done
> with the Taskview API
>
> 2) Keep track of these files - this must be done with inotify, as there
> are many programs which don't notify anybody about file changes. (e.g
> mv, batch-renamer, ...). There is still the problem of "offline" (we are
> not running) changes, but I don't see any solution except scanning the
> whole filesystems, which is also very suboptimal.
>

We can keep track of files as long as applications tell us if they are
moving them around. Currently only nautilus and commandline can undertake
such activities. But how many normal users use commandline.
There is NO WAY we will use inotify so the best solution right now is to get
the info:
1) moving and copying over nautilus -> taskview -> zeitgeist
2) renaming shoul be done over nautilus -> zeitgeist

>
> Of course there is a bit redundancy between 1 and 2 (e.g. move already
> known files with nautilus) and must be filtered out.
>
> --
> Monitoring Create/Move/Copy Files events
> https://bugs.launchpad.net/bugs/602211
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Zeitgeist Framework: Confirmed
> Status in Zeitgeist Datahub: Incomplete
>
> Bug description:
> An issue we are facing at the moment is that ppl lose track of there files
> in a timeline if the file was moved around or renamed. I would propose using
> taskview or patch nautilus to actually grab those events and either:
> 1) Modify the uris in the uris table
> 2) Create a new table with | new_id | old_uri_id | event | to map uris
> to their actual ids and the event that allowed the change, this would allow
> us to track a history of renaming or moving a file. It will look a bit like
> the following:
>
> 9 | 9 | 48124 # CREATE EVENT
> 12 | 9 | 48126 # MOVE EVENT
>
> In other words the last row means uri 12 was moved from uri 9 with event
> 48126
>
>
>
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/zeitgeist/+bug/602211/+subscribe
>

--
This is me doing some advertisement for my blog http://seilo.geekyogre.com

Revision history for this message
S. Sickert (s-sickert-deactivatedaccount) wrote :

I agree that 95% of the user never use the commandline, so that's not a big problem.

AFAIK renaming in nautilus is just moving the file.

Revision history for this message
Seif Lotfy (seif) wrote :

awesome :)

On Sun, Nov 7, 2010 at 9:15 PM, S. Sickert <email address hidden> wrote:

> I agree that 95% of the user never use the commandline, so that's not a
> big problem.
>
> AFAIK renaming in nautilus is just moving the file.
>
> --
> Monitoring Create/Move/Copy Files events
> https://bugs.launchpad.net/bugs/602211
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Zeitgeist Framework: Confirmed
> Status in Zeitgeist Datahub: Incomplete
>
> Bug description:
> An issue we are facing at the moment is that ppl lose track of there files
> in a timeline if the file was moved around or renamed. I would propose using
> taskview or patch nautilus to actually grab those events and either:
> 1) Modify the uris in the uris table
> 2) Create a new table with | new_id | old_uri_id | event | to map uris
> to their actual ids and the event that allowed the change, this would allow
> us to track a history of renaming or moving a file. It will look a bit like
> the following:
>
> 9 | 9 | 48124 # CREATE EVENT
> 12 | 9 | 48126 # MOVE EVENT
>
> In other words the last row means uri 12 was moved from uri 9 with event
> 48126
>
>
>
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/zeitgeist/+bug/602211/+subscribe
>

--
This is me doing some advertisement for my blog http://seilo.geekyogre.com

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote : Re: [Zeitgeist] [Bug 602211] Re: Monitoring Create/Move/Copy Files events

Keep an LRU cache[1] of inotify watches (or GFileMonitors more likely)
on the last 350 used files. Log some relevant event when they change.

This could be done either from an extension or a DP. The tricky part
os probably making sure that we don't log a lot of dupe entries (I'm
not sure the DataSourceRegistry is enough here).

[1]: Reviving my old impl from the bzr history would probably be
sanest - there are some very bad Python lru cache implementations
floating the interwebs. Caveat emptor.

Seif Lotfy (seif)
description: updated
Revision history for this message
Seif Lotfy (seif) wrote :

OK I think we can just add 1 column that links to the new uri in the uri table so it becomes

id | uri | new_id

so i i have a uri xxx

1 | xxx | 1

if its moved or renamed from xxx to yyy we have

1 | xxx | 2
2 | yyy | 2

OR we can do it the other way round by having the new column reference to its old_uri

1 | xxx | 1
2 | yyy | 1

ideas anyone ?

Revision history for this message
Siegfried Gevatter (rainct) wrote : Re: [Bug 602211] Re: Monitoring Create/Move/Copy Files events

2010/11/9 Seif Lotfy <email address hidden>:
> OK I think we can just add 1 column that links to the new uri in the uri
> table so it becomes

Keep in mind that the same URI can be reused for different things,
though. I may rename foo.txt to bar.txt and create a new foo.txt.

Revision history for this message
Seif Lotfy (seif) wrote :

I am aware of that issue and am brainstorming to solve it.
for that we will need to have 2 more columns
lets try your example:

event with foo.txt

uri_id | value | new_uri_id | old_uri_id |
---------------------------------------------------------
  1 | foo.txt | -1 | -1

event "renaming/moving" foo.txt to bar.txt

uri_id | value | new_uri_id | old_uri_id |
---------------------------------------------------------
  1 | foo.txt | 2 | -1
  2 | bar.txt | -1 | 1

event "creating" new foo.txt.

uri_id | value | new_uri_id | old_uri_id |
---------------------------------------------------------
  1 | foo.txt | -1 | -1
  2 | bar.txt | -1 | 1

This implies that foo.txt is new since it has no old uri not does it have a
new one
however bar.txt is linked to foo.txt since the old foo.txt is its origin.
This will allow me to search bar.txt and get foo.txt as a result (which is
not good for feature events) but foo.txt is not linked to bar.txt

We need more time for that issue

On Tue, Nov 9, 2010 at 2:39 PM, Siegfried Gevatter <email address hidden>wrote:

> 2010/11/9 Seif Lotfy <email address hidden>:
> > OK I think we can just add 1 column that links to the new uri in the uri
> > table so it becomes
>
> Keep in mind that the same URI can be reused for different things,
> though. I may rename foo.txt to bar.txt and create a new foo.txt.
>
> --
> Monitoring Create/Move/Copy Files events
> https://bugs.launchpad.net/bugs/602211
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Zeitgeist Framework: Confirmed
> Status in Zeitgeist Datahub: Incomplete
>
> Bug description:
> An issue we are facing at the moment is that ppl lose track of there files
> in a timeline if the file was moved around or renamed. I would propose using
> taskview or patch nautilus to actually grab those events and either:
> 1) Modify the uris in the uris table
> 2) Create a new table with | new_id | old_uri_id | event | to map uris
> to their actual ids and the event that allowed the change, this would allow
> us to track a history of renaming or moving a file. It will look a bit like
> the following:
>
> 9 | 9 | 48124 # CREATE EVENT
> 12 | 9 | 48126 # MOVE EVENT
>
> In other words the last row means uri 12 was moved from uri 9 with event
> 48126
>
> UPDATE:
>
> 3) Create a changable_uri table that is a map of the uri table. it gets
> updated upon moved and rename.
> We then add new resulttype that allow you to ask for either pureSubject or
> adaptedSubject. depending on which one is chosen we then use the according
> table in the join of the find_events_query :)
>
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/zeitgeist/+bug/602211/+subscribe
>

--
This is me doing some advertisement for my blog http://seilo.geekyogre.com

Seif Lotfy (seif)
Changed in zeitgeist:
assignee: nobody → Seif Lotfy (seif)
Changed in zeitgeist-datahub:
status: Incomplete → New
Revision history for this message
S. Sickert (s-sickert-deactivatedaccount) wrote :

New db_layout:

uri_id (p) | lookup_id (f) | new_uri_id | old_uri_id |
---------------------------------------------------------
1 | 1234 | 2 | -1
2 | 1235 | -1 | 1
3 | 1234 | -1 | -1

Lookup Table: id <-> uri:

lookup_id (p) | value
---------------
1234 | foo.txt
1235 | bar.txt

(p) - primary key
(f) - foreign key

This solves the previous mentioned problems, but breaks compatibility with the old db schema.
It don't know, how SQLite actually optimise the old schema, but I think a good DBMS should do this already.

Seif Lotfy (seif)
Changed in zeitgeist-datahub:
status: New → Confirmed
Revision history for this message
Seif Lotfy (seif) wrote :

OK what would happen if we use our uri table as our lookup table and add the above table as the uri_tracking_table

Another though would be actually to have the normal uri_table and a new update_uri_table
where we link and id to its new id

so moving foo to bar will give us
============================

id | value
---------------
1 | foo.txt
2 | bar.txt

id | new_value_id
-----------------------
1 | 2

now in case we move bar.txt to lol.txt we just update the DB again by checking the all value_ids = 2 and change it to 3 and add a new row...

id | value
---------------
1 | foo.txt
2 | bar.txt
3 | lol.txt

id | new_value_id
-----------------------
1 | 3
2 | 3

so if the user asks for raw events we do what we always did
if the user asks for updated events (updated subject_uris) we will need to join once... It comes with a cost of performance but I think we can handle it

We need to actually sprint on that issue.

I marked this as effecting Unity since when I move a file i don see it in unity anymore

Changed in unity:
status: New → Triaged
importance: Undecided → Low
Changed in unity-place-files:
importance: Undecided → Low
status: New → Triaged
Revision history for this message
Siegfried Gevatter (rainct) wrote :

 > foo.txt gets renamed to nice.txt, then to stuff.txt, there's a new foo.txt,
> stuff.txt gets renamed to omg!!.

renames
---------------
old_uri | new_uri | timestamp
foo.txt omg!! 1234
nice.txt omg!! 2500
stuff.txt omg!! 5000

events with timestamp < 1234 use "foo.txt" and find "omg!!"
in the table, same for the others. the new "foo.txt" has timestamp > 1234
and isn't in renames; it wasn't renamed.

> Finally foo.txt is renamed to why? and
> omg!! gets back to foo.txt

renames
---------------
old_uri | new_uri | timestamp
nice.txt foo.txt 2500
stuff.txt foo.txt 5000
foo.txt omg!! 6000
omg!! foo.txt 7000

an event with timestamp 2200 has nice.txt which is mapped to
foo.txt, events pointing to omg!! older than 7000 are mapped to foo.txt

Revision history for this message
Seif Lotfy (seif) wrote :

I dont understand the soltuion tbh
However looking at your table i got an idea

Scenario:
* foo.txt gets renamed to nice.txt
* nice.txt gets renamed to stuff.txt
* create new foo.txt,
* stuff.txt gets renamed to omg!!

uri
=========
id | value
---------------
1 | foo.txt
2 | nice.txt
3 | stuff.txt
4 | omg!!

and

uri_change_map
=============
new_id | old_id | timestamp
----------------------------------
2 | 1 | 2500
3 | 2 | 5000
4 | 3 | 10000

This will allow us to track how files were moved around

Example:
------------
* find events for subject_uri = nice.txt
find all events where (subj_id = 2 and timestamp > 2500) or (subj_id = 1 and timestamp <= 2500)

* find events for subject_uri = omg!
find all events where (subj_id = 4 and timestamp > 10000) or (subj_id = 2 and timestamp > 2500) or (subj_id = 1 and timestamp <= 2500)

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

All of these clever remapping schemes scare me a bit to be honest.
They all seem to imply a non-negligible impact on our query time, and
a considerable amount of logic.

A simpler and more performant scheme, which requires an API break (!),
is to simply add an extra field on our Subject structure
"current_uri". In the DB we represent this as an extra column on the
'event' table which holds the URI id of the current location of the
subject.

Doing it like this would require that the file monitor lives inside
Zeitgeist - and not as an extension, but as a core component, since it
requires a special db structure. The file monitor would then directly
modify the 'current_uri' column of the 'event' table on file events.
This breaks our "events are immutable" invariant - which i'm otherwise
very fond of... So there are many drawbacks to this approach.

Despite all the drawbacks something inside me tells me that something
like this is the right solution.

I very much dislike adding advanced stuff like a file monitor as a
core component, but we could make up for this, by having it be like a
stub, that didn't do anything by default, but would have to be
instrumented by an extension or something... So it requires air fair
deal of thought architecture wise...

Revision history for this message
Seif Lotfy (seif) wrote :
Download full text (3.7 KiB)

TBH I am not fond of the idea of having the file monitor inside Zeitgeist.
Zeitgeist is about events first. Also this means we will need to reside on
inotify and this is a BIG NO from me again. Unless we have fanotify I dont
even think we should look at the solution being inside Zeitgeist TBH.
I however do like the idea of extending the event table as a fallback
solution for now, although it really does break the idea of an event
is immutable. I am trying to convince myself that this is not an issue
though.

So I want to propose a new solution that builds on top of Mikkel's idea.
* change the event table to include current_subject_uri
* add a logic that handles MOVE/COPY/RENAME events
* listen to MOVE/COPY/RENAME events from outside sources such as nautilus
(by patching, ssickert has a patch)

What do you think?

On Thu, Nov 11, 2010 at 9:27 AM, Mikkel Kamstrup Erlandsen <
<email address hidden>> wrote:

> All of these clever remapping schemes scare me a bit to be honest.
> They all seem to imply a non-negligible impact on our query time, and
> a considerable amount of logic.
>
> A simpler and more performant scheme, which requires an API break (!),
> is to simply add an extra field on our Subject structure
> "current_uri". In the DB we represent this as an extra column on the
> 'event' table which holds the URI id of the current location of the
> subject.
>
> Doing it like this would require that the file monitor lives inside
> Zeitgeist - and not as an extension, but as a core component, since it
> requires a special db structure. The file monitor would then directly
> modify the 'current_uri' column of the 'event' table on file events.
> This breaks our "events are immutable" invariant - which i'm otherwise
> very fond of... So there are many drawbacks to this approach.
>
> Despite all the drawbacks something inside me tells me that something
> like this is the right solution.
>
> I very much dislike adding advanced stuff like a file monitor as a
> core component, but we could make up for this, by having it be like a
> stub, that didn't do anything by default, but would have to be
> instrumented by an extension or something... So it requires air fair
> deal of thought architecture wise...
>
> --
> Monitoring Create/Move/Copy Files events
> https://bugs.launchpad.net/bugs/602211
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Unity: Triaged
> Status in Unity Files Place: Triaged
> Status in Zeitgeist Framework: Confirmed
> Status in Zeitgeist Datahub: Confirmed
>
> Bug description:
> An issue we are facing at the moment is that ppl lose track of there files
> in a timeline if the file was moved around or renamed. I would propose using
> taskview or patch nautilus to actually grab those events and either:
> 1) Modify the uris in the uris table
> 2) Create a new table with | new_id | old_uri_id | event | to map uris
> to their actual ids and the event that allowed the change, this would allow
> us to track a history of renaming or moving a file. It will look a bit like
> the following:
>
> 9 | 9 | 48124 # CREATE EVENT
> 12 | 9 | 48126 # MOVE EVENT
>
> In other words the last row mea...

Read more...

Revision history for this message
Seif Lotfy (seif) wrote :

Again to sum it up. My new suggestion is:
* change the event table to include current_subject_uri
* add a logic that handles MOVE/COPY/RENAME events
* listen to MOVE/COPY/RENAME events from outside sources such as nautilus
(by patching, ssickert has a patch)

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

While we are at it - I think the solution should also be able to tackle the case where we can support queries that does not match files that have been deleted (until now we have been very focused on the case of moved files). So I guess this would simply boil down to subject_uri_current != ""?

Revision history for this message
Seif Lotfy (seif) wrote :

I think it easy for clients to determine subjects that have been deleted by calling exists on the uri. Of course we can detect delete events from nautilus too and thusly change the "current_uri" for it.

But again to the solution for solving "moved" files. The solution proposed by Mikkel (modified by me) could make things messy.
Once a file is moved we have to query on both subject_current_uri and subject_uri. right?

So lets say I have a file that has been moved from /home/seif/foo to /home/seif/bar. This means I query for all events where subject_uri = /home/seif/foo so I get all results until the point it was moved, right? And if I ask for /home/seif/bar I get all events with
subject_uri = /home/seif/foo as well as all subject_uri = /home/seif/bar ... ?

Revision history for this message
Siegfried Gevatter (rainct) wrote :

+1 to Mikkel on using it for deletes too, having two different systems
doesn't make sense.

Also, I'd keep uri and current_uri as separate things (current_uri
being a new parameter in event, that's why it's a nice array...), else
this is going to be a real mess.

Revision history for this message
Michal Hruby (mhr3) wrote :

I still didn't understand why would the current_uri be part of event, isn't it by definition a property of the subject? That way you don't need to update dozens of events, just one subject entry...

Revision history for this message
Seif Lotfy (seif) wrote :

I completely agree that current_uri as a new parameter however once u use it
u will need to use the the logic i described before:

So lets say I have a file that has been moved from /home/seif/foo to
/home/seif/bar. This means I query for all events where subject_uri =
/home/seif/foo so I get all results until the point it was moved, right? And
if I ask for /home/seif/bar I get all events with subject_uri =
/home/seif/foo as well as all subject_uri = /home/seif/bar ... ?

I dont mind using it for deletes. As I said its a nautilus patch

On Mon, Nov 15, 2010 at 8:21 PM, Michal Hruby <email address hidden> wrote:

> I still didn't understand why would the current_uri be part of event,
> isn't it by definition a property of the subject? That way you don't
> need to update dozens of events, just one subject entry...
>
> --
> Monitoring Create/Move/Copy Files events
> https://bugs.launchpad.net/bugs/602211
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Unity: Triaged
> Status in Unity Files Place: Triaged
> Status in Zeitgeist Framework: Confirmed
> Status in Zeitgeist Datahub: Confirmed
>
> Bug description:
> An issue we are facing at the moment is that ppl lose track of there files
> in a timeline if the file was moved around or renamed. I would propose using
> taskview or patch nautilus to actually grab those events and either:
> 1) Modify the uris in the uris table
> 2) Create a new table with | new_id | old_uri_id | event | to map uris
> to their actual ids and the event that allowed the change, this would allow
> us to track a history of renaming or moving a file. It will look a bit like
> the following:
>
> 9 | 9 | 48124 # CREATE EVENT
> 12 | 9 | 48126 # MOVE EVENT
>
> In other words the last row means uri 12 was moved from uri 9 with event
> 48126
>
> UPDATE:
>
> 3) Create a changable_uri table that is a map of the uri table. it gets
> updated upon moved and rename.
> We then add new resulttype that allow you to ask for either pureSubject or
> adaptedSubject. depending on which one is chosen we then use the according
> table in the join of the find_events_query :)
>
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/unity/+bug/602211/+subscribe
>

--
This is me doing some advertisement for my blog http://seilo.geekyogre.com

Revision history for this message
Siegfried Gevatter (rainct) wrote :

2010/11/15 Michal Hruby <email address hidden>:
> I still didn't understand why would the current_uri be part of event,
> isn't it by definition a property of the subject? That way you don't
> need to update dozens of events, just one subject entry...

A subject is part of an event, there is no global "subject entry"
since the subject represents a snapshot of an object at a particular
time instant (with some properties -eg. mimetype- which it has at that
instant).

--
Siegfried-Angel Gevatter Pujals (RainCT)
Free Software Developer       363DEAE3

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

On 15 November 2010 21:08, Siegfried Gevatter <email address hidden> wrote:
> 2010/11/15 Michal Hruby <email address hidden>:
>> I still didn't understand why would the current_uri be part of event,
>> isn't it by definition a property of the subject? That way you don't
>> need to update dozens of events, just one subject entry...
>
> A subject is part of an event, there is no global "subject entry"
> since the subject represents a snapshot of an object at a particular
> time instant (with some properties -eg. mimetype- which it has at that
> instant).

I understand your confusion Michal. The deal is that while subjects
are conceptually disjoint from the event they are still stored
together with the event in the event table (as an optimization). Also
as Siegfried says - the event subject is a snapshot (like a normal log
statement) so it makes even more sense to store it together with the
event as we do.

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

On 15 November 2010 21:07, Seif Lotfy <email address hidden> wrote:
> SNIP
> So lets say I have a file that has been moved from /home/seif/foo to
> /home/seif/bar. This means I query for all events where subject_uri =
> /home/seif/foo so I get all results until the point it was moved, right? And
> if I ask for /home/seif/bar I get all events with subject_uri =
> /home/seif/foo as well as all subject_uri = /home/seif/bar ... ?

We still do strict template matching. If you query for event with
subject_uri=bar then you wont get any events with subject_uri=foo,
disregarding the subject_current_uri.

Otoh if you query for events with subject_current_uri=bar you will get
all events for subject_uri=foo too because these events will have been
set to subject_current_uri=bar. Note that we are *not* being clever
about matching on both the subject_uri and subject_current_uri fields.
It's still strict matching on the subject_current_uri field that gives
you these results.

Revision history for this message
Seif Lotfy (seif) wrote :

Agreed I think I am overthinking it a bit. I am pretty ok with this solution
but we shouldn't jump dive into it unless we are pretty comfortable all of
us. What will break is also an issue for me..

On Tue, Nov 16, 2010 at 11:06 PM, Mikkel Kamstrup Erlandsen <
<email address hidden>> wrote:

> On 15 November 2010 21:07, Seif Lotfy <email address hidden> wrote:
> > SNIP
> > So lets say I have a file that has been moved from /home/seif/foo to
> > /home/seif/bar. This means I query for all events where subject_uri =
> > /home/seif/foo so I get all results until the point it was moved, right?
> And
> > if I ask for /home/seif/bar I get all events with subject_uri =
> > /home/seif/foo as well as all subject_uri = /home/seif/bar ... ?
>
> We still do strict template matching. If you query for event with
> subject_uri=bar then you wont get any events with subject_uri=foo,
> disregarding the subject_current_uri.
>
> Otoh if you query for events with subject_current_uri=bar you will get
> all events for subject_uri=foo too because these events will have been
> set to subject_current_uri=bar. Note that we are *not* being clever
> about matching on both the subject_uri and subject_current_uri fields.
> It's still strict matching on the subject_current_uri field that gives
> you these results.
>
> --
> Monitoring Create/Move/Copy Files events
> https://bugs.launchpad.net/bugs/602211
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Unity: Triaged
> Status in Unity Files Place: Triaged
> Status in Zeitgeist Framework: Confirmed
> Status in Zeitgeist Datahub: Confirmed
>
> Bug description:
> An issue we are facing at the moment is that ppl lose track of there files
> in a timeline if the file was moved around or renamed. I would propose using
> taskview or patch nautilus to actually grab those events and either:
> 1) Modify the uris in the uris table
> 2) Create a new table with | new_id | old_uri_id | event | to map uris
> to their actual ids and the event that allowed the change, this would allow
> us to track a history of renaming or moving a file. It will look a bit like
> the following:
>
> 9 | 9 | 48124 # CREATE EVENT
> 12 | 9 | 48126 # MOVE EVENT
>
> In other words the last row means uri 12 was moved from uri 9 with event
> 48126
>
> UPDATE:
>
> 3) Create a changable_uri table that is a map of the uri table. it gets
> updated upon moved and rename.
> We then add new resulttype that allow you to ask for either pureSubject or
> adaptedSubject. depending on which one is chosen we then use the according
> table in the join of the find_events_query :)
>
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/unity/+bug/602211/+subscribe
>

--
This is me doing some advertisement for my blog http://seilo.geekyogre.com

Seif Lotfy (seif)
Changed in zeitgeist:
milestone: 0.7.0 → none
Seif Lotfy (seif)
Changed in zeitgeist:
milestone: none → 0.8.0
Changed in unity-place-files (Ubuntu):
status: New → Triaged
Revision history for this message
Seif Lotfy (seif) wrote :

So I added a logic to handle MOVE_EVENTS here
https://code.launchpad.net/~zeitgeist/zeitgeist/move-event/+merge/53132
please review

Changed in zeitgeist:
status: Confirmed → In Progress
Seif Lotfy (seif)
Changed in zeitgeist:
status: In Progress → Fix Committed
Changed in zeitgeist:
status: Fix Committed → Fix Released
Changed in unity-2d:
status: New → Triaged
Changed in unity-2d:
importance: Undecided → Low
Omer Akram (om26er)
Changed in unity-place-files (Ubuntu):
importance: Undecided → Low
Changed in unity-lens-files (Ubuntu):
status: New → Triaged
Omer Akram (om26er)
no longer affects: unity-place-files (Ubuntu)
Changed in unity (Ubuntu):
status: New → Confirmed
Changed in unity-2d:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.