What value should subject.origin have?

Bug #487321 reported by Siegfried Gevatter on 2009-11-23
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Zeitgeist Framework
Fix Released
High
Mikkel Kamstrup Erlandsen

Bug Description

>> + origin = info.get_uri().rpartition("/")[0]

WHY?

Related branches

Changed in zeitgeist:
milestone: none → 0.3.0

WHY NOT?

REALLY GUYS? HERE? LOL

2009/11/24 Mikkel Kamstrup Erlandsen <email address hidden>

> WHY NOT?
>
> --
> Origin
> https://bugs.launchpad.net/bugs/487321
> You received this bug notification because you are subscribed to The
> Zeitgeist Project.
>
> Status in Zeitgeist Engine: New
>
> Bug description:
> >> + origin =
> info.get_uri().rpartition("/")[0]
>
> WHY?
>
>

This bug is mostly about eactly what we want to store in origin, and how to interpret origin exactly. I am deferring it to 0.3.1 because it's unrealistic that we get full clarity on this before the weekend.

Changed in zeitgeist:
milestone: 0.3.0 → 0.3.1
summary: - Origin
+ What value should subject.origin have?
Changed in zeitgeist:
assignee: nobody → Mikkel Kamstrup Erlandsen (kamstrup)
importance: Undecided → High
status: New → Triaged
Seif Lotfy (seif) wrote :

IMHO we should stick to using the origin as the domain of the subject

/path/to/file ==> /path/to
youtube.com/aflkasflkasjf ==> aflkasflkasjf

It will give us the option of knowing which are the most used documents in a folder or most watched videos on youtube.com

Seif Lotfy (seif) wrote :

Ok lets keep origin at the physical location at which the subject resided. This can be closed if we just agree.

I think "Physical location at which the subject resided at the moment of interaction" is a good formulation. It is still incomplete though. We need to define whether or not trailing slashes should be included - and stuff like that. I'd like to keep this bug open until all of this is thoroughly documented in our API docs.

Siegfried, I think you need to elaborate a bit in order to make that comment useful...

"Domain" seems to imply that it is the place where the subject is currently located. It also strikes me as a very vague and possibly ambiguous word here. Fx. the "domain" of an RDF property is the range in which it can take values (fx. the domian of the relation foo:hasTag is foo:TagObject).

Seif in comment #4 you mean "http://youtube.com/aflkasflkasjf ==> http://youtube.com" and not "...==> aflkasflkasjf", right?

The example is a bit too simple though. What about http://example.com/files/kamstrup/my.txt?encoding=utf-8&token=hd8763hf82 ? I'd say that it should map to http://example.com/files/kamstrup/.

If we take an example where I donwload the file from the above URL, there are three interesting URIs to log:

 - The URL my brower window points to
 - The URL of the file I download
 - The local file:// URL that I save the file under

Unfortunately our current datamodel can not contain all this information inside one event. We only have subject.uri and subject.origin. It would appear that we also need an event.origin..?

Seif Lotfy (seif) wrote :

2009/12/8 Mikkel Kamstrup Erlandsen <email address hidden>

> Siegfried, I think you need to elaborate a bit in order to make that
> comment useful...
>
> "Domain" seems to imply that it is the place where the subject is
> currently located. It also strikes me as a very vague and possibly
> ambiguous word here. Fx. the "domain" of an RDF property is the range
> in which it can take values (fx. the domian of the relation foo:hasTag
> is foo:TagObject).
>
> Seif in comment #4 you mean "http://youtube.com/aflkasflkasjf ==>
> http://youtube.com" and not "...==> aflkasflkasjf", right?
>

sorry my bad

> The example is a bit too simple though. What about
> http://example.com/files/kamstrup/my.txt?encoding=utf-8&token=hd8763hf82
> ? I'd say that it should map to http://example.com/files/kamstrup/.
>

nope it will be example.com

If we take an example where I donwload the file from the above URL,
> there are three interesting URIs to log:
>
> - The URL my brower window points to
> - The URL of the file I download
> - The local file:// URL that I save the file under
>
> Unfortunately our current datamodel can not contain all this information
> inside one event. We only have subject.uri and subject.origin. It would
> appear that we also need an event.origin..?

There is no need for event origin. If u want the origin of an event u look
for the subject of the event before.... at least that is what i think. I
just woke up so i need to think about it more

> --
> What value should subject.origin have?
> https://bugs.launchpad.net/bugs/487321
> You received this bug notification because you are subscribed to The
> Zeitgeist Project.
>
> Status in Zeitgeist Framework: Triaged
>
> Bug description:
> >> + origin =
> info.get_uri().rpartition("/")[0]
>
> WHY?
>
>
>
>

Siegfried Gevatter (rainct) wrote :

2009/12/8 Mikkel Kamstrup Erlandsen <email address hidden>:
> If we take an example where I donwload the file from the above URL,
> there are three interesting URIs to log:
>
>  - The URL my brower window points to

That's what I consider as "origin", and why I'm asking for other stuff
to use a different name (be it "domain" or whatever else).

>  - The URL of the file I download

I think this is something Tracker is supposed to store.

>  - The local file:// URL that I save the file under
aka "uri"

Cheers,

--
Siegfried-Angel Gevatter Pujals (RainCT)
Free Software Developer 363DEAE3

> There is no need for event origin. If u want the origin of an event u look
> for the subject of the event before....

The client has no rigorous way to find out which event came "before". We can apply heuristics, but I don't like that

>> there are three interesting URIs to log:
>>
>> - The URL my brower window points to
>
> That's what I consider as "origin", and why I'm asking for other stuff
> to use a different name (be it "domain" or whatever else).

Ok. Let's skip the finer details of the naming as long as we agree on the basic concepts :-)

>> - The URL of the file I download
>
>I think this is something Tracker is supposed to store.

So you don't consider the actual URL that I download part of the event? Imho that's more relevant to the event than the location of my browser window. But honestly I think both are quite relevant!

>> - The local file:// URL that I save the file under
> aka "uri"

 - as I said in the line following that :-)

Siegfried Gevatter (rainct) wrote :

2009/12/8 Mikkel Kamstrup Erlandsen <email address hidden>:
> So you don't consider the actual URL that I download part of the event?

No. If the user just downloaded a file, he will want to work with the
local version, not download it again. Knowing where the file came from
is also important (in case you want to give someone else the link or
you already removed the local file) so it's metadata worth keeping,
and we have a great metadata store called Tracker where we can put
that (and I think I read somewhere that Tracker/NEPOMUK/whatever
already planned to do that).

--
Siegfried-Angel Gevatter Pujals (RainCT)
Free Software Developer 363DEAE3

I am not sure Tracker's goal is to store metadata about non-existing items (fx. if I delete an image Tracker also forgets that the image had a width of 500px, but I could be wrong). Indeed I may also download a file to a directory Tracker is not even monitoring.

The origin URL of a file is certainly something that Tracker should be storing - I am not arguing with that. However there is a semantic difference in "I have this file with inode X and it was downloaded from http://bar/foo.txt" and the statement "at 12:34:47 today I started a download of http://bar/foo into ~/foo.txt".

The log statement is valid disregarding the time and what you do to your files afterwards. The metadata property of the downloaded file is not a perpetual truth. That's why I think that this info is *also* data we should log.

Maybe the trick is really to consider a download as two events. The click on the download link and the act of saving/choosing the designated local file. Consider:

First I have my browser on http://wikipedia.org and I click the download link;

  interp.=VISIT_EVENT, origin=http://wikipedia.org, subject_uri=http://bar/foo.txt

Then when the download finishes:

  interp.=DOWNLOAD_EVENT, origin=http://bar/foo.txt, subject_uri=file:///tmp/foo.txt

This makes it a bit harder for apps to fibure out the website you had open when the download link was clicked, but it can be done rigorously with some clever templating in FindEventIds(). In words:

  Select 1 VISIT_EVENT event ordered by recency that has timestamp before download_event.timestamp, that has subject_uri=download_event.origin

Anyway - just think out loud. There may be more elegant ways to do this.

Unless we can settle this issue right now I think we should punt this to 0.3.2

Siegfried Gevatter (rainct) wrote :

2009/12/8 Mikkel Kamstrup Erlandsen <email address hidden>:
> "Domain" seems to imply that it is the place where the subject is
> currently located. It also strikes me as a very vague and possibly
> ambiguous  word here.

Agreed, "root" would be a better name.

> If we take an example where I donwload the file from the above URL,
> there are three interesting URIs to log:

I don't remember how we came to discussing this, but let me explain my
position on this again.

If you really want "root" -let's call it like this for now- for
optimizing whatever you want to optimize (I'm still not sure what you
want it for...), then it should be an implementation detail which
isn't exposed in the API. It doesn't make any sense for it to be
there.

Completely unrelated, I propose that we add a new property to events,
called "origin". This property would cover "where does this event come
from, ie. what other subject triggered it".

So for example if you open project.odt and there click on
searchengine.com and search for elephants which brings you to
searchengine.com/elephants and there you click on
elephants.com/the-truth-about-them that would give you the events:
 - uri: file:///home/humanoid/project.odt, origin: None
 - uri: searchengine.com, origin: file:///home/humanoid/project.odt
 ... some other events because in the middle you talked to an IM contact ...
 - uri: searchengine.com/elephants, origin: searchengine.com
 ... some other events because you switched to another tab and looked
up photos of elements there ...
 - uri: elephants.com/the-truth-about-them, origin: searchengine.com/elephants

The case of downloads gets more complicated so let's not get into it
just right now.

Cheers,

--
Siegfried-Angel Gevatter Pujals (RainCT)
Free Software Developer 363DEAE3

Seif Lotfy (seif) wrote :

yes event origin should tell us which subject triggered it
but Again this is going to be a hard implementation. Imagine opening a doc,
then quickly check mail, then open something related to the doc (call it x).
the little mail there messes up the equations plus there is not way to have
the event.origin(x) = doc
just my opinion

2010/1/7 Siegfried Gevatter <email address hidden>

> 2009/12/8 Mikkel Kamstrup Erlandsen <email address hidden>:
> > "Domain" seems to imply that it is the place where the subject is
> > currently located. It also strikes me as a very vague and possibly
> > ambiguous word here.
>
> Agreed, "root" would be a better name.
>
> > If we take an example where I donwload the file from the above URL,
> > there are three interesting URIs to log:
>
> I don't remember how we came to discussing this, but let me explain my
> position on this again.
>
>
> If you really want "root" -let's call it like this for now- for
> optimizing whatever you want to optimize (I'm still not sure what you
> want it for...), then it should be an implementation detail which
> isn't exposed in the API. It doesn't make any sense for it to be
> there.
>
> Completely unrelated, I propose that we add a new property to events,
> called "origin". This property would cover "where does this event come
> from, ie. what other subject triggered it".
>
> So for example if you open project.odt and there click on
> searchengine.com and search for elephants which brings you to
> searchengine.com/elephants and there you click on
> elephants.com/the-truth-about-them that would give you the events:
> - uri: file:///home/humanoid/project.odt, origin: None
> - uri: searchengine.com, origin: file:///home/humanoid/project.odt
> ... some other events because in the middle you talked to an IM contact
> ...
> - uri: searchengine.com/elephants, origin: searchengine.com
> ... some other events because you switched to another tab and looked
> up photos of elements there ...
> - uri: elephants.com/the-truth-about-them, origin:
> searchengine.com/elephants
>
> The case of downloads gets more complicated so let's not get into it
> just right now.
>
> Cheers,
>
> --
> Siegfried-Angel Gevatter Pujals (RainCT)
> Free Software Developer 363DEAE3
>
> --
> What value should subject.origin have?
> https://bugs.launchpad.net/bugs/487321
> You received this bug notification because you are subscribed to The
> Zeitgeist Project.
>
> Status in Zeitgeist Framework: Triaged
>
> Bug description:
> >> + origin =
> info.get_uri().rpartition("/")[0]
>
> WHY?
>
>
>
>

Siegfried Gevatter (rainct) wrote :

2010/1/7 Seif Lotfy <email address hidden>:
> yes event origin should tell us which subject triggered it
> but Again this is going to be a hard implementation. Imagine opening a doc,
> then quickly check mail, then open something related to the doc (call it x).

No, because "origin" would be something the data provider gives (in my
example above, I guess we can forget for now about knowing that
searchengine.com came from the ODT, but the other ones between Firefox
items only are certainly possible).

--
Siegfried-Angel Gevatter Pujals (RainCT)
Free Software Developer 363DEAE3

Seif Lotfy (seif) wrote :

We will then have to fully rely on external dataproviders for that

2010/1/7 Siegfried Gevatter <email address hidden>

> 2010/1/7 Seif Lotfy <email address hidden>:
> > yes event origin should tell us which subject triggered it
> > but Again this is going to be a hard implementation. Imagine opening a
> doc,
> > then quickly check mail, then open something related to the doc (call it
> x).
>
> No, because "origin" would be something the data provider gives (in my
> example above, I guess we can forget for now about knowing that
> searchengine.com came from the ODT, but the other ones between Firefox
> items only are certainly possible).
>
> --
> Siegfried-Angel Gevatter Pujals (RainCT)
> Free Software Developer 363DEAE3
>
> --
> What value should subject.origin have?
> https://bugs.launchpad.net/bugs/487321
> You received this bug notification because you are subscribed to The
> Zeitgeist Project.
>
> Status in Zeitgeist Framework: Triaged
>
> Bug description:
> >> + origin =
> info.get_uri().rpartition("/")[0]
>
> WHY?
>
>
>
>

One use case I want to cover also is IM messages, I am working on a Telepathy observer that logs chats to Zg. How would I model this via the Zg datamodel? Let's see...

So I observe that <email address hidden> sends me the message "Hello Dolly" via MSN. His real name is John Doe, I can this from Telepathy as well. The formal identifier for the sender should be "mailto: <email address hidden>" using the standard IANA mailto URI-scheme, I would tend to store this in subject.origin - the alternative would event.actor.

It doesn't make much sense to store app://telepathy.desktop or app://empathy.desktop in event.actor because I have no way of knowing which app shows the message to the user - it could indeed be multiple apps...

It would be most convenient if I also stored the real name, "John Doe" somehwhere with the event so that apps wouldn't have to talk Telepathy to display real names. I could either prepend it to subject.text or somehow encode it in event.payload (fx. a json map of extra attributes).

event.timestamp = int(time.time()*1000)
event.interpretation = Interpretation.RECEIVE_EVENT
event.manifestation = Manifestation.WORLD_ACTIVITY ***
event.actor = ??
event.payload =
subject.interpretation = Interpretation.IM_MESSAGE
subject.manifestation = ??
subject.origin = "mailto:<email address hidden>"
subject.text = "Hello Dolly"
subject.mimetype = "text/plain"

***) I tentatively introduced the WORLD_ACTIVITY event manifestation because I think USER_NOTIFICATION would be wrong here

Anyway - it's just a very real use case which is relevant to this discussion...

I missed a subject.uri in my previous comment... I don't really know what to store in there..?

u forgot the subject uri
I agree we shouldnt put telepathy.desktop in
This is a use case where empathy cant even inform us what happened since we
r hooked into telepathy. plus a telepathy .desktop does nto exist.
yet i see no alternative atm.
Let us think about it today in some sort of meeting...
when do u have time?

2010/1/8 Mikkel Kamstrup Erlandsen <email address hidden>

> I missed a subject.uri in my previous comment... I don't really know
> what to store in there..?
>
> --
> What value should subject.origin have?
> https://bugs.launchpad.net/bugs/487321
> You received this bug notification because you are a member of Zeitgeist
> Developers, which is the registrant for Zeitgeist Framework.
>
> Status in Zeitgeist Framework: Triaged
>
> Bug description:
> >> + origin =
> info.get_uri().rpartition("/")[0]
>
> WHY?
>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~zeitgeist
> Post to : <email address hidden>
> Unsubscribe : https://launchpad.net/~zeitgeist
> More help : https://help.launchpad.net/ListHelp
>
>

2010/1/8 Mikkel Kamstrup Erlandsen <email address hidden>:
> subject.origin = "mailto:<email address hidden>"

Good point.

I suppose finding out whether you are talking to John Doe because you
clicked on a link in some file isn't really doable, so storing the
mail there for IM and the web for browser activities (if you agree
with having that) sounds good to me. Can we find some description for
the "origin" field which covers both cases without having to list them
explicitly (ie. not having a different explanation of it for every
activity type)?

> It doesn't make much sense to store app://telepathy.desktop or
> app://empathy.desktop in event.actor because I have no way of knowing
> which app shows the message to the user

Uhm, that may be problematic.

Maybe we can just have something a la "genericapp://im" and let apps
look up what the default IM client is and show that. There may be
different clients for supporting different protocols though, so it's
not so easy.

> I missed a subject.uri in my previous comment... I don't really know
> what to store in there..?

IMHO it should be something which when clicked shows the logs of the
conversation.

I discussed this with Gerfried Fuchs (upstream developer for the IRC
client "Smuxi"), who wants to add Zeitgeist support to Smuxi, and he
said he'd think of a scheme for irc://, which we can then discuss with
other IM client developers.

--
Siegfried-Angel Gevatter Pujals (RainCT)
Free Software Developer 363DEAE3

Changed in zeitgeist:
milestone: 0.3.1 → 0.3.4
Michal Hruby (mhr3) wrote :

I think that it makes sense to store in subject.origin the location where subject.uri can be found. Therefore the web page visit events could have as origin set either their referrer page or the domain (only if referrer is unavailable, since we can parse the domain from uri anyway).

As for Mikkel's IM use case, I don't think it's ZG's job to log every message, so there'd be only some kind of CONVERSATION_STARTED event, and I'd suggest this:
event.actor = "mailto:<email address hidden>"
subject.uri = # uri of conversation log *
subject.origin = # location where log is saved
subject.text = "Conversation with %s" % real_name
subject.mimetype = # mimetype of uri

* If there's no log, uri, origin and mimetype are empty.

Siegfried Gevatter (rainct) wrote :

2010/5/14 Michal Hruby <email address hidden>:
> Therefore the web page visit event could have as origin set -- the domain
Completely agree.

> As for Mikkel's IM use case, I don't think it's ZG's job to log every message, so there'd be only some kind of CONVERSATION_STARTED event
Completely agree.

:)

Seif Lotfy (seif) wrote :

So lets say i switched from i opened a website the a document
will the website be the origin of the document... if yes i think this is
kinda wrong. since we are talking about a subject origin as opposed to what
you are suggesting what i would call a event origin...
Lets have an irc meeting...
With the wildcards coming up we could rethink that...

On Fri, May 14, 2010 at 8:19 PM, Siegfried Gevatter <email address hidden>wrote:

> 2010/5/14 Michal Hruby <email address hidden>:
> > Therefore the web page visit event could have as origin set -- the domain
> Completely agree.
>
> > As for Mikkel's IM use case, I don't think it's ZG's job to log every
> message, so there'd be only some kind of CONVERSATION_STARTED event
> Completely agree.
>
> :)
>
> --
> What value should subject.origin have?
> https://bugs.launchpad.net/bugs/487321
> You received this bug notification because you are subscribed to The
> Zeitgeist Project.
>
> Status in Zeitgeist Framework: Triaged
>
> Bug description:
> >> + origin =
> info.get_uri().rpartition("/")[0]
>
> WHY?
>
>
>
>

--
This is me doing some advertisement for my blog http://seilo.geekyogre.com

Changed in zeitgeist:
milestone: 0.3.4 → 0.4.1
Seif Lotfy (seif) wrote :

Ok this issue has to be settled once and for all
IMHO I think it makes sense to look at origin as the "domain" of the subject of the event in question. Maybe Subject.origin is a better interpretation than event.origin but this would mean an API break.
The benefits of having origin follow the proposed schema is:
1) we can sort by domain (websites or folders)
2) we assist kamstrup and the unity team with their development
3) it will allow easier grouping on a UI level :)

Michal Hruby (mhr3) wrote :

I personally don't like that, you can have a simple method which will extract the domain from uri, on the other hand having possibility to create at least a partial tree of how you got to a uri is more useful.

Considering Siegfried's comment #23 and Michal's comment #24, and with a free interpretation of Seif's "domain" moniker in comment #27 I think we actually agree more or less on what to put in origin. I'll write a draft docstring and put up a branch for review.

Watch this space - we may actually be able to close this bug at long last ;-)

Changed in zeitgeist:
status: Triaged → Fix Released
Changed in zeitgeist:
milestone: 0.4.1 → 0.5.0
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers