non-clear API: get_*_most_used*

Bug #498878 reported by Mikkel Kamstrup Erlandsen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Zeitgeist Framework
Fix Released
Critical
Seif Lotfy

Bug Description

I just reviewed the latest API additions with the ZeitgeistClient.get_uris_most_used_with* and corresponding methods inside the engine. Sorry to be a party-spoiler, but I think there is some stuff that needs cleanup before we can roll 0.3.1.

The technical issues I found off the bat:

 1. Using the prefix get_* for all of these most-used queries seems a bit off since you are not retrieving a well known thing, but are querying and getting an unpredictable result. I'd prefer if we used find_everywhere instead.

 2. The method names also use the *_with_* word where we have used *_for_* everywhere else in the code hitherto.

 3. Why do these new methods return URIs and not Subject instances?

 4. The DBus method GetMostUsedWithSubjects has a misleading name. It doesn't take a list of subjects in the args.

 5. Why is the time_range arg. not the first arg like it is for FindEventIds()?

Then there's something about this API that just tells me it is wrong:

 I. Why is there no paging? The API allows me to do very broad queries, like "give me anything related to subjects of interpretation Document" which would presumably have huge result sets?

 II. Assuming there can be many results why doesn't it return a list of event ids, like FindEventIds()?

 III. Using "MostUsed" in the name more or less implies that which algorithm will be used. Do we want that? Is it better for forwards compatibility to use the more generic term "Related"?

All of these questions leads me to think that we really need an API like:

  FindRelatedEventIds(in (xx) time_range,
                                    in aE event_templates,
                                    in aE related_event_templates,
                                    in u storage_state,
                                    in u num_events,
                                    in u result_type
                                    out au related_event_ids)

The result_type could change the way results are ordered, much like our current FIndEventIds(), but not exactly the same switches. Maybe result_type could be:

  0 : Results ordered by relevancy (where the exact measure of relevancy is up to the engine)
  1 : Results ordered by recency

Related branches

Changed in zeitgeist:
milestone: none → 0.3.1
importance: Undecided → Critical
Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

After flaming each other on IRC (Ahhhh... it's always a joy fighting with you guys ;-P) we came more or less to the following API:

  FindRelatedSubjects(in (xx) time_range,
                                  in aE event_templates,
                                  in aE related_event_templates,
                                  in u storage_state,
                                  ?? in u max_subjects,
                                  ?? in u result_type,
                                  out aS related_subjects)

We where not conclusive about max_events and never really discussed whether or not a result_type arg would make sense. My personal take on these two issues:

 * max_events: Even though it does not make things lighter in the engine, I still think we want this arg. One day we may have another algo under the hood that returns more subjects.

 * result_type: Dunno. I am inclined to want it in even though we only have one value for it initially. It has proven mighty useful in FindEventIds(), and is a nice way to make things forward compatible.

Revision history for this message
Siegfried Gevatter (rainct) wrote :

Before releasing, note that those changes haven't been completely implemented yet!

 - Both max_results and result_type are missing. I agree that having them would be good.

 - The method is currently named FindRelated and does still return URIs. I'm still not convinced about returning subjects; I'll be on IRC if you want to talk about it.

Cheers

Revision history for this message
Seif Lotfy (seif) wrote :

I am thinking of returning events instead of subjects
as in the latest events of the subjects in that timeframe
also what does results type stand for ?

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

@seif: I think this API is already in MASTER (and in fact released in 0.3.2, although marked experimental: http://zeitgeist-project.com/docs/0.3.2/dbus_api.html#_zeitgeist.engine.remote.RemoteInterface.FindRelatedUris). Or are we talking about changing it here?

I think that the two args max_results and result_type would still be useful to provide long-term flexibility of the API. The idea with result_type is exactly the same as the same arg. in FindEventIds(). To provide a means to request different sorting, grouping, etc. From the get-go we'd only have one result_type (namelyt what we have now), but we could add others in the future.

Revision history for this message
Seif Lotfy (seif) wrote :

First lets start with the return
will we return:
    - uris ?
    - latest subjects with the uris ?
    - latest events with the subjects ?
depending on that we will rename the method to on of:
    - FindRelatedURIs
    - FindRelatedSubjects
    - FindRelatedEvents

I would go with FindRelatedEvents
I fully agree with adding max_results however I dont understand what result_type is
Can you explain?

Revision history for this message
Seif Lotfy (seif) wrote :

ok scratch that
I could do the results_type too.
It is just a matter of naming now

Seif Lotfy (seif)
Changed in zeitgeist:
assignee: nobody → Seif Lotfy (seif)
milestone: 0.3.1 → 0.3.3
status: New → In Progress
Seif Lotfy (seif)
Changed in zeitgeist:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.