Some way to mirror the review database by a third party

Bug #1408353 reported by Stuart Langridge
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ratings and Reviews server
New
Undecided
Unassigned

Bug Description

I would like to build a third-party app to allow people to "follow" reviews for particular packages or by particular people in the Ubuntu click store. There are a few ways that my app could get the information for this, and I'd like to discuss which is best and how it could get done.

== Canonical does it ==

Make the RnR API able to query data in many different ways, so that the external service can use it how it wants. This is likely to look like having all the reviews in some searchable system like elasticsearch or solr and then exposing the raw ES/Solr API to the public, which has potential security concerns but is one approach. (Since this is blocked on Canonical work, having RnR expose just the subset of searches that the third-party app needs is not likely to scale well, because every time my third-party app wants to query the data in a different way there'll be a long cycle time before Canonical provides it, and constant ongoing work for Canonical engineers. But it is a possibility.)

 == I do it ==

If I can mirror the reviews database, then I can store it however I want, and query it however I want, without any work devolving on Canonical engineering at all. This requires a way of mirroring the database:

1. Download a database dump
Once a day, Canonical dumps the whole rnr database (or a subset of fields for each review, to keep some things internal for privacy reasons) somewhere so it can be downloaded. These could either all be complete database dumps (and keep the last three or four around) or work like a backup where there is a "full" dump once a week/month and "incremental" dumps for each day since the last full backup, thus meaning that an external app downloads the changes. This needs duplicate file storage for all the data but does not put extra pressure on the API, no matter how many third-party apps use the data. It does not allow for returning different data dependent on authentication credentials, which is not important for my app but may be for others.

2. Allow querying the API by time
Add a "since" parameter to http://reviews.staging.ubuntu.com/click/api/1.0/reviews/ so that one can request http://reviews.staging.ubuntu.com/click/api/1.0/reviews/?since=(timestamp) and get all reviews posted since that timestamp. This allows a third party to update its database more often and with smaller updates than downloading a DB dump, but it does mean that third-party apps are clients of the API and so requires the API to be able to cope with that. It does allow for authentication.

I'm happy to hear other suggestions too.

Related branches

Revision history for this message
Michael Nelson (michael.nelson) wrote : Re: [Bug 1408353] [NEW] Some way to mirror the review database by a third party

On Thu, Jan 8, 2015 at 3:24 AM, Stuart Langridge
<email address hidden> wrote:
> Public bug reported:
>
> I would like to build a third-party app to allow people to "follow"
> reviews for particular packages or by particular people in the Ubuntu
> click store. There are a few ways that my app could get the information
> for this, and I'd like to discuss which is best and how it could get
> done.

Hi Stuart. I don't see a problem with dumping all published reviews
(Martin?), but given that the source is open, would it be an option to
improve the API by adding feeds?

For example,
/api/1.0/feeds/developer/(?P<developer>)/
/api/1.0/feeds/app/(?P<app>)/

Yep, the downside would be that it'd be slightly more difficult to get
things you need landed, and you'd not have control over when that was
deployed. Benefit would be that everyone can just use it and/or
improve it in the one place.

Anyway, Fabien has been working on this app for the past while and may
have other ideas.

Revision history for this message
Stuart Langridge (sil) wrote :

Have added an extra query to the API to return reviews by reviewer_username and sent an MP.

Revision history for this message
Stuart Langridge (sil) wrote :

Michael: I see that my merge proposal was rejected for being old. (Fine, it was also missing a couple of tests, but it wasn't looked at until ten months after it was proposed, and then was rejected for being old.) You see my concern about "every time my third-party app wants to query the data in a different way there'll be a long cycle time before Canonical provides it, and constant ongoing work for Canonical engineers"? When you say "it'd be slightly more difficult to get things you need landed"... if "slightly more difficult" means "wait nearly a year for a review" then I'd call that a little more than slightly. This is why I suggest that a much better way here would be to expose the back end and allow people to get at the data, at which point we can do what we want with it, rather than lobbying for an API change which will take a very long time to happen.

Revision history for this message
Colin Watson (cjwatson) wrote :

Not that I disagree with your basic point, but it's worth noting that Natalia did say "change status again if this MP is still current", which I'd count as an invitation to reopen it. (Also, the Launchpad timeout problem that caused pressure to have a short +activereviews list has been fixed.)

Revision history for this message
Michael Nelson (michael.nelson) wrote : Re: [Bug 1408353] Re: Some way to mirror the review database by a third party

Hey there Stuart,

On Sun, Nov 29, 2015 at 12:20 AM Stuart Langridge <
<email address hidden>> wrote:

> Michael: I see that my merge proposal was rejected for being old.

Yep - as were about 6 branches of mine that I'd not followed through to
landing (not for rnr, but other projects I was working in the same sweep).

> (Fine,
> it was also missing a couple of tests, but it wasn't looked at until ten
> months after it was proposed, and then was rejected for being old.) You
> see my concern about "every time my third-party app wants to query the
> data in a different way there'll be a long cycle time before Canonical
> provides it, and constant ongoing work for Canonical engineers"? When
> you say "it'd be slightly more difficult to get things you need
> landed"... if "slightly more difficult" means "wait nearly a year for a
> review" then I'd call that a little more than slightly.

Sorry Stuart - my fault - I should have been more explicit on my previous
reply that needed to follow-up with Fabien (who'd been working on and
maintaining the code-base) about the change. Unfortunately LP doesn't have
@username notifications, and I didn't check back.

> This is why I
> suggest that a much better way here would be to expose the back end and
> allow people to get at the data, at which point we can do what we want
> with it,

Yes, I'll do a manual notification to @Fabien (as he'll know whether the
rnr db has any sensitive info that would need to be excluded, such as
moderated reviews or whatever) and @beuno. Sorry I didn't do that the first
time.

> rather than lobbying for an API change which will take a very
> long time to happen.
>

I know it's no consolation to you, but it's the same for internal branches
- as above, my branches get rejected for the same reason if I don't follow
up and find someone to review/land. The difference is that it's much harder
for someone outside Canonical to get hold of people to follow up on a
branch and get it landed.

We really should switch to use *and maintain* a review queue of proposed
branches (which may be why Natalia was cleaning up old branches) so that
all branches are considered equally without needing to ping.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.