URLs need to be (more) cacheable

Bug #681471 reported by Michael Nelson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ratings and Reviews server
Fix Released
Undecided
Michael Nelson
software-center (Ubuntu)
New
Undecided
auto-mvo-1

Bug Description

When initial discussions for RNR started, we talked about caching of URLs for reviews etc., and I think it was for that reason that mvo/barry went with the long urls with the language, distro, series in the URL.

But bug 673948 indicates that we have the following additional complexities:

1) for reviews of a particular package we'll need to batch the results rather than returning all the results (note, pagination could also be added to the urls without too much trouble /en/ubuntu/hardy/+binary/mypackage/page/1/ - so in this respect it may be worthwhile)
2) for review stats the client will need to request a set of stats based on what the client wants to display in the UI (ie. user searches for 'game') (eg. /en/ubuntu/hardy/+review-stats/package_names/foo,bar,blah,url,encoded,of,course/ - if the client always requests the same number of items in alphabetical order, we may get a decent cache hit rate?)

While we're there, it would be great to make the templates/tests independent of the urls (using reverse etc.)

Revision history for this message
Michael Nelson (michael.nelson) wrote :

While on the subject of URLs, I'm keen for the following additional changes - let me know if you disagree :)

1) Rename the app itself from reviewsapp ( a (django) application for reviews) -> appreviews ( (desktop) application reviews).
2) Change the root url from reviews -> appreviews
3) ReviewModeration plural resource would then be /appreviews/moderations/ (we can use the language, origin, distroseries as *optional* filtering - currently the url requires them)
4) ReviewModeration singular resource at /reviewsapp/moderations/{id}/ (to which we can redirect after a moderation approval/rejection)

Revision history for this message
Matthew Paul Thomas (mpt) wrote :

(2) You'll be able to rate not just applications, but any software, e.g. fonts, themes, developer libraries. So I think it would make more sense to leave it as just /reviews.
(3) At <https://wiki.ubuntu.com/SoftwareCenter/RatingsAndReviews#moderating> I suggested that the moderation interface appear at the application's top level, <https://developer.ubuntu.com/reviews>. Is there something more important that belongs at that top level?

Revision history for this message
Michael Nelson (michael.nelson) wrote : Re: [Bug 681471] Re: URLs need to be (more) cacheable

On Wed, Dec 1, 2010 at 11:10 AM, Matthew Paul Thomas <email address hidden> wrote:
> (2) You'll be able to rate not just applications, but any software, e.g. fonts, themes, developer libraries. So I think it would make more sense to leave it as just /reviews.

This first part of the path is so we can tell apache - anything that
starts with /reviews can be handled by the rnr-server - ie. it
identifies the project.

So we can leave it as /reviews, but if we later expose reviews via the
UI, we'll need a second path segmant with '/reviews', ie.
/reviews/reviews/3/ as the unique path to a review item. Similarly, a
moderation resource would be /reviews/moderations/2/. Reviews for a
specific package would be something like
/reviews/packages/apache/reviews/ etc.

Theoretically we can get rid of the initial /reviews completely *if*
we will never want to use the domain for any other applications (ie.
we can tell apache to send all requests to rnr-server - this is what
we do for software-center.ubuntu.com), or are happy to have
complicated rules for apache like, send everything that starts with
'/api' or '/reviews' or '/moderations' or '/packages' to the
rnr-server - which needs updating each time we add another path
(unlikely). I just assumed that with a domain like
developer.ubuntu.com we need a project identifier as the first path.

Maybe there are other options that others can recommend, but if we
don't know whether we'll want to use developer.ubuntu.com for other
apps, then I think we need a project identifier first, even if it is
'/reviews' with the above issues, or simply '/r/' (ie. /r/reviews/1/
or /r/moderations/1/ etc.)

> (3) At <https://wiki.ubuntu.com/SoftwareCenter/RatingsAndReviews#moderating> I suggested that the moderation interface appear at the application's top level, <https://developer.ubuntu.com/reviews>. Is there something more important that belongs at that top level?

Currently, as we won't have any other items exposed (yet), we can use
the top-level (/reviews or whatever - see above) as an overview of
relevant moderation info for the current user (bearing in mind that it
might change if we do expose other stuff such as reviews or software
via the web interface later), but not for a general URL representing a
filter-able collection of all moderations (which is what I meant above
- sorry if it wasn't clear).

Revision history for this message
David Owen (dsowen) wrote :

On Thu, 2010-12-02 at 10:50 +0000, Michael Nelson wrote:
> On Wed, Dec 1, 2010 at 11:10 AM, Matthew Paul Thomas <email address hidden> wrote:
> > (2) You'll be able to rate not just applications, but any software, e.g. fonts, themes, developer libraries. So I think it would make more sense to leave it as just /reviews.
>
> This first part of the path is so we can tell apache - anything that
> starts with /reviews can be handled by the rnr-server - ie. it
> identifies the project.

> Theoretically we can get rid of the initial /reviews completely *if*
> we will never want to use the domain for any other applications (ie.
> we can tell apache to send all requests to rnr-server....

> ... if we don't know whether we'll want to use developer.ubuntu.com for
> other apps, then I think we need a project identifier first, even if
> it is '/reviews' with the above issues, or simply '/r/' (ie.
> /r/reviews/1/ or /r/moderations/1/ etc.)

Hostnames are cheap. You could always add in
reviews.developer.ubuntu.com. This sometimes meshes better with
browsers' concepts of isolation (HTTP auth credentials, cookies, AJAX &
XSS, etc.). Then again, sometimes you *want* no isolation, so the path
method would be better.

Revision history for this message
Matthew Paul Thomas (mpt) wrote :

In my current sketches, developer.ubuntu.com:
* lets moderators moderate reviews - developer.ubuntu.com/reviews (or maybe that should be developer.ubuntu.com/moderate?)
* lets developers submit applications, track their submissions, and see installation stats (and eventually reviews) for their published applications - developer.ubuntu.com/status
* lets anyone browse and search a knowledge base about developing on Ubuntu - developer.ubuntu.com/kb.

It shouldn't be particularly common for moderators to be developers, so there's no particular need for them to stay signed in between the first application and the other two, but there's no particular need for them *not* to share credentials either.

Most of the time, it won't be relevant to a moderator what Ubuntu version or package they're moderating for: they'll just go to the moderation app's base URL and be presented with a global list of flagged reviews, most-flagged first. The main thing that they'll want to filter on is language, and some moderators will be multilingual, so that probably makes more sense as a query parameter (&lang=en+hu) than as a path.

Revision history for this message
Michael Nelson (michael.nelson) wrote :

On Fri, Dec 3, 2010 at 1:20 PM, Matthew Paul Thomas <email address hidden> wrote:
> In my current sketches, developer.ubuntu.com:
> * lets moderators moderate reviews - developer.ubuntu.com/reviews (or maybe that should be developer.ubuntu.com/moderate?)

Either of those would be fine (out of those options, I'd prefer
/reviews/ as it better encapsulates what the app is about -
reviews+moderation - even if it means urls like /reviews/reviews/2/,
/reviews/moderations/3/ etc.).

Going slightly off-topic (perhaps something for the Monday call), but
why do we want reviews+moderations to be at developer.ubuntu.com at
all? Wouldn't software-center.ubuntu.com be a better long-term
location (especially if the web app eventually exposes reviews via the
UI - for whom the target audience would not be developers, but even
just for the moderation functionality, which is not developer specific
like the other resources you've listed below.

> * lets developers submit applications, track their submissions, and see installation stats (and eventually reviews) for their published applications - developer.ubuntu.com/status
> * lets anyone browse and search a knowledge base about developing on Ubuntu - developer.ubuntu.com/kb.
>
> It shouldn't be particularly common for moderators to be developers, so
> there's no particular need for them to stay signed in between the first
> application and the other two, but there's no particular need for them
> *not* to share credentials either.
>
> Most of the time, it won't be relevant to a moderator what Ubuntu
> version or package they're moderating for: they'll just go to the
> moderation app's base URL and be presented with a global list of flagged
> reviews, most-flagged first. The main thing that they'll want to filter
> on is language, and some moderators will be multilingual, so that
> probably makes more sense as a query parameter (&lang=en+hu) than as a
> path.

Yep, I'm adding a branch that makes that transition now.

Cheers,

Revision history for this message
Matthew Paul Thomas (mpt) wrote :

Ok, /reviews it is. :-)

There are two read-write actions for reviews, that I can think of. The short-term one is moderation. The longer-term one is letting developers reply to reviews. Maybe I'm wrong, but it seems to me that they should reply in the same place as they see those reviews and average ratings and installation statistics for their applications, which should be the same place they submit those applications in the first place, which should be developer.ubuntu.com.

You're quite right that we should eventually publish reviews on the Web for the benefit of people who aren't running Ubuntu right now. But that will be a read-only action, so I think it has quite a bit less in common with moderating than replying does. And we already have a site for that, apt.ubuntu.com (currently being worked on at <https://launchpad.net/apturlredirector>). apt.ubuntu.com is not a great domain name, but I think software-center.ubuntu.com would be a bit worse for two reasons: (1) there's apparently a non-zero chance that USC will be renamed in the medium term, and (2) software-center.ubuntu.com is 12 characters longer and therefore quite a bit less tweetable.

Revision history for this message
Michael Nelson (michael.nelson) wrote :

Back to one of the main points of this bug, just recording the points from a discussion with mvo about:

2) for review stats the client will need to request a set of stats based on what the client wants to display in the UI (ie. user searches for 'game') (eg. /en/ubuntu/hardy/+review-stats/package_names/foo,bar,blah,url,encoded,of,course/ - if the client always requests the same number of items in alphabetical order, we may get a decent cache hit rate?)

mvo was saying that even with potentially 30k packages with reviews, it may still be better to simply return statistics for all packages at this (or some other) URL for the client to store locally. Together with strategies such as the client initially requesting the stats in a background process, subsequent requests (ie. following days) the client would just request the changes since the previous day. The responses for both of these requests could also be written to file each day on the server and served directly by apache (Note: computers that are only on every few days would need to either grab the complete file, or somehow grab subsequent updates?)

So instead of serving lots of small potentially-but-not-always-cached responses to USC clients (ie. users searching locally for 'classic games' or something), we could serve one-very-large-but-always-cached response to each client seldomly, and smaller diffs for daily changes (also heavily cached).

Changed in rnr-server:
status: New → In Progress
assignee: nobody → Michael Nelson (michael.nelson)
Revision history for this message
Michael Nelson (michael.nelson) wrote :
Download full text (6.9 KiB)

Regarding comment 8 above (getting review stats to the client), this was discussed on IRC as follows:

15:14 < noodles> achuni: hi! If you've time before the SC meeting, mvo and I were keen to get your input re: https://bugs.launchpad.net/rnr-server/+bug/681471/comments/8
15:14 < _mup_> Bug #681471: URLs need to be (more) cacheable <Ratings and Reviews server:New> <https://launchpad.net/bugs/681471>
15:17 < achuni> noodles: hm
15:17 < achuni> noodles: I guess it'll depend of what exactly "always cached" means :)
15:17 < achuni> noodles: as we'll need to expire the cache every so often
15:18 * noodles scans the text...
15:18 < achuni> how much would a reasonable cache expiry time be for this? 5 minutes? 1 hour?
15:18 < noodles> achuni: you mean if all the stats were stored locally with the client? mvo says it could be 1 day.
15:19 < achuni> noodles: would that be kind of sucky for people that rate an app and expect their opinion to appear at some point relatively soon?
15:19 < noodles> achuni: and by "always cached" there, the idea was that a cron would generate the stats to file (gzipped?) and so any request would always be cached (ie. coming from disk).
15:19 < noodles> achuni: yes.
15:20 < achuni> if it's generate the full set of stats once a day it would definitely work for us I think
15:20 < noodles> achuni: actually, I'm sure it wouldn't be hard for mvo to update the local value based on the new review, even if no one else would see it.
15:20 < achuni> also, would the diff approach be needed right from the start or could we add that on later?
15:20 < noodles> And their review will appear when reviews for an app are requested, just not in the stats seen by other people.
15:21 < achuni> noodles: right, that would make a lot of sense
15:22 < achuni> noodles: we should check more or less how much the full set of stats will occupy gzipped
15:22 < noodles> Yep. Say for 30k packages.
15:23 < achuni> right. this would be requested by the desktop app once a day?
15:23 < achuni> (more often would be pointless if we're caching it for that long)
15:24 < noodles> achuni: aiui yes. Although we did talk about different options (ie. client requests once per day, or client always requests with headers and we reply appropriately if it's been less than 24hrs etc.)
15:24 < noodles> s/always requests/requests on startup/
15:30 < mvo> hello noodles and achuni: maybe a full day is a bit long, but we can tune it and use etag in the client to ensure we don't re-request if nothing changes. I was thinking about something like 1h or 4h (or up to a day if it turns out to be a problem)
15:31 < mvo> we can add the "diff" approach later and could have a very simple schema like "prev-day", "pre-week", "pre-month", "all"
15:31 < mvo> even if the info is 3 days old we would go for 7 days and pay the (probably relatively small) price for info we already have
15:31 < mvo> but this schema is super simple
15:32 < mvo> that assume of course that the bulk of package reviews will not change frequently

15:33 < mvo> about stale information: it would only affect the overview page and the client can update the local stats when it notices that the "details" info is != the stats ...

Read more...

Revision history for this message
Michael Nelson (michael.nelson) wrote :

I'm still not 100% that this is the best way forward, but as a pre-implementation discussion for enabling the client to have all stats locally (as discussed above), we can:

1) Provide a command run a cron job with whatever frequency we like (4hrs?) that publishes the full statistics:
  * review_stats_all.json,
  * review_stats_changes_today.json
2) A second command for a cron job that runs daily publishing:
  * review_stats_changes_last_day.json
  * review_stats_changes_last_3_days.json
  * review_stats_changes_last_7_days.json

With these available, the client could always decide whether to grab the previous day, 3 days updates, 7 days or all the stats, depending how many days ago it last updated stats. I think that's a correct summary of mvo's idea?

Revision history for this message
Michael Nelson (michael.nelson) wrote :

Chatted with Lukasz about this and will instead just cache the django views - we can always configure squid to then cache the result for us. So in addition to:

 * /api/review-stats/

I'll add:

 * /api/review-stats/updates-last-[1,3,7]-days/

so we can always configure more options if we want simply in the url.conf.

Changed in rnr-server:
status: In Progress → Fix Committed
Changed in rnr-server:
milestone: none → 11.01
status: Fix Committed → Fix Released
affects: software-center → software-center (Ubuntu)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.