Ubuntu
software-center package

Bug #681471
Comment #9

Comment 9 for bug 681471

Revision history for this message

Michael Nelson (michael.nelson) wrote on 2010-12-07:

Regarding comment 8 above (getting review stats to the client), this was discussed on IRC as follows:

15:14 < noodles> achuni: hi! If you've time before the SC meeting, mvo and I were keen to get your input re: https://bugs.launchpad.net/rnr-server/+bug/681471/comments/8
15:14 < _mup_> Bug #681471: URLs need to be (more) cacheable <Ratings and Reviews server:New> <https://launchpad.net/bugs/681471>
15:17 < achuni> noodles: hm
15:17 < achuni> noodles: I guess it'll depend of what exactly "always cached" means :)
15:17 < achuni> noodles: as we'll need to expire the cache every so often
15:18 * noodles scans the text...
15:18 < achuni> how much would a reasonable cache expiry time be for this? 5 minutes? 1 hour?
15:18 < noodles> achuni: you mean if all the stats were stored locally with the client? mvo says it could be 1 day.
15:19 < achuni> noodles: would that be kind of sucky for people that rate an app and expect their opinion to appear at some point relatively soon?
15:19 < noodles> achuni: and by "always cached" there, the idea was that a cron would generate the stats to file (gzipped?) and so any request would always be cached (ie. coming from disk).
15:19 < noodles> achuni: yes.
15:20 < achuni> if it's generate the full set of stats once a day it would definitely work for us I think
15:20 < noodles> achuni: actually, I'm sure it wouldn't be hard for mvo to update the local value based on the new review, even if no one else would see it.
15:20 < achuni> also, would the diff approach be needed right from the start or could we add that on later?
15:20 < noodles> And their review will appear when reviews for an app are requested, just not in the stats seen by other people.
15:21 < achuni> noodles: right, that would make a lot of sense
15:22 < achuni> noodles: we should check more or less how much the full set of stats will occupy gzipped
15:22 < noodles> Yep. Say for 30k packages.
15:23 < achuni> right. this would be requested by the desktop app once a day?
15:23 < achuni> (more often would be pointless if we're caching it for that long)
15:24 < noodles> achuni: aiui yes. Although we did talk about different options (ie. client requests once per day, or client always requests with headers and we reply appropriately if it's been less than 24hrs etc.)
15:24 < noodles> s/always requests/requests on startup/
15:30 < mvo> hello noodles and achuni: maybe a full day is a bit long, but we can tune it and use etag in the client to ensure we don't re-request if nothing changes. I was thinking about something like 1h or 4h (or up to a day if it turns out to be a problem)
15:31 < mvo> we can add the "diff" approach later and could have a very simple schema like "prev-day", "pre-week", "pre-month", "all"
15:31 < mvo> even if the info is 3 days old we would go for 7 days and pay the (probably relatively small) price for info we already have
15:31 < mvo> but this schema is super simple
15:32 < mvo> that assume of course that the bulk of package reviews will not change frequently

15:33 < mvo> about stale information: it would only affect the overview page and the client can update the local stats when it notices that the "details" info is != the stats info
15:34 < mvo> but yeah, its not ideal that there is a certain lack like this, but it seems like the alternative is not complelling either (that being a lot of http requests for the visible range plus the need to refresh that info too)
15:34 < achuni> mvo: requesting the full set of ratings once an hour could be a b/w hog for us
15:35 < achuni> I imagine *some* stat will have changed over the last hour as soon as rnr takes on any kind of momentum
15:36 < mvo> achuni: indeed, I just checked, a naive pkgname, nr1, nr2 gzip compressed is still ~150kb
15:36 < achuni> mvo: that's for 30k apps?
15:36 < mvo> achuni: so that sounds like we need a diff approach if we want to pull it of with this meachnism
15:36 < mvo> yep
15:36 < mvo> well, let me count
15:36 < mvo> I dumped my aptcache to disk
15:37 < mvo> 33,5k
15:37 < achuni> ah so even a better estimate than 30k :)
15:37 < mvo> that is without json overhead or anything like that, just "pkgname nr1 nr2"
15:37 < achuni> right
15:38 < mvo> I'm absolutely open for alternatives :) it just seems like we need the full thing at some point because s-c will want to rank searches based on populatirty in the future
15:38 < achuni> right. Requesting the full thing once at least to bootstrap ratings makes sense
15:39 < achuni> after that, requesting "for the apps the user looks at" sounds nice, but seems to be hard to specify
15:40 < mvo> yeah, and we want to prefetch for scrolling etc
15:41 < mvo> my gut feeling is that a diff approach is both simpler and cheaper
15:41 < mvo> and it does not even have to be a diff, just a "this changed the last 1 day" and that just overrides what the client has
15:41 < mvo> so once it has the full set and knows when it updated last it can simply use the diffs to keep track
15:42 < mvo> (unless I missed something of course ;)
15:43 < noodles> We could still cache paginated versions of that request if we wanted too (ie. batches of 1000)
15:44 < achuni> mvo: diff approach sounds right
15:45 < achuni> noodles: that request = full set of stats?
15:45 < noodles> achuni: yep.
15:46 < achuni> noodles: still unless we provide the stats "in the order SC needs them" it would have to request the full set up front I think?
15:46 < achuni> mvo: dunno if it would make sense to paginate ratings ^
15:47 < noodles> achuni: not sure I follow - I just mean that /.../review-stats/ could return the first 1000, then /.../review-stats/page/1/ the next 1000 etc. (all cached)
15:47 < mvo> that would put the burden to get them al on the client?
15:48 < noodles> mvo: yeah, it would just be a compromise (if we needed it - I don't know). ie. to ensure requests are only ever <=100k
15:48 < achuni> noodles: whay I say is that most of the time the client will need to just set to and request all 30 or so pages
15:49 < noodles> achuni: Right, as in (if we needed to) the client would request all pages until it has the whole set?
15:49 < achuni> yup, I'm not sure that makes more sense than just serving the full set all together
15:49 < noodles> achuni: but right, if apache is handling the....
15:49 < noodles> Yep.
15:50 < achuni> noodles: mvo: would we gain much if we say "any pkg/app not mention has 0 votes"?
15:50 < achuni> not mention*ed*
15:50 < achuni> and then just list the apps with >0 votes
15:51 < achuni> it would be a temporary win I guess
15:51 < mvo> yes
15:52 < mvo> I think that is totally the right way
15:52 < noodles> achuni: afaics we alreeady do that.
15:52 < mvo> as e.g. libwebkit-1.0 will not get that many votes
15:52 < noodles> (ie. we're currently only aggregating the Reviews that we have, but right, when we update to the cached totals/averages, we need to ensure we keep doing that)
15:52 < mvo> or libace-5.7.7
15:53 * achuni checks
15:53 < mvo> we have 12k packages with lib in the name
15:55 < achuni> noodles: right

Regarding comment 8 above (getting review stats to the client), this was discussed on IRC as follows:

15:14 < noodles> achuni: hi! If you've time before the SC meeting, mvo and I were keen to get your input re: https://bugs.launchpad.net/rnr-server/+bug/681471/comments/8
15:14 < _mup_> Bug #681471: URLs need to be (more) cacheable <Ratings and Reviews server:New> <https://launchpad.net/bugs/681471>
15:17 < achuni> noodles: hm
15:17 < achuni> noodles: I guess it'll depend of what exactly "always cached" means :)
15:17 < achuni> noodles: as we'll need to expire the cache every so often
15:18  * noodles scans the text...
15:18 < achuni> how much would a reasonable cache expiry time be for this?  5 minutes? 1 hour?
15:18 < noodles> achuni: you mean if all the stats were stored locally with the client? mvo says it could be 1 day.
15:19 < achuni> noodles: would that be kind of sucky for people that rate an app and expect their opinion to appear at some point relatively soon?
15:19 < noodles> achuni: and by "always cached" there, the idea was that a cron would generate the stats to file (gzipped?) and so any request would always be cached (ie. coming from disk).
15:19 < noodles> achuni: yes.
15:20 < achuni> if it's generate the full set of stats once a day it would definitely work for us I think
15:20 < noodles> achuni: actually, I'm sure it wouldn't be hard for mvo to update the local value based on the new review, even if no one else would see it.
15:20 < achuni> also, would the diff approach be needed right from the start or could we add that on later?
15:20 < noodles> And their review will appear when reviews for an app are requested, just not in the stats seen by other people.
15:21 < achuni> noodles: right, that would make a lot of sense
15:22 < achuni> noodles: we should check more or less how much the full set of stats will occupy gzipped
15:22 < noodles> Yep. Say for 30k packages.
15:23 < achuni> right.  this would be requested by the desktop app once a day?
15:23 < achuni> (more often would be pointless if we're caching it for that long)
15:24 < noodles> achuni: aiui yes. Although we did talk about different options (ie. client requests once per day, or client always requests with headers and we reply appropriately if it's been less than 24hrs etc.)
15:24 < noodles> s/always requests/requests on startup/
15:30 < mvo> hello noodles and achuni: maybe a full day is a bit long, but we can tune it and use etag in the client to ensure we don't re-request if nothing changes. I was thinking about something like 1h or 4h (or up to a day if it turns out to be a problem)
15:31 < mvo> we can add the "diff" approach later and could have a very simple schema like "prev-day", "pre-week", "pre-month", "all"
15:31 < mvo> even if the info is 3 days old we would go for 7 days and pay the (probably relatively small) price for info we already have
15:31 < mvo> but this schema is super simple
15:32 < mvo> that assume of course that the bulk of package reviews  will not change frequently

15:33 < mvo> about stale information: it would only affect the overview page and the client can update the local stats when it notices that the "details" info is != the stats info
15:34 < mvo> but yeah, its not ideal that there is a certain lack like this, but it seems like the alternative is not complelling either (that being a lot of http requests for the visible range plus the need to refresh that info too)
15:34 < achuni> mvo: requesting the full set of ratings once an hour could be a b/w hog for us
15:35 < achuni> I imagine *some* stat will have changed over the last hour as soon as rnr takes on any kind of momentum
15:36 < mvo> achuni: indeed, I just checked, a naive pkgname, nr1, nr2 gzip compressed is still ~150kb
15:36 < achuni> mvo: that's for 30k apps?
15:36 < mvo> achuni: so that sounds like we need a diff approach if we want to pull it of with this meachnism
15:36 < mvo> yep
15:36 < mvo> well, let me count
15:36 < mvo> I dumped my aptcache to disk
15:37 < mvo> 33,5k
15:37 < achuni> ah so even a better estimate than 30k :)
15:37 < mvo> that is without json overhead or anything like that, just "pkgname nr1 nr2"
15:37 < achuni> right
15:38 < mvo> I'm absolutely open for alternatives :) it just seems like we need the full thing at some point because s-c will want to rank searches based on populatirty in the future
15:38 < achuni> right.  Requesting the full thing once at least to bootstrap ratings makes sense
15:39 < achuni> after that, requesting "for the apps the user looks at" sounds nice, but seems to be hard to specify
15:40 < mvo> yeah, and we want to prefetch for scrolling etc
15:41 < mvo> my gut feeling is that a diff approach is both simpler and cheaper
15:41 < mvo> and it does not even have to be a diff, just a "this changed the last 1 day" and that just overrides what the client has
15:41 < mvo> so once it has the full set and knows when it updated last it can simply use the diffs to keep track
15:42 < mvo> (unless I missed something of course ;)
15:43 < noodles> We could still cache paginated versions of that request if we wanted too (ie. batches of 1000)
15:44 < achuni> mvo: diff approach sounds right
15:45 < achuni> noodles: that request = full set of stats?
15:45 < noodles> achuni: yep.
15:46 < achuni> noodles: still unless we provide the stats "in the order SC needs them" it would have to request the full set up front I think?
15:46 < achuni> mvo: dunno if it would make sense to paginate ratings ^
15:47 < noodles> achuni: not sure I follow - I just mean that /.../review-stats/  could return the first 1000, then /.../review-stats/page/1/ the next 1000 etc. (all cached)
15:47 < mvo> that would put the burden to get them al on the client?
15:48 < noodles> mvo: yeah, it would just be a compromise (if we needed it - I don't know). ie. to ensure requests are only ever <=100k
15:48 < achuni> noodles: whay I say is that most of the time the client will need to just set to and request all 30 or so pages
15:49 < noodles> achuni: Right, as in (if we needed to) the client would request all pages until it has the whole set?
15:49 < achuni> yup, I'm not sure that makes more sense than just serving the full set all together
15:49 < noodles> achuni: but right, if apache is handling the....
15:49 < noodles> Yep.
15:50 < achuni> noodles: mvo: would we gain much if we say "any pkg/app not mention has 0 votes"?
15:50 < achuni> not mention*ed*
15:50 < achuni> and then just list the apps with >0 votes
15:51 < achuni> it would be a temporary win I guess
15:51 < mvo> yes
15:52 < mvo> I think that is totally the right way
15:52 < noodles> achuni: afaics we alreeady do that.
15:52 < mvo> as e.g. libwebkit-1.0 will not get that many votes
15:52 < noodles> (ie. we're currently only aggregating the Reviews that we have, but right, when we update to the cached totals/averages, we need to ensure we keep doing that)
15:52 < mvo> or libace-5.7.7
15:53  * achuni checks
15:53 < mvo> we have 12k packages with lib in the name
15:55 < achuni> noodles: right

Ubuntusoftware-center package

Comment 9 for bug 681471

Ubuntu
software-center package