Bug #888756 “dynamic bug listings do round trips for new batches... : Bugs : Launchpad itself

Aaron Bentley (abentley) on 2011-11-10

Changed in launchpad:
status:	New → In Progress
importance:	Undecided → High
assignee:	nobody → Aaron Bentley (abentley)

Revision history for this message

Robert Collins (lifeless) wrote on 2011-11-10: Re: [Bug 888756] [NEW] dynamic bug listings do round trips for new batches

#1

Uhm lets not do this, This will increase server load and most of the
time the user won't be following those links.

Revision history for this message

Aaron Bentley (abentley) wrote on 2011-11-11:

#2

I think most of the time the user *will* be following these links. Not all of them, but at least the next link. I say "I think", because I don't have the actual data. Do you?

Revision history for this message

Robert Collins (lifeless) wrote on 2011-11-11: Re: [Bug 888756] Re: dynamic bug listings do round trips for new batches

#3

Download full text (3.7 KiB)

On Sat, Nov 12, 2011 at 4:01 AM, Aaron Bentley <email address hidden> wrote:
> I think most of the time the user *will* be following these links. Not
> all of them, but at least the next link. I say "I think", because I
> don't have the actual data. Do you?

next is plausible. We have enough data in our logs to run an analysis
if desired. I don't know at what % of times someone clicking next
would be enough to justify calculating and throwing away a search
result.

In the datacentre, getting https://bugs.launchpad.net/ubuntu/+bugs via
wget - 3.04seconds to generate, 3.129s end to end - the primary reason
round trips are slow on bug searches is that the bug search batch
generation is slow. For me locally.
https://bugs.launchpad.net/launchpad/+bugs just took 2.2s to render
and 4.2s end to end. So the round trip component for me is ~2 seconds,
and the time spent on a service point doing the generates was 2.2
seconds. If there was eager loading then I could click next and get it
immediately *after another 4 seconds*. Clicking next twice in a row
would still be slow. And if I don't click next, then the computation
is wasted.

Google have made a few big presentations recently about eager loading
and discarding results to get a very very snappy UI. Thats pretty
cool, but they (AFAIK) only do it on search - they don't do it for
google code bug reports, nor for gmail. They are also doing it because
it brings them revenue - the more searches folk do the more time they
spend searching, according to their in-house research... and the more
adverts folk see. I don't know if the google presentations have
influenced this bug, but if they have - I think the goal is great, but
I don't think the situations are similar.

If doing another batch was a cheap operation, I would have no concerns
about us having everyone that searches do a second batch automatically
(though I would suggest you just send the second immediately, because
/most/ of our search results don't deliver incrementally from the DB;
grabbing 150 items at once is much cheaper than two x 75.

https://devpad.canonical.com/~lpqateam/ppr/lpnet/latest-monthly-pageids.html
shows that
Person:+bugs was hit 470K times in the last month, with a mean render
time of 0.36 seconds (most persons have few related bugs). Thats about
2 CPU days of time on our cluster. If next is rarely clicked, we'll be
doubling that; if its often clicked, the overhead of eager-loading
(just next) will be inconsequential. So the impact ranges from 'does
not matter' to 'matters quite a lot'.
DistributionSourcePackage:+bugs 114K times with avg 0.71 - about one CPU day.
Product:+bugs - 53K times, avg 0.71 - 1/2 CPU day.
SourcePackage:+bugs 43K @ 0.35s - 0.17 CPU days
Distribution:+bugs 20K @ 3.76 for 0.86 CPU days
MaloneApplication:+bugs 13K@6s - 0.9 CPU days
DistroSeries+bugs 10K@1.18 - 0.13 CPU days
ProductSeries:+bugs 5.5K@0.23 - 0.01 CPU days
ProjectGroup:+bugs 4K@0.73 - 0.03 CPU days

TBH I had expected rather more page impressions on bug search. And
this doesn't discriminate between batch retrieval and surrounding UI
as yet. Nevertheless, summing the current figures: we spend 3.6 CPU
days a month processing +bugs, or ~10% of a...

On Sat, Nov 12, 2011 at 4:01 AM, Aaron Bentley <aaron@canonical.com> wrote:
> I think most of the time the user *will* be following these links.  Not
> all of them, but at least the next link.  I say "I think", because I
> don't have the actual data.  Do you?

next is plausible. We have enough data in our logs to run an analysis
if desired. I don't know at what % of times someone clicking next
would be enough to justify calculating and throwing away a search
result.

In the datacentre, getting https://bugs.launchpad.net/ubuntu/+bugs via
wget - 3.04seconds to generate, 3.129s end to end - the primary reason
round trips are slow on bug searches is that the bug search batch
generation is slow. For me locally.
https://bugs.launchpad.net/launchpad/+bugs just took 2.2s to render
and 4.2s end to end. So the round trip component for me is ~2 seconds,
and the time spent on a service point doing the generates was 2.2
seconds. If there was eager loading then I could click next and get it
immediately *after another 4 seconds*. Clicking next twice in a row
would still be slow. And if I don't click next, then the computation
is wasted.

Google have made a few big presentations recently about eager loading
and discarding results to get a very very snappy UI. Thats pretty
cool, but they (AFAIK) only do it on search - they don't do it for
google code bug reports, nor for gmail. They are also doing it because
it brings them revenue - the more searches folk do the more time they
spend searching, according to their in-house research... and the more
adverts folk see. I don't know if the google presentations have
influenced this bug, but if they have - I think the goal is great, but
I don't think the situations are similar.

If doing another batch was a cheap operation, I would have no concerns
about us having everyone that searches do a second batch automatically
(though I would suggest you just send the second immediately, because
/most/ of our search results don't deliver incrementally from the DB;
grabbing 150 items at once is much cheaper than two x 75.

https://devpad.canonical.com/~lpqateam/ppr/lpnet/latest-monthly-pageids.html
shows that
Person:+bugs was hit 470K times in the last month, with a mean render
time of 0.36 seconds (most persons have few related bugs). Thats about
2 CPU days of time on our cluster. If next is rarely clicked, we'll be
doubling that; if its often clicked, the overhead of eager-loading
(just next) will be inconsequential. So the impact ranges from 'does
not matter' to 'matters quite a lot'.
DistributionSourcePackage:+bugs 	114K times with avg 0.71 - about one CPU day.
Product:+bugs - 53K times, avg 0.71 - 1/2 CPU day.
SourcePackage:+bugs 43K @ 0.35s - 0.17 CPU days
Distribution:+bugs 20K @ 3.76 for 0.86 CPU days
MaloneApplication:+bugs 13K@6s - 0.9 CPU days
DistroSeries+bugs 10K@1.18 - 0.13 CPU days
ProductSeries:+bugs 5.5K@0.23 - 0.01 CPU days
ProjectGroup:+bugs 4K@0.73 - 0.03 CPU days

TBH I had expected rather more page impressions on bug search. And
this doesn't discriminate between batch retrieval and surrounding UI
as yet. Nevertheless, summing the current figures: we spend 3.6 CPU
days a month processing +bugs, or ~10% of a CPU. This is shared
between appservers and DB, roughly 50-50.
If we increased that by 10 fold (all links), we'd be taking up an
additional 0.45 cores in our DB cluster, which is not very scalable at
the moment, and is very expensive to scale.
Doubling it (direction-of-travel eager loading only) would be much
more modest and I suspect we can support that without additional
hardware today if we need to.

However, I believe this is premature optimisation - we would be better
off investing in faster search generation than in doing the work
speculatively, *at this point*.

-Rob

Revision history for this message

Aaron Bentley (abentley) wrote on 2011-11-11:

#4

Download full text (4.1 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11-11-11 01:57 PM, Robert Collins wrote:
> next is plausible. We have enough data in our logs to run an
> analysis if desired. I don't know at what % of times someone
> clicking next would be enough to justify calculating and throwing
> away a search result.

Next is a special case, because you can also just double the initial
batch size and hide half the results. Or indeed, we could halve the
displayed batch size, because 75 rows is >4 screenfuls on a 1080p
monitor, and that's quite excessive if the bug listings are just as
responsive as scrolling.

> In the datacentre, getting https://bugs.launchpad.net/ubuntu/+bugs
> via wget - 3.04seconds to generate, 3.129s end to end - the primary
> reason round trips are slow on bug searches is that the bug search
> batch generation is slow.

If, as you later implied, half of that is appserver time, shipping
JSON across the wire might be cheaper. My best-of-3 nova/+bug time is
2.50s, while best-of-3 nova/+bug/++model++ is 2.27. However, there is
a lot of variation.

> For me locally. https://bugs.launchpad.net/launchpad/+bugs just
> took 2.2s to render and 4.2s end to end. So the round trip
> component for me is ~2 seconds, and the time spent on a service
> point doing the generates was 2.2 seconds. If there was eager
> loading then I could click next and get it immediately *after
> another 4 seconds*.

Ah, you mean that once the browser receives the page, it will
pre-fetch the results, and that will take another 4 seconds, eh?

This is true, but I assume the user will look at the results before
clicking next, so it will still be faster than otherwise.

This is also an argument in favour of shipping the next batch
simultaneously with the initial page load, per "Next is a special case..."

> Clicking next twice in a row would still be slow.

(I assume you mean clicking next once, and then clicking it again
before the we could pre-fetch the new next result, i.e. the 3rd batch.)

True, but I don't think that's the common case. As above, I assume
the user will look at the results before clicking "next". So I'd
expect that only happens when the search is very poorly selected. We
could also consider pre-fetching even more batches.

> And if I don't click next, then the computation is wasted.

You could also see it as an investment in reducing overall latency.

> I don't know if the google presentations have influenced this bug,
> but if they have - I think the goal is great, but I don't think the
> situations are similar.

No, they haven't influenced me. I'm thinking more of Facebook and
other ajaxy sites. Facebook's photo viewer, for example, is a joy to use.

> https://devpad.canonical.com/~lpqateam/ppr/lpnet/latest-monthly-pageids.html
>
>
shows that
> Person:+bugs was hit 470K times in the last month, with a mean
> render time of 0.36 seconds (most persons have few related bugs).
> Thats about 2 CPU days of time on our cluster. If next is rarely
> clicked, we'll be doubling that; if its often clicked, the overhead
> of eager-loading (just next) will be inconsequential.

Just how few related bugs do people have? If it's less than 75 on
average, t...

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11-11-11 01:57 PM, Robert Collins wrote:
> next is plausible. We have enough data in our logs to run an
> analysis if desired. I don't know at what % of times someone
> clicking next would be enough to justify calculating and throwing
> away a search result.

Next is a special case, because you can also just double the initial
batch size and hide half the results.  Or indeed, we could halve the
displayed batch size, because 75 rows is >4 screenfuls on a 1080p
monitor, and that's quite excessive if the bug listings are just as
responsive as scrolling.

> In the datacentre, getting https://bugs.launchpad.net/ubuntu/+bugs
> via wget - 3.04seconds to generate, 3.129s end to end - the primary
> reason round trips are slow on bug searches is that the bug search
> batch generation is slow.

If, as you later implied, half of that is appserver time, shipping
JSON across the wire might be cheaper.  My best-of-3 nova/+bug time is
2.50s, while best-of-3 nova/+bug/++model++ is 2.27.  However, there is
a lot of variation.

> For me locally. https://bugs.launchpad.net/launchpad/+bugs just
> took 2.2s to render and 4.2s end to end. So the round trip
> component for me is ~2 seconds, and the time spent on a service
> point doing the generates was 2.2 seconds. If there was eager
> loading then I could click next and get it immediately *after
> another 4 seconds*.

Ah, you mean that once the browser receives the page, it will
pre-fetch the results, and that will take another 4 seconds, eh?

This is true, but I assume the user will look at the results before
clicking next, so it will still be faster than otherwise.

This is also an argument in favour of shipping the next batch
simultaneously with the initial page load, per "Next is a special case..."

> Clicking next twice in a row would still be slow.

(I assume you mean clicking next once, and then clicking it again
before the we could pre-fetch the new next result, i.e. the 3rd batch.)

True, but I don't think that's the common case.  As above, I assume
the user will look at the results before clicking "next".  So I'd
expect that only happens when the search is very poorly selected.  We
could also consider pre-fetching even more batches.

> And if I don't click next, then the computation is wasted.

You could also see it as an investment in reducing overall latency.

> I don't know if the google presentations have influenced this bug,
> but if they have - I think the goal is great, but I don't think the
> situations are similar.

No, they haven't influenced me.  I'm thinking more of Facebook and
other ajaxy sites.  Facebook's photo viewer, for example, is a joy to use.

> https://devpad.canonical.com/~lpqateam/ppr/lpnet/latest-monthly-pageids.html
>
> 
shows that
> Person:+bugs was hit 470K times in the last month, with a mean
> render time of 0.36 seconds (most persons have few related bugs).
> Thats about 2 CPU days of time on our cluster. If next is rarely
> clicked, we'll be doubling that; if its often clicked, the overhead
> of eager-loading (just next) will be inconsequential.

Just how few related bugs do people have?  If it's less than 75 on
average, there won't be any "next" to fetch.

> However, I believe this is premature optimisation

I believe that your round-trip-time is too slow, all by itself.  We
should be aiming for 100 ms where we can achieve it.

http://www.useit.com/alertbox/response-times.html

That's the kind of responsiveness that will make users love us.  I
know of no way we can achieve that response time without pre-fetching.
 So I don't think this is premature.

> - we would be better off investing in faster search generation than
> in doing the work speculatively, *at this point*.

Both of these approaches can improve the user experience.  If we do
pre-fetching, then faster search generation becomes a way to reduce
initial load time and save money by reducing hardware requirements.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk69jIkACgkQ0F+nu1YWqI1ZWgCfeODzXWFzPoy++3CQlreUaSaq
BN0AnRFaMDxT48goMdfSiL+cUzj6R2kU
=K5b+
-----END PGP SIGNATURE-----

Revision history for this message

Martin Pool (mbp) wrote on 2011-11-16:

#5

On 12 November 2011 02:01, Aaron Bentley <email address hidden> wrote:
> I think most of the time the user *will* be following these links. Not
> all of them, but at least the next link. I say "I think", because I
> don't have the actual data. Do you?

I barely ever look at the second page of results: if my search results
weren't good enough I will either tweak my search or more likely do
another search through my mail or through google.

I guess you could look at the access logs to see how often people
actually proceed to the next page?

If people rarely click that link there is not much point doing work or
spending cpu time to preload it.

--
Martin

Revision history for this message

Aaron Bentley (abentley) wrote on 2011-11-16:

#6

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11-11-16 03:36 AM, Martin Pool wrote:
> On 12 November 2011 02:01, Aaron Bentley <email address hidden>
> wrote:
>> I think most of the time the user *will* be following these
>> links. Not all of them, but at least the next link. I say "I
>> think", because I don't have the actual data. Do you?
>
> I barely ever look at the second page of results: if my search
> results weren't good enough I will either tweak my search or more
> likely do another search through my mail or through google.
>
> I guess you could look at the access logs to see how often people
> actually proceed to the next page?

That's what I was thinking of when I said "actual data".

> If people rarely click that link there is not much point doing work
> or spending cpu time to preload it.

I often click the link. Anecdotes are interesting, but they don't
prove anything.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7D3m4ACgkQ0F+nu1YWqI3euwCeNSO9DOGwtdb8xp31781yKEaB
CNAAnR1ck2OXaG1/Jsci3MgcplPvjHKm
=Q751
-----END PGP SIGNATURE-----

Revision history for this message

Martin Pool (mbp) wrote on 2011-11-16:

#7

>> I guess you could look at the access logs to see how often people
>> actually proceed to the next page?
>
> That's what I was thinking of when I said "actual data".

I was curious so I looked at recent access data:

In the window I looked at (current logs from soybean, all instances),
there were 30370 total requests for +bugs, and 390 of those had a
start= parameter. So naively it's about 1.2%.

Many of them are Googlebot, various other bots, or the
python-launchpad-bugs pseudo-api-client (which is scraping a huge
amount of data).

With them cut out it's down to 5887 requests in the window I looked
at, and only 67 requests from humans for pages after 0, so as it
happens that's still about 1.1%.

If those numbers are correct, then the 'next' link is rarely clicked
and it will not be worth preloading it.

Revision history for this message

Launchpad QA Bot (lpqabot) wrote on 2011-11-23:

#8

Fixed in stable r14352 <http://bazaar.launchpad.net/~launchpad-pqm/launchpad/stable/revision/14352>.

tags:	added: qa-needstesting
Changed in launchpad:
status:	In Progress → Fix Committed

Aaron Bentley (abentley) on 2011-11-23

tags:

added: qa-ok
removed: qa-needstesting

Raphaël Badin (rvb) on 2011-11-24

Changed in launchpad:
status:	Fix Committed → Fix Released

Aaron Bentley (abentley) on 2011-11-24

tags:

added: bug-columns

Launchpad itself

dynamic bug listings do round trips for new batches

Bug Description

Related branches

Other bug subscribers

Remote bug watches