lightweight checkout slower than branch over hpss

Bug #368717 reported by James Westby
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Wishlist
Jelmer Vernooij

Bug Description

Hi,

Grabbing a lightweight checkout over hpss can now be slower than grabbing the
branch. I assume this is because "branch" has had a lot of work done on it and streams
all the data down and "checkout --lightweight" is grabbing each text individually.

Tested with lp:ubuntu-doc which is currently:

Standalone branch (format: rich-root-pack)
Location:
  branch root: sftp://bazaar.launchpad.net/%7Eubuntu-core-doc/ubuntu-doc/ubuntu-karmic/

Related branches:
  parent branch: /home/matt/ubuntu-doc/ubuntu-jaunty

Format:
       control: Meta directory format 1
        branch: Branch format 6
    repository: Packs containing knits with rich root support

Branch history:
       287 revisions
       391 days old
   first revision: Tue 2008-04-01 22:14:02 +0100
  latest revision: Tue 2009-04-28 08:30:16 +0100

Repository:
       369 revisions

Related branches

Revision history for this message
James Westby (james-w) wrote :

Oh, and if I am correct about the cause I would consider it a *good* thing,
I'd just like the same love applied to lightweight checkouts so that they can
be even quicker.

Thanks,

James

Revision history for this message
James Westby (james-w) wrote :
Revision history for this message
Robert Collins (lifeless) wrote :

Different love is needed; lightweight checkouts have never been, and I doubt ever will be suitable for use over high latency connections.

Changed in bzr:
importance: Undecided → Wishlist
status: New → Confirmed
Revision history for this message
John A Meinel (jameinel) wrote :

Can you give the version of bzr you were testing? While I agree with Robert that we probably won't spend a huge amount of time optimizing lightweight checkouts for high-latency connections (consider something more like stacked-branches with minimal history, etc.)

Anyway, there *was* some work done recently by me to change the lightweight checkout fetch to batch things up a bit more.

The change was the 'group_keys_for_io' change, which was around bzr.dev @ 4039.3.7, landed in 4051, which should be in both 1.13 and 1.14.

Though reading the log file, it looks like at least some batching is going on. There are only 91 readv commands issued.

You are also in a somewhat lucky position, in that you have a lot of bandwidth, such as:
26.404 result: ('readv',)
30.315 4352629 body bytes read

^- 3.9s to download 4.15MiB == 1.06MiB/s

But fairly high latency:
26.148 hpss call w/readv: 'readv', '/~ubuntu-core-doc/ubuntu-doc/ubuntu-karmic/.bzr/repository/packs/2115d5e1ee8917c83744105928120b0f.pack'
26.149 13 bytes in readv request
26.232 result: ('readv',)
26.233 221 body bytes read

^- it takes 85ms to get a single request made.
So in that 85ms you can download almost 90KiB.

However, 91 round trips * 0.085s == 7.7s, which doesn't quite account for your 30s difference (not to mention that you are downloading far less content.)

Anyway, the new format should be a bit better about lightweight checkouts, especially with a patch like this:
  http://bzr.arbash-meinel.com/branches/bzr/brisbane/split_pack

(it changes the compressor logic to cluster all 'new' texts into their own set of groups)

Regardless that branch, it would be interesting to see how the new format changes this effect.

Revision history for this message
John A Meinel (jameinel) wrote :

I should also note that the lightweight checkout seems to have to download most, if not *all* of the remote history anyway:

$ (grep "body bytes read" light_checkout.log | sed -e 's/.*[[:space:]]\+\([[:digit:]]\+\).*/\1/' | tr '\n' '+'; echo 0 ) | bc
56879036

$ (grep "byte part read" heavy_checkout.log | sed -e 's/.*[[:space:]]\+\([[:digit:]]\+\).*/\1/' | tr '\n' '+'; echo 0 ) | bc
53278569

So the 'lightweight' checkout is downloading 3MB *more* than the heavy checkout. (My guess is it is reading the remote indexes, but I'm not positive on that.)

I would guess that all the files in ~ubuntu-core-code don't have enough history to actually have a fulltext in their delta-chain (can take up to 200 revisions).

One possible optimization would be a specific RPC for 'iter_files_bytes()' which is the api which gets the actual file content during checkout. A checkout is 93MB, and a tar.gz of just the checkout is 21MB. So I guess a theoretically optimal fetch could download ~21MB instead of 50MB.

However, if the best we could possibly do is only 2x faster than getting everything, I'm not sure it is worth spending a lot of time trying to optimize for the lightweight version.

Note that other codebases would experience different results, I think for 'python' it was 20MB versus 200MB for lightweight checkout versus whole-history, which is a more interesting case.

Jelmer Vernooij (jelmer)
Changed in bzr:
assignee: nobody → Jelmer Vernooij (jelmer)
Vincent Ladeuil (vila)
Changed in bzr:
milestone: none → 2.5b5
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.