autopacking should be optional
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Bazaar |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Autopacking is a sensible default, but is not suitable for all use cases.
I maintain a pack-0.92 repository which stores revision data for a number of unrelated projects. One of the projects, which is primarily composed of a large number of binary files, has accumulated a revision history of about 2GB. Consequently, autopacking can result in moving large amounts of data around at unpredictable times. The situation is made worse by the fact that the repository is located on a file server with limited bandwidth; when the autopack operation kicks in, it can take 5 minutes to do a commit.
I would very much like to be able to disable autopacking, and run "bzr pack" as part of a regular maintenance script.
As a side note, I noticed that the 2GB autopack operation can kick in even when touching one of the smaller projects. My intuition may be wrong here, but that doesn't really feel right to me... is there any benefit to be had from packing unrelated projects into the same glob?
Changed in bzr: | |
status: | New → Invalid |
On Sat, 2008-01-19 at 01:13 +0000, Paul Pelzl wrote:
> Public bug reported:
>
> Autopacking is a sensible default, but is not suitable for all use
> cases.
>
> I maintain a pack-0.92 repository which stores revision data for a
> number of unrelated projects. One of the projects, which is primarily
> composed of a large number of binary files, has accumulated a revision
> history of about 2GB. Consequently, autopacking can result in moving
> large amounts of data around at unpredictable times. The situation is
> made worse by the fact that the repository is located on a file server
> with limited bandwidth; when the autopack operation kicks in, it can
> take 5 minutes to do a commit.
This happens at expontially backing off occasions based on the number of
revisions in a pack; adding size to the pack is something I considered
but couldn't get a satisfactory model for the various tradeoffs.
An important thing to note is that the 2GB of history will /not/ be
moved by many autopacks.
Specifically, your 2GB of data will be moved once from a single pack to
a 10-pack, then once from a 10-pack to a 100-pack, and once from a
100-pack to a 1000-pack. (In other words any one piece of data is moved
log10(commits since it was introduced).
5 minutes seems like a long time to do the autopack of a smaller amount
of data; what protocol are you using? If you are using bzr+ssh the
autopack will occur on the server itself with no network bandwidth use.
~/.bzr.log will have details on which packs were combined by autopack.
> I would very much like to be able to disable autopacking, and run "bzr
> pack" as part of a regular maintenance script.
I have no objection to this. You could just run bzr pack as part of a
regular maintenance script: this will cause autopack to do nothing as
the repository is already more tightly packed than autopack aims for so
it will do nothing.
> As a side note, I noticed that the 2GB autopack operation can kick in
> even when touching one of the smaller projects. My intuition may be
> wrong here, but that doesn't really feel right to me... is there any
> benefit to be had from packing unrelated projects into the same glob?
Yes. You can't tell whats in a pack without reading the index, so if you
had (say) 16000 projects you would have to read up to 16000 indices to
find a given revision. Detecting 'unrelated' requires total-history
analysis, so its much more IO expensive to keep projects separate, and
there is little benefit in having them separate as you have efficient
access within any given single index: less indices is better than more,
regardless of project count.
-Rob
-- www.robertcolli ns.net/ keys.txt>.
GPG key available at: <http://