upgrade from <RepositoryFormatKnit1> to the newest format eats all my ram

Bug #256757 reported by Ignas Mikalajūnas
2
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Critical
John A Meinel

Bug Description

I got the message:

Format <RepositoryFormatKnit1> for file:///home/ignas/src/schooltool/.bzr/ is deprecated - please use 'bzr upgrade' to get better performance
Repository checkout (format: dirstate-tags)

when running bzr info. So I have followed the advice and did:

bzr upgrade

which ate 1.5 GB of memory, and got killed by kernel...

Format <RepositoryFormatKnit1> for file:///home/ignas/src/schooltool/.bzr/ is deprecated - please use 'bzr upgrade' to get better performance
starting upgrade of file:///home/ignas/src/schooltool/
making backup of tree history
file:///home/ignas/src/schooltool/.bzr has been backed up to file:///home/ignas/src/schooltool/backup.bzr
if conversion fails, you can move this directory back to .bzr
if it succeeds, you can remove this directory if you wish
starting repository conversion
\ [================================================ ] Copying content into repository. 3/4Killed

Is there a way to at least disable the warning that I am getting after every single command?

Related branches

Changed in bzr:
importance: Undecided → Critical
Revision history for this message
Ignas Mikalajūnas (ignas) wrote :

du -sh .bzr
155M

shared repository contains branches of:

https://code.launchpad.net/~schooltool-owners/schooltool/schooltool
https://code.launchpad.net/~schooltool-owners/schooltool/schooltool.lyceum.journal

mostly.

bzr info on schooltool/schooltool gives:

Repository checkout (format: dirstate-tags)
Location:
  repository checkout root: .
        checkout of branch: bzr+ssh://schooltool.org/var/local/bzr/schooltool/schooltool/trunk/
         shared repository: /home/ignas/src/schooltool

Related branches:
    push branch:
  parent branch: /home/ignas/st.bzr/trunk/schooltool
  submit branch: /home/ignas/src/schooltool/schooltool_buildout

Format:
       control: Meta directory format 1
  working tree: Working tree format 4
        branch: Branch format 6
    repository: Knit repository format 1
Server does not understand Bazaar network protocol 3, reconnecting. (Upgrade the server to avoid this.)

In the working tree:
      1276 unchanged
         0 modified
         0 added
         0 removed
         0 renamed
        15 unknown
       453 ignored
       285 versioned subdirectories

Branch history:
      2433 revisions
        20 committers
      1809 days old
   first revision: Fri 2003-09-05 10:36:15 +0000
  latest revision: Thu 2008-07-24 20:08:31 +0300

Repository:
     13042 revisions

shared repository is in ~/src/schooltool/

bzr branch ~/src/schooltool/schooltool ~/src/st_upgrade

eats all my memory too.

Revision history for this message
Ignas Mikalajūnas (ignas) wrote :

ok:

bzr branch ~/src/schooltool/schooltool ~/src/st_upgrade

apparently works, but uses up ~700 mb of ram.

bzr info -v on the newly created branch gives:

Standalone tree (format: dirstate-tags)
Location:
  branch root: .

Related branches:
  parent branch: /home/ignas/src/schooltool/schooltool

Format:
       control: Meta directory format 1
  working tree: Working tree format 4
        branch: Branch format 6
    repository: Knit repository format 1

In the working tree:
      1276 unchanged
         0 modified
         0 added
         0 removed
         0 renamed
         0 unknown
         0 ignored
       285 versioned subdirectories

Branch history:
      2433 revisions
        20 committers
      1809 days old
   first revision: Fri 2003-09-05 10:36:15 +0000
  latest revision: Thu 2008-07-24 20:08:31 +0300

Repository:
      6409 revisions

My guess is that upgrading a shared repository of twice the size uses at least twice as much memory...

Revision history for this message
John A Meinel (jameinel) wrote :

Odd. I just accidentally branched both branches into a pack repository (effectively upgrading them), and I never really saw memory consumption above ~200MB.

Also, I'm a bit surprised about the 155MB number, as if I do both 'schooltool' and 'schooltool.lyceum.journal' I see more like 326 MB.

It *did* take a while to do the copying knit => pack (approx 30min for each branch.)

I'll try again, though, with a proper knit repository.

Revision history for this message
John A Meinel (jameinel) wrote :

So doing a bit more with the local repositories.

Branching pack => knit took ~5min and consumed 1GB of RAM (doing only the 'schooltool' branch.) When finished the knit repo was only 77MB.

Branching knit => pack (from the small repo) took 700MB of RAM, and the result repo was 169MB.

Something really strange is happening here, since 'du --apparent' on the knit repo is only 35MB, I would expect the pack repository to be on that order, not 2x the size of the knit repository.

On the same repo, doing "bzr upgrade" peaked at XXXXMB of ram, took XXXX, and the result was

When I used ^| to see what was happening it was doing:

            elif record.storage_kind == 'fulltext':
                self.add_lines(record.key, parents,
                    split_lines(record.get_bytes_as('fulltext')))

The specific issue is that the source repo was reading texts as lines, putting them together as a single large string, and then splitting them again into lines. Which is.... unfortunate.
At a minimum, it causes a 3x bloat for any given text. What worries me is that it seems to be caching all texts at the same time while doing this (hence the 700 MB number).

I'll try to dig a bit deeper to see where exactly the memory is being cached.

Changed in bzr:
status: New → Triaged
Revision history for this message
John A Meinel (jameinel) wrote :

Sorry, I forgot to fill in my XXX, it took 700MB of RAM, 3minutes, and the final repository was 169MB.

So the results are very similar to "bzr branch knit_repo pack_repo" which is what I expected as it should be using the same logic.

Revision history for this message
John A Meinel (jameinel) wrote :

I've decided to block 1.6-final on fixing this bug.
I'm not strictly concerned about the memory consumption (it is a problem, but not the most serious one). The general problem is the knit => pack fetching code is highly unoptimized, and the resultant pack repository is grossly bloated. And 'bzr pack' doesn't know (yet) how to re-optimize at the text-level. So you end up with a very sub-optimal result, and no way to improve it.

Changed in bzr:
assignee: nobody → jameinel
milestone: none → 1.6
Revision history for this message
John A Meinel (jameinel) wrote :

The attached patch seems to correct a logic bug. Specifically the variable is "fetch_uses_deltas" but the variable passed is "include_delta_closure". Which actually means "and transmit all history and use full-texts". Which "fetch_uses_deltas" seems to have the opposite meaning.

With the 'not' fix, memory consumption drops to 100MB. (Instead of 700+MB).
However, this only applies to bzr.dev, so it shouldn't actually be a regression in 1.6.
Also, the target repository is still grossly bloated (140MB).

If I change the fetch order for pack_repo to also be "topological" then I get the expected target size of 35MB. I'm not as convinced that is the right fix, but it does make the fetch correct.

Ignas, can you give us the 'bzr --version' you are using?

Revision history for this message
John A Meinel (jameinel) wrote :

This *does* effect bzr.1.6 and I have a better fix I'll be publishing.

Changed in bzr:
status: Triaged → Fix Committed
Revision history for this message
John A Meinel (jameinel) wrote :
Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 256757] Re: upgrade from <RepositoryFormatKnit1> to the newest format eats all my ram

On Mon, 2008-08-18 at 18:35 +0000, John A Meinel wrote:
> The attached patch seems to correct a logic bug. Specifically the
> variable is "fetch_uses_deltas" but the variable passed is
> "include_delta_closure". Which actually means "and transmit all history
> and use full-texts". Which "fetch_uses_deltas" seems to have the
> opposite meaning.
>
> With the 'not' fix, memory consumption drops to 100MB. (Instead of 700+MB).
> However, this only applies to bzr.dev, so it shouldn't actually be a regression in 1.6.
> Also, the target repository is still grossly bloated (140MB).
>
> If I change the fetch order for pack_repo to also be "topological" then
> I get the expected target size of 35MB. I'm not as convinced that is the
> right fix, but it does make the fetch correct.

VF shouldn't be converting texts to full-text willy-nilly. It can't with
the uses_deltas set appropriately anyway; it has to have the basis text
locally to be able to convert via a fulltext.

-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 256757] Re: upgrade from <RepositoryFormatKnit1> to the newest format eats all my ram

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> On Mon, 2008-08-18 at 18:35 +0000, John A Meinel wrote:
>> The attached patch seems to correct a logic bug. Specifically the
>> variable is "fetch_uses_deltas" but the variable passed is
>> "include_delta_closure". Which actually means "and transmit all history
>> and use full-texts". Which "fetch_uses_deltas" seems to have the
>> opposite meaning.
>>
>> With the 'not' fix, memory consumption drops to 100MB. (Instead of 700+MB).
>> However, this only applies to bzr.dev, so it shouldn't actually be a regression in 1.6.
>> Also, the target repository is still grossly bloated (140MB).
>>
>> If I change the fetch order for pack_repo to also be "topological" then
>> I get the expected target size of 35MB. I'm not as convinced that is the
>> right fix, but it does make the fetch correct.
>
> VF shouldn't be converting texts to full-text willy-nilly. It can't with
> the uses_deltas set appropriately anyway; it has to have the basis text
> locally to be able to convert via a fulltext.
>
> -Rob

So the specific issue was that I wasn't setting the _use_deltas for all
objects. I was setting it for .texts, but I missed the .inventories. And, as
it turns out with this repo, the .inventories was bloating more than the rest.
 (This repo is about 6k small files, so a large inventory causes more bloat
than extra file fulltexts.)

Which is why my final fix set it to 'unordered'. (Rather than 'unsorted',
because get_record_stream() isn't supposed to know what 'unsorted' means. :)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIqfpLJdeBCYSNAAMRAmiiAJ9qKJ/IwCuWWf5W7i3q5Br/R8s/dgCdFYXQ
fsvtq+YJ15gqFDw++AGTjdg=
=Yml+
-----END PGP SIGNATURE-----

Revision history for this message
Ignas Mikalajūnas (ignas) wrote :

Bzr 1.6rc4 from http://bazaar.launchpad.net/%7Ebzr/bzr/bzr.1.6/

works fine for me now. I Have upgraded the shared repository successfully and it takes up even less space than it did before the upgrade.

Bzr trunk (http://bazaar-ng.org/bzr/bzr.dev/) still can't do the migration, so the problem is still there...

Revision history for this message
John A Meinel (jameinel) wrote :

Correct, I'm planning on merging in the fix from the 1.6 release branch into bzr.dev, and just haven't gotten there yet. It should happen sometime early today.

John A Meinel (jameinel)
Changed in bzr:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.