bzr branch on large projects require vast amounts of memory

Bug #408531 reported by Jan Danielsson
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Medium
John A Meinel

Bug Description

First, see bug 408526. This is the same system, and the following commands are run just after the successful completion of the commit:

$ cd ~/bazaar
$ ulimit -d
524288
$ bzr branch netbsd-5.0 mybranch
bzr: ERROR: exceptions.MemoryError:

Traceback (most recent call last):
  File "/usr/pkg/lib/python2.5/site-packages/bzrlib/commands.py", line 729, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/usr/pkg/lib/python2.5/site-packages/bzrlib/commands.py", line 924, in run_bzr
    ret = run(*run_argv)
  File "/usr/pkg/lib/python2.5/site-packages/bzrlib/commands.py", line 560, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/pkg/lib/python2.5/site-packages/bzrlib/builtins.py", line 1147, in run
    source_branch=br_from)
  File "/usr/pkg/lib/python2.5/site-packages/bzrlib/bzrdir.py", line 1178, in sprout
    result_repo.fetch(source_repository, fetch_spec=fetch_spec)
  File "/usr/pkg/lib/python2.5/site-packages/bzrlib/repository.py", line 1553, in fetch
    find_ghosts=find_ghosts, fetch_spec=fetch_spec)
  File "/usr/pkg/lib/python2.5/site-packages/bzrlib/decorators.py", line 192, in write_locked
    result = unbound(self, *args, **kwargs)
  File "/usr/pkg/lib/python2.5/site-packages/bzrlib/repository.py", line 3139, in fetch
    pb=pb, find_ghosts=find_ghosts)
  File "/usr/pkg/lib/python2.5/site-packages/bzrlib/fetch.py", line 82, in __init__
    self.__fetch()
  File "/usr/pkg/lib/python2.5/site-packages/bzrlib/fetch.py", line 108, in __fetch
    self._fetch_everything_for_search(search)
  File "/usr/pkg/lib/python2.5/site-packages/bzrlib/fetch.py", line 136, in _fetch_everything_for_search
    stream, from_format, [])
  File "/usr/pkg/lib/python2.5/site-packages/bzrlib/repository.py", line 4047, in insert_stream
    return self._locked_insert_stream(stream, src_format, is_resume)
  File "/usr/pkg/lib/python2.5/site-packages/bzrlib/repository.py", line 4089, in _locked_insert_stream
    self.target_repo.chk_bytes.insert_record_stream(substream)
  File "/usr/pkg/lib/python2.5/site-packages/bzrlib/groupcompress.py", line 1369, in insert_record_stream
    for _ in self._insert_record_stream(stream, random_id=False):
  File "/usr/pkg/lib/python2.5/site-packages/bzrlib/groupcompress.py", line 1423, in _insert_record_stream
    for record in stream:
  File "/usr/pkg/lib/python2.5/site-packages/bzrlib/repofmt/groupcompress_repo.py", line 932, in _filter_id_to_entry
    self._chk_id_roots, uninteresting_root_keys):
  File "/usr/pkg/lib/python2.5/site-packages/bzrlib/chk_map.py", line 1440, in iter_interesting_nodes
    bytes = record.get_bytes_as('fulltext')
  File "/usr/pkg/lib/python2.5/site-packages/bzrlib/groupcompress.py", line 419, in get_bytes_as
    self._manager._prepare_for_extract()
  File "/usr/pkg/lib/python2.5/site-packages/bzrlib/groupcompress.py", line 512, in _prepare_for_extract
    self._block._ensure_content(self._last_byte)
  File "/usr/pkg/lib/python2.5/site-packages/bzrlib/groupcompress.py", line 156, in _ensure_content
    self._z_content, num_bytes + _ZLIB_DECOMP_WINDOW)
MemoryError

bzr 1.16.1 on python 2.5.4 (netbsd4)
arguments: ['/usr/pkg/bin/bzr', 'branch', 'netbsd-5.0', 'mybranch']
encoding: '646', fsenc: '646', lang: None
plugins:
  bzrtools /usr/pkg/lib/python2.5/site-packages/bzrlib/plugins/bzrtools [1.16]
  launchpad /usr/pkg/lib/python2.5/site-packages/bzrlib/plugins/launchpad [1.16.1]
  netrc_credential_store /usr/pkg/lib/python2.5/site-packages/bzrlib/plugins/netrc_credential_store [1.16.1]
*** Bazaar has encountered an internal error.
    Please report a bug at https://bugs.launchpad.net/bzr/+filebug
    including this traceback, and a description of what you
    were doing when the error occurred.

If it's running out of memory, it's using more than ~512MB RAM(!).

Tags: memory
Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 408531] [NEW] bzr branch on large projects require vast amounts of memory

How many paths are in tree?
How many commits?
Whats the largest file size?

-Rob

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> How many paths are in tree?
> How many commits?
> Whats the largest file size?
>
> -Rob
>

On IRC he was saying this was the initial commit of the NetBSD tree. So
after doing "cvs co ...", it was "bzr commit" and then push/pull.

It is related to a couple of other bugs where he was:

1) Unable to commit w/ less than 512MB of memory (ulimit 512MB)
2) Unable to branch w/ 512MB of memory.

So for whatever reason, 'bzr branch' was taking more memory than 'bzr
commit'.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkp4RDcACgkQJdeBCYSNAAO7eQCgw0AJg34U3nX2Ko8gvNE8fyAu
ufkAn0CNUJhhs3nIOm2B/lN4R7LUXBKQ
=HWAJ
-----END PGP SIGNATURE-----

Revision history for this message
Jan Danielsson (jan.m.danielsson) wrote :

Robert,

See bug 408526 -- everything I did is logged there. If you run the relevant commands to export the netbsd sources (as documented in that bug report), you'll get exactly what I was using.

But the quick answers are:
$ find . -type d | wc -l
    9336
One commit.
I don't believe there aren't any abnormally large files in the repository.

Revision history for this message
Jan Danielsson (jan.m.danielsson) wrote :

I don't believe there _are_ any abnormally ...

Revision history for this message
Jan Danielsson (jan.m.danielsson) wrote :

I tried updating to bzr 1.17, and there's no change.

Revision history for this message
Jan Danielsson (jan.m.danielsson) wrote :

I've noticed that there are quite a few bug reports about memory usage, and a few of them get marked as dupes, referring to a bug about large files being read into memory.

I'm fairly certain that's not a problem in my case. There are no abnormally large files involved. But there are many files and many subdirectories.

Revision history for this message
Andrew Bennetts (spiv) wrote :

If there are no large files, then I think 2.1.0b3 will help. Hopefully it will approximately halve the memory bzr uses for you. Can you try it out and report the results?

Revision history for this message
John A Meinel (jameinel) wrote :

If this is specifically about large-memory consumption during "bzr branch", this has, indeed, been addressed in bzr-2.1.0b2 (and thus b3 as well).

*A* problem with a bug like "require lots of memory" is that there isn't a clear point when the bug can be considered closed. The old code didn't really grow without bounds, it just had a high bound (say 1GB to branch a Launchpad branch), and the new code has a lower bound (approx 512MB now).

That doesn't let us do the work in say 128MB, but it is *better*. It is a bit hard to give an explicit memory bound for an operation. Lower is always better, but I certainly think that if you are doing an operation on a large amount of data, it is reasonable to expect it to consume more resources (memory, cpu time, etc.)

I don't think there are many remaining "easy-to-trim" memory consumption changes to be made at this point. I'm sure more could be done, but we are certainly into the "effort-vs-benefit" level.

I'm tempted to mark this as fix released in bzr-2.1.0b2 and open a new bug if we want to continue the discussion.

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 408531] Re: bzr branch on large projects require vast amounts of memory

2009/11/18 John A Meinel <email address hidden>:
> *A* problem with a bug like "require lots of memory" is that there isn't
> a clear point when the bug can be considered closed. The old code didn't
> really grow without bounds, it just had a high bound (say 1GB to branch
> a Launchpad branch), and the new code has a lower bound (approx 512MB
> now).

Right.

> I'm tempted to mark this as fix released in bzr-2.1.0b2 and open a new
> bug if we want to continue the discussion.

OK with me.

That has another possibly beneficial effect: you can see if anyone
using >2.1b2 *actually* complains about memory usage, or how many
people do. If nobody, then though it may not be the smallest it could
be, it would seem a low priority.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
John A Meinel (jameinel) wrote :

Technically fixed in 2.1.0b2, but it isn't worth re-opening the milestone for just this.

Changed in bzr:
assignee: nobody → John A Meinel (jameinel)
importance: Undecided → Medium
milestone: none → 2.1.0b4
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.