[master] bzr holds whole files in memory; raises MemoryError on large files

Bug #109114 reported by Martin Pool
460
This bug affects 60 people
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
High
Unassigned
Declined for 0.90 by Robert Collins
Declined for 1.8 by Robert Collins
Breezy
Triaged
Medium
Unassigned

Bug Description

Bazaar reads each source file completely into memory when committing, which means files that are comparable in size to the machine's vm can't be committed. This currently gives a MemoryError.

We also store too many copies in memory. This is being worked on by John. However even when fixed we'll still be limited by size. Fixing this bug requires some sort of fragmentation capacity when large files are detected.

Related branches

Martin Pool (mbp)
Changed in bzr:
importance: Undecided → Medium
status: Unconfirmed → Confirmed
Revision history for this message
Niclas Lindgren (niclas-lindgren) wrote :
Download full text (4.2 KiB)

I think the file doesn't have to comparable to the VM, I have a 300MB binary file and python is using up 2GB before crashing due to out of memory.

The reason this is a problem is that we have stored lots of graphics in CVS today

bzr: ERROR: exceptions.MemoryError:

Traceback (most recent call last):
  File "C:\Program Files\PyGTK\Python\Lib\site-packages\bzrlib\commands.py", line 817, in run_bzr_catch_errors
    return run_bzr(argv)
  File "C:\Program Files\PyGTK\Python\Lib\site-packages\bzrlib\commands.py", line 779, in run_bzr
    ret = run(*run_argv)
  File "C:\Program Files\PyGTK\Python\Lib\site-packages\bzrlib\commands.py", line 477, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "C:\Program Files\PyGTK\Python\Lib\site-packages\bzrlib\builtins.py", line 2283, in run
    reporter=reporter, revprops=properties)
  File "C:\Program Files\PyGTK\Python\Lib\site-packages\bzrlib\decorators.py", line 165, in write_locked
    return unbound(self, *args, **kwargs)
  File "C:\Program Files\PyGTK\Python\Lib\site-packages\bzrlib\workingtree_4.py", line 246, in commit
    result = WorkingTree3.commit(self, message, revprops, *args, **kwargs)
  File "C:\Program Files\PyGTK\Python\Lib\site-packages\bzrlib\decorators.py", line 165, in write_locked
    return unbound(self, *args, **kwargs)
  File "C:\Program Files\PyGTK\Python\Lib\site-packages\bzrlib\mutabletree.py", line 207, in commit
    revprops=revprops, *args, **kwargs)
  File "C:\Program Files\PyGTK\Python\Lib\site-packages\bzrlib\commit.py", line 300, in commit
    self._update_builder_with_changes()
  File "C:\Program Files\PyGTK\Python\Lib\site-packages\bzrlib\commit.py", line 607, in _update_builder_with_changes
    self._populate_from_inventory(specific_files)
  File "C:\Program Files\PyGTK\Python\Lib\site-packages\bzrlib\commit.py", line 679, in _populate_from_inventory
    parent_id, definitely_changed, existing_ie)
  File "C:\Program Files\PyGTK\Python\Lib\site-packages\bzrlib\commit.py", line 731, in _record_entry
    path, self.work_tree)
  File "C:\Program Files\PyGTK\Python\Lib\site-packages\bzrlib\repository.py", line 2133, in record_entry_contents
    ie.snapshot(self._new_revision_id, path, previous_entries, tree, self)
  File "C:\Program Files\PyGTK\Python\Lib\site-packages\bzrlib\inventory.py", line 438, in snapshot
    work_tree, commit_builder)
  File "C:\Program Files\PyGTK\Python\Lib\site-packages\bzrlib\inventory.py", line 453, in _snapshot_into_revision
    self._snapshot_text(previous_entries, work_tree, commit_builder)
  File "C:\Program Files\PyGTK\Python\Lib\site-packages\bzrlib\inventory.py", line 724, in _snapshot_text
    self.file_id, file_parents, get_content_byte_lines, self.text_sha1, self.text_size)
  File "C:\Program Files\PyGTK\Python\Lib\site-packages\bzrlib\repository.py", line 2180, in modified_file_text
    self._add_text_to_weave(file_id, new_lines, file_parents.keys())
  File "C:\Program Files\PyGTK\Python\Lib\site-packages\bzrlib\repository.py", line 2196, in _add_text_to_weave
    versionedfile.add_lines(self._new_revision_id, parents, new_lines)
  File "C:\Program Files\PyGTK\Python\Lib\site-packages\bzrlib\versionedfile.py", line 148...

Read more...

Revision history for this message
Martin Pool (mbp) wrote : robert writes

I think we all agree we should handle these situations better.

If you have some python skills I'd be delighted to guide you through
addressing this error.

Basically I think the right approach is to:
 * try the fast path
 * catch MemoryError
 * fallback to a slower file-based approach

This will work for most cases, and will address the number of copies
problem substantially, but we may still fall down on merge, which is
somewhat trickier to reduce memory usage on.

Revision history for this message
John A Meinel (jameinel) wrote : Re: commit holds whole files in memory

I just wanted to add something to this bug.

On IRC, Treeform pointed me to a text file which was only 80MB in size, but had 2.6M lines.

When trying to commit, I let it get to about 700MB (I only have 1GB of ram) before I kill it because it makes my machine unusable.

I'll try to run it on another machine and give the final peak vmstat.

But at a minimum, we seem to have almost 10 copies of it in RAM. We've argued about needing 3 for merge, but commit should not need 10. Especially since this is a plain "bzr init; bzr add; bzr commit". I could understand having 2 copies (say it is a string and we are splitting it into a list, etc).

But for "init; add; commit" we don't even need to run diff.

Revision history for this message
Treeform (starplant) wrote :

There is many huge files like this SQL database my case was 3D model pack. I think there should be a way to specify to treat binary files differently simple copy type thing.

It ran only using up 350mb on my machine after i updated before it was around 1Gig. 80mb -> 350mb is not good. But i also did not expect it to parse a 2million line file but do expect to have some sort of a binary switch.

Revision history for this message
Aaron Bentley (abentley) wrote : Re: [Bug 109114] Re: commit holds whole files in memory

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Treeform wrote:
> There is many huge files like this SQL database my case was 3D model
> pack. I think there should be a way to specify to treat binary files
> differently simple copy type thing.

If analysis shows there's no other way, I would support that. But I
would rather find out why this is happening, and fix it for all files,
not only binary ones.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHaDM90F+nu1YWqI0RArPNAJ99f5LliTRXD3EvzyxXmQm4nQSpbwCfUaVT
mKR0+yOmxgVobKEzy3RfgoY=
=jhR7
-----END PGP SIGNATURE-----

Revision history for this message
Emilis Dambauskas (emilis-d) wrote : Re: commit holds whole files in memory

Hi,

I was trying to fix bug #109115, which is related to this one and found out a couple of things:

1. If you need to reproduce this bug (#109114), but are too lazy to wait for GB sized files to be processed, you can limit memory for the bzr process and use MB sized files instead. See the bash script testcase_109114.sh (attached).

2. It is interesting to note that when you add & commit one big file, bzr needs 1x amount of memory for:
    lines = tree.get_file(ie.file_id, path).readlines()
and then ~4x more memory when calling
    self._add_text_to_weave(...)
in bzrlib/repository.py CommitBuilder.record_entry_contents() .

Revision history for this message
Gustavo Rahal (gustavo-grahal) wrote :

Added the same comments to bug 269125, but noticed this one explains the actual problem so wanted to record my use case here...

I tried to commit a 270MB tar.gz. I didn't get to the point of an exception but I noticed that memory consumption when running "top" did not stop growing (I stopped at about 60% of a 1GB total ram)

It's probably not the most common use case but something to consider. I personally use git to backup important documents and wanted to try bzr.

Revision history for this message
Michael Nagel (nailor) wrote :

i have been hit by this bug too. some ~40MB binary files cause bzr to suck up 1.5 gigabytes of ram and then die.

adding such files should work in the first place, but a somewhat nicer message ( *mentioning the file causing the trouble* , so you can ignore it as a workaround) would be really nice!

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 109114] Re: commit holds whole files in memory

On Thu, 2008-10-30 at 09:28 +0000, Michael Nagel wrote:
> i have been hit by this bug too. some ~40MB binary files cause bzr to
> suck up 1.5 gigabytes of ram and then die.
>
> adding such files should work in the first place, but a somewhat nicer
> message ( *mentioning the file causing the trouble* , so you can ignore
> it as a workaround) would be really nice!

Thats a reasonable point, though in fact we should be able to use a temp
file on disk to work around memory limits for this - and we have longer
term facilities being designed at the moment to handle extremely large
files gracefully.

-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Revision history for this message
Lionel Dricot (ploum-deactivatedaccount) wrote : Re: commit holds whole files in memory

Confirming that it stills the case in latest version. Trying to commit ~30Mo binary file resulting in the attached error.

Revision history for this message
Lionel Dricot (ploum-deactivatedaccount) wrote :

I want to add that I would consider this bug as extremely severe : nearly all users will, one day or another, be impacted and it makes bzr unusable for a lot of project but without letting you know when you start the project. One day, you have to add a big file and you realize that you can't. And you are in deep troubles...

Revision history for this message
John A Meinel (jameinel) wrote :

Just trying to add a little bit of debugging to this issue.

1) File.readlines() has a certain amount of memory overhead versus File.read().

To generate a 'large' file, I grabbed a dump from something else, and then copied it until I had a 100MB file. This meant I had a pure text file, but it also meant that the average line width is only 21 bytes.

A 'str' object in python has 24 bytes of python object overhead. (Not to mention any waste do to the allocator, etc.)

So for my particular test case, I have a 108MiB file, with 5.1Mi lines. Because we use 'readlines()' we end up with:
  a) 21MiB list object (a list with 5Mi * 4bytes of references), note that on 64-bit this would double to 41MiB because each ref is 8 bytes
  b) 5.1Mi*24 bytes string overhead, or 122MiB of python object overhead. Again, on 64-bit a 'str' object goes up to around 40 bytes each, which would be 204.6MiB (or ~ 2x the size of the *content*)
  c) 108MiB actual file content

2) KnitVersionedFile._add

   a) The first thing this function does is:
              line_bytes = ''.join(lines)
      which obviously doubles memory consumption right there. Well, in my case
      it is 50% bigger, because a single string removes all of the per-string
      overhead.

   b) After a bit, it then checks to see if the content ends in a newline. If
      it *doesn't* it then:
            if lines[-1][-1] != '\n':
                # copy the contents of lines.
                lines = lines[:]
                options.append('no-eol')
                lines[-1] = lines[-1] + '\n'
                line_bytes += '\n'
      This creates a new 'list' object (in my case, this is 21MiB), but it
      *also* generates a new string with " += '\n'", which will again have 2
      copies in memory.
      So we now have 2 large list objects, 1 copy of the text split across many
      str objects, and 2 copies as large string objects. (Note that it will
      quickly drop, since we are replacing the original string.)

   c) We then call '_record_to_data' which does:
        bytes = ''.join(chain(
            ["version %s %d %s\n" % (key[-1],
                                     len(lines),
                                     digest)],
            dense_lines or lines,
            ["end %s\n" % key[-1]]))
      This uses 'dense_lines or lines', so we shouldn't end up with an extra
      large list, but it does mean that we have yet one-more copy of the file
      content.

So as I can see, there is a minimum of 3 copies of the content in memory, not
to mention a bit of overhead here and there.

I'll poke around a bit and see if I can't get rid of some of those internal
copies. We may decide to enable that code path only when the length of the file
is large, to avoid slowing down when committing lots of small content.

description: updated
Revision history for this message
Martin Pool (mbp) wrote :

For the sake of reducing duplicates, and until at least the commit case is fixed, I'm going to mark this bug the master and point others to it. It's true there are other operations that can fail beyond commit. Having only commit work is not very well fixed if you can't get the changes out again.

summary: - commit holds whole files in memory
+ [master] commit holds whole files in memory
summary: - [master] commit holds whole files in memory
+ [master] bzr holds whole files in memory; raises MemoryError on large
+ files
Revision history for this message
Martin Pool (mbp) wrote : Re: [master] commit holds whole files in memory

See also bug 54624 that we should at least warn people that adding large files may be unintended and/or fail

Revision history for this message
John A Meinel (jameinel) wrote :

So the two branches I worked on should land in bzr.dev. This brings the peak memory consumption for commit w/ into a --2a format repository down to 1x the file size + 2x the size of the compressed bytes + whatever bzrlib etc overhead. (Down from somewhere around 4-5x for commits into 'pack-0.92' repositories.)

Revision history for this message
Andrea Corbellini (andrea.corbellini) wrote :

Oops, I pressed the wrong button.

Changed in bzr:
assignee: nobody → Andrea Corbellini (andrea-bs)
status: Confirmed → In Progress
assignee: Andrea Corbellini (andrea-bs) → John A Meinel (jameinel)
Changed in bzr:
assignee: John A Meinel (jameinel) → nobody
status: In Progress → Triaged
Revision history for this message
Robert Collins (lifeless) wrote :

Making wishlist: there are other wishlist bugs that will have more impact on more of our users than this one.

Changed in bzr:
importance: Medium → Wishlist
Martin Pool (mbp)
Changed in bzr:
status: Triaged → Confirmed
Revision history for this message
Jan Danielsson (jan.m.danielsson) wrote :

I may want to protest that my bug report has been marked a dupe, and that it now refers to this bug. Or I want someone to tell me that they in fact are two sides of the same coin. :)

In my bug the problem was related to very many small files/directories. But this bug specifically mentions: "Fixing this bug requires some sort of fragmentation capacity when large files are detected.". No large files are involved in my case.

In my original bug-report I included an entire user session which could be used to replicate the problem. It entails checking out the netbsd source code, then runing "bzr init", "bzr add" and "bzr commit". If you check the source tree, you'll find no large files at all. But there are _many_ directories and files.

With this information in minde -- is the duplification still valid?

Revision history for this message
Martin Pool (mbp) wrote :

Jan, you're right, that does sound different, please dedupe it. (You should have permission - otherwise tell me the bug number.)

Revision history for this message
Andrew Bennetts (spiv) wrote :

Jan's report is bug 408526. I've deduped.

Revision history for this message
ANelson (anelson) wrote :

This issue is preventing me from using bazaar at all.

My team of Windows developers have an SVN repos with ~24000 revisions, ~50k files, ~25GB repository size. I'm trying to migrate it to bzr with 'bzr svn-import'. This keeps running out of memory on my 64-bit Win7 box with 12GB of RAM.

Part of the problem is the fact that the Windows binaries are always 32-bit, so have a maximum address space of 4GB when run under a 64-bit OS (2GB under 32-bit Windows). However, jelmer on #bzr irc suggested I report this issue under this bug, since the primary reason it's a problem is bzr's seemingly insatiable thirst for memory.

To reproduce my problem, I've created a simple SVN repos consisting of a single large (100MB) binary file, and one revision to that file which extends it to 250MB. This simulates the actual revision in our production repository that we are unable to convert. If you download this repos and try to import it on a Windows box with the bzr 2.1.0 binaries installed, you will find you run out of memory. I've compressed the SVN repository files into a 200MB 7-zip archive: http://apocryph.org/stuff/repos.7z

If I was having this problem with a 1GB+ file, I'd be a bit more understanding, but supposedly bzr requires about 3x the size of a file, which in my case would mean 750MB; clearly that's not the case since memory is being exhausted.

This memory usage issue combined with the 32-bit binaries is preventing me from using bzr. I really don't want to be stuck in merge hell with svn, but with both Mercurial and bzr suffering the same problem, it doesn't look good for me.

Is there any way this issue can be upgraded from 'wishlist' to something higher?

Revision history for this message
ANelson (anelson) wrote :

One more data point to (hopefully) motivate you to make this a higher priority, I decided to try to migrate my production SVN repository under a 64-bit Ubuntu box with 2GB of RAM, since that's probably closer to what you guys test with.

It hasn't run out of memory yet, but it's on revision ~7000 out of ~30,000, and the bzr process is already using 2698m of virtual memory, and 1.4g of resources, according to top. My box is at ~95% physical memory utilization, along with 20% of the 6GB swap file, and climbing. At this point the paging is so bad that bzr has slowed to a crawl so I'll have to kill it, but if I didn't I suspect it would slowly and inexorably consume ever more memory.

While my svn repos does have some large (100-400MB) binary files, the vast majority of the files, and the commits, are either source files or small binary files like import libraries or icons. At this point I have no viable way to migrate my svn repos to bzr, meaning I can't use it.

Somebody please help! I don't particularly like svn but I'm stuck with it until bzr or mercurial overcome this resource limitation.

Revision history for this message
Martin Pool (mbp) wrote :

Hi ANelson,

It may help move this along if you can install the Meliae memory debugger and create a dump file attached to this bug showing memory usage when it's bogging down. See http://jam-bazaar.blogspot.com/2009/11/memory-debugging-with-meliae.html

tags: added: performance
Changed in bzr:
importance: Wishlist → Medium
Revision history for this message
Christian Zambrano (czambran) wrote :

Martin,

I am experiencing a similar problem and I was hoping to provide the info you requested but when I tried to use meliae I got the following error which I will report:

*** OverflowError: long int too large to convert to int

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 109114] Re: [master] bzr holds whole files in memory; raises MemoryError on large files

On 25 April 2010 14:34, Christian Zambrano <email address hidden> wrote:
> Martin,
>
> I am experiencing a similar problem and I was hoping to provide the info
> you requested but when I tried to use meliae I got the following error
> which I will report:
>
> *** OverflowError: long int too large to convert to int

Can you please file a separate bug about that, including a backtrace
if possible.
--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Greg (gregspecialsource) wrote :

Is this issue still not resolved after 3 years?
I am trying to add and commit a 600mb data file and getting bzr out of memory.
Running Windows 7, 64bit, 6GB RAM, bzrlib 2.1.1 from Bazaar Explorer.

Revision history for this message
Andrew Voznytsa (andrew-voznytsa) wrote :

Yes, I did not see any noticeable movement with this issue. Time to move to
git?

On Fri, Jun 25, 2010 at 9:32 AM, Greg <email address hidden> wrote:

> Is this issue still not resolved after 3 years?
> I am trying to add and commit a 600mb data file and getting bzr out of
> memory.
> Running Windows 7, 64bit, 6GB RAM, bzrlib 2.1.1 from Bazaar Explorer.
>
> --
> [master] bzr holds whole files in memory; raises MemoryError on large files
> https://bugs.launchpad.net/bugs/109114
> You received this bug notification because you are a direct subscriber
> of a duplicate bug.
>

Revision history for this message
Joerg Lange (jcl-gmx) wrote :

I am also waiting for a resolution of this issue since a couple of
years. For me, this is still a showstopper to use bazaar. Is it not possible
to provide a workaround and to resolve this issue later correctly?

However, I can completely understand that it's not a "normal" use of a VCS
to commit files with hundreds of megabytes in size.

On Fri, Jun 25, 2010 at 11:43, Andrew Voznytsa <email address hidden>wrote:

> Yes, I did not see any noticeable movement with this issue. Time to move to
> git?
>
> On Fri, Jun 25, 2010 at 9:32 AM, Greg <email address hidden> wrote:
>
> > Is this issue still not resolved after 3 years?
> > I am trying to add and commit a 600mb data file and getting bzr out of
> > memory.
> > Running Windows 7, 64bit, 6GB RAM, bzrlib 2.1.1 from Bazaar Explorer.
> >
> > --
> > [master] bzr holds whole files in memory; raises MemoryError on large
> files
> > https://bugs.launchpad.net/bugs/109114
> > You received this bug notification because you are a direct subscriber
> > of a duplicate bug.
> >
>
> --
> [master] bzr holds whole files in memory; raises MemoryError on large files
> https://bugs.launchpad.net/bugs/109114
> You received this bug notification because you are a direct subscriber
> of a duplicate bug.
>
> Status in Bazaar Version Control System: Confirmed
>
> Bug description:
> Bazaar reads each source file completely into memory when committing, which
> means files that are comparable in size to the machine's vm can't be
> committed. This currently gives a MemoryError.
>
> We also store too many copies in memory. This is being worked on by John.
> However even when fixed we'll still be limited by size. Fixing this bug
> requires some sort of fragmentation capacity when large files are detected.
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/bzr/+bug/109114/+subscribe
>

Revision history for this message
Martin Pool (mbp) wrote :

Memory usage has improved quite a bit in 2.2beta so you might like to
test that. Versioning files of hundreds of MB on a machine with a few
GB should now be quite feasible, but we still don't consider it as a
core use case. bzr is for versioning source, not ISOs.

--
Martin

Revision history for this message
Jorgen Bodde (jorgb) wrote : Re: [Bug 109114] Re: [master] bzr holds whole files in memory; raises MemoryError on large files

I do not see why a version system should dictate what the "core usage" of
a tool is. I work in forensic software development and it's very normal to
attach test files that are hundreds of MB's to a gigabyte if it is really
needed for the software.

BZR works great, I use it with great pleasure but I am always using it
with caution because I know of the memory limitation. This should not be
the case. If all files are "kept" in memory before storing couldn't you
just make a memory file on disk and take the slower check in for granted?

Another use case I have for BZR if it is taking large mem files is
versioning my personal data and media. Because BZR conveniently only keeps
one .BZR folder all my personal files are not cluttered. With one bzr
command the missing files can be added, and the deleted files removed. I
love that. And I will store the repository on a USB disk to keep a
versioned backup. But indeed, some files fail because they are too big.

With regards,
- Jorgen

On Mon, 28 Jun 2010 06:35:03 -0000, Martin Pool <email address hidden>
wrote:
> Memory usage has improved quite a bit in 2.2beta so you might like to
> test that. Versioning files of hundreds of MB on a machine with a few
> GB should now be quite feasible, but we still don't consider it as a
> core use case. bzr is for versioning source, not ISOs.
>
> --
> Martin

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 109114] Re: [master] bzr holds whole files in memory; raises MemoryError on large files

its a matter of priorities. As a technical challenge we'd love to
support versioning multi GB files; we have discussed how (sharding the
files), but its not on the current roadmap for Canonical sponsored
work. If someone stepped up to do it, I'm sure the core team would
happily articulate the constraints that would apply to the system -
but its not a trivial bit of work, and there are lots of interactions
with networking and performance (of more regularly sized files) to
consider.

-Rob

Revision history for this message
Jorgen Bodde (jorgb) wrote : Re: [Bug 109114] Re: [master] bzr holds whole files in memory; raises MemoryError on large files

If you state it like that, I truely understand why it's not fixed yet. It
is great that Canonical sponsors the work done on BZR, and if the
priorities for huge files is not on their list (right now), then so be it.

However, a lot of people pick up BZR and want to use it professionally
because it looks and feels as a mature product. Then they are set back if
they reach this limitation and are forced to look for other tools. As
subversion administrator at work I would love to also introduce BZR as
alternative but as long as this limitation is in place I can't..

So, I do understand that Cananical does not consider it as a high priority
to put on the roadmap, but as the tool matures and is used more and more,
it needs to be addressed at some time.

In any case, all the work done on BZR is greatly appreciated by me. It is
by far the easiest and most likeable version control tool around.

With regards,
- Jorgen

Revision history for this message
Goldorak (yaneric-roussel) wrote :

Even if that bug is not a priority, there should be at least a warning in the commit dialog if a file is too large for Bazaar.

This bug affects me even if I did not wanted to add a large file...

* I did a commit of a large file by mistake.
* I did an uncommit.
* From that point, I was unable to do new commit (out of memory)
* pack seems to be the problem
* After that Bazaar refused all my attemps to recover (pack always return out of memory)
  ** I updated from 2.1.1 to 2.2.0 -> no difference
  ** I did a new branch (not binded) at the previous revision, pack, push -> out of memory
  ** Even worse, another developper on another branch (same parent) got this error when attempting to do a commit (share repository)
  ** I took the backup version of yesterday (the main shared repo) and I did a new binded branch in my locally shared repos I got again the same error...

It looks like I will have to:
 * takes the backup for the server
 * reinit a new share repos on my local computer
 * redo manually all the commit of the day.

Setup:
 * share repo available from a networked windows share
 * binded branch on local developpers computer. (with a share repo also locally)

Revision history for this message
Goldorak (yaneric-roussel) wrote :

More information for previous post:

In fact, after the commit of the large file, I was able to do 2 or 3 small commit. After that I was not able to do more commit. I did uncommit 2 or 3 times until the large file was uncomitted.

I also tried to use pack --clean-obsolete-packs (out of memory)

Is there a way to put the repo in good shape?

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 109114] Re: [master] bzr holds whole files in memory; raises MemoryError on large files

@Goldorak, You should be able to branch into a fresh repository from
the commit just before this occurred, and that will not copy the large
files. If you have multiple branches in the repository, branch each
of them across. Once you're happy with the result, move away the
repository with unreasonably large files and put the new one in its
place.

Revision history for this message
Vernon Cole (vernondcole) wrote :

Implementation suggestion:
  (I have been bitten by this bug twice, now, and have spent some time thinking about it.)
Large files are almost never source files, but Bazaar is only designed to handle source -- and compares files as if they were source - byte by byte - for changes. Large files are usually compressed media or compressed databases which can't be 'diffed' and probably can't be compressed. So:

1) Assign some arbitrary size value beyond which files are considered "large". This limit will be a matter of some debate and will probable have to be user-settable, though 99% of users would never touch it.

2) When a file is beyond that size, do not attempt to compress or examine it. Simply make a fingerprint or CRC of some kind, and record that, along with a (local) version of the modification datetime. If the modification time changes, check the fingerprint for a mismatch. If there is a mismatch, copy the file as is into the .bzr tree. (Supply an option for max number of copies to retain.)

3) Using this tool, restore the video of President Mandela explaining the meaning of the term "ubuntu" to the distribution.

I suggest that this would be fairly easy for a member of the team, but very difficult for an outsider.
--
Vernon

Revision history for this message
Greg (gregspecialsource) wrote :

Any work around implementation that allows critical (large) binary files to be stored together with source code would be a massive improvement to usability.

The alternative is to force users to store critical files in two locations as well as manage the separation of them (ie. ignoring portions of the file set between each versioning or archiving system).

It is highly desirable for a developer to 'pull' a whole build-able file-set in one go. The kinds of projects I work on have anywhere from MBs to GBs of critical binary files.

If we lose quality compression or Diff on large files, so be it, that's pretty much how the old VCS worked anyway. That behavior can be improved in future.

So please, (wonderful BZR developers!) make it work now, and make it great later.

Martin Pool (mbp)
Changed in bzr:
status: Confirmed → In Progress
importance: Medium → High
assignee: nobody → Martin Pool (mbp)
Revision history for this message
Martin Pool (mbp) wrote :

Shannon recently added a feature to bzr2.5 so bzr will not implicitly add large files (20MB by default): https://code.launchpad.net/~weyrick/bzr/54624-warn-on-large-files/+merge/70691

I'm working on one specific case in https://bugs.launchpad.net/bzr/+bug/890085

Martin Pool (mbp)
Changed in bzr:
assignee: Martin Pool (mbp) → nobody
Vincent Ladeuil (vila)
Changed in bzr:
status: In Progress → Confirmed
Revision history for this message
alod (aleksander-lodwich) wrote :

Just recently (today) I have tried to add a 300 MiB binary file to bzr and it failed despite plenty of RAM. So the bug(s) is still in 2.6b. This is a very serious bug. What confuses me ist that I didn't see all the RAM to be used. I received an "out of memory" anyway.

Dist: 2.6b1 Win 7, 32Bit, standard install pack with everything in it

Jelmer Vernooij (jelmer)
Changed in brz:
status: New → Triaged
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.