Memory resource consumption issues prevent branch of very large svn repositories

Bug #191731 reported by Rexbard
140
This bug affects 16 people
Affects Status Importance Assigned to Milestone
Bazaar Subversion Plugin
Triaged
High
Unassigned
bzr-svn (Ubuntu)
Triaged
Medium
Unassigned

Bug Description

I'm attempting to use bzr-svn to branch an SVN repository, which was once a CVS repository. There are over 23000 revisions in the SVN "branch". When left running, "bzr branch svn+http://..." dies after a few hours with either a generic "Killed" output, or a core dump in a somewhat random part of the bzr code. This is after consuming 3+Gb of memory, according to pmem.

I do not think this is the same as Bug #54253, as I am running with Subversion patch pointed to as the fix by Bug #54253: http://svn.collab.net/viewvc/svn?view=rev&revision=28544. That modification allows the branch to progress much farther, but it still eventually dies.

I do not have much information at this time, but am able to reliably reproduce it on our closed system, if someone can tell me what they'd like to see. The main content of the .bzr.log is minor variations of the following:

     added revision id {svn-v3-none:...}
     Auto-packing repository <bzrlib.repofmt.pack_repo.RepositoryPackCollection object at ...>, which has NN pack files, containing NNNN revisions in to XX packs.
     u'branches/BASELINE-COMPATIBLE' copied from 'trunk':svn-v3-none:...
     unsupported dir property 'svn:ignore'
     The following two lines are reported many, many times:
          unsupported file property 'svn:keywords'
          unsupported file property 'svn:eol-style'

In an attempt to get what I need now, I did an initial -r1 branch, followed by consecutive pulls of 250 revisions. i.e.:

   bzr branch -r1 svn+http://url
   bzr pull -r250 --remember "svn+http://url"
   bzr pull -r500
   bzr pull -r750
   ...

Even with only steps of 250 revisions, the pull consumption is approaching 2Gb at times (currently 6000 revisions pulled in this manner).

System Setup:

Red Hat Enterprise Linux Server release 5 (Tikanga)
Bazaar (bzr) 1.2.0.dev.0
  Python interpreter: /usr/bin/python 2.4.3.final.0
  Python standard library: /usr/lib64/python2.4
bzr-svn 0.4.7
Subversion 1.4.6
  + subversion-1.4.0-metze-python-bindings.patch
     <http://samba.org/~metze/subversion-1.4.0-metze-python-bindings.patch>
  + subversion-1.4.6-r28544-memory-fix.patch
     <http://svn.collab.net/viewvc/svn?view=rev&revision=28544>

The SVN repository has the "trunk, branches, tags" structure. bzr-svn branch dies whether trying to branch the whole repository or just one svn branch.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

I can't reproduce this. Are you sure the memory leak patch is applied correctly?

Changed in bzr-svn:
status: New → Incomplete
Revision history for this message
Rexbard (john-klinger) wrote :

I checked the application of the memory leak patch before posting this bug, but there is always the possibility I missed something. So, I deleted svn, the svn libraries, the svn python bindings, bazaar and all its python plugins, and the ~/.bazaar home directory. I then rebuilt everything from scratch, with the metze patch and the one patch to the subversion bindings (r28544). Same results.

I then tried that same thing again, this time adding the "ugly workaround" that Lukáš Lalinský mentioned in Bug #54253, that patches __init__.py and logwalker.py patches. Still no joy.

It appears to get worse the further into the revisions. The last successful pull was "bzr pull -r9800". Attempting to then perform a "bzr pull -r10000" consumes 2Gb within 2 minutes. The last entry in the log when I killed the process was:

     Auto-packing repository <bzrlib.repofmt.pack_repo.RepositoryPackCollection object at 0x2aaab39dbdd0>, which has 37 pack files, containing 10000 revisions into 1 packs.

I just performed a pull of one additional revision: "bzr pull -r9801". Phase "0/2" completed in 4 minutes, peaking at about 1.1Gb of memory. Phase "1/2" completed within the next minute, not consuming much additional memory. There was only one modified file in that revision; no "auto-packing" output was in the logfile.

Revision history for this message
Rexbard (john-klinger) wrote :

This may be a bzr issue instead of a bzr-svn.

The recently released bzr 1.2 (14 Feb 08) no longer consumes memory quite so readily. I cannot tell if this means some of the memory issues were fixed in bzr1.2, or if bzr1.2 is just not calling a problem routine in bzr-svn/svn bindings as often.

The good news is that this may make it close to usable. I started with a new "branch -r1" into a new repository, then performed a 1000 revision pull. Previously, this would run out of memory after running for several hours. This time, it used only 300Mb of memory and took a shiny 6 minutes to complete.

A subsequent pull of 3000 more revisions, "bzr pull -r4000", topped out at 925Mb and took 30 minutes. Not great, but much better than 3Gb over many hours for a 250 revision pull when using bzr1.1.

I'll try a full branch of all 22k+ revisions when I return from the weekend and let you know how things go.

Revision history for this message
Rexbard (john-klinger) wrote :
Download full text (5.0 KiB)

Tried a branch of all revisions, "bzr branch svn+http://...", using bzr 1.2 and it still ran out of memory after 3.5 hrs and close to 9200 revision_ids were added. However, as I mentioned earlier, there is a workaround, as I am able to create a branch by pulling about 1-2k revisions at a time, "bzr branch -r1000 svn+http://... project; cd project; bzr pull -r2000; ...".

I still can't tell if the problem is in bzr, bzr-svn, or the subversion bindings. Let me know if there is any diagnostics or tests you'd like me to execute.

The tracebacks when it dies are inconsistent, since they report when the process runs out of memory, not who's taking it all. Here's the tracebacks from the full branch attempt (a mix of console and .bzr.log output):

bzr: ERROR: bzrlib.errors.KnitCorrupt: Knit <bzrlib.knit._PackAccess object at 0x2aaab52e5190> corrupt: While reading {svn-v3-trunk0:3282bf91-d132-0410-b1ef-d2f8e70e1952:trunk:9168} got MemoryError()

13224.904 Traceback (most recent call last):
  File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 834, in run_bzr_catch_errors
    return run_bzr(argv)
  File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 790, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib64/python2.4/site-packages/bzrlib/commands.py", line 492, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib64/python2.4/site-packages/bzrlib/builtins.py", line 908, in run
    accelerator_tree=accelerator_tree)
  File "/usr/lib/python2.4/site-packages/bzrlib/plugins/svn/remote.py", line 75, in sprout
    result_repo.fetch(repo, revision_id=revision_id)
  File "/usr/lib64/python2.4/site-packages/bzrlib/repository.py", line 949, in fetch
    return inter.fetch(revision_id=revision_id, pb=pb, find_ghosts=find_ghosts)
  File "/usr/lib/python2.4/site-packages/bzrlib/plugins/svn/fetch.py", line 716, in fetch
    self._fetch_switch(needed, pb, lhs_parent)
  File "/usr/lib/python2.4/site-packages/bzrlib/plugins/svn/fetch.py", line 673, in _fetch_switch
    reporter.finish_report(pool)
  File "/usr/lib/python2.4/site-packages/bzrlib/plugins/svn/errors.py", line 110, in convert
    return unbound(*args, **kwargs)
  File "/usr/lib/python2.4/site-packages/bzrlib/plugins/svn/transport.py", line 271, in finish_report
    self._baton, pool)
  File "/usr//lib/svn-python/libsvn/ra.py", line 783, in svn_ra_reporter2_invoke_finish_report
    return apply(_ra.svn_ra_reporter2_invoke_finish_report, args)
  File "/usr/lib/python2.4/site-packages/bzrlib/plugins/svn/fetch.py", line 386, in close_file
    self._store_file(self.file_id, lines, self.file_parents)
  File "/usr/lib/python2.4/site-packages/bzrlib/plugins/svn/fetch.py", line 467, in _store_file
    file_weave.add_lines(self.revid, parents, lines)
  File "/usr/lib64/python2.4/site-packages/bzrlib/versionedfile.py", line 122, in add_lines
    left_matching_blocks, nostore_sha, random_id, check_content)
  File "/usr/lib64/python2.4/site-packages/bzrlib/knit.py", line 925, in _add_lines
    parent_texts, left_matching_blocks, nostore_sha, random_id)
  File "/usr/lib64/python2.4/site-packages/bzrlib/knit.py", line 993, in _add
    left_matching_blocks)
  File "/usr/l...

Read more...

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

This may be fixed in Subversion 1.5 (which doesn't require any patching). Any chance you can try it with that? I know there are at least several extra memory leaks fixed in 1.5 that aren't fixed in 1.4.

Revision history for this message
Rexbard (john-klinger) wrote :

I never pursued the svn-trunk, since "bzr selftest svn" fails. On your request, I ignored that and tried anyway.

I came in after running the "bzr branch" over the weekend, 2.5 days, and had to perform a hard-reboot to get access, losing the timer info. At reboot, bzr was hovering around 3.5Gb [only 2Gb real] with about 16k of 22k revisions completed. The last few significant .bzr.log entries were:

27401.959 added revision_id {svn-v3-trunk0:3282bf91-d132-0410-b1ef-d2f8e70e1952:trunk:16351}
27404.424 added revision_id {svn-v3-trunk0:3282bf91-d132-0410-b1ef-d2f8e70e1952:trunk:16352}
27404.426 Auto-packing repository <bzrlib.repofmt.pack_repo.RepositoryPackCollection object at 0x4bc1810>, which has 29 pack files, containing 15860 revisions into 20 packs.
27405.229 unsupported file property 'svn:keywords'
27405.230 unsupported file property 'svn:eol-style'
...
27405.248 unsupported file property 'svn:keywords'
27405.248 unsupported file property 'svn:eol-style'
27406.860 added revision_id {svn-v3-trunk0:3282bf91-d132-0410-b1ef-d2f8e70e1952:trunk:16353}
27412.450 added revision_id {svn-v3-trunk0:3282bf91-d132-0410-b1ef-d2f8e70e1952:trunk:16354}
27414.768 added revision_id {svn-v3-trunk0:3282bf91-d132-0410-b1ef-d2f8e70e1952:trunk:16355}
27420.107 added revision_id {svn-v3-trunk0:3282bf91-d132-0410-b1ef-d2f8e70e1952:trunk:16356}
27422.490 added revision_id {svn-v3-trunk0:3282bf91-d132-0410-b1ef-d2f8e70e1952:trunk:16357}
27425.005 added revision_id {svn-v3-trunk0:3282bf91-d132-0410-b1ef-d2f8e70e1952:trunk:16358}
27430.544 added revision_id {svn-v3-trunk0:3282bf91-d132-0410-b1ef-d2f8e70e1952:trunk:16359}
27432.983 added revision_id {svn-v3-trunk0:3282bf91-d132-0410-b1ef-d2f8e70e1952:trunk:16360}
27433.841 unsupported file property 'svn:keywords'
27433.842 unsupported file property 'svn:eol-style'
...
27464.286 unsupported file property 'svn:keywords'
27464.286 unsupported file property 'svn:eol-style'

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

This should've improved now that bzr-svn has its own bindings. Any chance you can try again?

Revision history for this message
Rolf Leggewie (r0lf) wrote :

confirmed in bug 243939

Changed in bzr-svn:
status: Incomplete → Confirmed
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Rolf Leggewie (r0lf) wrote :

Same thing here for "bzr get svn+ssh://<email address hidden>/repo//gnucash/branches/aqbanking3". Unfortunately, I had to abort the process about 2/3 of the way when memory usage hit 2G.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Rolf, what version of bzr-svn are you using?

Revision history for this message
Rolf Leggewie (r0lf) wrote :

The latest hardy version, 0.4.9-1

Jelmer Vernooij (jelmer)
Changed in bzr-svn:
importance: Undecided → Medium
milestone: none → 0.4.12
status: Confirmed → Triaged
Revision history for this message
Nicholas Allen (nick-allen) wrote :

I'm using the latest versions of bzr and bzr-svn (from launchpad branches) on Hardy and am still seeing this.

~/source/bzr-svn$ bzr plugins
launchpad
    Launchpad.net integration plugin for Bazaar.

svn 0.4.11exp0
    Support for Subversion branches

~/source/bzr-svn$ bzr version
Bazaar (bzr) 1.6b4
  Python interpreter: /usr/bin/python 2.5.2
  Python standard library: /usr/lib/python2.5
  bzrlib: /usr/lib/python2.5/site-packages/bzrlib
  Bazaar configuration: /home/nia/.bazaar
  Bazaar log file: /home/nia/.bzr.log

Jelmer Vernooij (jelmer)
Changed in bzr-svn:
milestone: 0.4.12 → 0.4.13
Revision history for this message
Jelmer Vernooij (jelmer) wrote :

I very much suspect this is actually a bzr issue, when lots of revisions are imported into a repository. Other people have reproduced similar behaviour importing large amounts of revisions using fastimport.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Has anybody been able to reproduce this with 0.5.0~rc1 yet?

Revision history for this message
Ryo IGARASHI (rigarash) wrote :

Hi,

This morning (in Japan) I found that bzr branch from the boost.org tree has successfully finished.
During branching, I noteced that previously the memory usage is about 8GB, now that
about 1GB, the memory usage has reduced drastically.

Revision history for this message
Rexbard (john-klinger) wrote :

I tried again, with mixed results. In the end, I was unable to fully check out the repository with which I originally reported this Bug.

Running with bzr 1.10 and bzr-svn 0.5.0.rc.1.

First few runs died after many hours (7, 8, 12) due to network dropouts.

I then created a local svn mirror and tried using bzr-svn on that repository. Some of these issues are already documented as bugs. Since I'm using the bzr-svn release candidate, I'll document them here anyway.

(1) "http://svn+http" noted as deprecated, but "http://" doesn't work [Bug #268553]
$ bzr branch svn+http://<host>/repoPath/trunk/subDirName localName
The svn+ syntax is deprecated, use http://<host>/repoPath/trunk/subDirName instead.

$ bzr branch http://<host>/repoPath/trunk/subDirName localName
bzr: ERROR: Invalid http response for http://<host>/repoPath/trunk/subDirName/.bzr/branch-format: Unable to handle http code 503: expected 200 or 404 for full response.

(2) Cannot resume an interrupted branch. This is tough when the branch dies after 12 hours of work [Bug #116148, Bug #125067]

(3) System didn't crash from out of memory, which is a definite improvement. However, the system became unusable during operation. I reniced the process a couple of times to allow me some access to the system. I ran a top in parallel, polling every minute. The max CPU was 100%, max memory was 1.3g [68.3%]. Included is the top for the last run.

(4) After about 8 hours [344 minutes of CPU], the branch operation ultimately failed with an AssertionError [No bug found]. I received the same error running under an "init-repo --rich-root-pack" and a straight branch into a new directory.

bzr: ERROR: exceptions.AssertionError: Tried registering <RevisionMetadata for revision 350, path trunk in repository '3282bf91-d132-0410-b1ef-d2f8e70e1952'> as parent while <RevisionMetadata for revision 355, path trunk in repository '3282bf91-d132-0410-b1ef-d2f8e70e1952'> already was parent for <RevisionMetadata for revision 356, path trunk in repository '3282bf91-d132-0410-b1ef-d2f8e70e1952'>

Enclosed is the bzr.log for the final run containing the Assert error. It is slightly sanitized for file and host names.

Revision history for this message
Rexbard (john-klinger) wrote :

Enclosed is the stderr for the Assert error.

Revision history for this message
Rexbard (john-klinger) wrote :

Enclosed is the 1-minute polling top for the Assert Error run.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

That particular assert is fixed in the current 0.5 branch, a fix for it will definitely be in 0.5 final.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Importing OpenOffice.org now takes 3.3Gb of RAM, which could be less but is not unreasonable.

Jelmer Vernooij (jelmer)
Changed in bzr-svn:
milestone: 0.4.13 → 0.6.0
Revision history for this message
Adrian Wilkins (adrian-wilkins) wrote : Re: Memory problems prevent branch of very large svn repositories

This also affects repositories with single large revisions, as opposed to just repositories with lots of revisions.

Case :

Revision 3 of a repository contains the add of 7 folders, each containing three large CSV table files (respectively 2 files of ~ 60MB and one file of ~ 25MB). bzr consumes over 2.1GB of memory and slowly requests more data from server. Eventually, server truncates the connection as a timeout.

(sorry, I can't give you access to the repository in question because it contains material that is not under an open license).

Revision history for this message
Jaco Vosloo (g-launchpad-jacovosloo-info) wrote :

A workaround for this is to disable the svn plugin and then use Bazaar in conjunction with SVN.

For windows XP run the following commands in a shell from the folder where you want the repository:
1. move "C:\Program Files\Bazaar\plugins\svn" "C:\Program Files\Bazaar\svnpluginbroken"
2. echo .svn/ >> .bzrignore
3. bzr init
4. bzr add
5. bzr commit -m Init

Lastly tell SVN to ignore the .bzr folder if you want otherwise commit it back to SVN.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Jaco, this isn't really a workaround as it doesn't get you any of the history.

Changed in bzr-svn:
assignee: nobody → Jelmer Vernooij (jelmer)
Jelmer Vernooij (jelmer)
Changed in bzr-svn:
assignee: Jelmer Vernooij (jelmer) → nobody
Jelmer Vernooij (jelmer)
Changed in bzr-svn:
milestone: 0.6.0 → 2.0.0
milestone: 2.0.0 → 1.1.0
milestone: 1.1.0 → 1.1.1
Revision history for this message
fredhoare (fredh-glenaffric) wrote :

I am trying to import a single project from an svn repository that has multiple projects. We have almost 400000 revisions across all those projects. Even limiting the number of revisions pulled to 200 results in huge memory usage - in my case in excess of 10GB. The memory seems to go up when bzr-svn is looking for tags. I was surprised to see that even though I had told it to import the first 500 revisions when it came to looking for tags it searced all the svn revisions.

Revision history for this message
Roel Van de Paar (roel11) wrote :

Re-questing re-triage. This bug is major.

It took quite a while to locate this bug on Launchpad due to the non-descriptiveness of the issue: the only thing you get to see is "killed". As there are 14 people already listed here as affected, it is to be expected that the total number of users affected is large.

I am getting this bug while trying to pull percona-server under FC17 running in vbox with 2Gb of memory assigned.

When trying to up the memory assigned to the vm, the pull quickly consumed 4+Gb and eventually lead to host os/vbox OOM errors!

 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 1576 - 20 0 4576m 4.2g 3872 S 14.6 74.6 2:29.86 bzr

Could we have this looked at?

bzr version: 2.5.0
uname: 3.4.6-2.fc17.x86_64
vbox: 4.1.18 r78361 (host OS: Win7 HP SP1)

Changed in bzr-svn:
status: Triaged → Confirmed
Revision history for this message
Roel Van de Paar (roel11) wrote :

Found a workaround: make an additional 20Gb swap space. Once main memory starts running out, it will use swap. This way I was able to at least finish my intended bzr branch. (Within a vbox session adding swap space is quite easy. Add an additional VDI and format as swap. FC 17 picked up on the extra swap without any additional configuration necessary).

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Unfortunately my spare time is limited and this is a non-trivial problem. I don't have time to look at it, but patches are more than welcome if anybody else is interested in hacking on this.

Changed in bzr-svn:
status: Confirmed → Triaged
Changed in bzr-svn (Ubuntu):
status: Confirmed → Triaged
Changed in bzr-svn:
milestone: 1.1.1 → none
importance: Medium → High
Revision history for this message
Jelmer Vernooij (jelmer) wrote :

I'm happy to bump this to "Importance: high", but that won't actually change how quickly it's adressed.

Revision history for this message
Selene ToyKeeper (toykeeper) wrote :

FWIW, it's possible to work around this by using git-svn and then converting to bzr with fastimport or bzr-git.

In my tests (Bug 243939, from 2008), I found that git-svn used at least an order of magnitude less RAM and ran in a fraction of the time as bzr-svn. However, the workaround won't allow you to 'bzr pull' and get updates from upstream svn, so it may or may not be appropriate. It's only really useful for one-time one-way conversions.

Revision history for this message
Andreas Mohr (andi) wrote :

Edited title to disambiguate it from memory CORRUPTION issues.

summary: - Memory problems prevent branch of very large svn repositories
+ Memory resource consumption issues prevent branch of very large svn
+ repositories
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.