"Out of memory" error when pushing a large repository
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Dulwich |
Fix Released
|
Medium
|
Jelmer Vernooij |
Bug Description
When pushing a large repository, Dulwich may (depending on repository size and available system memory) crash with the following message:
abort: out of memory
fatal: write error: Broken pipe
I observed this when using hg-git. A related issue has been posted for hg-git: <https:/
I'm not familiar enough with Dulwich's API to make a Dulwich-only script that reproduces the problem. But, if you need a large repository to test with, you can use <https:/
I tracked the crash to the write_pack_data() function in dulwich/pack.py. In particular, the following code triggers it for me:
recency = list(objects)
# FIXME: Somehow limit delta depth
# FIXME: Make thin-pack optional (its not used when cloning a pack)
# Build a list of objects ordered by the magic Linus heuristic
# This helps us find good objects to diff against us
magic = []
for obj, path in recency:
magic.sort()
# Build a map of objects and their index in magic - so we can find
# preceeding objects to diff against
offs = {}
for i in range(len(magic)):
Creating "recency" (the list copy of "objects", which is an ObjectStoreIter
Amusingly, all that memory is gobbled up for no purpose. None of those lists or dicts are actually used, because the only code that used them is commented out:
#for i in range(offs[
# if i < 0 or i >= len(offs): continue
# b = magic[i][4]
# if b.type_num != orig_t: continue
# base = b.as_raw_string()
# delta = create_delta(base, raw)
# if len(delta) < len(winner):
# winner = delta
# t = 6 if magic[i][2] == 1 else 7
Removing or commenting out all the code in the first chunk I pasted above, significantly reduces Dulwich's memory footprint and speeds up the pushing process. After removing that code, I was able to successfully push the repository without running out of memory. And it has no negative impact on Dulwich's behavior, since the results of that code weren't being used anyway.
In the short term, I'd recommend commenting out that code. In the long term, Dulwich should split up large repositories into several smaller packs, so that it doesn't use so much memory at once.
Changed in dulwich: | |
status: | New → Fix Committed |
importance: | Undecided → Medium |
assignee: | nobody → Jelmer Vernooij (jelmer) |
milestone: | none → 0.7.2 |
Changed in dulwich: | |
status: | Fix Committed → Fix Released |
One more thing I forgot to mention: if the code is commented out, this line:
for o, path in recency:
needs to be changed to this:
for o, path in objects: