MemoryError in _create_z_content_from_chunks during commit

Bug #890085 reported by Martin Pool on 2011-11-14
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Bazaar
High
Unassigned

Bug Description

bzr crashes when committing a file nearly twice the size of available memory.

NOTE: please only dupe or comment this bug if you're seeing this specific traceback; otherwise please vote for bug 109114 for generic memory issues.

Following on from bug 566940, bug 185072, bug 109114.

To reproduce:

8.145 Traceback (most recent call last):
  File "bzrlib/commit.py", line 431, in _commit
    self._update_builder_with_changes()
  File "bzrlib/commit.py", line 691, in _update_builder_with_changes
    self.work_tree, self.basis_revid, iter_changes):
  File "bzrlib/vf_repository.py", line 760, in record_iter_changes
    file_id, text, heads, nostore_sha)
  File "bzrlib/vf_repository.py", line 829, in _add_text_to_weave
    nostore_sha=nostore_sha, random_id=self.random_revid)[0:2]
  File "bzrlib/groupcompress.py", line 1320, in _add_text
    nostore_sha=nostore_sha))[0]
  File "bzrlib/groupcompress.py", line 1857, in _insert_record_stream
    flush()
  File "bzrlib/groupcompress.py", line 1721, in flush
    bytes_len, chunks = self._compressor.flush().to_chunks()
  File "bzrlib/groupcompress.py", line 336, in to_chunks
    self._create_z_content()
  File "bzrlib/groupcompress.py", line 332, in _create_z_content
    self._create_z_content_from_chunks(chunks)
  File "bzrlib/groupcompress.py", line 316, in _create_z_content_from_chunks
    compressed_chunks = map(compressor.compress, chunks)
MemoryError

Related branches

Martin Pool (mbp) wrote :
Martin Pool (mbp) wrote :

see John's mail "Reducing peak memory for commit" in April 2010, in which he did some work towards this.

description: updated
Martin Pool (mbp) wrote :

In passing, https://code.launchpad.net/~mbp/bzr/remove-pylzma/+merge/82097 can clean this up a bit.

    def _create_z_content_from_chunks(self, chunks):
        compressor = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION)
        # Peak in this point is 1 fulltext, 1 compressed text, + zlib overhead
        # (measured peak is maybe 30MB over the above...)
        compressed_chunks = map(compressor.compress, chunks)
        compressed_chunks.append(compressor.flush())
        # Ignore empty chunks
        self._z_content_chunks = [c for c in compressed_chunks if c]
        self._z_content_length = sum(map(len, self._z_content_chunks))

The whole GroupCompressClass class is based on a format that has a compressed blocks preceded by the length of compressed data. I don't think we can compatibly remove the per-file length prefix. However, what we could probably do is spill the compressed content to a temporary file, trading off disk for memory pressure. Then we'll know the length when we're done compressing, and we can copy from the temporary file out to the actual pack. We need to then just be careful not to read the whole file.

For local storage we could leave space for the count, write out the compressed content, and then seek back, but that obviously won't work when sending these things across the network, and it might be poor over some transports.

Martin Pool (mbp) wrote :
Download full text (5.7 KiB)

Earlier on, we're reading the whole file to be added in to memory, so it's hard for the compressor to deal with it as chunks.

The structure here were the compressed data is held attached to an object is perhaps making it a bit more likely it will be unnecessarily retained.

The traceback where it's added is

-> if expected_sha is not None:
(Pdb) bt
  /home/mbp/bzr/work/bzr(145)<module>()
-> exit_val = bzrlib.commands.main()
  /home/mbp/bzr/work/bzrlib/commands.py(1213)main()
-> ret = run_bzr_catch_errors(argv)
  /home/mbp/bzr/work/bzrlib/commands.py(1226)run_bzr_catch_errors()
-> return exception_to_return_code(run_bzr, argv)
  /home/mbp/bzr/work/bzrlib/commands.py(923)exception_to_return_code()
-> return the_callable(*args, **kwargs)
  /home/mbp/bzr/work/bzrlib/commands.py(1128)run_bzr()
-> ret = run(*run_argv)
  /home/mbp/bzr/work/bzrlib/commands.py(676)run_argv_aliases()
-> return self.run(**all_cmd_args)
  /home/mbp/bzr/work/bzrlib/commands.py(698)run()
-> return self._operation.run_simple(*args, **kwargs)
  /home/mbp/bzr/work/bzrlib/cleanup.py(135)run_simple()
-> self.cleanups, self.func, *args, **kwargs)
  /home/mbp/bzr/work/bzrlib/cleanup.py(165)_do_with_cleanups()
-> result = func(*args, **kwargs)
  /home/mbp/bzr/work/bzrlib/builtins.py(3497)run()
-> lossy=lossy)
  /home/mbp/bzr/work/bzrlib/decorators.py(217)write_locked()
-> result = unbound(self, *args, **kwargs)
  /home/mbp/bzr/work/bzrlib/workingtree_4.py(208)commit()
-> result = WorkingTree.commit(self, message, revprops, *args, **kwargs)
  /home/mbp/bzr/work/bzrlib/decorators.py(217)write_locked()
-> result = unbound(self, *args, **kwargs)
  /home/mbp/bzr/work/bzrlib/mutabletree.py(210)commit()
-> *args, **kwargs)
  /home/mbp/bzr/work/bzrlib/commit.py(289)commit()
-> lossy=lossy)
  /home/mbp/bzr/work/bzrlib/cleanup.py(131)run()
-> self.cleanups, self.func, self, *args, **kwargs)
  /home/mbp/bzr/work/bzrlib/cleanup.py(165)_do_with_cleanups()
-> result = func(*args, **kwargs)
  /home/mbp/bzr/work/bzrlib/commit.py(431)_commit()
-> self._update_builder_with_changes()
  /home/mbp/bzr/work/bzrlib/commit.py(691)_update_builder_with_changes()
-> self.work_tree, self.basis_revid, iter_changes):
  /home/mbp/bzr/work/bzrlib/vf_repository.py(760)record_iter_changes()
-> file_id, text, heads, nostore_sha)
  /home/mbp/bzr/work/bzrlib/vf_repository.py(829)_add_text_to_weave()
-> nostore_sha=nostore_sha, random_id=self.random_revid)[0:2]
  /home/mbp/bzr/work/bzrlib/groupcompress.py(1335)_add_text()
-> nostore_sha=nostore_sha))[0]
  /home/mbp/bzr/work/bzrlib/groupcompress.py(1835)_insert_record_stream()
-> nostore_sha=nostore_sha)
> /home/mbp/bzr/work/bzrlib/groupcompress.py(866)compress()
-> if expected_sha is not None:

With my branch so far in place, and giving it a bit more memory, the place it bombs out is

(Pdb) bt
  /home/mbp/bzr/work/bzr(145)<module>()
-> exit_val = bzrlib.commands.main()
  /home/mbp/bzr/work/bzrlib/commands.py(1213)main()
-> ret = run_bzr_catch_errors(argv)
  /home/mbp/bzr/work/bzrlib/commands.py(1226)run_bzr_catch_errors()
-> return exception_to_return_code(run_bzr, argv)
  /home/mbp/bzr/work/bzrlib/commands.py(923)exception_to_return_code()
-> return the_callable...

Read more...

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/14/2011 10:12 AM, Martin Pool wrote:
...

>
> add_raw_records rather bizarrely takes one big bytestring and a
> list of substring lengths and then chops it up again.
>

This is because of knit days where it made sense to do so.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7BDxwACgkQJdeBCYSNAANLZgCfay/JcTrWsiapmJFQZOar8ZMM
tUgAnRbp/zVRkOlYelTgX2QWaRBysmMO
=NnUP
-----END PGP SIGNATURE-----

Martin Pool (mbp) on 2013-01-28
Changed in bzr:
assignee: Martin Pool (mbp) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments