So _DirectPackAccess._access.add_raw_records takes one big byte string apparently, and that needs to be fixed.
The to_chunks method is building a list of which one element is the bigstring chunks, so that might need to be fixed too.
add_raw_records rather bizarrely takes one big bytestring and a list of substring lengths and then chops it up again.
From the groupcompress repo, add_raw_records seems to be only ever called with one byte string, so it's a bit redundant as a layer. It calls in to ContainerWriter.add_bytes_record, which also needs one single string but could be fairly easily adapted to handle more. That in turn goes to ContainerSerializer which builds another big string and could easily
Earlier on, we're reading the whole file to be added in to memory, so it's hard for the compressor to deal with it as chunks.
The structure here were the compressed data is held attached to an object is perhaps making it a bit more likely it will be unnecessarily retained.
The traceback where it's added is
-> if expected_sha is not None: mbp/bzr/ work/bzr( 145)<module> () commands. main() mbp/bzr/ work/bzrlib/ commands. py(1213) main() catch_errors( argv) mbp/bzr/ work/bzrlib/ commands. py(1226) run_bzr_ catch_errors( ) to_return_ code(run_ bzr, argv) mbp/bzr/ work/bzrlib/ commands. py(923) exception_ to_return_ code() mbp/bzr/ work/bzrlib/ commands. py(1128) run_bzr( ) mbp/bzr/ work/bzrlib/ commands. py(676) run_argv_ aliases( ) **all_cmd_ args) mbp/bzr/ work/bzrlib/ commands. py(698) run() .run_simple( *args, **kwargs) mbp/bzr/ work/bzrlib/ cleanup. py(135) run_simple( ) mbp/bzr/ work/bzrlib/ cleanup. py(165) _do_with_ cleanups( ) mbp/bzr/ work/bzrlib/ builtins. py(3497) run() mbp/bzr/ work/bzrlib/ decorators. py(217) write_locked( ) mbp/bzr/ work/bzrlib/ workingtree_ 4.py(208) commit( ) commit( self, message, revprops, *args, **kwargs) mbp/bzr/ work/bzrlib/ decorators. py(217) write_locked( ) mbp/bzr/ work/bzrlib/ mutabletree. py(210) commit( ) mbp/bzr/ work/bzrlib/ commit. py(289) commit( ) mbp/bzr/ work/bzrlib/ cleanup. py(131) run() mbp/bzr/ work/bzrlib/ cleanup. py(165) _do_with_ cleanups( ) mbp/bzr/ work/bzrlib/ commit. py(431) _commit( ) builder_ with_changes( ) mbp/bzr/ work/bzrlib/ commit. py(691) _update_ builder_ with_changes( ) mbp/bzr/ work/bzrlib/ vf_repository. py(760) record_ iter_changes( ) mbp/bzr/ work/bzrlib/ vf_repository. py(829) _add_text_ to_weave( ) sha=nostore_ sha, random_ id=self. random_ revid)[ 0:2] mbp/bzr/ work/bzrlib/ groupcompress. py(1335) _add_text( ) sha=nostore_ sha))[0] mbp/bzr/ work/bzrlib/ groupcompress. py(1835) _insert_ record_ stream( ) sha=nostore_ sha) bzr/work/ bzrlib/ groupcompress. py(866) compress( )
(Pdb) bt
/home/
-> exit_val = bzrlib.
/home/
-> ret = run_bzr_
/home/
-> return exception_
/home/
-> return the_callable(*args, **kwargs)
/home/
-> ret = run(*run_argv)
/home/
-> return self.run(
/home/
-> return self._operation
/home/
-> self.cleanups, self.func, *args, **kwargs)
/home/
-> result = func(*args, **kwargs)
/home/
-> lossy=lossy)
/home/
-> result = unbound(self, *args, **kwargs)
/home/
-> result = WorkingTree.
/home/
-> result = unbound(self, *args, **kwargs)
/home/
-> *args, **kwargs)
/home/
-> lossy=lossy)
/home/
-> self.cleanups, self.func, self, *args, **kwargs)
/home/
-> result = func(*args, **kwargs)
/home/
-> self._update_
/home/
-> self.work_tree, self.basis_revid, iter_changes):
/home/
-> file_id, text, heads, nostore_sha)
/home/
-> nostore_
/home/
-> nostore_
/home/
-> nostore_
> /home/mbp/
-> if expected_sha is not None:
With my branch so far in place, and giving it a bit more memory, the place it bombs out is
(Pdb) bt mbp/bzr/ work/bzr( 145)<module> () commands. main() mbp/bzr/ work/bzrlib/ commands. py(1213) main() catch_errors( argv) mbp/bzr/ work/bzrlib/ commands. py(1226) run_bzr_ catch_errors( ) to_return_ code(run_ bzr, argv) mbp/bzr/ work/bzrlib/ commands. py(923) exception_ to_return_ code() mbp/bzr/ work/bzrlib/ commands. py(1128) run_bzr( ) mbp/bzr/ work/bzrlib/ commands. py(676) run_argv_ aliases( ) **all_cmd_ args) mbp/bzr/ work/bzrlib/ commands. py(698) run() .run_simple( *args, **kwargs) mbp/bzr/ work/bzrlib/ cleanup. py(135) run_simple( ) mbp/bzr/ work/bzrlib/ cleanup. py(165) _do_with_ cleanups( ) mbp/bzr/ work/bzrlib/ builtins. py(3497) run() mbp/bzr/ work/bzrlib/ decorators. py(217) write_locked( ) mbp/bzr/ work/bzrlib/ workingtree_ 4.py(208) commit( ) commit( self, message, revprops, *args, **kwargs) mbp/bzr/ work/bzrlib/ decorators. py(217) write_locked( ) mbp/bzr/ work/bzrlib/ mutabletree. py(210) commit( ) mbp/bzr/ work/bzrlib/ commit. py(289) commit( ) mbp/bzr/ work/bzrlib/ cleanup. py(131) run() mbp/bzr/ work/bzrlib/ cleanup. py(165) _do_with_ cleanups( ) mbp/bzr/ work/bzrlib/ commit. py(431) _commit( ) builder_ with_changes( ) mbp/bzr/ work/bzrlib/ commit. py(691) _update_ builder_ with_changes( ) mbp/bzr/ work/bzrlib/ vf_repository. py(760) record_ iter_changes( ) mbp/bzr/ work/bzrlib/ vf_repository. py(829) _add_text_ to_weave( ) sha=nostore_ sha, random_ id=self. random_ revid)[ 0:2] mbp/bzr/ work/bzrlib/ groupcompress. py(1335) _add_text( ) sha=nostore_ sha))[0] mbp/bzr/ work/bzrlib/ groupcompress. py(1872) _insert_ record_ stream( ) bzr/work/ bzrlib/ groupcompress. py(1747) flush()
/home/
-> exit_val = bzrlib.
/home/
-> ret = run_bzr_
/home/
-> return exception_
/home/
-> return the_callable(*args, **kwargs)
/home/
-> ret = run(*run_argv)
/home/
-> return self.run(
/home/
-> return self._operation
/home/
-> self.cleanups, self.func, *args, **kwargs)
/home/
-> result = func(*args, **kwargs)
/home/
-> lossy=lossy)
/home/
-> result = unbound(self, *args, **kwargs)
/home/
-> result = WorkingTree.
/home/
-> result = unbound(self, *args, **kwargs)
/home/
-> *args, **kwargs)
/home/
-> lossy=lossy)
/home/
-> self.cleanups, self.func, self, *args, **kwargs)
/home/
-> result = func(*args, **kwargs)
/home/
-> self._update_
/home/
-> self.work_tree, self.basis_revid, iter_changes):
/home/
-> file_id, text, heads, nostore_sha)
/home/
-> nostore_
/home/
-> nostore_
/home/
-> flush()
> /home/mbp/
-> bytes = ''.join(chunks)
So _DirectPackAcce ss._access. add_raw_ records takes one big byte string apparently, and that needs to be fixed.
The to_chunks method is building a list of which one element is the bigstring chunks, so that might need to be fixed too.
add_raw_records rather bizarrely takes one big bytestring and a list of substring lengths and then chops it up again.
From the groupcompress repo, add_raw_records seems to be only ever called with one byte string, so it's a bit redundant as a layer. It calls in to ContainerWriter .add_bytes_ record, which also needs one single string but could be fairly easily adapted to handle more. That in turn goes to ContainerSerializer which builds another big string and could easily