Comment 3 for bug 890085

Revision history for this message
Martin Pool (mbp) wrote :

In passing, https://code.launchpad.net/~mbp/bzr/remove-pylzma/+merge/82097 can clean this up a bit.

    def _create_z_content_from_chunks(self, chunks):
        compressor = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION)
        # Peak in this point is 1 fulltext, 1 compressed text, + zlib overhead
        # (measured peak is maybe 30MB over the above...)
        compressed_chunks = map(compressor.compress, chunks)
        compressed_chunks.append(compressor.flush())
        # Ignore empty chunks
        self._z_content_chunks = [c for c in compressed_chunks if c]
        self._z_content_length = sum(map(len, self._z_content_chunks))

The whole GroupCompressClass class is based on a format that has a compressed blocks preceded by the length of compressed data. I don't think we can compatibly remove the per-file length prefix. However, what we could probably do is spill the compressed content to a temporary file, trading off disk for memory pressure. Then we'll know the length when we're done compressing, and we can copy from the temporary file out to the actual pack. We need to then just be careful not to read the whole file.

For local storage we could leave space for the count, write out the compressed content, and then seek back, but that obviously won't work when sending these things across the network, and it might be poor over some transports.