Comment 0 for bug 402652

Revision history for this message
John A Meinel (jameinel) wrote : smart fetch for --2a does not opportunistically pack

somewhat related to bug #402114
The current --2a fetching code is smart enough to fragment on demand. (So if you have a group of length 2MB and you only request the middle 10kB, it will create a new group on the fly containing only those bytes.)

However, it is not smart enough to combine on the fly. So if you commit 9 times the same file content, you will end up with 9 fulltexts in various pack files. If you then fetch this, it will stream those 9 fulltexts into a new pack file, but will *leave* them as 9 fulltexts. Until the point where an autopack/manual pack decides to update that pack file.

This has fairly strong implications for 'mirror' branches (especially of PQM branches). As the PQM will be creating lots of commits and loosely packed .pack files. Normally these would be cleaned up every 10 revs. However, if someone fetches everything inbetween autopacks, they will fetch the loose packs into a single (but still not optimally compressed) pack. And the autopack on the new location will be deferred until much later, because the number of revisions present in the mirror location.

One possibility is for the 'get_record_stream()' code to see that we are requesting several blocks that all seem like they would fit very well as a single larger block. This has the potential to reduce network I/O and improve disk layout, at the cost of more CPU on the server.

Note that aside from bug #402645 which (injecting too much fragmentation from an optimally packed source) doing a 'bzr pack' on the effected repositories should help minimize the impact of this.