I believe this is the problem. Specifically, because it is considering 'whole nodes' not 'interesting items', we see all the items in new pages, not just the items that are new in those pages.
chk_bytes = self.from_repository.chk_bytes
def _filter_id_to_entry():
for record, items in chk_map.iter_interesting_nodes(chk_bytes, self._chk_id_roots, uninteresting_root_keys):
for name, bytes in items: # Note: we don't care about name_utf8, because we are always # rich-root = True _, file_id, revision_id = bytes_to_info(bytes) self._text_keys.add((file_id, revision_id))
if record is not None: yield record
To me this reinforces the need to push this diff code into chk_map.py, and have a single implementation of it.
I believe this is the problem. Specifically, because it is considering 'whole nodes' not 'interesting items', we see all the items in new pages, not just the items that are new in those pages.
chk_bytes = self.from_ repository. chk_bytes id_to_entry( ): iter_interestin g_nodes( chk_bytes,
self. _chk_id_ roots, uninteresting_ root_keys) :
# Note: we don't care about name_utf8, because we are always
# rich-root = True
_ , file_id, revision_id = bytes_to_ info(bytes)
self. _text_keys. add((file_ id, revision_id))
yield record
def _filter_
for record, items in chk_map.
for name, bytes in items:
if record is not None:
To me this reinforces the need to push this diff code into chk_map.py, and have a single implementation of it.