bzr log DIR could layer above iter_changes
Bug #503071 reported by
John A Meinel
This bug affects 3 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Bazaar |
Confirmed
|
Medium
|
Unassigned |
Bug Description
This is a spin-off from bug #374730.
Basically, 'bzr log DIR' currently goes to each revision, and pulls out a 'minimal inventory' that just includes DIR and things underneath. It then runs 'iter_changes' on that.
However, it could be more ideal to run 'iter_changes(..., DIR)' and then filter on that. The main difference is that the current code is O(subtree), while the proposed code is O(changes). What we would like is to be able to combine the two and have O(changes-
tags: | added: check-for-breezy |
To post a comment you must log in.
Implementation wise, log DIR is a bit tricky because changes are stored by file-id. So you need to map from paths => file-ids, and then compute the changes on that. (Also, the iter_changes apis don't always know whether they are path based or file-id based.)
So you end up needing to look up in a couple of different chk maps.
The 2a format would allow us to do:
1) compute the mapping from paths => file-ids for this revision
2) compute the mappings in the previous revision, also noting that if the chk root id didn't change, the mapping is known to be identical.
3) Run iter_changes across only those paths/file-ids.
4) continue from step 2
Our current design does suffer a bit from locality issues. A big-enough subdir is likely to have its file-ids spread out across all/most of the chk pages. So we end up reading all the pages for every revision anyway. Also, the deserialization, etc code means that we probably do a bit more extraction than we need to.
(Ideally, 'iter_changes' could even work down at the bytes level, so that we don't have to extract 50 rows to determine that they are all identical between both sides.)