Bazaar

bzr log DIR could layer above iter_changes

Bug #503071 reported by John A Meinel on 2010-01-04

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	Bazaar	Confirmed	Medium	Unassigned

Bug Description

This is a spin-off from bug #374730.

Basically, 'bzr log DIR' currently goes to each revision, and pulls out a 'minimal inventory' that just includes DIR and things underneath. It then runs 'iter_changes' on that.

However, it could be more ideal to run 'iter_changes(..., DIR)' and then filter on that. The main difference is that the current code is O(subtree), while the proposed code is O(changes). What we would like is to be able to combine the two and have O(changes-in-subtree).

Tags:

Revision history for this message

John A Meinel (jameinel) wrote on 2010-01-04:

Implementation wise, log DIR is a bit tricky because changes are stored by file-id. So you need to map from paths => file-ids, and then compute the changes on that. (Also, the iter_changes apis don't always know whether they are path based or file-id based.)

So you end up needing to look up in a couple of different chk maps.

The 2a format would allow us to do:

1) compute the mapping from paths => file-ids for this revision
2) compute the mappings in the previous revision, also noting that if the chk root id didn't change, the mapping is known to be identical.
3) Run iter_changes across only those paths/file-ids.
4) continue from step 2

Our current design does suffer a bit from locality issues. A big-enough subdir is likely to have its file-ids spread out across all/most of the chk pages. So we end up reading all the pages for every revision anyway. Also, the deserialization, etc code means that we probably do a bit more extraction than we need to.

(Ideally, 'iter_changes' could even work down at the bytes level, so that we don't have to extract 50 rows to determine that they are all identical between both sides.)

Revision history for this message

Craig Hewetson (craighewetson-deactivatedaccount) wrote on 2010-02-23:

I'm really keen on this fix :) I've been getting up hill from my fellow colleges about this feature. Is there any way that I can help with this ... maybe not development but testing etc.

Revision history for this message

Matt Doran (matt-doran) wrote on 2010-04-12:

Yeah I'd like to see some improvements here too. We commonly need to do this to check the history of a sub-component of a large repository ... and it would be nice for this to be as fast as the rest of bzr. :)

Revision history for this message

Per Johansson (per.j) wrote on 2010-05-10:

It seems to me that doing a bzr log <file> plus bzr diff -c <rev> <file> for all revisions is quite a bit faster than bzr log -v <file>, even though it produces a superset of the information. Is that what this bug is about?

Eg. (11551 is the only rev for this file, where it was added):

; time sh -c 'bzr log fileinbigrepo && bzr diff -c 11551 fileinbigrepo' > /dev/null
real 0m4.035s
user 0m3.826s
sys 0m0.203s

; time bzr log -v fileinbigrepo > /dev/null
real 0m19.138s
user 0m18.892s
sys 0m0.221s

Jelmer Vernooij (jelmer) on 2017-11-09

tags:

added: check-for-breezy

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.