small changes to dirstate are too slow
Bug #380202 reported by
Ian Clatworthy
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Bazaar |
Fix Released
|
Medium
|
John A Meinel |
Bug Description
Looking at the profiling data for 'bzr st' after changing a single file on OOo, 47% of the time is spent just saving the dirstate. For 'bzr st file', saving takes 75% of the time. Note that this is the primary reason why 'hg status' is faster than 'bzr status' on OOo (for hg 1.2.1 vs bzr 1.15).
It appears that we deserialise and reserialise every dirstate line, whether it changed or not, as the "is modified" marker is dirstate-wide. Perhaps we should remember what lines changed and only deserialise/
Related branches
lp:~ian-clatworthy/bzr/faster-dirstate-saving
Rejected
for merging
into
lp:~bzr/bzr/trunk-old
- Martin Pool: Needs Fixing
- Diff: 351 lines (has conflicts)
lp:~jameinel/bzr/2.4-faster-dirstate-saving-380202
- Vincent Ladeuil: Needs Fixing
- Jelmer Vernooij (community): Approve
-
Diff: 708 lines (+315/-50)8 files modifiedbzrlib/_dirstate_helpers_pyx.pyx (+9/-1)
bzrlib/dirstate.py (+102/-28)
bzrlib/tests/per_workingtree/test_workingtree.py (+35/-0)
bzrlib/tests/test__dirstate_helpers.py (+27/-15)
bzrlib/tests/test_dirstate.py (+52/-4)
bzrlib/workingtree_4.py (+24/-2)
doc/developers/dirstate.txt (+54/-0)
doc/en/release-notes/bzr-2.4.txt (+12/-0)
Changed in bzr: | |
assignee: | nobody → Ian Clatworthy (ian-clatworthy) |
Changed in bzr: | |
status: | In Progress → Fix Released |
To post a comment you must log in.
2009/5/25 Ian Clatworthy <email address hidden>: serialise those? Or something like that.
> It appears that we deserialise and reserialise every dirstate line,
> whether it changed or not, as the "is modified" marker is dirstate-wide.
> Perhaps we should remember what lines changed and only
> deserialise/
There are a few things we could do.
If there are no changes, we shouldn't even think about writing the
file. Do we get this right at the moment?
If there's very few changes, even just opening the file for writing
may be more work than is worthwhile. Updating it in place should be
possible, and it might be faster to seek to the write place and just
write that one line or section.
The other thing to consider is that the hash cache function is
generally only useful to tell if the contents of the file are
different to the basis tree. There are other cases like a file that's
changed from the parent but the same as some other tree the user's
diffing against, but I think they're uncommon. I think, further, it's
reasonable to assume that files will rarely be made the same as in the
parent except by bzr operations like commit, merge, and revert. So if
you accept this, we should probably never write the tree from bzr st,
but only when we've just updated the tree eg from building it, revert,
or commit.
There are three possible snags here: first, that reasoning may be
wrong. Second, because of the granularity of file timestamps, we may
not safely be able to record the timestamp when we first build the
tree. Third, I'm not sure that we always do update the hash cache
when we update the tree, and updating it on read operations will give
us some safety net against that. (But it's kind of hiding the real
problem, maybe we should rip off the bandaid.)
Doing this would have the advantage that we'd no longer be doing
physical writes from logical read operations which would be nice.
[Also, this bug may be a dupe.]
-- launchpad. net/~mbp/>
Martin <http://