Content filtering breaks commit w/ merge parents
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Bazaar |
Fix Released
|
Critical
|
Martin Pool | ||
1.18 |
Fix Released
|
High
|
Unassigned | ||
2.0 |
Fix Released
|
Critical
|
Unassigned |
Bug Description
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Frits Jalvingh (bug #405251) noticed that the performance of his
repository suddenly died when he upgraded to bzr 1.17. It looked like
all of a sudden he would have thousands of changed files per commit.
I dug into it, and found that there is a specific bug wrt content
filtering and commit that isn't using "record_
This happens whenever you have a merge in a 1.14 repository.
I was able to track it down into this code path:
bzrlib.
has this statement:
if kind == 'file':
if content_summary[2] is None:
if not store:
if (# if the file length changed we have to store:
...
if store:
# We want to record a new node regardless of the presence or
# absence of a content change in the file.
^- The issue is that if you are using content filtering,
'content_
parent_
Or in other words, it is comparing the canonical size with the
convenient size. And when they differ it forces the code to add a new
text. (by setting nostore_sha = None, so we won't notice that nothing
has changed.)
This also happens because of this code in Workingtree:
def path_content_
...
if kind == 'file':
size = stat_result.st_size
# try for a stat cache lookup
stat_result)
return (kind, size, executable, self._sha_
So path_content_
are filters in effect. And this gets passed all the way around to commit
builder.
I believe "CommitBuilder.
because 'iter_changes' was taught to do the right thing wrt content size.
Which means that --1.14 format repositories + WT are probably
vulnerable, but --2a formats are probably not.
John
=:->
affects bzr
status triaged
importance critical
milestone 2.0
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkq
P2EAoJxfNcFhZU4
=1gFL
-----END PGP SIGNATURE-----
Related branches
- Ian Clatworthy (community): Approve
- John A Meinel (community): Approve
- Diff: 345 lines
The associated branch has a potential fix.
Basically, stop paying attention to 'content_ summary[ ...]' to be accurate about the committed text size.
This means that for files with content filtering, we will read them off disk, apply the filtering, and then see that nothing has changed.
I expect performance will suck a little bit, as we'll be re-reading all the files that have a content filter. But at least it won't cause bogus data to be inserted into the repository on every merge commit.