log --include-merges apparently prints unrelated commits

Bug #842695 reported by Martin von Gagern
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Undecided
Unassigned
Breezy
Triaged
Medium
Unassigned

Bug Description

Looking at the log for the bash_completion plugin in current bzr.dev, I get a strange result:
$ ./bzr log --log-format=line --include-merges bzrlib/plugins/bash_completion/
  5268.7.2: Jelmer Vernooij 2011-03-17 [merge] merge bzr.dev.
    5050.73.11: Vincent Ladeuil 2011-07-06 [merge] Merging in trunk
  5425.4.4: Martin Pool 2011-03-28 [merge] merge trunk
  5609.39.9: Vincent Ladeuil 2011-05-26 [merge] Retarget to trunk and merge ...
  5340.12.24: Martin 2011-05-18 [merge] Merge bzr.dev to renew work
  5074.5.3: INADA Naoki 2011-05-06 [merge] merge lp:bzr

In particular I notice that the lines printed without the --include-merges are not included any more:
$ ./bzr log --log-format=line bzrlib/plugins/bash_completion/
5638: Canonical.com Pat... 2011-01-28 [merge] (vila) Mark test failures on w...
5277: Canonical.com Pat... 2010-06-02 [merge] (lifeless) Tab-complete tags w...
5255: Canonical.com Pat... 2010-05-25 [merge] (lifeless) Generalise probing ...
5251: Canonical.com Pat... 2010-05-24 [merge] (lifeless) Adjusted README.txt...
5249: Canonical.com Pat... 2010-05-21 [merge] (lifeless) Fix deprecation war...
5240: Canonical.com Pat... 2010-05-19 [merge] (lifeless) Replace the unmaint...

Obviously there is somethig broken here in a way which the test suite didn't catch. So I guess that before writing a fix, someone should identify the cause of this and create a smaller test case.

Tags: log has-test

Related branches

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 842695] [NEW] log --include-merges apparently prints unrelated commits

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 09/06/2011 03:33 PM, Martin von Gagern wrote:
> Public bug reported:
>
> Looking at the log for the bash_completion plugin in current bzr.dev, I get a strange result:
> $ ./bzr log --log-format=line --include-merges bzrlib/plugins/bash_completion/
> 5268.7.2: Jelmer Vernooij 2011-03-17 [merge] merge bzr.dev.
> 5050.73.11: Vincent Ladeuil 2011-07-06 [merge] Merging in trunk
> 5425.4.4: Martin Pool 2011-03-28 [merge] merge trunk
> 5609.39.9: Vincent Ladeuil 2011-05-26 [merge] Retarget to trunk and merge ...
> 5340.12.24: Martin 2011-05-18 [merge] Merge bzr.dev to renew work
> 5074.5.3: INADA Naoki 2011-05-06 [merge] merge lp:bzr
>
> In particular I notice that the lines printed without the --include-merges are not included any more:
> $ ./bzr log --log-format=line bzrlib/plugins/bash_completion/
> 5638: Canonical.com Pat... 2011-01-28 [merge] (vila) Mark test failures on w...
> 5277: Canonical.com Pat... 2010-06-02 [merge] (lifeless) Tab-complete tags w...
> 5255: Canonical.com Pat... 2010-05-25 [merge] (lifeless) Generalise probing ...
> 5251: Canonical.com Pat... 2010-05-24 [merge] (lifeless) Adjusted README.txt...
> 5249: Canonical.com Pat... 2010-05-21 [merge] (lifeless) Fix deprecation war...
> 5240: Canonical.com Pat... 2010-05-19 [merge] (lifeless) Replace the unmaint...
>
> Obviously there is somethig broken here in a way which the test suite
> didn't catch. So I guess that before writing a fix, someone should
> identify the cause of this and create a smaller test case.
>
> ** Affects: bzr
> Importance: Undecided
> Status: New
>

I would guess this is intentional, but I'd like to find out more what
you expect.

namely, "--include-merges" is telling you the revision that was the
original change, while '--no-include-merges' (default) is telling you
what mainline revision merged a change.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5mI7MACgkQJdeBCYSNAAOPQACcCH94wdP4j/L4MOSJMc/MoXS4
VrsAn1CAYnpKrTtX8F/+yLpx1t2dWNCz
=l1z6
-----END PGP SIGNATURE-----

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

On 09/06/2011 03:44 PM, John A Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 09/06/2011 03:33 PM, Martin von Gagern wrote:
>> Public bug reported:
>>
>> Looking at the log for the bash_completion plugin in current bzr.dev, I get a strange result:
>> $ ./bzr log --log-format=line --include-merges bzrlib/plugins/bash_completion/
>> 5268.7.2: Jelmer Vernooij 2011-03-17 [merge] merge bzr.dev.
>> 5050.73.11: Vincent Ladeuil 2011-07-06 [merge] Merging in trunk
>> 5425.4.4: Martin Pool 2011-03-28 [merge] merge trunk
>> 5609.39.9: Vincent Ladeuil 2011-05-26 [merge] Retarget to trunk and merge ...
>> 5340.12.24: Martin 2011-05-18 [merge] Merge bzr.dev to renew work
>> 5074.5.3: INADA Naoki 2011-05-06 [merge] merge lp:bzr
>>
>> In particular I notice that the lines printed without the --include-merges are not included any more:
>> $ ./bzr log --log-format=line bzrlib/plugins/bash_completion/
>> 5638: Canonical.com Pat... 2011-01-28 [merge] (vila) Mark test failures on w...
>> 5277: Canonical.com Pat... 2010-06-02 [merge] (lifeless) Tab-complete tags w...
>> 5255: Canonical.com Pat... 2010-05-25 [merge] (lifeless) Generalise probing ...
>> 5251: Canonical.com Pat... 2010-05-24 [merge] (lifeless) Adjusted README.txt...
>> 5249: Canonical.com Pat... 2010-05-21 [merge] (lifeless) Fix deprecation war...
>> 5240: Canonical.com Pat... 2010-05-19 [merge] (lifeless) Replace the unmaint...
>>
>> Obviously there is somethig broken here in a way which the test suite
>> didn't catch. So I guess that before writing a fix, someone should
>> identify the cause of this and create a smaller test case.
>>
>> ** Affects: bzr
>> Importance: Undecided
>> Status: New
>>
> I would guess this is intentional, but I'd like to find out more what
> you expect.
>
> namely, "--include-merges" is telling you the revision that was the
> original change, while '--no-include-merges' (default) is telling you
> what mainline revision merged a change.
>
It's a bit odd that "--include-merges" doesn't just include the mergeD
commits but also no longer lists the merge commits.

Cheers,

Jelmer

Revision history for this message
Martin von Gagern (gagern) wrote : [Bug 842695] log --include-merges apparently prints unrelated commits

On 06.09.2011 15:44, John Arbash Meinel wrote:
> I would guess this is intentional, but I'd like to find out more what
> you expect.
>
> namely, "--include-merges" is telling you the revision that was the
> original change, while '--no-include-merges' (default) is telling you
> what mainline revision merged a change.

I'd expect both mainline and non-mainline revisions to show. --include-*
to me implies adding information, but not removing some.

But more to the point, I'd have expected the output to mention those
commits that actually affected the named directory, instead of what
appears to me to be completely unrelated commits. So I'd have expected
the output to start like this:

5638: Canonical.com P... 2011-01-28 [merge] (vila) Mark test failure...
  5636.1.2: Vincent Ladeuil 2011-01-28 Raise KnownFailure on windows...
  5636.1.1: Vincent Ladeuil 2011-01-28 Get rid of the useless __init__.
5277: Canonical.com P... 2010-06-02 [merge] (lifeless) Tab-complete ...
  5255.4.4: Martin von Gagern 2010-05-29 Properly import bzrlib.test...
  5255.4.3: Martin von Gagern 2010-05-29 [merge] Fix test_revspec_ta...
    0.36.1: Martin von Gagern 2010-05-29 Fix test_revspec_tag_endran...
  5255.4.2: Martin von Gagern 2010-05-26 Adjust selftest for renamed...
...

Revision history for this message
Martin von Gagern (gagern) wrote :
Download full text (3.5 KiB)

OK, finally got this thing analyzed and wrote a test case for it. It's in lp:~gagern/bzr/bug842695-log-dir and can be used as a basis for fixing this bug. I wrote a rather verbose commit message, which I'll simply paste here:

When restricting a log to a given directory, _generate_deltas will be used to find out matching revisions. It does so using repository.get_deltas_for_revisions, which describes the difference that the given revision introduced with respect to its left hand parent. So files introduced by the right hand parent of a commit will be considered "added" by the delta. This can lead to false positives, to commits reportedly touching a given dir although they only merged stuff introducing these files.

Note that for in some cases, this is expected behaviour: every merge on the route from a modification to its first merge into mainline should be considered touched by that modification. But for stuff already included in main line, those modifications should not be reported again if they are merged into some side line. In other words, every change should have one direct child reporting it, but no more.

_generate_deltas apparently processes revisions in batches of 200. After each batch, the found additions are removed from the fileid_set and won't be tracked in the next batch. Processing terminates if there are no more files to track. Due to this logic, a false positive in one batch can lead to false negatives later on, as the file gets removed too early, and its actual
addition is therefore lost.

So much is in the commit message. The bug as described above occurred because some branches forked off lp:bzr before the bash_completion_plugin dir was added, and were merged back into lp:bzr after that. At some point, they themselves merged from lp:bzr, and thus had a lot of additions compared to their own mainline. Those additions caused the log to report these merges, and also to miss later merges that I expected.

I guess that before addressing this, we should make certain that we agree on what we expect. To do so, I'd like to define two terms for the scope of this discussion here. I'd say a commit has an ORIGINAL modification to a file if none of its parent has the same file with the same content. In contrast to that, a modification is INHERITED if the left parent doesn't have the same file content, but some other parent does. Obviously, only merges can inherit modifications. The same terms can be applied to directories by comparing their whole recursive content.

Two important observations: if a file is edited in two branches, then merging them will cause a 3-way merge, resulting in a file different from the version in either branch. So the file content merge produces an ORIGINAL modification. On the other hand, if a branch never modifies a given file itself, then repeated merges from a single trunk will never cause an ORIGINAL modification for that file.

So what behaviour would I expect for the -n0 log restricted to a given file or directory? First of all, I'd like to see every ORIGINAL modification to that file. Secondly, I guess that for every ORIGINAL modification, I'd like to see a SINGLE line of (ORIGINAL or INHERITED) modifi...

Read more...

Revision history for this message
Martin von Gagern (gagern) wrote :

If you disagree with my expectation outlined above, or in other words, if you believe that indeed every commit should be compared to its left hand child only, then you may change the test case in the attached branch lp:~gagern/bzr/bug842695-log-dir to also include 1.2.1 in the list of expected revisions.

I believe everybody will agree that revisions 2 and 1.1.1 should be reported in the output, though. As current bzr.dev ONLY reports 1.2.1, and none of the other revisions, current implementation is broken no matter which interpretation you prefer for revisions like 1.2.1.

I would like someone officially acknowledging this is a bug, even if he won't offer writing a fix for it. Just now I've got the impression that previous readers of this report here have been missing the point, and my clarifications were simply too verbose to be read at all. Getting the status updated to "Confirmed" would make me feel a lot better.

Revision history for this message
Vincent Ladeuil (vila) wrote :

Sorry for the delay :-/

> Is this expectation understandable? Do you agree with it? Do you think it can be made to work?

Yes, yes and yes, but I expect the amount of work to be significant.

Also, the log for changes in a dir is not well defined and there have been
discussions in the past about whether a file that was included in a dir at
one point but then moved elsewhere should still be considered part of the
dir for example.

But here, the '200' is clearly an implementation detail leading to a bug in
what is displayed to the user and as such, should be fixed.

Regarding the points you raised about ORIGINAL/INHERITED, I think this needs to be discussed on the list rather than on the bug (even if it's nice to have your summary here, the discussion deserves a better exposure).

Changed in bzr:
status: New → Confirmed
Revision history for this message
Martin von Gagern (gagern) wrote :

> Regarding the points you raised about ORIGINAL/INHERITED, I think
> this needs to be discussed on the list rather than on the bug

Discussion at http://thread.gmane.org/gmane.comp.version-control.bazaar-ng.general/73033

John points out that this means that for different branches "bzr log -n0 dir" would print different revisions. Personally that's just what I'd expect. Robert mentions lp:difftacular by Aaron which does something along these lines. Besides these, there was general agreement to both my expectations and my terminology, and not much debate.

Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
Jelmer Vernooij (jelmer)
Changed in brz:
status: New → Confirmed
status: Confirmed → Triaged
importance: Undecided → Medium
tags: added: has-test log
removed: check-for-breezy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.