bzr merge puts entire ChangeLog from other branch into conflict, rather than just the diff cherry-picked

Bug #151731 reported by Scott James Remnant (Canonical)
2
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Medium
Unassigned

Bug Description

Cherry picking changes from trunk into a stable branch, since the changes are bug fixes useful for a new stable release.

Each and every time, I get a conflict in ChangeLog - which isn't necessarily surprising given the different text - what is surprising is that conflict includes the entire difference between the two branches of the ChangeLog.

This makes me extremely nervous as merge is doing more than what it's told to do!

Example:

 $ bzr branch -r 665 http://bazaar.launchpad.net/~keybuk/upstart/trunk upstart
 $ cd upstart
 $ bzr merge -r 725..726 http://bazaar.launchpad.net/~keybuk/upstart/trunk

ChangeLog will have conflicted, but look at the conflict! The ChangeLog added for that merge should have only been 2 lines, not 200!

Related branches

Revision history for this message
James Westby (james-w) wrote :

The merge picks 725 for the base in this case, is that correct.

If so then the base consists of the Changelog at 725, other with it at 726,
and this at 665, which obviously is missing plenty that the base has. I guess
it is these missing lines that cause the extra conflict.

Taking 725 as the merge base seems correct, and so the conflict seems
natural, but I'm not sure, should it show the lines that the base has, but
this doesn't as deleted in the 'this' side?

Thanks,

James

Revision history for this message
John A Meinel (jameinel) wrote :

You might try using --reprocess. We are thinking to make it the default (it just happens to conflict with --show-base). It has to do with how we find the hunks that conflict, and then 'reprocess' them to remove any similar areas.

Revision history for this message
James Westby (james-w) wrote : Re: [Bug 151731] bzr merge puts entire ChangeLog from other branch into conflict, rather than just the diff cherry-picked

Hi all,

I've looked a little in to the cause of the following, and while it
seems that what bzr is doing is correct in some way it could perhaps be
a little more helpful.

The case is a cherry pick of a revision that contains the current tip in
it's history. The base revision is then taken as the revision before the
one that is being cherry-picked. The 'this' and 'other' revisions are
the tip and the revision that is being picked respectively.

The merge ends up with a large conflicting region in ChangeLog,
consisting of all the lines that were added in further development.
Consider if we start with 'this' as

   a

and we want to cherry pick

   a
   b
   c

taking

   a
   b

as the base. bzr would currently mark 'b\nc' as conflicted, where Scott
would just like 'c' to be inserted.

This happens as the 'base' text doesn't include lines in the 'this'
text, so the diff algorithm just considers them changes in 'other'. It
seems like this situation could perhaps be improved if there was a
special case put in for where 'base' is a descendant of 'this', with the
merge algorithm considering lines that are changed in 'base' compared to
'this' differently than it does currently.

Is this feasible? Is it going to open up a whole can of worms?

Thanks,

James

On (11/10/07 22:00), Scott James Remnant wrote:
> Public bug reported:
>
> Cherry picking changes from trunk into a stable branch, since the
> changes are bug fixes useful for a new stable release.
>
> Each and every time, I get a conflict in ChangeLog - which isn't
> necessarily surprising given the different text - what is surprising is
> that conflict includes the entire difference between the two branches of
> the ChangeLog.
>
> This makes me extremely nervous as merge is doing more than what it's
> told to do!
>
> Example:
>
> $ bzr branch -r 665 http://bazaar.launchpad.net/~keybuk/upstart/main upstart
> $ cd upstart
> $ bzr merge -r 725..726 http://bazaar.launchpad.net/~keybuk/upstart/main
>
> ChangeLog will have conflicted, but look at the conflict! The ChangeLog
> added for that merge should have only been 2 lines, not 200!
>
> ** Affects: bzr
> Importance: Undecided
> Status: New
>
> --
> bzr merge puts entire ChangeLog from other branch into conflict, rather than just the diff cherry-picked
> https://bugs.launchpad.net/bugs/151731
> You received this bug notification because you are a member of Bazaar
> Developers, which is the registrant for Bazaar.
>

--
  James Westby -- GPG Key ID: B577FE13 -- http://jameswestby.net/
  seccure key - (3+)k7|M*edCX/.A:n*N!>|&7U.L#9E)Tu)T0>AM - secp256r1/nistp256

Revision history for this message
James Westby (james-w) wrote : Re: [Bug 151731] Re: bzr merge puts entire ChangeLog from other branch into conflict, rather than just the diff cherry-picked

On (12/10/07 16:07), John Arbash Meinel wrote:
> You might try using --reprocess. We are thinking to make it the default
> (it just happens to conflict with --show-base). It has to do with how we
> find the hunks that conflict, and then 'reprocess' them to remove any
> similar areas.
>

This has no effect on the resulting conflicts in ChangeLog.

Thanks,

James

--
  James Westby -- GPG Key ID: B577FE13 -- http://jameswestby.net/
  seccure key - (3+)k7|M*edCX/.A:n*N!>|&7U.L#9E)Tu)T0>AM - secp256r1/nistp256

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Isn't this the very definition of a cherry-pick?

I only want that single commit to be merged; I don't want all the interim commits suddenly appearing just to get that one commit.

Why is bzr trying to give me them?

James Westby (james-w)
Changed in bzr:
importance: Undecided → Medium
status: New → Confirmed
John A Meinel (jameinel)
description: updated
Revision history for this message
John A Meinel (jameinel) wrote :

I'm trying to look into it a bit. To start with, the branch seems to have moved from 'main' to 'trunk'. I went ahead and updated the bug details.

However, when using "trunk" and those numbers, I end up seeing a conflict with:
<<<<<<< TREE
=======
2007-06-12 Scott James Remnant <email address hidden>

        * compat/sysv/shutdown.c: Use nih pidfile functions since they're
...
        int (best conversion from an enum)

>>>>>>> MERGE-SOURCE
2007-03-11 Scott James Remnant <email address hidden>

So basically the empty list conflicted with the whole set of Changelog that was introduced. I don't know if these are the same revisions that Scott was referring to in his original bug report.

It probably is, because if I do "bzr diff -r 725..726" I see:
=== modified file 'ChangeLog'
--- ChangeLog 2007-06-11 17:37:44 +0000
+++ ChangeLog 2008-02-29 19:02:08 +0000
@@ -1,3 +1,8 @@
+2007-06-12 Scott James Remnant <email address hidden>
+
+ * compat/sysv/shutdown.c: Use nih pidfile functions since they're
+ more reliable than doing it ourselves.
+
 2007-06-11 Scott James Remnant <email address hidden>

So we should have introduced about 5 lines of Changelog, but we seem to be introducing all of the change lines.

What is also strange is that the diff for 'shutdown.c' seems to be correct. Though to be honest the difference from 665=>726 is the same as 725=>726. (It seems shutdown.c was *only* modified in 726).
This looks like something really weird happening with the merge algorithm picking a different base when merging files and not cherrypicking at all. Only selecting what files to merge based on the cherrypick, but then merging the whole changes between them.

I'll try to look deeper.

Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (3.6 KiB)

So I looked closer, and it seems to be just how 3-way merge works.

Specifically, if you do:

 bzr cat -r 665 Changelog > THIS
 bzr cat -r 725 Changelog > BASE
 bzr cat -r 726 Changelog > OTHER

and then use diff3 as

 diff3 -X -m THIS BASE OTHER > MERGED

you'll get exactly the same results.

I'll explain why I think it is happening.

3-way merge basically looks at the differences from BASE => THIS and BASE => OTHER and then conflicts on some regions where the changes overlap. And otherwise brings the BASE => OTHER changes into THIS.

By using r725 as BASE, when you compare r725 => r665 (THIS) it looks like you *deleted* all of the entries (like the ones for 700, 710, etc). 725=>726 looks like it adds a new one on top.

Adding a new on on top obviously conflicts with deleting the rest.

I think it has to do with how 3-way merge decides overlapping ranges. Digging into the code, it seems to start by finding the lines that both sides agree on (find_sync_regions). This will find that in both sides all of the lines that exist in 665 match to 725. (Basically all the lines from the middle to the end match both sides).

This seems to be the key part....

The next step then compares the regions that don't "sync". What happens is that you compare the whole region to see if 0-0 or THIS matches the unsynced region in base (it can't), and then you compare if 0-120 matches the unsynced region in BASE, now 10-120 does, but 0-120 doesn't so it shows up as lines added by OTHER.

The problem is that you use "intersect()" to break up a matching regions into a subset (where both sides match the same section), and then you lose the information that one side actually matched more of base.

Another way to introduce the same problem is if one side deletes a bunch of lines, and the other side adds lines to it. For example:

BASE:
context
lines
will be deleted
more context
text

THIS:
context
lines
more context
text

OTHER:
context
lines
extra lines
will be deleted
more context
text

The conflict for this will look like:
context
lines
<<<<
====
extra lines
will be deleted
>>>>
more context
text

(this is true for both diff3 and for merge3, and might explain some of our more confusing 'matches an empty set' conflicts)

I think the argument is that the delete in THIS is conflicting with the add in OTHER. But the lines don't exist in THIS so we can't put them in that region of the conflict markers. Another way of looking at it would be:

context
lines
<<<< THIS
==== OTHER
extra lines
will be deleted
||||| BASE
will be deleted
>>>>
more context
text

So while this may not be what we want, it does seem like the correct thing for 3-way merge to be doing.
If one side deletes lines that the other is using, it *should* conflict. 3-way merge markers have no way to indicate that this range was deleted by THIS and it conflicts with lines added by OTHER. So instead it just marks all of the deleted lines as added by OTHER, and marks it as conflicted with the empty range in THIS.

meld (or any gui 3-way diff tool) could show what happened fairly well. It would show that the lines were deleted BASE => THIS and it conflicts with the lines added in BASE => OTHER.

As for cherrypickin...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote :

So a short summary... It seems to be just how 3-way merge works
I'm tempted to mark this as Won't Fix, but arguably we could stop using 3-way merge for cherrypicking.

Revision history for this message
James Westby (james-w) wrote :

On Fri, 2008-02-29 at 21:00 +0000, John A Meinel wrote:
> So a short summary... It seems to be just how 3-way merge works
> I'm tempted to mark this as Won't Fix, but arguably we could stop using 3-way merge for cherrypicking.
>

This was pretty much my conclusion as well. However I thought
the result was something that we didn't want, and so
some special handling for cherrypicks could give something
better. Obviously Won't Fix is a valid thing to do, but I think
as we have the information we could use it to give the
user much better conflicts.

As cherrypick isn't really a first class operation yet
this probably isn't high priority. I don't know if
cherrypicks are on the sprint agenda, and I know from
your summary mail on the subject they have a lot of
problems. If cherrypicks become a first class operation
then I guess this should be fixed as well. Until that point
then the conflicts should not be too bad to sort out, though
they may be nasty on a file that is heavily edited in the
region concerned.

Thanks,

James

John A Meinel (jameinel)
Changed in bzr:
status: Confirmed → Fix Committed
Revision history for this message
John A Meinel (jameinel) wrote :

My change to how cherrypicking merge3 works has been merged into bzr.dev, and should be in 1.3

Changed in bzr:
milestone: none → 1.3
status: Fix Committed → Fix Released
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Thanks John,

I will no doubt be trying that out pretty soon ;)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.