diff gives huge areas of unnecessary changes

Bug #315399 reported by Matthew Fuller
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Medium
Unassigned

Bug Description

diff seems to grab far too large an area for changes.

Consider this file:

--------------
 x
  A
 y
 x
  B
 y
 x
  C
 y
 x
  D
 y
--------------

Let's lowercase some letter. If we do D, it's fine. D and C both, looks fine:

--------------
@@ -5,8 +5,8 @@
   B
  y
  x
- C
+ c
  y
  x
- D
+ d
  y

--------------

But if we lowercase B as well, it goes all nutty:

--------------
@@ -2,11 +2,11 @@
   A
  y
  x
- B
- y
- x
- C
- y
- x
- D
+ b
+ y
+ x
+ c
+ y
+ x
+ d
  y
--------------

diff(1) doesn't do this:

--------------
@@ -2,11 +2,11 @@
   A
  y
  x
- B
+ b
  y
  x
- C
+ c
  y
  x
- D
+ d
  y

--------------

This makes reading the diffs very hard...

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 315399] [NEW] diff gives huge areas of unnecessary changes

On Fri, 2009-01-09 at 10:25 +0000, fullermd wrote:
> Public bug reported:
>
> diff seems to grab far too large an area for changes.
>
> Consider this file:
>
> --------------
> x
> A
> y
> x
> B
> y
> x
> C
> y
> x
> D
> y
> --------------

bzr only lines up on unique items in the sequence - it first strips out
all dups: the x and y chars go.
then A, B, C, D are used on the left side.

> Let's lowercase some letter. If we do D, it's fine. D and C both, looks fine:
>
> --------------
> @@ -5,8 +5,8 @@
> B
> y
> x
> - C
> + c
> y
> x
> - D
> + d
> y
>
> --------------

I think there is a fuzz factor used here, I'd need to check th code.

>
> But if we lowercase B as well, it goes all nutty:
>
> --------------
> @@ -2,11 +2,11 @@
> A
> y
> x
> - B
> - y
> - x
> - C
> - y
> - x
> - D
> + b
> + y
> + x
> + c
> + y
> + x
> + d
> y
> --------------

And here it has matched from B on the left to the end of the file on the
right (because only ABCD were anchors).

-Rob

--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Revision history for this message
Wesley J. Landaker (wjl) wrote :

This happens in practice VERY often. Image that x is "{" and y is "}" and you've just described how bzr often gives terrible diffs for changed C++ code, etc. This happens in many other situations as well.

One workaround is to use --using= and point to an external diff command, but a better solution would be so update bzr's diff algorithm to do whatever other diff tools (like GNU diff) are doing that make them work so much better.

For bonus points, it'd be nice if bzr's diff did whatever GNU diff's --minimal does, since that's even better for interactive use.

Revision history for this message
Matthew Fuller (fullermd) wrote :

I run into it regularly with HTML. Imagine that x is "<th>" and y is "</th>". Working with HTML seems to trigger this and similar-thought-not-necessarily-related things like bug 278346.

In the time that lead to me filing this, <th> was actually about 60 characters of stuff, so it took a long time reading char by char and then manually checking whitespace to convince myself that I HADN'T changed that line.

Changed in bzr:
importance: Undecided → Medium
status: New → Confirmed
Jelmer Vernooij (jelmer)
tags: added: diff
Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.