cluttered output on diff of binary file

Bug #71307 reported by Matthew Paul Thomas
8
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Low
Unassigned
Breezy
Fix Released
Medium
Jelmer Vernooij

Bug Description

bzr 0.11.0, Ubuntu 6.10

When I "bzr add" a binary file, then do "bzr diff", I get a result like:
=== added file 'lib/canonical/launchpad/icing/app-bugs.large.gif'
Binary files lib/canonical/launchpad/icing/app-bugs.large.gif 1970-01-01 00:00:00 +0000 and lib/canonical/launchpad/icing/app-bugs.large.gif 2006-11-09 01:41:18 +0000 differ

The first line makes sense; the second does not, because neither I nor the file existed in 1970. The second line should be omitted.

Tags: diff easy

Related branches

Revision history for this message
John A Meinel (jameinel) wrote :

Confirmed. The specifics are that:

1) To get the timestamp as part of the output, we actually cheat, and change the filename. So to get:
--- lib/canonical/foo 2005-01-02 ...
+++ lib/canonical/foo 2006-11-09 ...

 we actually change the name of the file from 'lib/canonical/foo' to 'lib/canonical/foo' to 'lib/canonical/foo 2005-01-02 ...'
We really should have a separate parameter that we pass, so we don't get weird things later.

2) 1970 is the date of the epoch, and it is the date that diff uses when it is trying to say 'no such file'.
At one point we didn't have datestamps. Somebody asked for them because they worked better. And they also asked us to not use /dev/null, and instead use the filename.

Specifically, you can do:
$ mkdir a; echo foo > a/foo; mkdir b
$ diff -urN a b
--- a/foo 2007-01-03 15:35:10.000000000 -0600
+++ b/foo 1969-12-31 18:00:00.000000000 -0600
@@ -1 +0,0 @@
-foo

I was surprised to see 'diff' use 1970 adjusted for the local time zone, but regardless that seems to be the "standard" output.

I personally don't really like the datestamps, but if someone finds them useful...

So I would argue that we do need to fix the code to use the real filenames, separate from the datestamps. I'm not sure what to do exactly for the rest.

Changed in bzr:
importance: Undecided → Low
status: Unconfirmed → Confirmed
description: updated
Revision history for this message
Robert Collins (lifeless) wrote :

I dont get where the with respect to the filenames is. Can you enlarge on that?

Revision history for this message
John A Meinel (jameinel) wrote :

patience_diff.unified_diff() is a direct copy of difflib, just allows a custom SequenceMatcher.

It can take the parameters: fromfile, tofile, fromfiledate, tofiledate.

However diff.internal_diff() is where we call it, and it only takes the parameters old_filename, and new_filename.

So in diff._show_diff_trees() it has already reformatted the filename to be:
old_name = '%s%s\t%s' % (old_label, path,
                         _patch_header_date(old_tree, file_id, path))
new_name = '%s%s\t%s' % (new_label, path, EPOCH_DATE)

The reason we did that, is because we didn't want to change all the layers. Because _show_diff_trees() calls inventory[file_id].diff() passing it the names, which eventually calls back into internal_diff(). So we need to upgrade quite a few apis if we want to pass the filenames + datestamps around.

Further, unified_diff uses a space character to separate the filename from the datestamp, and we seem to want to use a tab character. But since we already have a custom copy of unified_diff, we could easily fix that.

(I hope I understood your question)

Revision history for this message
Matthew Paul Thomas (mpt) wrote :

Similarly when I remove a text file, "bzr diff" tells me:

=== removed file 'foo/bar/hum'
--- a/foo/bar/hum 2007-11-01 04:23:25 +0000
+++ b/foo/bar/hum 1970-01-01 00:00:00 +0000
[contents of file with every line preceded by "-"]

Again, neither I nor the file existed in 1970. Should that be a separate bug report, or should this one be generalized?

Revision history for this message
Aaron Bentley (abentley) wrote : Re: [Bug 71307] Re: "bzr diff" produces silly result for added binary file

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Matthew Paul Thomas wrote:
> Similarly when I remove a text file, "bzr diff" tells me:
>
> === removed file 'foo/bar/hum'
> --- a/foo/bar/hum 2007-11-01 04:23:25 +0000
> +++ b/foo/bar/hum 1970-01-01 00:00:00 +0000
> [contents of file with every line preceded by "-"]
>
> Again, neither I nor the file existed in 1970. Should that be a separate
> bug report, or should this one be generalized?

You can file it, but we'll mark it invalid, because we emulate the
behavior of diff, that this is what diff does:

$ mkdir foo
$ mkdir bar
$ echo baz> foo/baz
$ TZ="utc" diff -urN foo bar
diff -urN foo/baz bar/baz
- --- foo/baz 2008-06-13 20:53:00.000000000 +0000
+++ bar/baz 1970-01-01 00:00:00.000000000 +0000
@@ -1 +0,0 @@
- -baz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIUt780F+nu1YWqI0RAh5tAJ4gHEPQPs9eq3TOVKE1BJ2DV1AQZACcD2b4
a9irQ6tVOLpbr5MS9HiDF6M=
=N+zx
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote : Re: "bzr diff" produces silly result for added binary file

GNU diff (and I think other 'diff' programs) use epoch time to indicate that the file was removed. Epoch being Jan 1, 1970 UTC time. (the time when 'timestamp = 0').

Aaron's message is a bit confusing, because GPG doubles up leading '-' characters, it should read:
$ TZ="utc" diff -urN foo bar
diff -urN foo/baz bar/baz
--- foo/baz 2008-06-13 20:53:00.000000000 +0000
+++ bar/baz 1970-01-01 00:00:00.000000000 +0000
@@ -1 +0,0 @@
-baz

Revision history for this message
Matthew Fuller (fullermd) wrote : Re: "bzr diff" produces silly result for binary files

This is an ongoing absurdity. It makes reading diff output pretty mindwarping.

=== added file 'src/mail_files/generic_bottom_left.gif'
Binary files src/mail_files/generic_bottom_left.gif 1970-01-01 00:00:00 +0000 and src/mail_files/generic_bottom_left.gif 2009-01-27 16:44:30 +0000 differ
=== added file 'src/mail_files/generic_bottom_right.gif'
Binary files src/mail_files/generic_bottom_right.gif 1970-01-01 00:00:00 +0000 and src/mail_files/generic_bottom_right.gif 2009-01-27 16:44:30 +0000 differ
=== removed file 'src/mail_files/generic_footer.gif'
Binary files src/mail_files/generic_footer.gif 2008-03-14 22:16:39 +0000 and src/mail_files/generic_footer.gif 1970-01-01 00:00:00 +0000 differ
=== removed file 'src/mail_files/generic_header.jpg'
Binary files src/mail_files/generic_header.jpg 2008-03-14 22:16:39 +0000 and src/mail_files/generic_header.jpg 1970-01-01 00:00:00 +0000 differ

Eh? Run that by me again? There are 22 binary files in this diff, and one text file that you can actually see a sensible diff for. It's kinda an eyesore...

Martin Pool (mbp)
summary: - "bzr diff" produces silly result for binary files
+ cluttered output on diff of binary file
tags: added: diff easy
Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
Jelmer Vernooij (jelmer)
tags: removed: check-for-breezy
Jelmer Vernooij (jelmer)
Changed in brz:
status: New → In Progress
assignee: nobody → Jelmer Vernooij (jelmer)
importance: Undecided → Medium
milestone: none → 3.1.0
Jelmer Vernooij (jelmer)
Changed in brz:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.