full diff on file id change

Bug #438531 reported by Ted Gould
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Low
Unassigned

Bug Description

I've got an SVN imported repository that has different results on one of the revisions. If you do this:

  $ svn diff -r17074:17075 http://inkscape.svn.sourceforge.net/svnroot/inkscape/inkscape/trunk

Then you get a one line diff of what the change is. If you use the import and do this:

  $ bzr diff -r 4503..4504 lp:~ted/inkscape/newtrunk

The difference is huge, replacing the configure.ac file with a new one. I would expect them to generate the same results.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Bazaar does not support copies from historic revisions.

r17075 | gouldtj | 2008-01-16 07:34:56 +0100 (Wed, 16 Jan 2008) | 3 lines
Changed paths:
   M /inkscape/trunk
   R /inkscape/trunk/configure.ac (from /inkscape/trunk/configure.ac:17014)

 r17634@shi: ted | 2008-01-14 21:53:49 -0800
 Changing verison to 0.45+0.46pre0

configure.ac is replaced by an older version of itself. in bzr this causes a remove + add of that file, and that causes the long diff.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

For reference the same commit in bzr:

revno: 4504
svn revno: 17075 (on /inkscape/trunk)
committer: gouldtj
timestamp: Wed 2008-01-16 06:34:56 +0000
message:
   r17634@shi: ted | 2008-01-14 21:53:49 -0800
   Changing verison to 0.45+0.46pre0
removed:
  configure.ac
added:
  configure.ac

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

The main issue here seems to be that "bzr diff" will print the full file contents when a file is removed and then reintroduced with (almost) the same contents but a different file id. I'm not sure if this is something we'd like to fix (these are two different files after all, since their file ids are different).

summary: - Imported SVN repo differs from upstream repo
+ full diff on file id change
affects: bzr-svn → bzr
Revision history for this message
Ted Gould (ted) wrote : Re: [Bug 438531] Re: Imported SVN repo differs from upstream repo

On Tue, 2009-09-29 at 10:08 +0000, Jelmer Vernooij wrote:
> removed:
> configure.ac
> added:
> configure.ac

It seems like, since bazaar has the file id tracking, to do a diff
between the to versions rather than trying to emulate the copy. As
bzr-svn isn't tracking the merge anyway, it seems like the remove/add is
kind of worst case.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

On Tue, 2009-09-29 at 14:28 +0000, Ted Gould wrote:
> On Tue, 2009-09-29 at 10:08 +0000, Jelmer Vernooij wrote:
> > removed:
> > configure.ac
> > added:
> > configure.ac
> It seems like, since bazaar has the file id tracking, to do a diff
> between the to versions rather than trying to emulate the copy. As
> bzr-svn isn't tracking the merge anyway, it seems like the remove/add is
> kind of worst case.
It sounds like you suggest changing this "remove + copy" operation to a
simple "modify" operation on the fly if the copy is from an older
version of the same file. Is that a correct interpretation?

This is unfortunately not possible without severe performance
consequences. If there is no remove+add this means that the file id does
not change, and to prove this we would have to analyse the history of a
particular file to see if it shared history with the file it was copied
from each time we encounter a file copy.

Cheers,

Jelmer

--
Jelmer Vernooij <email address hidden> - http://samba.org/~jelmer/
Jabber: <email address hidden>

Revision history for this message
Ted Gould (ted) wrote :

On Tue, 2009-09-29 at 15:14 +0000, Jelmer Vernooij wrote:
> It sounds like you suggest changing this "remove + copy" operation to a
> simple "modify" operation on the fly if the copy is from an older
> version of the same file. Is that a correct interpretation?

Yes.

> This is unfortunately not possible without severe performance
> consequences. If there is no remove+add this means that the file id does
> not change, and to prove this we would have to analyse the history of a
> particular file to see if it shared history with the file it was copied
> from each time we encounter a file copy.

I mean, you know SVN a million times better than I, but it seems to me
that you'd already have to track that information as both of the file ID
would have to be found, and you could determine that they're the same.
At that point it's only calculating a diff between them and applying the
diff.

So if you're looking at a rev like this:

r3
   delete:
         342340823498 r2 as configure.ac
   copy:
         342340823498 r1 as configure.ac

Then you see that the two IDs are the same. Do a:

   diff 342340823498 -r 2..3

And record that as the change instead of what was given. I'm unsure why
you'd have to trace the file back to all it's revisions as we only need
the difference between the previous one (being deleted) and the final
one created in the revision we're trying to represent.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

On Tue, 2009-09-29 at 15:55 +0000, Ted Gould wrote:
> On Tue, 2009-09-29 at 15:14 +0000, Jelmer Vernooij wrote:
> > This is unfortunately not possible without severe performance
> > consequences. If there is no remove+add this means that the file id does
> > not change, and to prove this we would have to analyse the history of a
> > particular file to see if it shared history with the file it was copied
> > from each time we encounter a file copy.
> I mean, you know SVN a million times better than I, but it seems to me
> that you'd already have to track that information as both of the file ID
> would have to be found, and you could determine that they're the same.
> At that point it's only calculating a diff between them and applying the
> diff.
>
> So if you're looking at a rev like this:
>
> r3
> delete:
> 342340823498 r2 as configure.ac
> copy:
> 342340823498 r1 as configure.ac

Subversion doesn't have anything like file ids (I assume that's what you
mean with the 342340823498?). The only way you can find out if two files
are related is by looking at the output of "svn log" and see if they
both were copied from the same location at one point in history.

> Then you see that the two IDs are the same. Do a:
>
> diff 342340823498 -r 2..3
>
> And record that as the change instead of what was given. I'm unsure why
> you'd have to trace the file back to all it's revisions as we only need
> the difference between the previous one (being deleted) and the final
> one created in the revision we're trying to represent.
We have to trace back to see if they were actually the same file. If
they were not the same file then we have a case where one file is being
copied over another, and we want the file id to change in that case.

Cheers,

Jelmer

Revision history for this message
Ted Gould (ted) wrote :

On Tue, 2009-09-29 at 19:03 +0000, Jelmer Vernooij wrote:
> Subversion doesn't have anything like file ids (I assume that's what you
> mean with the 342340823498?). The only way you can find out if two files
> are related is by looking at the output of "svn log" and see if they
> both were copied from the same location at one point in history.

Ah, okay. That's where I was missing the connection. I didn't realize
that SVN was so... silly. Thanks!

Could there be perhaps an option to run bzr-svn in "slow" mode where
it'd do this check? It would suck for the first branch of a repository,
but for people who are migrating to Bazaar it would seem like a very low
cost over time.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

On Tue, 2009-09-29 at 19:24 +0000, Ted Gould wrote:
> On Tue, 2009-09-29 at 19:03 +0000, Jelmer Vernooij wrote:
> > Subversion doesn't have anything like file ids (I assume that's what you
> > mean with the 342340823498?). The only way you can find out if two files
> > are related is by looking at the output of "svn log" and see if they
> > both were copied from the same location at one point in history.
>
> Ah, okay. That's where I was missing the connection. I didn't realize
> that SVN was so... silly. Thanks!
>
> Could there be perhaps an option to run bzr-svn in "slow" mode where
> it'd do this check? It would suck for the first branch of a repository,
> but for people who are migrating to Bazaar it would seem like a very low
> cost over time.
Such an option to svn-import has been proposed in the past. Since the
file ids in such a conversion would be different though, it would mean
that the resulting revision ids would have to be different as well. This
basically means that you wouldn't be able to mix such a "slow branch"
with other bzr-svn operations. I.e. it would be useful for one-time
conversions but not for other uses of bzr-svn.

It seems to me like it would make more sense for "bzr diff" to not care
about file ids in this particular situation and work based on paths
rather than on file ids if that makes more sense.

Cheers,

Jelmer

--
Jelmer Vernooij <email address hidden> - http://samba.org/~jelmer/
Jabber: <email address hidden>

Revision history for this message
Ted Gould (ted) wrote :

On Tue, 2009-09-29 at 19:53 +0000, Jelmer Vernooij wrote:
> Such an option to svn-import has been proposed in the past. Since the
> file ids in such a conversion would be different though, it would mean
> that the resulting revision ids would have to be different as well. This
> basically means that you wouldn't be able to mix such a "slow branch"
> with other bzr-svn operations. I.e. it would be useful for one-time
> conversions but not for other uses of bzr-svn.

Yes, that's why it should probably be an option. But I'm for something
scary like "--one-way" ;) It works for my use-case today, but I could
understand it not being for everyone.

> It seems to me like it would make more sense for "bzr diff" to not care
> about file ids in this particular situation and work based on paths
> rather than on file ids if that makes more sense.

I don't like this option because I think that it breaks down the strong
file concept of bazaar. File IDs and ID tracking is something I do like
in Bazaar and if diff hid that I think it would make things more complex
for understanding what's going on.

Revision history for this message
Martin Pool (mbp) wrote :

2009/9/30 Ted Gould <email address hidden>:
>> It seems to me like it would make more sense for "bzr diff" to not care
>> about file ids in this particular situation and work based on paths
>> rather than on file ids if that makes more sense.
>
> I don't like this option because I think that it breaks down the strong
> file concept of bazaar.  File IDs and ID tracking is something I do like
> in Bazaar and if diff hid that I think it would make things more complex
> for understanding what's going on.

I think we should at least have an option for this: people can get
into the situation of wanting to diff across files with mismatched
ids, in situations other than just bzr-svn.

--
Martin <http://launchpad.net/~mbp/>

Martin Pool (mbp)
Changed in bzr:
importance: Undecided → Low
status: New → Confirmed
Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.