CVSps ignores revision order

Bug #523670 reported by Martin von Gagern
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
CVS to Bazaar importer
Triaged
Medium
Unassigned

Bug Description

CVSps 2.1 is broken. In particular, it does take timestamps, authors and log messages into account when merging individual file modifications to patchsets, but it does ignore file-level ancestry relations and the commitid, and the fuzzy timestamp matching seems somewhat fishy to me as well.

Example:
export CVSROOT="$PWD/cvsr"
mkdir $CVSROOT
cvs init
mkdir mod
cd mod
cvs import -m Init mod testing start
cd ..
rmdir mod
cvs co mod
cd mod
echo 1 > foo
cvs add foo
cvs ci -m foo
echo 2 > foo
echo 1 > bar
cvs add bar
cvs ci -m bar
echo 2 > bar
cvs ci -m foo
cvsps -x --root $CVSROOT mod

cvs rlog: Logging mod
---------------------
PatchSet 1
Date: 2010/02/18 09:31:23
Author: mvg
Branch: HEAD
Tag: (none)
Log:
foo

Members:
 bar:1.1->1.2
 foo:INITIAL->1.1

---------------------
PatchSet 2
Date: 2010/02/18 09:31:40
Author: mvg
Branch: HEAD
Tag: (none)
Log:
bar

Members:
 bar:INITIAL->1.1
 foo:1.1->1.2

This is certainly no sane way to merge commits, as each depends on the other. This has to be broken up into three commits. I encountered a real life example pretty much like this, where a file was first modified then deleted, but cvsps got the order wrong, so cvsps-import complained about the deletion of a non-existing file. Makes me wonder what else it might have gotten wrong without me noticing.

Yes, this is a bug in cvsps, but I have a hope that bzr-cvsps-import is more actively maintaines than cvsps, so I'd say it should deal with the situation in one of two possible ways:
A) Check all dependencies are met, i.e. that the left hand revision number in the cvsps members list matches the right hand revision number of the last patchset modifying that file in the bzr ancestry. If not, complain loudly and ask people to get a less buggy cvsps, or to edit its output manually.
B) Drop cvsps altogether, and reimplement its functionality in Python. Shouldn't be too hard, and would increase portability as well.

I had some thoughts about how I would merge commits in an abstract sort of way:
1. Have a DAG per file modelling revision ancestry. The next pointers in the rcs format provide this.
2. Have a set s of all revisions without unprocessed parents.
3. While s isn't empty, choose the item i of s with minimum timestamp.
4. Also take all items from s that have the same author, message and commitid as i.
5. If i has no commitid, instead compare timestamps uf subsequent commits against some fuzz span.
6. All the revisions selected this way are bundled into a single changeset, and their children get added to s.
7. The timestamp of the whole commit can be the minimum, maximum, middle, mean or median of the individual timestamps.

Benefits:
a) This would ensure that ancestry dependency between revisions would ALWAYS be maintained.
b) Commitids provided by recent cvs are honoured, avoiding problems with duplicate messages and fuzzy timestamps.
c) Comparing subsequent timestamps, instead of all against the first, should allow for proper merging of real long commits.
d) Unrelated changes are still ordered by (minimum) timestamp.

Looking at the source code of cvsps, I have the feeling that implementing my algorithm there, or adjusting cvsps to match my ideas, would be major work, comparable to a complete reimplementation in Python. Therefore I'd like to know where you'd rather address this, so I can start coding (when and if I do find the time for it) in the appropriate language. If you'd rather write this yourself, I'd be more than happy...

Revision history for this message
Martin von Gagern (gagern) wrote :

Oh, and to prove that bzr-cvsps-import doesn't already handle CVSps buggyness already:
bzr log --forward -p
------------------------------------------------------------
revno: 1
committer: mvg
branch nick: HEAD
timestamp: Thu 2010-02-18 07:31:23 +0000
message:
  foo
diff:
=== added file 'bar'
--- bar 1970-01-01 00:00:00 +0000
+++ bar 2010-02-18 07:31:23 +0000
@@ -0,0 +1,1 @@
+2

=== added file 'foo'
--- foo 1970-01-01 00:00:00 +0000
+++ foo 2010-02-18 07:31:23 +0000
@@ -0,0 +1,1 @@
+1
------------------------------------------------------------
revno: 2
committer: mvg
branch nick: HEAD
timestamp: Thu 2010-02-18 07:31:40 +0000
message:
  bar
diff:
=== modified file 'bar'
--- bar 2010-02-18 07:31:23 +0000
+++ bar 2010-02-18 07:31:40 +0000
@@ -1,1 +1,1 @@
-2
+1

=== modified file 'foo'
--- foo 2010-02-18 07:31:23 +0000
+++ foo 2010-02-18 07:31:40 +0000
@@ -1,1 +1,1 @@
-1
+2

So as you can see, the final state of the bzr import has file bar contain the string 1, while the cvs originally ended up with the string 2. So anyone importing from CVS and relying on that import for future development is likely to loose data, because I'm not sure it's feasible to fix this kind of corruption once it's committed to bzr.

Is launchpad itself using cvsps-import? If so, this bug should affect launchpad itself as well. If not, what else do they use?

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 523670] Re: CVSps ignores revision order

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin von Gagern wrote:
> Oh, and to prove that bzr-cvsps-import doesn't already handle CVSps buggyness already:
> bzr log --forward -p

...

>
> So as you can see, the final state of the bzr import has file bar contain the string 1, while the cvs originally ended up with the string 2. So anyone importing from CVS and relying on that import for future development is likely to loose data, because I'm not sure it's feasible to fix this kind of corruption once it's committed to bzr.
>
> Is launchpad itself using cvsps-import? If so, this bug should affect
> launchpad itself as well. If not, what else do they use?
>

Launchpad does not use cvsps-import. There are currently 2 options

1) Launchpad uses 'cscvs', which is a rather involved setup, but
probably one of the highest fidelity conversions. It is designed to also
handle stuff like people manually hacking their CVS repository. eg
Copying a ,v file suddenly shows a new file in your whole ancestry,
which wasn't there during the previous conversion, cscvs is the only
converter I know that handles that.

2) The cvs2svn converter actually has a suite of tools to convert to
various targets. One (cvs2bzr) can output a 'fast-import' stream, that
can then be converted using 'bzr fast-import' (with bzr-fastimport
installed).

I'm told that cvs2bzr can be used incrementally, but AIUI it will have
to see the whole history to generate a new 'fast-import' file. So it
cannot extract incrementally, but 'bzr fast-import' should be able to
incrementally update an existing conversion.

I haven't been very actively maintaining bzr-cvsps-import, mostly
because the other tools are better, and as you say, cvsps isn't very
good at convertying CVS changes into logical changesets.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkt9VTQACgkQJdeBCYSNAANu+gCdF8hqKbpGk7gnUQq+TlKLWwpH
IEoAoIcFO3NXCrAq+oB2wSPUBE1CFHz2
=yjWf
-----END PGP SIGNATURE-----

Jelmer Vernooij (jelmer)
Changed in bzr-cvsps-import:
status: New → Triaged
importance: Undecided → Low
importance: Low → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.