Bazaar Version Control System

bzr missing "cp" command for forking files /w history

Reported by Guenther Brunthaler on 2008-09-11
418
This bug affects 79 people
Affects Status Importance Assigned to Milestone
Bazaar
Medium
Unassigned

Bug Description

Bazaar-NG SCM is almost a full replacement for Mercurial or Subversion SCMs, but not quite.

There is one big thing left which bzr is unable to do: Copying files within the repository, creating a new fork of the file which has a new file-id, but gets a copy of the history of the original file up to the commit which forked the file.

I. e. it should be possible to do the following:

$ bzr init-repo demo
$ cd demo
$ mkdir prj1
$ touch prj1/Makefile
$ bzr add
$ bzr ci -m "Makefile of primary project"
$ bzr mkdir prj1
$ bzr cp prj1/Makefile prj2/
$ bzr ci -m "Makefile of secondary project forked from primary one"

Currently, the only way of achieving a similar effect in bzr is to manually copy the file to be "forked" independent of bzr, and then add the copied file as if it were a completely new file.

While this approach works, all the inheritance relationship and thus all the history associated with the new file before it has been cloned will be lost.

This is a severe loss of historic information!

In a highly-productive environment it is often important to be able to find out where new files have originated from, especially if the origin is another file in the same repository which is already version controlled!

I consider this missing functionality bzr's greatest (and actually only disadvantage) when being compared to svn or hg.

This should really be addressed by some forthcoming version of bzr!

BTW, the inability to track file histories across file duplications proves the manual's assertion a lie "that bzr keeps more metadata than most other SCMs" (not literally, but according to the meaning).

Strictly speaking, this is an enhancement request, not a bug report.

However, it might also be considered a "documentation bug" or "advertisement bug", because in the manual bzr is advertised as being a replacement for svn or hg which "preserves as many metadata as possible, and more than most other SCMs" during the migration.

As long as there is no "bzr cp" (or a suitable work-around for emulating such a command using other bzr commands), this is simply not true, as its closest competitors all provide this feature, and only bzr does not.

As this seems to be the *ONLY* real shortcoming of bzr compared to hg or svn, it might be be worth being addressed before adding any additional new features are introduced to bzr. (Bug fixing should still have a higher importance, of course.)

As far as I understand the data structures of bzr, it might not be easy to directly implement a historic relationship between two repository objects with different file ids.

However, the problem might be solved indirectly by simply adding another metadata item to the commit log entry which describes the parent file-ids of all forked file system objects for this check-in.

Then "bzr cp" would be very much like "bzr add", only that a metadata record is also added which describes the parent file-id of the newly copied file.

This should not require any dramatic changes to the repository data structures, but it will at least record the per-file inheritance relationships for a later time to come when other bzr commands will actually make use of this new metadata item.

At least the information will no longer be lost!

To refine my last posting, if "bzr cp" accepts a "file-ids-from" option just like "bzr add" does, this would even allow to add files from different, otherwise unrelated repositories to an existing repository!

In the current implementation of bzr, "branches" and "repositories" are just organizational boundaries between projects which determine where the revision history is physically stored.

But from a conceptional point of view, it should be possible to merge any existing repository from any user from any site into a large, shared single one, because it is revision-ids and file-ids which glues the revision graph together, and not branch location or repository location.

And revision-ids as well as file-ids are unique identifiers in space and time.

This would allow, for instance, to use the same file-id for a text file "GPL" regardless of the repository, and tools could make use of this fact to retrieve an inheritance graph of this file across a set of otherwise unrelated repositories.

Thanks to "bzr add --file-ids-from" this already works in a limited way when moving files across repositories: A file might be removed from a repository and added to a different one using the same file-id.

While the past history of the file remains physically in the first repository only, a third-party tool can easily extract and combine the histories of the file from both repositories and display it, because it is always possible to associate the related files from both repository using the identical file-id.

A command "bzr cp" would make the same possible *within* a single repository, which is quite impossible now. For instance,

$ bzr add GPLv2
$ bzr ci
$ bzr cp GPLv2 GPLv3
$ "$EDITOR" GPLv3
$ bzr ci

would reveal that that GPLv3 is a modified copy of GPLv2 as of the previous revision, but from now on both files will evolve independent (with different file-ids).

Dorin Scutarașu (dorins) wrote :

Confirmed. Forking a file (or a whole file tree) with history is important. Subversion and Perforce can do that, and personally I use it a lot.

Changed in bzr:
status: New → Confirmed
Jonny Dee (jonny.dee) wrote :

I would appreciate the solution suggested by Guenther Brunthaler a lot. In fact, this is an issue which makes me constantly think about switching to hg at work.

Adrian (adrian-dziubek) wrote :

I see no voting widget here, so just: I'm waiting for this too.

Jacob Myers (jacob-whotookspaz) wrote :

This definitley should be implemented. Waiting on this too.

Martin Pool (mbp) on 2009-07-02
Changed in bzr:
importance: Undecided → Medium
Per Johansson (per.j) wrote :

copy is not all about preserving history, even if it is the most important function. Consider I do a copy from A to B, and my parent branch changes A. Then the changes should be applied to both A and B (or only B if I've removed A, which is likely).

Mercurial has describes some more cases:
http://mercurial.selenic.com/wiki/CopyMergeCases

Chris Carlin (volkris) wrote :

I disagree, Per Johansson. What you describe sounds more like linking than copying.

In both normal filesystems and real life one doesn't expect a copy to change as the original changes, so I wouldn't expect such change in version control either.

Per Johansson (per.j) wrote :

I guess since bzr has a separate mv command cp does not have to substitute for it, like it does in mercurial.

Hi Per,

Bazaar'r "mv" command is fine.

But there is no way to duplicate files with "mv" - you can move or rename a file; but it will still be a single file.

In order to better explain when "cp" would come handy, just imagine you have a template file under version control which just contains the boilerplate text (copyleft etc.) each new source file should start with.

The contents of that template file might change over time for the project (for instance the year in the copyleft notice), and new source files will always be "cloned" from the current contents of this template file.

This cloning operation would be an ideal candidate for a "bzr cp" command - you can't use "bzr mv" for this because then there would no more template file.

Of course, one could just copy the file using the shell and then do an "bzr add" - but doing so would not record the information where the original of the new file came from.

Of course, you might be lucky and correctly guess it has come from the template file - but with a dedicated "bzr cp" command no luck would be needed to find out - "bzr log -v" would show this information.

BTW, there is an interesting new SCM named "fossil" which combines ideas from git and Bazaar (and Trac), and its data structures are well prepared for later addition of a "cp" command (although it does not yet exist). However, it is not nearly as mature as Bazaar and won't be a serious competitor for some time.

Per Johansson (per.j) wrote :

You misunderstood me. I was referring to my previous comment. I agree cp would be very useful.

Joke de Buhr (joke) wrote :

The ability of coping/splitting files is important for c/c++ programmers.

Often you start with single source file and decide to split it into a header file and a source file. With bazaar only one file can retain the history. The other half needs to be added as a completely new file. You can't use diff or anything to display the changes.

Martitza (martitzam) wrote :

One more vote from the community: We use "copy with history" heavily in the other version control systems we use (or have used). Once you have used this feature, you will not want to give it up. It promotes intelligent reuse and attribution within the project community.

And now...this is the big deal for us...you need a command like 'bzr showforks <file>' to do the following:

For a given file, file all instances (ancestors and descendants, regardless of name in the filesystem) of forks. This is important! Although it would be dangerous (as others have noted) to automatically propagate changes (in which case you should really have a link to a single file anyway) being able to transparently expose all "fork relations" for a file is essential for maintainability. This allows an intelligent decision (or code review) to determine which changes should be cherry-pick-merged back/across and *where* they should be merged.

I would not be concerned about implementation efficiency in the "showforks" command. It's used rarely. Only two places need the fork marker: the parent and the child. This is sufficient to ensure that following the line(s) of descent and the line of ancestry will find all forks, even if it takes a while.

I suppose this might be pretty disruptive, but perhaps it could be worked in parallel with other requests for metadata, for example https://bugs.launchpad.net/bzr/+bug/218128 .

It would be great if this could

Per Johansson (per.j) wrote :

In a couple of weeks I've now encountered loss of annotate info several times due to this command missing, similar to what Joke de Buhr describes above.

Jean Jordaan (jean-jordaan) wrote :

I was amazed to find that 'bzr cp' doesn't exist.

I see suggestions to merge changes to both a file and copies of it.
That sounds like something for a 'bzr link' type of command. 'bzr cp' should only record where a file came from.
Thereafter it should be independent of the file it was copied from.

Jelmer Vernooij (jelmer) on 2011-02-01
tags: added: copy
Tomasz Magulski (magul) on 2011-02-07
Changed in bzr:
assignee: nobody → Tomasz Magulski (magul)
Alex Tsepkov (atsepkov) wrote :

Any updates on this? We need this feature as well for our project, but it seems that for the last few years there hasn't been any progress on this? Our team wants to stay with bzr, but we've already lost track of history info for a couple files we had to split, and a lot more splits are coming in the future so we might have to switch to a different version control system to avoid losing track of our changes.

bing (bingbing38) wrote :

Finally find my answer here about my newbie question posted on "Question for Bazaar" #159218 "Newbie question: How does bzr track copied files ?"
So the answer of it is "not supported". I originally try to switch from git to bzr due to lack of a proper "git cp". But now this seems to be a no go. Though "hg" support that but it seems that there are other issues about hg. I think I will stick on git until this feature is supported. I'm also surprised that the documents did not specifically mentioned this, or maybe I didn't read it careful enough. Anyway I'll keep tracking of the progress of "bzr cp" once a while, since it actually is the only remaining reason that forbit me from switching to it.

@bing: True. Except for Subversion and Mercurial, "cp" seems to be a feature generally left out in most prominent SCMs.

As in the case of git, a "cp" command is not really required: Git does not record dependencies between files in different commits anyway.

So a "git cp" would have little effect, because the data structures of git do not support inter-commit name tracking.

In Bazaar, on the other hand, a "bzr cp" would really be useful.

But I'll stick with Bazaar anyway, because of its superb capability to track directories as well as files. If directory trees are often reorganized, this is a real killer feature.

Richard B. (richardb.) wrote :

I'm currently using Mercurial, but just found out that it doesn't track empty directories, so I thought about maybe switching to Bazaar. I was pleased to see that Bzr tracks directories properly, including empty directories, but was disappointed it doesn't support tracking file copies.

So, I can either use Mercurial which tracks file copies properly and has a workaround/hack for tracking empty directories (ie. put empty file in it), or use Bazaar which solves the directory tracking problem, but doesn't support file copying (and there doesn't appear to be a workaround/hack for this).

At the moment, it seems that no DVCS fully supports all required/useful features, so I'm sure that adding proper support for tracking file copies to Bzr would help to improve Bzr's popularity.

This was originally requested over 2.5 years ago, and it was interesting to see that it was finally assigned to someone, Tomasz Magulski, in Feb 2011.

How's progress going for this feature, and will we see it in a release in the near future ?

@richard, posting # 20:

True. I have used them all - RCS, CVS, svn, svk, hg, git and bzr. All have their problems, but in my opinion bzr is still the best choice for version control of "normal" source code.

Git might be better when managing patches - but only because patches are usually individual entities which no or neglectible version history of their own. (Usually, new patches are applied over existing ones instead of modifying patches. DARCS is even based on that idea.)

Regarding directories, I think bzr's feature to track individual directories cannot be overemphasized. Especially when merging. Try that in git: Start with some files in one directory, fork off a branch, then change the name of the directory in one branch. From now on, you have to tell git on *every* merge that it should merge the files in the old directory name of the original branch with those in the new directory name of the new branch. Every time! Git is just too dumb to remember it.

Regarding "cp", as far as I remember svn can do it, and therefore also its "distributed brother" svk. I have used svk for some time, but it was really no fun. Lousy documentation, lots of bugs and tons of ugly Perl code. Managing it was really challenging. Git might be hard to learn, but I felt svk's repository/branch management to be even harder.

IMHO Mercurial is the "easiest" version control system to learn, and I tend to recommend it to SCM beginners.

However, it has many shortcomings when compared to Bazaar, most notably its weak branch support and the exceptional stupid way it uses to store commits for different branches in the *same* repository (does not apply for the one-branch-one-repository approach).

On the other hand, Mercurial provides "cp" and Bazaar doesn't...

Well, let's hope some day this will change.

In the meantime, I suggest the following approach to ameliorate the situation: Whenever a "bzr cp" would be adequate (but isn't available as we know), do the following:

* Make sure there is a commit reflecting the state of the file to be forked. That is, if the file has changes, commit them *before* doing the "cp".

* Copy the file, add the copy to bzr, and commit the addition before *changing* the contents of the copy.

This approach will make sure there is a commit with the file using the original name and a later commit containing a copy of the file under a new path/name, but both files can be linked by their identical contents.

When "bzr cp" will be later added to Bazaar, it might be possible to write some sort of "rebase" tool which re-writes the version history, finds the files with the same contents in different versions, and infer a "bzr cp" from this.

Actually, this is how git detects renames nowadays. And aside from evil special cases such as renaming different files with identical contents it works surprisingly well.

Mike Gratton (mjog) wrote :

The need for tracking copies has been recognised, so if you do not have a branch or patch which implements the feature, please do not comment further — this bug is not a discussion forum, please do not treat it as such.

Tomasz Magulski (magul) on 2011-11-16
Changed in bzr:
assignee: Tomasz Magulski (magul) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers