Comment 0 for bug 351317

Revision history for this message
Eric Anderson (eric-pixelwareinc) wrote : Cannot join by reference more than one repository

When trying to use subtree formats I cannot join in more than one git repository. To reproduce try the following:

$ mkdir test
$ cd test/
test$ bzr init --development-subtree
Created a standalone tree (format: development2-subtree)
test$ bzr branch git://github.com/harukizaemon/schema_validations.git schema_validations
Branched 4 revision(s).
test$ bzr join --reference schema_validations
test$ bzr branch git://github.com/harukizaemon/redhillonrails_core.git redhillonrails_core
Branched 1 revision(s).
test$ bzr join --reference redhillonrails_core
bzr: ERROR: Cannot join redhillonrails_core. Root id already present in tree

These git repositories are just used as an example because they are small and therefore quick to branch from. Any repo will get the same behavior. The problem is in the mapping. The root id for all git repositories are the same which is the constant ROOT_ID. If I make the following change I can add a new repository as a subtree:

    def generate_file_id(self, path):
        # Git paths are just bytestrings
        # We must just hope they are valid UTF-8..
        assert isinstance(path, str)
        if path == "":
            return ROOT_ID.join('-a')
        return escape_file_id(path)

    def parse_file_id(self, file_id):
        if file_id.startswith(ROOT_ID):
            return ""
        return unescape_file_id(file_id)

But then I am back to the same problem of not being able to do anymore. I can change the '-a' to '-b' (or anything not already used) and therefore work around the issue. But obviously this is not a solution.

I tried just appending a randomly generated string to the suffix but within the joining process it seems we need to have the same value returned every time generate_file_id is called. My next attempt was to try affixing the current time under the idea that withing the joining operation the time is not likely to change but within different joining operations it will. This seems to work. My naive code for returning the root id is:

ROOT_ID.join(str(time.mktime(datetime.datetime.now().timetuple())))

I know nothing of Python and this was just borrowed from some site explaining how to get the current number of seconds since the unix epoch (I'm just a Ruby programmer so our stuff would just be ROOT_ID + Time.now.to_i). Anyway this obviously has two problems:

* It is possible that two repositories could be joined within the same second (via a script or something). Then we are back to our problem.
* It is also possible that joining a repo could span multiple seconds meaning the generate_file_id will not always return the same value within a joining operation causing an error.

But it seems to work well enough for my purposes until I real fix get's created. I would imagine the best thing to do would be to append a suffix based on the repo's URI (maybe hashed for fun). But the mapping object doesn't seem to have any reference to the repo it is mapping from what I can tell making that not possible unless we pass more info into the generate_file_id method.