git: file ids are not very unique
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Bazaar Git Plugin |
Triaged
|
Wishlist
|
Unassigned | ||
Breezy |
Triaged
|
Medium
|
Unassigned |
Bug Description
When trying to use subtree formats I cannot join in more than one git repository. To reproduce try the following:
$ mkdir test
$ cd test/
test$ bzr init --development-
Created a standalone tree (format: development2-
test$ bzr branch git://github.
Branched 4 revision(s).
test$ bzr join --reference schema_validations
test$ bzr branch git://github.
Branched 1 revision(s).
test$ bzr join --reference redhillonrails_core
bzr: ERROR: Cannot join redhillonrails_
These git repositories are just used as an example because they are small and therefore quick to branch from. Any repo will get the same behavior. The problem is in the mapping. The root id for all git repositories are the same which is the constant ROOT_ID. If I make the following change I can add a new repository as a subtree:
def generate_
# Git paths are just bytestrings
# We must just hope they are valid UTF-8..
assert isinstance(path, str)
if path == "":
return ROOT_ID.join('-a')
return escape_
def parse_file_id(self, file_id):
if file_id.
return ""
return unescape_
But then I am back to the same problem of not being able to do anymore. I can change the '-a' to '-b' (or anything not already used) and therefore work around the issue. But obviously this is not a solution.
I tried just appending a randomly generated string to the suffix but within the joining process it seems we need to have the same value returned every time generate_file_id is called. My next attempt was to try affixing the current time under the idea that within the joining operation the time is not likely to change but within different joining operations it will. This seems to work. My naive code for returning the root id is:
ROOT_ID.
I know nothing of Python and this was just borrowed from some site explaining how to get the current number of seconds since the unix epoch (I'm just a Ruby programmer so our stuff would just be ROOT_ID + Time.now.to_i). Anyway this obviously has two problems:
* It is possible that two repositories could be joined within the same second (via a script or something). Then we are back to our problem.
* It is also possible that joining a repo could span multiple seconds meaning the generate_file_id will not always return the same value within a joining operation causing an error.
But it seems to work well enough for my purposes until a real fix gets created. I would imagine the best thing to do would be to append a suffix based on the repo's URI (maybe hashed for fun). But the mapping object doesn't seem to have any reference to the repo it is mapping from what I can tell making that not possible unless we pass more info into the generate_file_id method.
description: | updated |
tags: | added: next-mapping-format |
summary: |
- Cannot join by reference more than one repository + file ids are not very unique |
Changed in brz-git: | |
status: | New → Triaged |
Changed in brz-git: | |
importance: | Undecided → Medium |
summary: |
- file ids are not very unique + git: file ids are not very unique |
tags: | added: git |
affects: | brz-git → brz |
the fundamental problem here is the way file ids in bzr-git are constructed at the moment; even if we would fix the tree root file id issue, then the file ids for other paths in the tree would still clash (a README file would have the same file id if it existed in both trees).
Generating proper file ids and following renames is something that we've delayed until the next mapping version, as it can be quite complex.