Activity log for bug #351317

Date Who What changed Old value New value Message
2009-03-30 01:20:44 Eric Anderson bug added bug
2009-03-30 01:23:45 Eric Anderson description When trying to use subtree formats I cannot join in more than one git repository. To reproduce try the following: $ mkdir test $ cd test/ test$ bzr init --development-subtree Created a standalone tree (format: development2-subtree) test$ bzr branch git://github.com/harukizaemon/schema_validations.git schema_validations Branched 4 revision(s). test$ bzr join --reference schema_validations test$ bzr branch git://github.com/harukizaemon/redhillonrails_core.git redhillonrails_core Branched 1 revision(s). test$ bzr join --reference redhillonrails_core bzr: ERROR: Cannot join redhillonrails_core. Root id already present in tree These git repositories are just used as an example because they are small and therefore quick to branch from. Any repo will get the same behavior. The problem is in the mapping. The root id for all git repositories are the same which is the constant ROOT_ID. If I make the following change I can add a new repository as a subtree: def generate_file_id(self, path): # Git paths are just bytestrings # We must just hope they are valid UTF-8.. assert isinstance(path, str) if path == "": return ROOT_ID.join('-a') return escape_file_id(path) def parse_file_id(self, file_id): if file_id.startswith(ROOT_ID): return "" return unescape_file_id(file_id) But then I am back to the same problem of not being able to do anymore. I can change the '-a' to '-b' (or anything not already used) and therefore work around the issue. But obviously this is not a solution. I tried just appending a randomly generated string to the suffix but within the joining process it seems we need to have the same value returned every time generate_file_id is called. My next attempt was to try affixing the current time under the idea that withing the joining operation the time is not likely to change but within different joining operations it will. This seems to work. My naive code for returning the root id is: ROOT_ID.join(str(time.mktime(datetime.datetime.now().timetuple()))) I know nothing of Python and this was just borrowed from some site explaining how to get the current number of seconds since the unix epoch (I'm just a Ruby programmer so our stuff would just be ROOT_ID + Time.now.to_i). Anyway this obviously has two problems: * It is possible that two repositories could be joined within the same second (via a script or something). Then we are back to our problem. * It is also possible that joining a repo could span multiple seconds meaning the generate_file_id will not always return the same value within a joining operation causing an error. But it seems to work well enough for my purposes until I real fix get's created. I would imagine the best thing to do would be to append a suffix based on the repo's URI (maybe hashed for fun). But the mapping object doesn't seem to have any reference to the repo it is mapping from what I can tell making that not possible unless we pass more info into the generate_file_id method. When trying to use subtree formats I cannot join in more than one git repository. To reproduce try the following: $ mkdir test $ cd test/ test$ bzr init --development-subtree Created a standalone tree (format: development2-subtree) test$ bzr branch git://github.com/harukizaemon/schema_validations.git schema_validations Branched 4 revision(s). test$ bzr join --reference schema_validations test$ bzr branch git://github.com/harukizaemon/redhillonrails_core.git redhillonrails_core Branched 1 revision(s). test$ bzr join --reference redhillonrails_core bzr: ERROR: Cannot join redhillonrails_core. Root id already present in tree These git repositories are just used as an example because they are small and therefore quick to branch from. Any repo will get the same behavior. The problem is in the mapping. The root id for all git repositories are the same which is the constant ROOT_ID. If I make the following change I can add a new repository as a subtree: def generate_file_id(self, path): # Git paths are just bytestrings # We must just hope they are valid UTF-8.. assert isinstance(path, str) if path == "": return ROOT_ID.join('-a') return escape_file_id(path) def parse_file_id(self, file_id): if file_id.startswith(ROOT_ID): return "" return unescape_file_id(file_id) But then I am back to the same problem of not being able to do anymore. I can change the '-a' to '-b' (or anything not already used) and therefore work around the issue. But obviously this is not a solution. I tried just appending a randomly generated string to the suffix but within the joining process it seems we need to have the same value returned every time generate_file_id is called. My next attempt was to try affixing the current time under the idea that within the joining operation the time is not likely to change but within different joining operations it will. This seems to work. My naive code for returning the root id is: ROOT_ID.join(str(time.mktime(datetime.datetime.now().timetuple()))) I know nothing of Python and this was just borrowed from some site explaining how to get the current number of seconds since the unix epoch (I'm just a Ruby programmer so our stuff would just be ROOT_ID + Time.now.to_i). Anyway this obviously has two problems: * It is possible that two repositories could be joined within the same second (via a script or something). Then we are back to our problem. * It is also possible that joining a repo could span multiple seconds meaning the generate_file_id will not always return the same value within a joining operation causing an error. But it seems to work well enough for my purposes until a real fix gets created. I would imagine the best thing to do would be to append a suffix based on the repo's URI (maybe hashed for fun). But the mapping object doesn't seem to have any reference to the repo it is mapping from what I can tell making that not possible unless we pass more info into the generate_file_id method.
2009-03-30 01:49:25 Jelmer Vernooij bzr-git: importance Undecided Wishlist
2009-03-30 01:49:25 Jelmer Vernooij bzr-git: status New Triaged
2009-05-16 15:31:59 Jelmer Vernooij tags next-mapping-format
2010-12-25 23:23:47 Jelmer Vernooij summary Cannot join by reference more than one repository file ids are not very unique
2013-08-25 15:33:28 Sergei Golubchik bug added subscriber Sergei
2013-08-25 15:38:57 Sergei Golubchik attachment added bzr-git.file-id.patch https://bugs.launchpad.net/bzr-git/+bug/351317/+attachment/3787167/+files/bzr-git.file-id.patch
2013-08-26 15:06:34 Sergei Golubchik attachment removed bzr-git.file-id.patch https://bugs.launchpad.net/bzr-git/+bug/351317/+attachment/3787167/+files/bzr-git.file-id.patch
2013-08-26 15:08:08 Sergei Golubchik attachment added bzr-diff-fileid.patch https://bugs.launchpad.net/bzr-git/+bug/351317/+attachment/3788340/+files/bzr-diff-fileid.patch
2018-03-06 01:20:15 Jelmer Vernooij bug task added brz-git
2018-03-06 01:23:22 Jelmer Vernooij brz-git: status New Triaged
2018-05-06 11:50:40 Jelmer Vernooij brz-git: importance Undecided Medium
2018-05-10 01:20:14 Jelmer Vernooij summary file ids are not very unique git: file ids are not very unique
2018-05-10 01:21:11 Jelmer Vernooij tags next-mapping-format git next-mapping-format
2018-05-10 01:22:20 Jelmer Vernooij affects brz-git brz