incorrect repository detected with symlink to a branch

Bug #124859 reported by Robert Collins
6
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
High
Unassigned

Bug Description

Just ran into this. The following situation will break things nastily...

bzr init-repo --trees .
bzr init tree
bzr init-repo --trees subproject
bzr init subproject/subtree
cd tree
ln -s ../subproject/subtree link
cd ../subproject/subtree
bzr commit --unchanged -m 'foo'
cd -
bzr log link
bzr: ERROR: bzrlib.errors.NoSuchRevision: Branch
KnitRepository('file:///tmp/t/.bzr/') has no revision

(this is trivially demonstrated using just bzrlib, and I found this in
bzr-avahi's share command). A test for any fix for it should be a core
test not a blackbox test).

The cause is that the branch object for 'link' has not followed the
symlink - its root and its bzrdir are './link' rather than
'../subproject/subtree' - so when the repository is searched for (by
walking up the tree) the shared repository at '../subproject' is not
found, rather the shared repository at '..' is - which (of course) does
not include the revision or inventory or text data.

I think this is a critical bug, as it can trivially lead to data being
recorded in the wrong repository, and hard to diagnose glitches for
other users - it will let people think things are corrupt (when they
aren't). I also think folk have reported this on IRC but we haven't
diagnosed it correctly at the time.

 affects /products/bzr
 status triaged
 importance critical

--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Revision history for this message
Robert Collins (lifeless) wrote :

This appears to have been introduced with or around urlutils: the comment for local_abspath in local.py says it calls realpath, but urlutils.py does not call realpath, instead it just does url manipulation now.

Revision history for this message
Aaron Bentley (abentley) wrote : Re: [Bug 124859] incorrect repository detected with symlink to a branch

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> The cause is that the branch object for 'link' has not followed the
> symlink

I disagree that following the symlink is The Right Thing.

The symlink may well be a versioned file, so the containing directory is
the correct place to search. This allows "bzr log link" to give the
version history of the symlink, while bzr log "subproject/subtree" will
give the version history of the subtree.

Following the symlink would mean that there would be no way to get the
version history of the symlink, and would have even worse effects with
"commit", "merge", "revert" et al.

In sum, symlinks are first-class entities in Bazaar, so it does not make
sense to follow them.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGkjpJ0F+nu1YWqI0RAlkjAJ4uL8uNBIUqJlaZPxnjsUToWbup3wCfYv62
mbNwtLIDe8StDNRVx/vjvy0=
=41Yi
-----END PGP SIGNATURE-----

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 124859] incorrect repository detected with symlink to a branch

On Mon, 2007-07-09 at 13:38 +0000, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Robert Collins wrote:
> > The cause is that the branch object for 'link' has not followed the
> > symlink
>
> I disagree that following the symlink is The Right Thing.

In some places I agree with you, in others I don't.

> The symlink may well be a versioned file, so the containing directory is
> the correct place to search. This allows "bzr log link" to give the
> version history of the symlink, while bzr log "subproject/subtree" will
> give the version history of the subtree.

I think getting the log for the history of 'link' in the tree link is in
should be possible. Right now its not possible at all.

I don't have a strong opinion as to what should be the default, but I
strongly believe that we should support *both* following the symlink for
the Branch object, and logging the history of 'link' itself too.

I'm not sure how we should expose this.

> Following the symlink would mean that there would be no way to get the
> version history of the symlink, and would have even worse effects with
> "commit", "merge", "revert" et al.

This is overgeneralising what I was reporting. At the moment the
situation I created is already totally broken. I was not suggesting
changing how we track versioning of symlink - I am talking about the
behaviour of 'BzrDir.open(PATH)' exclusively.

> In sum, symlinks are first-class entities in Bazaar, so it does not make
> sense to follow them.

They are first class entities, but that does not preclude following them
at appropriate times.

-Rob

--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Revision history for this message
Aaron Bentley (abentley) wrote : Re: [Bug 124859] incorrect repository detected with symlink to a branch

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
>> The symlink may well be a versioned file, so the containing directory is
>> the correct place to search. This allows "bzr log link" to give the
>> version history of the symlink, while bzr log "subproject/subtree" will
>> give the version history of the subtree.
>
> I think getting the log for the history of 'link' in the tree link is in
> should be possible. Right now its not possible at all.

It is possible, but not using the symlink.

>> Following the symlink would mean that there would be no way to get the
>> version history of the symlink, and would have even worse effects with
>> "commit", "merge", "revert" et al.
>
> This is overgeneralising what I was reporting. At the moment the
> situation I created is already totally broken. I was not suggesting
> changing how we track versioning of symlink - I am talking about the
> behaviour of 'BzrDir.open(PATH)' exclusively.

Well, I am talking about what BzrDir.open_containing should do, and
that's implemented on top of BzrDir.open.

Given:
tree_a/link
tree_b/file

where tree_a and tree_b are both Bazaar working trees, and link is a
pointer to file, what should "bzr commit tree_a/link" do?

If BzrDir.open follows symlinks, then this will commit tree_b/file. I
believe this is incorrect behavior, because tree_a/link may legitimately
refer to the symlink, not the file, and this is the only way of
committing the symlink. Yet by definition, a symlink's target always
has its own name. So if you want to commit "tree_b/file", you can do
"bzr commit tree_b/file"

>> In sum, symlinks are first-class entities in Bazaar, so it does not make
>> sense to follow them.
>
> They are first class entities, but that does not preclude following them
> at appropriate times.

Okay, let me say that "when the symlink is the terminal path element" is
not an appropriate time. You're right that this has come up before, and
this is the reason we didn't "fix" it then.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGkkUY0F+nu1YWqI0RAlJNAJ0blFg6SDZOl1GA3AZulBPSa1QnmwCfRP9R
xpETx/b0CRCyddtVo/ZIyII=
=EgKc
-----END PGP SIGNATURE-----

Revision history for this message
Aaron Bentley (abentley) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Bentley wrote:
> Given:
> tree_a/link
> tree_b/file
>
> where tree_a and tree_b are both Bazaar working trees, and link is a
> pointer to file, what should "bzr commit tree_a/link" do?

Actually, in this scenario, you'll probably get the correct
(commit-the-symlink) behavior.

This is a better example:

tree_a/link
tree_b

where link is a versioned symlink to tree_b.

In this scenario, if BzrDir.open() followed symlinks, "bzr commit
tree_a/link" would commit tree_b, which I believe is the wrong behavior.

It's also worth mentioning that following symlinks may not be
implementable on all transports.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGkk6r0F+nu1YWqI0RAs4JAJ9o1ir1Q0SlhHf6LAl1IyFq6ud52gCfbQ10
QiCp6vx4gwFpHPfDwz7gqsM=
=GSsm
-----END PGP SIGNATURE-----

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 124859] incorrect repository detected with symlink to a branch

On Mon, 2007-07-09 at 14:24 +0000, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Robert Collins wrote:
> >> The symlink may well be a versioned file, so the containing directory is
> >> the correct place to search. This allows "bzr log link" to give the
> >> version history of the symlink, while bzr log "subproject/subtree" will
> >> give the version history of the subtree.
> >
> > I think getting the log for the history of 'link' in the tree link is in
> > should be possible. Right now its not possible at all.
>
> It is possible, but not using the symlink.

??? Its certainly not possible from the command line in the example I
have.

> Given:
> tree_a/link
> tree_b/file
>
> where tree_a and tree_b are both Bazaar working trees, and link is a
> pointer to file, what should "bzr commit tree_a/link" do?
>
> If BzrDir.open follows symlinks, then this will commit tree_b/file. I
> believe this is incorrect behavior, because tree_a/link may legitimately
> refer to the symlink, not the file, and this is the only way of
> committing the symlink. Yet by definition, a symlink's target always
> has its own name. So if you want to commit "tree_b/file", you can do
> "bzr commit tree_b/file"

I don't think your example is correct. If bzrdir.open follows symlinks,
and tree_a/link points to tree_b/file, then BzrDir.open('tree_a/link')
will fail (tree_b/file is not a bzr tree);
BzrDir.open_containing('tree_a/link') will return (BzrDir('tree_a'),
'link').

If tree_a/link points at tree_b, then and only then does the confusion I
am talking about occur.

> >> In sum, symlinks are first-class entities in Bazaar, so it does not make
> >> sense to follow them.
> >
> > They are first class entities, but that does not preclude following them
> > at appropriate times.
>
> Okay, let me say that "when the symlink is the terminal path element" is
> not an appropriate time. You're right that this has come up before, and
> this is the reason we didn't "fix" it then.

Thats true. However, this current behaviour is a regression AFAICT - I
believe that BzrDir.open() used to follow symlinks (but
BzrDir.open_containing never did, and should not start following them).

Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Revision history for this message
Aaron Bentley (abentley) wrote : Re: [Bug 124859] incorrect repository detected with symlink to a branch

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
>> It is possible, but not using the symlink.
>
> ??? Its certainly not possible from the command line in the example I
> have.

I am saying that in your example, "bzr log ../subproject/subtree"
instead of "bzr log link" would work.

> I don't think your example is correct.

> If tree_a/link points at tree_b, then and only then does the confusion I
> am talking about occur.

Agreed. See my subsequent email.

But it's worth noting that "bzr commit tree_b" has unclear semantics.
Does it refer to the tree root or to the branch? Because if it refers
to the tree root, then it's hard to see why bzr "bzr commit link" should
commit "tree_b/." when "bzr commit link-to-file" does not commit
"tree_b/file".

> Thats true. However, this current behaviour is a regression AFAICT - I
> believe that BzrDir.open() used to follow symlinks (but
> BzrDir.open_containing never did, and should not start following them).

I don't believe this is a regression. AFAIK, the behavior of
BzrDir.open has not changed recently.

If BzrDir.open had followed symlinks, then BzrDir.open_containing would
necessarily have followed some symlinks, since it is implemented on top
of BzrDir.open.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGklkK0F+nu1YWqI0RApYoAJ41dFQH2U8m20PUxZBbTeugOVL15QCeLfUZ
qiszttUPuVLPixYSL0paedY=
=tOQc
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote :

All of the regression/not regression actually comes from LocalTransport.__init__

It used to use a local function that would *not* call realpath because of how we did/didn't want to follow symlinks. (I believe at the time we didn't want it to follow so that 'bzr add link' would work).

I think the appropriate way to handle it is to have get_transport() not call realpath, but to have the Branch.__init__ (or BzrDir.__init__()) renormalize it's url.

The other problem, though, is that we detect a BzrDir by taking: path + '/.bzr/branch-format'

And that will work for a symlink to a directory. We may want to change the open_containing functions to stat the target, or some other extra hackery. (WT.open_containing() could certainly consider stat'ing the target file/directory/symlink).

Arguably, it could be different for WT.open_containing() (which we've defined *must* be local) and Branch.open_containing() (which may be a URL).

Just a thought.

Martin Pool (mbp)
Changed in bzr:
importance: Critical → High
Martin Pool (mbp)
Changed in bzr:
status: Triaged → Confirmed
Revision history for this message
Martin Pool (mbp) wrote :

This is now, I would say, fixed by the fix for bug 128562. At any rate it does not error.

'bzr log link' now tells you about the history of modifications to the link. 'bzr log link/.' or probably 'log -d link' will tell you about the branch it points to, if any.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.