Stacking is not fully transitive

Bug #715000 reported by William Grant on 2011-02-08
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Critical
Martin Pool
2.2
Critical
Martin Pool
2.3
Critical
Martin Pool
Launchpad itself
Critical
William Grant
Ubuntu Distributed Development
High
canonical-bazaar
bzr (Ubuntu)
Undecided
Unassigned
Maverick
Undecided
Unassigned
Natty
High
Jelmer Vernooij

Bug Description

Stacking is not transitive in all cases. For example, bzr log -n0 will fail if a revision is only present two levels down.

 - bzr init foo
 - cd foo
 - bzr ci --unchanged -m "foo"
 - bzr push --stacked-on=. ../bar
 - bzr push --stacked-on=../bar ../baz
 - bzr log -n0 ../baz

BOOM like http://launchpadlibrarian.net/63776527/1qgzM86qVTkcJblo3eTosECDEF3.txt

Related branches

James Westby (james-w) wrote :

Seems to be Branch.iter_merge_sorted_revisions that is common to both here,
so may just be limited to that one code path.

Thanks,

James

James Westby (james-w) wrote :

Trying with lp:ubuntu/lucid/ssss I see

471 known_graph = self.repository.get_known_graph_ancestry(
472 [last_revision])

which internally fails to find last_revision in either of the _fallback_vfs that it checks.

I suspect this bug occurs when the tip revision is in a stacked-on repository. It may
happen when any revision in the history is stored in a stacked-on repo, I'm not
sure yet.

Thanks,

James

Andrew Bennetts (spiv) on 2011-02-08
Changed in bzr:
status: New → Confirmed
assignee: nobody → canonical-bazaar (canonical-bazaar)
importance: Undecided → High
James Westby (james-w) wrote :

Hi,

There is a chain of stacking

  A -> B -> C

(-> means stacked on).

We open A, which gets B as a fallback, which causes B to be opened which gets
C as a fallback.

We then try to get the known_graph of A, and we reach this code (groupcompress.py)

   def get_known_graph_ancestry(self, keys):
        """Get a KnownGraph instance with the ancestry of keys."""
        # Note that this is identical to
        # KnitVersionedFiles.get_known_graph_ancestry, but they don't share
        # ancestry.
        parent_map, missing_keys = self._index.find_ancestry(keys)
        for fallback in self._fallback_vfs:
            if not missing_keys:
                break
            (f_parent_map, f_missing_keys) = fallback._index.find_ancestry(
                                                missing_keys)
            parent_map.update(f_parent_map)
            missing_keys = f_missing_keys

So we are inside A.revisions, and we don't find the revision in our index, so we
look in the fallback, B.revisions, which doesn't have it either. If we looked in A.revisions
we would find it, but we don't.

Rob says:

<james_w> I would have assumed that most of this code didn't recurse at all call-sites, and just collected all the fallbacks when opening in to _fallback_vfs
<lifeless> each repo is mutable
<lifeless> so open is the wrong time
<lifeless> flattening at execution is probably appropriate
<lifeless> that or recursing via the public api

I am not going to work on this any further, so someone else should take it over
to stop Launchpad from OOPSing.

Thanks,

James

Andrew Bennetts (spiv) wrote :

This is causing Launchpad to OOPS when scanning branches generated by the package importer. It seems likely the package importer and/or users of those branches may encounter similar issues too. wgrant suggests the backlog of unscannable branches may become a problem for Launchpad too.

So I'm tentatively bumping this up to Critical.

Changed in bzr:
importance: High → Critical
tags: added: stacking udd
Martin Pool (mbp) on 2011-02-08
Changed in bzr:
assignee: canonical-bazaar (canonical-bazaar) → Martin Pool (mbp)
status: Confirmed → In Progress
Martin Pool (mbp) wrote :

Thanks for localizing it, James.

I think bug 571962 is similar, or maybe a dupe.

I have a passing test.

I think I understand #3 which is that although that function looks like it's iterating across all the fallbacks, it doesn't actually have them all in the list yet. That's a bit of a trap and there may well be other occurrences.

Martin Pool (mbp) wrote :

It looks like get_parent_map and _get_parent_map_with_sources could have a similar bug. Perhaps adding a function to get the full expansion, then checking which callers should be updated to use this seems better. Rewriting every function to recurse through the public interface seems to have more danger of causing performance problems.

Doing so seems to fix my test for it, assuming it's representative.

If we're comfortable with this fix, we should also grep for _fallback_vfs and update them, and perhaps even rename that to something like _immediate_fallbacks that won't encourage people to make the same mistake.

John A Meinel (jameinel) wrote :

Just to mention fetch already has code for tracking through fallbacks, as it was the first place we encountered it.

What we have to watch out for is whether operations can happen remotely or not. Imagine this case:

 A stacked on B stacked on C

A = local branch and repository
B = Launchpad branch B
C = Launchpad branch C

The issue is that the local client has something like:

RepoA is a GroupCompressRepo with _fallback_repository = [RemoteRepoB]
RemoteRepoB is a RemoteRepository with _fallback_repository = [RemoteRepoC]
RemoteRepoC is a RemoteRepository with _fallback_repository = []

However, on the server side, we have:

RepoB is a GroupCompressRepo with _fallback_repository = []
RepoC is a GroupCompressRepo with _fallback_repository = []

Fetch is currently special, because it is evaluated in the server side. So client-side fetch had to be updated, since server side repositories don't see their fallbacks.

Note that VersionedFile objects get an equivalent VF for fallbacks.

I'm also pretty sure that at least James didn't realize that
RepoA._fallback_repositories != [RemoteRepoB, RemoteRepoC]

On 9 February 2011 08:32, John A Meinel <email address hidden> wrote:
> I'm also pretty sure that at least James didn't realize that
> RepoA._fallback_repositories != [RemoteRepoB, RemoteRepoC]

I did not realize that either. I think it's kind of a dangerous
interface that way.

Robert Collins (lifeless) wrote :

On Wed, Feb 9, 2011 at 11:23 AM, Martin Pool <email address hidden> wrote:
> On 9 February 2011 08:32, John A Meinel <email address hidden> wrote:
>> I'm also pretty sure that at least James didn't realize that
>> RepoA._fallback_repositories != [RemoteRepoB, RemoteRepoC]
>
> I did not realize that either.  I think it's kind of a dangerous
> interface that way.

Its the immediate fallbacks; they can have their own - and a repo can
have N immediate fallbacks because of shared repositories with stacked
branches in the repo.

Flattening it is possible, but you want to make sure you still have
the fallback repositories know their own fallbacks so they can
maintain their own invariants about inventories, revisions etc. The
downside if you flatten it then is that you need two apis - a
non-recursive one and a recursive one, or you'll end up handling
fallbacks twice - once in the top level, and then once in the first
level. Of course this can still happen when you have several direct
fallbacks, but thats very uncommon today.

-Rob

Martin Pool (mbp) wrote :

On 9 February 2011 10:00, Robert Collins <email address hidden> wrote:
> On Wed, Feb 9, 2011 at 11:23 AM, Martin Pool <email address hidden> wrote:
>> On 9 February 2011 08:32, John A Meinel <email address hidden> wrote:
>>> I'm also pretty sure that at least James didn't realize that
>>> RepoA._fallback_repositories != [RemoteRepoB, RemoteRepoC]
>>
>> I did not realize that either.  I think it's kind of a dangerous
>> interface that way.
>
> Its the immediate fallbacks; they can have their own - and a repo can
> have N immediate fallbacks because of shared repositories with stacked
> branches in the repo.

Sure. I would just change it to be called _immediate_fallbacks or
something so that people don't misinterpret it.

> Flattening it is possible, but you want to make sure you still have
> the fallback repositories know their own fallbacks so they can
> maintain their own invariants about inventories, revisions etc. The
> downside if you flatten it then is that you need two apis - a
> non-recursive one and a recursive one, or you'll end up handling
> fallbacks twice - once in the top level, and then once in the first
> level. Of course this can still happen when you have several direct
> fallbacks, but thats very uncommon today.

It seems like either recursing or flattening them can get into
inefficiency. At the moment flattening it from a method call at the
moment we want to know the list seems easiest.

If you get the chance to review
<https://code.launchpad.net/~mbp/bzr/715000-stacking/+merge/48886> I'd
appreciate it.
If you have a chance I'd like to see your review of the patch.

William Grant (wgrant) on 2011-02-09
Changed in launchpad:
assignee: nobody → William Grant (wgrant)
status: New → In Progress
William Grant (wgrant) on 2011-02-09
Changed in launchpad:
importance: Undecided → Critical
Martin Pool (mbp) wrote :

I looked for other cases and couldn't find any. We can make the api a bit safer by renaming it: https://code.edge.launchpad.net/~mbp/bzr/715000-more-fallbacks/+merge/49025

This is now waiting for deployment to launchpad, probably in the next 24h.

Changed in bzr:
status: In Progress → Fix Released
Launchpad QA Bot (lpqabot) wrote :
Changed in launchpad:
milestone: none → 11.03
tags: added: qa-needstesting
Changed in launchpad:
status: In Progress → Fix Committed
Martin Pool (mbp) wrote :

I'm adding a task against udd for deploying a fixed bzr to the importer. It may or may not actually be necessary.

Changed in udd:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → canonical-bazaar (canonical-bazaar)
William Grant (wgrant) on 2011-02-10
tags: added: qa-ok
removed: qa-needstesting
Martin Pool (mbp) wrote :

bug 716892 for monitoring to catch similar failures.

William Grant (wgrant) on 2011-02-11
Changed in launchpad:
status: Fix Committed → Fix Released
John A Meinel (jameinel) wrote :

So are we still just waiting on this being deployed in order to consider it finished?

John A Meinel (jameinel) wrote :

This landed in bzr-2.3 on revision 5622. So it should be included in whatever bzr-2.3.2 release is made.

Jelmer Vernooij (jelmer) on 2011-06-08
Changed in udd:
status: Confirmed → Fix Released
Changed in bzr (Ubuntu):
status: New → Fix Released
Changed in bzr (Ubuntu Natty):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Jelmer Vernooij (jelmer)

Accepted bzr into natty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in bzr (Ubuntu Natty):
status: In Progress → Fix Committed
tags: added: verification-needed
Martin Pool (mbp) wrote :

Hi pitti, thanks for taking this into natty-proposed.

There is a regression in 2.3.3 which should block the SRU, bug 786980. We will fix that and then try again with 2.3.4.

tags: added: sru
tags: added: verification-failed
removed: verification-needed
Martin Pool (mbp) wrote :

(I've marked this verification-failed to abort the SRU; this particular bug is actually fine afaik.)

Clint Byrum (clint-fewbar) wrote :

Hello William, or anyone else affected,

Accepted bzr into natty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: removed: verification-failed
tags: added: verification-needed
Jelmer Vernooij (jelmer) wrote :

Verified by running the bzr testsuite from the package in a clean natty install.

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package bzr - 2.3.4-0ubuntu1

---------------
bzr (2.3.4-0ubuntu1) natty-proposed; urgency=low

  * New upstream release.
   + Fix bzr version number in deprecation warnings. LP: #794960
   + Prevent write attemps on remote branch during "bzr up". LP: #786980
   + Fix conflict handling when two trees involved in a merge have different
     root ids. LP: #805809

bzr (2.3.3-0ubuntu1) natty-proposed; urgency=low

  * New upstream release.
   + Fixes deprecation warning on newer versions of Python. LP: #760435
   + Stops 'bzr push' from copying entire repository if a .bzr directory is
     present without a branch. LP: #465517
   + Fixes undefined local variable error when waiting for lock. LP: #733136
   + Fixes lock contention issues pushing to a bound branch. LP: #733350
   + Transfers less data creating a new stacked branch. LP: #737234
   + Several fixes to the test suite, making it more robust. LP: #654733,
      LP: #751824
   + 'bzr merge --pull --preview' actually shows a preview rather than
     actually merging. LP: #760152
   + bzr smart server now supports UTF-8 user names. LP: #659763
   + user identity can now be set based on username and /etc/mailname, not
     requiring it to be set manually. LP: #616878
   + stacking is now fully transitive. LP: #715000
   + makes in-terminal crash report of plugins much shorter. LP: #716389
 -- Jelmer Vernooij <email address hidden> Thu, 14 Jul 2011 21:12:58 +0200

Changed in bzr (Ubuntu Natty):
status: Fix Committed → Fix Released
Jelmer Vernooij (jelmer) on 2011-09-27
tags: removed: verification-done
Martin Pitt (pitti) wrote :

Is there really much point in fixing this in maverick now? It has lived most of its live with this bug, and will go EOL in two months.

Martin Pitt (pitti) on 2012-02-17
Changed in bzr (Ubuntu Maverick):
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers