Incremental push to stacked branch on LP copies all tagged revisions

Bug #745664 reported by Andrew Bennetts
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Invalid
Medium
John A Meinel

Bug Description

$ time bzr push
Using saved push location: lp:~spiv/twisted/deferred-no-failure-tracebacks-by-default
BzrBranch7(file:///home/andrew/code/twisted/deferred-no-failure-tracebacks-by-default/) was read locked again
CHKInventoryRepository('file:///home/andrew/code/twisted/.bzr/repository/') was read locked again
Transferred: 16152kB (30.7kB/s r:166kB w:15986kB)
bzr: interrupted
HPSS calls: 513 (0 vfs) SmartSSHClientMedium(bzr+ssh://<email address hidden>/)

real 8m45.832s
user 0m15.869s
sys 0m0.264s

[got fed up, cancelled it, broke the lock]

$ time ~/warthogs/bzr/bzr-2.3/bzr push
Using saved push location: lp:~spiv/twisted/deferred-no-failure-tracebacks-by-default
BzrBranch7(file:///home/andrew/code/twisted/deferred-no-failure-tracebacks-by-default/) was read locked again
CHKInventoryRepository('file:///home/andrew/code/twisted/.bzr/repository/') was read locked again
Pushed up to revision 15786.
Transferred: 56kB (3.8kB/s r:27kB w:29kB)
HPSS calls: 19 (0 vfs) SmartSSHClientMedium(bzr+ssh://<email address hidden>/)

real 0m17.792s
user 0m0.376s
sys 0m0.048s

This has happened to me a couple of times now pushing to lp:~spiv/twisted/*

It appears to be a regression in trunk vs. 2.3

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 745664] [NEW] Incremental push to stacked branch on LP walks all of history

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 3/30/2011 2:28 PM, Andrew Bennetts wrote:
> Public bug reported:
>
> $ time bzr push
> Using saved push location: lp:~spiv/twisted/deferred-no-failure-tracebacks-by-default
> BzrBranch7(file:///home/andrew/code/twisted/deferred-no-failure-tracebacks-by-default/) was read locked again
> CHKInventoryRepository('file:///home/andrew/code/twisted/.bzr/repository/') was read locked again
> Transferred: 16152kB (30.7kB/s r:166kB w:15986kB)
> bzr: interrupted
> HPSS calls: 513 (0 vfs) SmartSSHClientMedium(bzr+ssh://<email address hidden>/)
>
> real 8m45.832s
> user 0m15.869s
> sys 0m0.264s
>
> [got fed up, cancelled it, broke the lock]
>
> $ time ~/warthogs/bzr/bzr-2.3/bzr push
> Using saved push location: lp:~spiv/twisted/deferred-no-failure-tracebacks-by-default
> BzrBranch7(file:///home/andrew/code/twisted/deferred-no-failure-tracebacks-by-default/) was read locked again
> CHKInventoryRepository('file:///home/andrew/code/twisted/.bzr/repository/') was read locked again
> Pushed up to revision 15786.
> Transferred: 56kB (3.8kB/s r:27kB w:29kB)
> HPSS calls: 19 (0 vfs) SmartSSHClientMedium(bzr+ssh://<email address hidden>/)
>
> real 0m17.792s
> user 0m0.376s
> sys 0m0.048s
>
>
> This has happened to me a couple of times now pushing to lp:~spiv/twisted/*
>
> It appears to be a regression in trunk vs. 2.3

Did you get a chance to look into this at all? I'll try to take over it
from now, since it is certainly a significant regression if true.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2UKSkACgkQJdeBCYSNAAOtGgCgsKFwN+HWBsS+Di7JL1EU8anC
YykAn28YwojVg7wsdPRc+2Hapn1soC5C
=U7nM
-----END PGP SIGNATURE-----

John A Meinel (jameinel)
Changed in bzr:
assignee: nobody → John A Meinel (jameinel)
Revision history for this message
John A Meinel (jameinel) wrote : Re: Incremental push to stacked branch on LP walks all of history

I just tested with bzr.dev, and this is my bzr.log for the second transfer. I did:

bzr commit --unchanged -m "no change"
bzr push lp:~jameinel/bzr/test-commits
bzr commit --unchanged -m "no change2"
bzr push lp:~jameinel/bzr/test-commits

Now there are a few things to note:

1) We are transferring quite a bit of data. 1.6MB for a single revision (with no text content)
  I think this is the 'push always pushes parent CHK pages, even if they are already present' bug.
2) We have a really weird 'missing-basis' return value:
131.083 result: ('missing-basis', 'll32:603f6cad9b67e5fd3900118914a380d9ell11:inventories46:<email address hidden>:inventories63:<email address hidden>:inventories54:<email address hidden>:inventories54:<email address hidden>:inventories57:<email address hidden>:inventories50:<email address hidden>:inventories50:<email address hidden>:inventories50:<email address hidden>:inventories50:<email address hidden>:inventories50:<email address hidden>:inventories50:<email address hidden>:inventories50:<email address hidden>:inventories50:<email address hidden>:inventories50:<email address hidden>:inventories50:<email address hidden>:inventories50:<email address hidden>:inventories50:<email address hidden>:inventories50:<email address hidden>:inventories50:<email address hidden>:inventories50:<email address hidden>:inventories50:<email address hidden>:inventories50:<email address hidden>:inventories50:<email address hidden>:inventories50:<email address hidden>:inventories50:<email address hidden>:inventories50:<email address hidden>:inventories50:<email address hidden>:inventories50:<email address hidden>:inventories50:<email address hidden>')

For a commit that doesn't introduce anything, which has a trivial parent, why would it think we are missing 29 basis inventories. Also, that final content seems to account for: 1,265,025 or 1.2M of the 1.6MB we upload.

3) I do see a surprising amount of parent walking going on, though it doesn't appear to be walking the whole ancestry. Then again, the very first key in the 'give me the parents of these revisions' is '<email address hidden>' which is a 4 year old revision. That seems really surprising.

Revision history for this message
John A Meinel (jameinel) wrote :

Ah, I bet it is the Tags being fetched. Which is also causing us to try to insert the basis inventories for all of those tags. *sigh*.

Revision history for this message
John A Meinel (jameinel) wrote :

Confirmed. If I open the stacked branch, and do this:
>>> k = repo.revisions._index.keys()
>>> len(k)
168

That indicates we have uploaded 168 revisions to the stacked repository.
>>> t = b.tags.get_tag_dict()
>>> len(t)
151

A little odd that we uploaded 168 revisions for 151 tags + 2 actual commits. What is even weirder is if I do:
>>> rev_ids = [k[0] for k in repo.revisions._index.keys()]
>>> tag_rev_ids = set(b.tags.get_reverse_tag_dict())
>>> extra = set(rev_ids) - tag_rev_ids
>>> len(extra)
125

So it isn't that we uploaded all tags and a couple of extra revisions, we uploaded mostly extra revisions.
>>> list(extra)[:10]
['<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>']

Something *really* weird is going on here.

Revision history for this message
John A Meinel (jameinel) wrote :

note that the third push seems to be fine, and doesn't get the big "missing-basis" entry.

Revision history for this message
John A Meinel (jameinel) wrote :

For completeness, here is the first push.

Revision history for this message
John A Meinel (jameinel) wrote :

So looking at "first-push" it appears that we *don't* try to copy the tags, and thus we don't get any missing basis inventories.

"second-push" we know think that there is an existing target branch, and it needs to have its tags properly filled, and for every tag if we have the matching revision we push it. This then causes us to be missing the basis inventory for each of those revisions. So we are asked to push all of that up as well.

"third-push" from then on we've already copied up all the tag revisions and their basis inventories, so everyone is happy.

I think we really need to think closely about how we want the tag thing to interact with stacking, because right now it is interacting very poorly. Think of things like "emacs" where you have 1129 tags. You really don't want to have to push >1k revisions just to push up your one interesting revision.

Also, if tags are copying the revisions, I'm pretty sure that means it is copying up all of the chk bytes and *text* files that are referenced uniquely by those revisions.

I'm pretty sure we *don't* want to push all tags to a stacked branch. Arguably they should exist either in the target branch or the stacked-on branch. But we have to be *really* careful that it doesn't get super expensive to check this on every push because "oh hey look, we have tags, do you need any of them?" doesn't sound nice.

Revision history for this message
Andrew Bennetts (spiv) wrote : Re: [Bug 745664] Re: Incremental push to stacked branch on LP walks all of history

John A Meinel wrote:
> So looking at "first-push" it appears that we *don't* try to copy the
> tags, and thus we don't get any missing basis inventories.

That's a bug; we should be pushing tagged revisions on the first push
(if we push them at all).

[…]
> I think we really need to think closely about how we want the tag thing
> to interact with stacking, because right now it is interacting very
> poorly. Think of things like "emacs" where you have 1129 tags. You
> really don't want to have to push >1k revisions just to push up your one
> interesting revision.

Hmm, that does sound expensive. But if we don't push them, then they
won't be there when someone does 'bzr branch' on that pushed branch. I
agree there's little benefit in pushing a revision to a stacked
repository if it is found in its stacked-on repository. Is that really
happening? (I think that's what you're saying, and it sounds like a
plausible bug, I just want to be sure I'm understanding correctly.)

I think that's probably a fixable problem. Perhaps the fetch code
doesn't distinguish well enough between “revs I want present in the
stacked repo” and “revs that I want present in the repo any of the
fallbacks”.

If that's the issue, and we fix it, then for branches hosted on
Launchpad, if the development focus has our tags (and the revs for those
tags), then pushing should still be pretty cheap, at least in principle.

Note that pushing up stacked branches outside of Launchpad is not a
common scenario. Most people use shared repositories, where the cost of
pushing up tagged revs on the first push is going to be negligible
compared to pushing up the rest of history.

> Also, if tags are copying the revisions, I'm pretty sure that means it
> is copying up all of the chk bytes and *text* files that are referenced
> uniquely by those revisions.

Yes.

> I'm pretty sure we *don't* want to push all tags to a stacked branch.
> Arguably they should exist either in the target branch or the stacked-on
> branch. But we have to be *really* careful that it doesn't get super
> expensive to check this on every push because "oh hey look, we have
> tags, do you need any of them?" doesn't sound nice.

I don't think two get_parent_map calls are going to be too expensive.
We can look at extending the HPSS verbs to allow a get_parent_map call
that is given a list of fallback repos (as relpaths in that server, not
URLs) to try at the same time, to reduce it to one roundtrip to check
a branch and its fallback(s) if they are on the one host.

John A Meinel (jameinel)
summary: - Incremental push to stacked branch on LP walks all of history
+ Incremental push to stacked branch on LP copies all tagged revisions
John A Meinel (jameinel)
Changed in bzr:
status: Confirmed → In Progress
Revision history for this message
John A Meinel (jameinel) wrote :

I might have a slightly better handle on this. The bug might be not that we copy all tagged revisions, but that we copy all tagged revisions that aren't in the ancestry. So if you have:

 A
 |\
 B C (C-tag)
 |
 D (Base tip)
 |
 E (Stacked tip)

The Graph search will have C and E as tips that we want to fetch if we can, and it will automatically set a StopRevision of D because that will show up as 'already present'. However, C is not in the ancestry of D. So while it *is* present in the base repository, it wasn't in the search graph.

I could be wrong, but I might as well mention what I've gotten to so far. I'm having trouble reproducing this in a test case, because when I use:

 A (A-tag)
 |
 B (base-tip)
 |
 B (stacked tip)

It correctly didn't copy A-tag into the stacked branch. Which may be why we see a lot of extra content for bzr (because we have 60/151 tags aren't in the ancestry), but we may not see as much for other projects.

Revision history for this message
John A Meinel (jameinel) wrote :

I think I've tracked all this down, and it looks like it is very bzr specific.

If I track all the way into _walk_to_common_revisions, I end up with

> c:\users\jameinel\dev\bzr\work\bzrlib\repository.py(3433)_walk_to_common_revisions()
-> have_revs = have_revs.union(null_set)
(Pdb) pp len(next_revs)
147
(Pdb) pp len(have_revs)
88

So including tags, we have ~147 tips that we want to fetch. ~bzr-pqm/bzr/bzr.dev has 88 of those, but is still missing 59. New stacked branches are pushing up all 59 revisions (that I have), but not all 147 tips that they could be pushing up.

It actually looks like we are doing what we want, and we just need to get a push of bzr.dev into bzr.dev from someone who has all of these tagged revisions which aren't in the ancestry. Then our stacked-on branch will be populated, and we'll stop pushing up all the extra data.

Revision history for this message
John A Meinel (jameinel) wrote :

This is a new behavior, but it is the desired behavior in 2.4. We will copy up all tags into the stacked location that are not present on the stacked-on location. It just happens that bzr has *lots* of tags that are not in the ancestry of tip, which means the default fetching never copied their history. So we have *lots* of tags that look new on every stacked branch.

Changed in bzr:
importance: Critical → Medium
status: In Progress → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.