Ubuntu publisher is taking more than an hour to complete

Bug #884649 reported by Julian Edwards
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Critical
Jeroen T. Vermeulen

Bug Description

Due to the recent domination changes to allow arch-all binaries to persist until their counterpart arch-any get dominated, the domination part of the publisher is now taking 30 minutes or more which pushes the whole run over 1 hour.

Related branches

Changed in launchpad:
status: New → Triaged
importance: Undecided → Critical
tags: added: lp-soyuz soyuz-publish
Changed in launchpad:
status: Triaged → In Progress
assignee: nobody → Jeroen T. Vermeulen (jtv)
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :
Download full text (3.5 KiB)

See lib/lp/archepublisher/domination.py for background, and especially the comment in dominateBinaries.

Some notes about our planned approach follow, from IRC (times in UTC):

(11:54:23) jtv:
But the big change I'm hoping for is that we might find that the first binary domination pass can simply keep all arch-all pubs alive.

(11:54:55) bigjools:
That's how I originally did it.

(11:55:47) jtv:
Doesn't work?

(11:56:43) bigjools:
The two corner cases mentioned in the comment ...
The arch-all pubs were never getting dominated.

(12:03:59) jtv:
Looking at the comment, I see why the second pass is needed. Accepting that we need the second pass anyway, there's nothing per se against further increasing the need for it. Which means: legroom for optimization!

About the "keep arch-all bpphs alive" thing: are you saying you tried that with, or without the double domination?

(12:09:18) bigjools:
Without.
Well — 2 sets of queries
rather than a full domination run
so it did arch-any first then arch-all
but that has the schizo problem.

(12:15:30) jtv:
Quite.

(12:15:53) jtv:
So that means that _with_ double domination, this might just work.

Dominate twice, but on the first pass, don't supersede arch-all at all.
(So consider them, but keep them live)
I wonder if we could do the first pass for all architectures, before doing the second pass.
Because that way, we get to supersede all non-live arch-specific pubs before we even start looking at the other-publications-from-same-source.

The second pass could group by SPRs. There'd be only one getOtherPublicationsForSameSource for each.

(12:20:44) bigjools:
Isn't this what we do right now?

(12:20:59) jtv:
Slightly different loop nesting.
Right now we loop over DASes, and for each, do 2 domination passes.

(12:21:15) bigjools:
Actually one way to speed it is to get the list of sources that had arch-all binaries that we left live.

(12:21:23) jtv:
Indeed.
Should be very easy to collect that information in the per-package domination loop.
That's a nice touch.
In fact, it eliminates a whole lot of work that I thought we were going to need!

I guess for the second pass, the algorithm could be something like:
Keep the latest version alive. Keep the remaining arch-specific versions alive. Keep any arch-indep versions alive if there are still arch-specific pubs for their SPR.

“What's the difference with what we do now?” I hear you ask.

(12:26:32) bigjools:
:)

(12:26:57) jtv:
The difference is that there's no need to dominate arch-specific BPPHs at all in that pass.

(12:27:06) bigjools:
Indeed.

(12:27:48) jtv:
AFAICS we can pass just the arch-specific BPPHs to the inner domination method.

Och aye, no we can't quite do that.

Because if the latest version is arch-specific, we'd end up deleting the last arch-all version instead of superseding it.
(Unless we allow the caller to "pre-seed" the dominant)

Anyway, that's going into needless detail.

First I'll apply some loop fission to separate the two passes into separate DAS loops.

Then I add code to collect the SPRs-with-live-arch-indep-BPPHs.

Then I rearrange the second-pass loop to iterate over those, with an inner loop that iterates DAS.

(12:33:15)...

Read more...

Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
tags: added: qa-needstesting
Changed in launchpad:
status: In Progress → Fix Committed
Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Q/A is looking good. Processed Oneiric's Release pocket in about 10 minutes on dogfood, 4 of which are for “superseded processing” and 6 for domination proper. Full domination for Ubuntu (this is not a busy server) completed in 11 minutes, 4 seconds.

Two Oneiric publications are kept alive for arch-independent package lsb-release-udeb: 6.4ubuntu4 and 6.4ubuntu6 (there appears to be no 6.4ubuntu5 on dogfood). The double live publications survived both domination passes, on all architectures. The reason: 6.4ubuntu6 failed to build on all architectures except i386. So the arch-specific builds for those must have stayed live, and that kept the arch-indep build alive (on all architectures). This is exactly the problem case that we were most worried about, and the dominator seems to have dealt with it in exactly the way we were hoping for.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

By the way, an update to the plan outlined in IRC:

 * We no longer need to make the second binary-domination pass iterate by source package. The cache I added in branch-4 obviates it.

 * There's probably still a bit of object-object querying while populating that cache (BPR→BPB) that we can probably speed up a bit more through bulk-loading.

 * We probably also ought to bulk-fetch BPN.

 * Raphaël's suggestion could do wonders for speeding up _sortPackages.

tags: added: qa-ok
removed: qa-needstesting
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

The Q/A done just now includes all branches attached to this branch so far:
 * bug-884649-branch-1
 * getBinariesForDomination-bulk
 * bug-884649-branch-2
 * bug-884649-branch-3
 * bug-884649-branch-4

(Tested revision 14260).

Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
tags: added: qa-needstesting
removed: qa-ok
tags: added: qa-ok
removed: qa-needstesting
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Whoops: that was the Q/A tagger catching up, unaware that dogfood Q/A has run far ahead of staging rollouts.

Once revision 14256 gets rolled out, we can watch the numbers and see if bug-884649-branch-5 is worth the trouble.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Ubuntu domination has gone from over half an hour to well under 2 minutes. We probably won't be needing branch-5; at any rate it turns out that the [BS]PPH.{binary,source}packagename columns (which the branch uses) have not been fully populated yet.

Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.