See lib/lp/archepublisher/domination.py for background, and especially the comment in dominateBinaries. Some notes about our planned approach follow, from IRC (times in UTC): (11:54:23) jtv: But the big change I'm hoping for is that we might find that the first binary domination pass can simply keep all arch-all pubs alive. (11:54:55) bigjools: That's how I originally did it. (11:55:47) jtv: Doesn't work? (11:56:43) bigjools: The two corner cases mentioned in the comment ... The arch-all pubs were never getting dominated. (12:03:59) jtv: Looking at the comment, I see why the second pass is needed. Accepting that we need the second pass anyway, there's nothing per se against further increasing the need for it. Which means: legroom for optimization! About the "keep arch-all bpphs alive" thing: are you saying you tried that with, or without the double domination? (12:09:18) bigjools: Without. Well — 2 sets of queries rather than a full domination run so it did arch-any first then arch-all but that has the schizo problem. (12:15:30) jtv: Quite. (12:15:53) jtv: So that means that _with_ double domination, this might just work. Dominate twice, but on the first pass, don't supersede arch-all at all. (So consider them, but keep them live) I wonder if we could do the first pass for all architectures, before doing the second pass. Because that way, we get to supersede all non-live arch-specific pubs before we even start looking at the other-publications-from-same-source. The second pass could group by SPRs. There'd be only one getOtherPublicationsForSameSource for each. (12:20:44) bigjools: Isn't this what we do right now? (12:20:59) jtv: Slightly different loop nesting. Right now we loop over DASes, and for each, do 2 domination passes. (12:21:15) bigjools: Actually one way to speed it is to get the list of sources that had arch-all binaries that we left live. (12:21:23) jtv: Indeed. Should be very easy to collect that information in the per-package domination loop. That's a nice touch. In fact, it eliminates a whole lot of work that I thought we were going to need! I guess for the second pass, the algorithm could be something like: Keep the latest version alive. Keep the remaining arch-specific versions alive. Keep any arch-indep versions alive if there are still arch-specific pubs for their SPR. “What's the difference with what we do now?” I hear you ask. (12:26:32) bigjools: :) (12:26:57) jtv: The difference is that there's no need to dominate arch-specific BPPHs at all in that pass. (12:27:06) bigjools: Indeed. (12:27:48) jtv: AFAICS we can pass just the arch-specific BPPHs to the inner domination method. Och aye, no we can't quite do that. Because if the latest version is arch-specific, we'd end up deleting the last arch-all version instead of superseding it. (Unless we allow the caller to "pre-seed" the dominant) Anyway, that's going into needless detail. First I'll apply some loop fission to separate the two passes into separate DAS loops. Then I add code to collect the SPRs-with-live-arch-indep-BPPHs. Then I rearrange the second-pass loop to iterate over those, with an inner loop that iterates DAS. (12:33:15) bigjools: Nice. (12:33:37) jtv: And somewhere along the way, I cut computations that duplicate the outer loops out of the domination method. In particular, the call to getPublicationsForSameSource goes into the outer second-pass loop. Its results get re-used for every architecture. (Thus we query it once for every SPR, apply the result to each architecture, then move on to the next SPR) (12:36:47) bigjools: ok