Comment 5 for bug 718478

Revision history for this message
Paul Sokolovsky (pfalcon) wrote : Re: Mirror service per-host configs and timeouts to alleviate upstream downtimes

Hello James,

On Fri, 03 Jun 2011 17:51:08 -0400
James Westby <email address hidden> wrote:

> I wasn't suggesting that we block this change, just start the
> discussion around a more complete solution.

Ok, sounds good, I deployed those changes in the meantime - seem to work
nice, and well, android.git.kernel.org is back up. I'm also cc:ing
lp:718478 so the discussion is captured.

> On Fri, 3 Jun 2011 23:40:35 +0300, Paul Sokolovsky
> <email address hidden> wrote:
> > What exact issues do you see here? These changes would allow us to
> > always sync with trees we directly develop on (git.linaro.org, maybe
> > some vendor trees later), while sync more lazily with 3rd party
> > trees, of which we don't even use master branch, but some release
> > branch/tag which updates infrequently on their own.
> >
> > Just in case, Alexander on IRC expressed desire for the following
> > schedule: always sync with git.linaro.org, for other hosts, can sync
> > like twice a day.
>
> Well, that's the exact problem that I see. Google does another code
> drop and we have a choice of waiting 12 hours until we can build it,
> or doing some manual intervention?

Well, as Alexander pointed out, 12hrs is probably not that much, but I
agree that good solution should minimize delay automagically, I have
ideas on that (below).

But let's first consider situation we used to have. It's the fact that
upstream git servers can be overloaded/down, and even for longer than
12hrs. Potentially, during any such outages Google can made a code drop
(pretty realistic scenario actually - Google did code drop and servers
got DDOSed). So, would we want, in case of upstream server
non-availability, to not build anything at all, on the basis that
there's possibility that in place far, far away a new code has landed
that we don't have?

I guess, that's worse alternative than be able to still build what we
have, especially when what we already have is exactly what we need.
After all, we added mirroring service to minimize extra-cloud traffic,
but it brings us extra, like allows to also improve our HA points.

Now let's consider what risks are there. Building stale code will be
problem for release builds, but release builds should really use only
builds from tags. So, we either have that tag and can build it, or
don't have, and can't (this relies on good upstream tagging policy,
like not moving tags).

For daily builds for branches, we'd just normally have 12hrs average
delay, the same as for builds themselves. But here's idea how to
improve that: following previous patch, add also "soft_stale" and
"hard_stale" settings. Upstream synced less than soft_stale time ago
won't be synced at all. After that, sync will be attempted, but it's ok
for it to fail w/o affecting a build. After hard_stale time passed,
failed sync will fail the build. So, for android.git we could set
soft_delay=2hrs and hard_delay=24hrs and be pretty good.

Finally, for real-time developers' builds, we indeed could provide
at first a script, later frontend UI to request unconditional sync.

How does that sound?

--
Best Regards,
Paul