Comment 8 for bug 718478

Revision history for this message
Paul Sokolovsky (pfalcon) wrote : Re: [Bug 718478] Re: Mirror service per-host configs and timeouts to alleviate upstream downtimes

On Mon, 06 Jun 2011 18:55:30 -0000
James Westby <email address hidden> wrote:

> On Mon, 06 Jun 2011 17:01:59 -0000, Paul Sokolovsky
> <email address hidden> wrote:
> > But let's first consider situation we used to have. It's the fact
> > that upstream git servers can be overloaded/down, and even for
> > longer than 12hrs. Potentially, during any such outages Google can
> > made a code drop (pretty realistic scenario actually - Google did
> > code drop and servers got DDOSed). So, would we want, in case of
> > upstream server non-availability, to not build anything at all, on
> > the basis that there's possibility that in place far, far away a
> > new code has landed that we don't have?
>
> No, I don't know where you get the idea that I am suggesting that.

Well, I don't say that, I just wanted to draw extremes, to find good
place inbetween where system can sustainably function.

> We
> need to design a robust system that gives the possibility to have
> quick turnaround when needed. We used to have a non-robust system with
> quick-turnaround. We now have a robust system with slow turnaround.
>

[]

> > For daily builds for branches, we'd just normally have 12hrs average
> > delay, the same as for builds themselves. But here's idea how to
> > improve that: following previous patch, add also "soft_stale" and
> > "hard_stale" settings. Upstream synced less than soft_stale time ago
> > won't be synced at all. After that, sync will be attempted, but
> > it's ok for it to fail w/o affecting a build. After hard_stale time
> > passed, failed sync will fail the build. So, for android.git we
> > could set soft_delay=2hrs and hard_delay=24hrs and be pretty good.
>
> That sounds like a useful part of a solution to me, provided that the
> sync is atomic and so a failed sync on the soft_delay period doesn't
> corrupt the repos.

Can you elaborate on this? If by corrupt you mean inconsistent state,
then it may be not that bright - git pull is of course atomic, but repo
just pulls git subtrees one by one, so it can be the case that one
subtree is updated, while another not.

>
> > Finally, for real-time developers' builds, we indeed could provide
> > at first a script, later frontend UI to request unconditional sync.
>
> This would be a "request sync" step to force a sync?
>
> Why is it a separate step? The developer would then have to request a
> sync and then wait until it was complete before submitting their build
> to ensure that the build used the code that they wanted.
>
> Could it just be a part of the build config which is translated to
> extra info passed to the mirror service to force an override?

Well, it's matter of concept separation. So far, what's in the build
config, affects just that build. Mirror control, on the other hand, is
on another level, and may affect other builds.

Usecase:

1. Mirror has synced, and upstream host soon went down.
2. During soft_stale period, all builds would still succeed.
3. But one developer decided to force mirror sync (essentially
by removing last sync timestamps).
4. As upstream host is down, the sync didn't succeed.
5. Developer's build thus failed.
6. But any other build since this build will also fail. (Whereas
otherwise there would soft_stale fail-free period).

Taking into account that build config changes are persistent (vs
one-off), I envisioned "force sync" feature as a separate, special
frontend action available only to build admins (like ability to create
official builds). But if you think that controlling it on the build
config level is useful and scenario above isn't problematic, it can be
done there.

And we still can start with just providing couple of scripts on
android-build.linaro.org, so people with access can SSH in and do:

force-mirror-sync - next mirror service request will force mirrored
repos update (impl: remove last sync timestamps for all hosts)

pretend-mirror-sync - mark all hosts as if they were just synced, the
remedy for prolonged upstream host unavailability (impl: touch last
sync timestamps)

>
> Thanks,
>
> James
>

--
Best Regards,
Paul