error reporting is entirely absent

Bug #724096 reported by Michael Hudson-Doyle
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linaro Android Mirror
Fix Released
High
Paul Sokolovsky

Bug Description

When a mirror fails, this should be somehow reported to the process on the receiving end of the mirror, at least including which host failed to mirror properly.

Related branches

Changed in linaro-android-mirror:
assignee: nobody → Paul Sokolovsky (pfalcon)
status: New → In Progress
Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Well, I guess we for sure should log such errors to mirror service logs, to see what kind of errors and how often we get. Btw, even from local runs it's clear that android.git.kernel.org is pretty overloaded and from time to time times out or resets connection. Moreover, I saw cases when git did report errors, but repo stills finished with 0 exit code.

Other question is if we should report such failures to XMLRPC caller. If we do, then build will fail (non-deterministic failure). On the other hand, if we don't, we may have a build from a stale codebase which is of course worse. Compromise might to have auto-retrying for repo sync failures, say, 3 times, and then return a failure. This way we'll minimize non-deterministic failures but won't have spurious false positives due to stale codebase.

Still, my first patch is just for capturing and logging results of repo invocations.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote : Re: [Bug 724096] Re: error reporting is entirely absent

On Mon, 18 Apr 2011 12:55:37 -0000, Paul Sokolovsky <email address hidden> wrote:
> Well, I guess we for sure should log such errors to mirror service logs,
> to see what kind of errors and how often we get. Btw, even from local
> runs it's clear that android.git.kernel.org is pretty overloaded and
> from time to time times out or resets connection.

Yes. This seems to get better and worse on a week-to-week sort of
timescale. It's been OK for the last month or so :)

> Moreover, I saw cases when git did report errors, but repo stills
> finished with 0 exit code.

Well, this is just another reason not to run repo on the server I guess.

> Other question is if we should report such failures to XMLRPC caller. If
> we do, then build will fail (non-deterministic failure).

I think a mirror fail should fail the build -- if we weren't mirroring,
and the remote server barfed, the build would fail.

> On the other hand, if we don't, we may have a build from a stale
> codebase which is of course worse. Compromise might to have
> auto-retrying for repo sync failures, say, 3 times, and then return a
> failure. This way we'll minimize non-deterministic failures but won't
> have spurious false positives due to stale codebase.

I think we should do the simple thing and fail the build to start with,
and see if something more determined turns out to be needed.

Cheers,
mwh

Changed in linaro-android-mirror:
importance: Undecided → Low
status: In Progress → Triaged
Changed in linaro-android-mirror:
importance: Low → High
Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Ok, right on LDS we had a build failure case (described in lp:781651) which was pretty hard to debug exactly because there was not error reporting from mirror service, and build continued, to fail later with a confusing error. So, Michael is right that mirror failure should lead to build failure. I already made and deployed corresponding changes. They can be seen in action at https://android-build.linaro.org/jenkins/job/linaro-android_leb-panda/36/console for example. The problem so far is that only generic error is propagated, so from build logs it's not clear what the underlying cause was (in the case above it's a glitch on Google servers). This is Twisted limitation, and I'm preparing a patch to get around it.

Changed in linaro-android-mirror:
status: Triaged → In Progress
Revision history for this message
Alexander Sack (asac) wrote :

OK, from what i understand git/repo sync etc. can fail on "slave" side and on "mirror" side.

I would think:

 1. error on "slave" must make build fail! If not thats a bug (e.g. if repo sync exits 0 -> bug!)

 2. error during "mirror" update might be ok. So, let's still go ahead and mark such builds in UI with a warning like "might have used outdated repos: <FAILED-TO-SYNC-REPO-LIST> would be appropriate.

 3. if "mirror" initial sync failed, a hard fail happens and it is properly displayed in UI that this was a mirror fail.

 4. log all mirror errors and review regularly.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Hello Alexander,

On Sun, 22 May 2011 17:43:19 -0000
Alexander Sack <email address hidden> wrote:

> 2. error during "mirror" update might be ok.

I used to think that...

> So, let's still go ahead
> and mark such builds in UI with a warning like "might have used
> outdated repos: <FAILED-TO-SYNC-REPO-LIST> would be appropriate.

... until I spent a day at LDS banging my head why build says that some
branch doesn't exist while it is in the repo. So no, all errors should
be errors, otherwise, there still will be errors, but much more obscure
and hard to debug.

--
Best Regards,
Paul

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Change which propagates mirror error to the build side was deployed a weak ago and show to be quite helpful with diagnosing build problems, and useful for wider audience than just build system maintainer. We uncovered few more specific issues, and now have better understanding of narrow places in the system. All in all, we now have pretty good error reporting, closing this bug.

Changed in linaro-android-mirror:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.