communication failed - user timeout caused connection failure

Reported by Michael Nelson on 2010-09-24
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
High
Julian Edwards

Bug Description

A number of times over the past few days, people have been complaining that builds are being bounced from one builder to another. maxb reported this morning that issues with a build of bzr. Checking the logs shows the builder dispatching fine, and then:

2010-09-24 07:00:49+0000 [-] <americium:http://americium.ppa:8221/> communication failed (User timeout caused connection failure.)

Checking the history for this builder shows that, although it's still taking on new builds, the last build to finish was over 3 hrs ago.

Looking for similar errors in the logs results in quite a few buildds, shown below. I checked a few of these buildd's histories, and they all have a large gap of 3-4 hours where they didn't finish anything - as if none of them could communicate the results back.

2010-09-24 07:31:13+0000 [-] <actinium:http://actinium.ppa:8221/> communication failed (User timeout caused connection failure.)
2010-09-24 07:31:13+0000 [-] <actinium:http://actinium.ppa:8221/> failure (None)
2010-09-24 07:31:13+0000 [-] <einsteinium:http://einsteinium.ppa:8221/> communication failed (User timeout caused connection failure.)
2010-09-24 07:31:13+0000 [-] <einsteinium:http://einsteinium.ppa:8221/> failure (None)
2010-09-24 07:31:53+0000 [-] <cushaw:http://cushaw.buildd:8221/> communication failed (User timeout caused connection failure.)
2010-09-24 07:31:53+0000 [-] <cushaw:http://cushaw.buildd:8221/> failure (None)
2010-09-24 07:31:53+0000 [-] <hawthorn:http://hawthorn.buildd:8221/> communication failed (User timeout caused connection failure.)
2010-09-24 07:31:53+0000 [-] <hawthorn:http://hawthorn.buildd:8221/> failure (None)
2010-09-24 07:31:54+0000 [-] <allspice:http://allspice.buildd:8221/> communication failed (User timeout caused connection failure.)
2010-09-24 07:31:54+0000 [-] <allspice:http://allspice.buildd:8221/> failure (None)
2010-09-24 07:31:54+0000 [-] <adare:http://adare.buildd:8221/> communication failed (User timeout caused connection failure.)
2010-09-24 07:31:54+0000 [-] <adare:http://adare.buildd:8221/> failure (None)
2010-09-24 07:31:54+0000 [-] <gourd:http://gourd.buildd:8221/> communication failed (User timeout caused connection failure.)
2010-09-24 07:31:54+0000 [-] <gourd:http://gourd.buildd:8221/> failure (None)
2010-09-24 07:31:57+0000 [-] <palmer:http://palmer.buildd:8221/> communication failed (User timeout caused connection failure.)
2010-09-24 07:31:57+0000 [-] <palmer:http://palmer.buildd:8221/> failure (None)
2010-09-24 07:31:57+0000 [-] <genip:http://genip.buildd:8221/> communication failed (User timeout caused connection failure.)
2010-09-24 07:31:57+0000 [-] <genip:http://genip.buildd:8221/> failure (None)
2010-09-24 07:31:57+0000 [-] <crested:http://crested.buildd:8221/> communication failed (User timeout caused connection failure.)
2010-09-24 07:31:57+0000 [-] <crested:http://crested.buildd:8221/> failure (None)
2010-09-24 07:31:57+0000 [-] <plutonium:http://plutonium.ppa:8221/> communication failed (User timeout caused connection failure.)
2010-09-24 07:31:57+0000 [-] <plutonium:http://plutonium.ppa:8221/> failure (None)
2010-09-24 07:31:57+0000 [-] <nannyberry:http://nannyberry.ppa:8221/> communication failed (User timeout caused connection failure.)
2010-09-24 07:31:57+0000 [-] <nannyberry:http://nannyberry.ppa:8221/> failure (None)
2010-09-24 07:31:57+0000 [-] <mercury:http://mercury.ppa:8221/> communication failed (User timeout caused connection failure.)
2010-09-24 07:31:58+0000 [-] <mercury:http://mercury.ppa:8221/> failure (None)
2010-09-24 07:31:58+0000 [-] <lakoocha:http://lakoocha.ppa:8221/> communication failed (User timeout caused connection failure.)
2010-09-24 07:31:58+0000 [-] <lakoocha:http://lakoocha.ppa:8221/> failure (None)

etc.

Related branches

description: updated
Michael Nelson (michael.nelson) wrote :

maxb witnessed the issue again:

10:32 < maxb> noodles775: shipova just ejected my build, it's now starting again on thorium
10:34 < maxb> noodles775: for the record, that build on shipova had most definitely started. It had been running for an hour, and was displaying build log output

Checking the log this time shows what seems to be a different issue - the builder being marked as not ok:

2010-09-24 08:30:53+0000 [-] shipova was made unavailable, resetting attached job

Michael Nelson (michael.nelson) wrote :

Ignore the previous comment - it seems shipova (and other builders) were pulled out of the buildfarm so it is correct. But the original communication error is still a mystery (that has resolved itself).

On Friday 24 September 2010 09:46:37 Michael Nelson wrote:
> Ignore the previous comment - it seems shipova (and other builders) were
> pulled out of the buildfarm so it is correct. But the original
> communication error is still a mystery (that has resolved itself).

I am hoping that Jelmer's changes will alleviate this problem, which I am
CPing on Monday. We'll take it from there.

affects: launchpad-buildd → soyuz
Changed in soyuz:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Julian Edwards (julian-edwards)
tags: added: buildd-manager
Changed in soyuz:
status: Triaged → Fix Released
milestone: none → 10.11
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers