Name resolution failure on gate job

Bug #1270382 reported by Ildiko Vancsa on 2014-01-18
54
This bug affects 10 people
Affects Status Importance Assigned to Milestone
OpenStack Core Infrastructure
Fix Released
Medium
James E. Blair
OpenStack-Gate
Medium
James E. Blair
devstack
Undecided
Unassigned

Bug Description

Tests failed on gate, with Temporary name resolution error:

2014-01-17 21:59:06.598 | Started by user anonymous
2014-01-17 21:59:06.606 | [EnvInject] - Loading node environment variables.
2014-01-17 21:59:09.804 | Building remotely on bare-precise-rax-ord-1118767 in workspace /home/jenkins/workspace/gate-ceilometer-pep8
2014-01-17 21:59:27.318 | [gate-ceilometer-pep8] $ /bin/bash -xe /tmp/hudson754625762436948936.sh
2014-01-17 21:59:28.122 | + /usr/local/jenkins/slave_scripts/gerrit-git-prep.sh https://review.openstack.org http://zuul.openstack.org git://git.openstack.org
2014-01-17 21:59:28.127 | Triggered by: https://review.openstack.org/62158
2014-01-17 21:59:28.133 | + [[ ! -e .git ]]
2014-01-17 21:59:28.134 | + ls -a
2014-01-17 21:59:28.135 | .
2014-01-17 21:59:28.136 | ..
2014-01-17 21:59:28.137 | + rm -fr '.[^.]*' '*'
2014-01-17 21:59:28.138 | + '[' -d /opt/git/openstack/ceilometer/.git ']'
2014-01-17 21:59:28.139 | + git clone file:///opt/git/openstack/ceilometer .
2014-01-17 21:59:28.152 | Cloning into '.'...
2014-01-17 21:59:29.009 | + git remote set-url origin git://git.openstack.org/openstack/ceilometer
2014-01-17 21:59:29.014 | + git remote update
2014-01-17 21:59:29.019 | Fetching origin
2014-01-17 22:04:09.302 | fatal: unable to connect to git.openstack.org:
2014-01-17 22:04:09.303 | git.openstack.org: Temporary failure in name resolution
2014-01-17 22:04:09.303 |
2014-01-17 22:04:09.313 | error: Could not fetch origin
2014-01-17 22:04:09.314 | + echo 'The remote update failed, so garbage collecting before trying again.'
2014-01-17 22:04:09.315 | The remote update failed, so garbage collecting before trying again.
2014-01-17 22:04:09.315 | + git gc
2014-01-17 22:04:09.657 | + git remote update
2014-01-17 22:04:09.664 | Fetching origin
2014-01-17 22:08:49.955 | fatal: unable to connect to git.openstack.org:
2014-01-17 22:08:49.955 | git.openstack.org: Temporary failure in name resolution
2014-01-17 22:08:49.955 |
2014-01-17 22:08:49.957 | error: Could not fetch origin
2014-01-17 22:08:50.197 | Build step 'Execute shell' marked build as failure
2014-01-17 22:08:51.045 | [SCP] Copying console log.
2014-01-17 22:08:51.053 | Finished: FAILURE

Joe Gordon (jogo) wrote :

logstash query: message:"git.openstack.org: Temporary failure in name resolution" AND filename:"console.html"

20 hits in logstash

Reviewed: https://review.openstack.org/68280
Committed: https://git.openstack.org/cgit/openstack-infra/elastic-recheck/commit/?id=d8e4e1ba0078b9b50575793bbc8344e97c5853e7
Submitter: Jenkins
Branch: master

commit d8e4e1ba0078b9b50575793bbc8344e97c5853e7
Author: Joe Gordon <email address hidden>
Date: Tue Jan 21 14:40:46 2014 -0800

    Add fingerprint for bug 1270382

    bug 1270382 is a infra bug

    Change-Id: I040b007634cff3d2ce21c6cf85c008167f385901
    Related-Bug: #1270382

The above logstash query does not cover failures like the following:

2014-01-16 19:23:47.779 | Downloading/unpacking dnspython>=1.9.4 (from -r /home/jenkins/workspace/gate-swift-docs/requirements.txt (line 1))
2014-01-16 19:23:47.779 | Error <urlopen error [Errno -3] Temporary failure in name resolution> while getting http://pypi.openstack.org/openstack/dnspython/dnspython-1.11.1.zip (from http://pypi.openstack.org/openstack/dnspython/)
2014-01-16 19:23:47.779 | Cleaning up...

See: http://logs.openstack.org/07/66407/2/gate/gate-swift-docs/6b1df29/console.html.gz

It would seem we should just search for message:"Temporary failure in name resolution".

Reviewed: https://review.openstack.org/68961
Committed: https://git.openstack.org/cgit/openstack-infra/elastic-recheck/commit/?id=314486a36aa3e1ff46518b323bfee8cb52c8ea5b
Submitter: Jenkins
Branch: master

commit 314486a36aa3e1ff46518b323bfee8cb52c8ea5b
Author: Peter Portante <email address hidden>
Date: Fri Jan 24 11:41:38 2014 -0500

    Expand DNS failures to catch all "errno" string output

    Related bug 1270382

    Change-Id: I6a958d0575fb0b5ece0bdc3b3ce50990615b5557

James E. Blair (corvus) on 2014-02-04
Changed in openstack-ci:
status: New → Incomplete
importance: Undecided → Medium
Jeremy Stanley (fungi) wrote :

This seems to be mainly effecting slaves in Rackspace (ironic since that's where the DNS record in question is hosted). My best guess is that their resolvers get overloaded from time to time.

One possible workaround would be to prime a local resolver cache on each slave with multiple retries to look up the names of each of the hosts we need to communicate with during the course of normal job runs.

Jeremy Stanley (fungi) on 2014-03-11
Changed in openstack-ci:
status: Incomplete → Confirmed
milestone: none → icehouse
Mark McLoughlin (markmc) wrote :

Sounds like there isn't devstack work needed here

Changed in devstack:
status: New → Invalid
Clark Boylan (cboylan) wrote :

The underlying issue here was rackspace would blacklist IPs using their DNS resolvers if they queried too much. When these IPs were recycled and picked up by our slaves the DNS blacklist for the IPs was still in place preventing name resolution. To correct this we have installed unbound as local caching forwarding resolvers on all of our slaves. Instead of talking to rackspace and hpcloud DNS servers we now talk to google dns servers.

Changed in openstack-ci:
assignee: nobody → James E. Blair (corvus)
status: Confirmed → Fix Released
Craig Bryant (craig-bryant) wrote :

My job still failed on gate on 7/16/2014:

https://review.openstack.org/#/c/107181/

http://logs.openstack.org/81/107181/8/gate/gate-cookbook-monasca-thresh-chef-unit/b1b0322/console.html

2014-07-16 22:01:43.289 | Started by user anonymous
2014-07-16 22:01:43.292 | Building remotely on bare-precise-rax-dfw-928290 in workspace /home/jenkins/workspace/gate-cookbook-monasca-thresh-chef-unit
2014-07-16 22:01:50.040 | [gate-cookbook-monasca-thresh-chef-unit] $ /bin/bash /tmp/hudson4648257686961539642.sh
2014-07-16 22:01:52.009 | [gate-cookbook-monasca-thresh-chef-unit] $ /bin/bash -xe /tmp/hudson4842354825482043824.sh
2014-07-16 22:01:52.015 | + /usr/local/jenkins/slave_scripts/gerrit-git-prep.sh https://review.openstack.org git://git.openstack.org
2014-07-16 22:01:52.213 | Triggered by: https://review.openstack.org/107181
2014-07-16 22:01:52.213 | + [[ ! -e .git ]]
2014-07-16 22:01:52.213 | + ls -a
2014-07-16 22:01:52.216 | .
2014-07-16 22:01:52.216 | ..
2014-07-16 22:01:52.216 | + rm -fr '.[^.]*' '*'
2014-07-16 22:01:52.219 | + '[' -d /opt/git/stackforge/cookbook-monasca-thresh/.git ']'
2014-07-16 22:01:52.344 | + git clone file:///opt/git/stackforge/cookbook-monasca-thresh .
2014-07-16 22:01:53.462 | Cloning into '.'...
2014-07-16 22:02:03.988 | + git remote set-url origin git://git.openstack.org/stackforge/cookbook-monasca-thresh
2014-07-16 22:02:04.075 | + git remote update
2014-07-16 22:02:04.368 | Fetching origin
2014-07-16 22:02:04.380 | fatal: unable to connect to git.openstack.org:
2014-07-16 22:02:04.380 | git.openstack.org: Temporary failure in name resolution
2014-07-16 22:02:04.380 |
2014-07-16 22:02:04.381 | error: Could not fetch origin
2014-07-16 22:02:04.381 | + echo 'The remote update failed, so garbage collecting before trying again.'
2014-07-16 22:02:04.381 | The remote update failed, so garbage collecting before trying again.
2014-07-16 22:02:04.382 | + git gc
2014-07-16 22:02:20.297 | + git remote update
2014-07-16 22:02:20.303 | Fetching origin
2014-07-16 22:02:20.308 | fatal: unable to connect to git.openstack.org:
2014-07-16 22:02:20.308 | git.openstack.org: Temporary failure in name resolution
2014-07-16 22:02:20.308 |
2014-07-16 22:02:20.308 | error: Could not fetch origin
2014-07-16 22:02:20.317 | Build step 'Execute shell' marked build as failure
2014-07-16 22:02:20.426 | [SCP] Copying console log.
2014-07-16 22:02:20.972 | [SCP] Trying to create /srv/static/logs/81/107181/8/gate
2014-07-16 22:02:20.988 | [SCP] Trying to create /srv/static/logs/81/107181/8/gate/gate-cookbook-monasca-thresh-chef-unit
2014-07-16 22:02:20.990 | [SCP] Trying to create /srv/static/logs/81/107181/8/gate/gate-cookbook-monasca-thresh-chef-unit/b1b0322
2014-07-16 22:02:21.004 | Finished: FAILURE

Jan Klare (j-klare) wrote :

Also running into the same error, but retriggering the job solved the problem for me (i know that this is not the solution ;) )

Jeremy Stanley (fungi) on 2014-12-04
Changed in openstack-gate:
status: New → Fix Released
importance: Undecided → Medium
assignee: nobody → James E. Blair (corvus)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers